Unraveling the Genetics of Congenital Diaphragmatic Hernia: An Ongoing Challenge

Congenital diaphragmatic hernia (CDH) is a congenital structural anomaly in which the diaphragm has not developed properly. It may occur either as an isolated anomaly or with additional anomalies. It is thought to be a multifactorial disease in which genetic factors could either substantially contribute to or directly result in the developmental defect. Patients with aneuploidies, pathogenic variants or de novo Copy Number Variations (CNVs) impacting specific genes and loci develop CDH typically in the form of a monogenetic syndrome. These patients often have other associated anatomical malformations. In patients without a known monogenetic syndrome, an increased genetic burden of de novo coding variants contributes to disease development. In early years, genetic evaluation was based on karyotyping and SNP-array. Today, genomes are commonly analyzed with next generation sequencing (NGS) based approaches. While more potential pathogenic variants are being detected, analysis of the data presents a bottleneck—largely due to the lack of full appreciation of the functional consequence and/or relevance of the detected variant. The exact heritability of CDH is still unknown. Damaging de novo alterations are associated with the more severe and complex phenotypes and worse clinical outcome. Phenotypic, genetic—and likely mechanistic—variability hampers individual patient diagnosis, short and long-term morbidity prediction and subsequent care strategies. Detailed phenotyping, clinical follow-up at regular intervals and detailed registries are needed to find associations between long-term morbidity, genetic alterations, and clinical parameters. Since CDH is a relatively rare disorder with only a few recurrent changes large cohorts of patients are needed to identify genetic associations. Retrospective whole genome sequencing of historical patient cohorts using will yield valuable data from which today's patients and parents will profit Trio whole genome sequencing has an excellent potential for future re-analysis and data-sharing increasing the chance to provide a genetic diagnosis and predict clinical prognosis. In this review, we explore the pitfalls and challenges in the analysis and interpretation of genetic information, present what is currently known and what still needs further study, and propose strategies to reap the benefits of genetic screening.

Congenital diaphragmatic hernia (CDH) is a congenital structural anomaly in which the diaphragm has not developed properly. It may occur either as an isolated anomaly or with additional anomalies. It is thought to be a multifactorial disease in which genetic factors could either substantially contribute to or directly result in the developmental defect. Patients with aneuploidies, pathogenic variants or de novo Copy Number Variations (CNVs) impacting specific genes and loci develop CDH typically in the form of a monogenetic syndrome. These patients often have other associated anatomical malformations. In patients without a known monogenetic syndrome, an increased genetic burden of de novo coding variants contributes to disease development. In early years, genetic evaluation was based on karyotyping and SNP-array. Today, genomes are commonly analyzed with next generation sequencing (NGS) based approaches. While more potential pathogenic variants are being detected, analysis of the data presents a bottleneck-largely due to the lack of full appreciation of the functional consequence and/or relevance of the detected variant. The exact heritability of CDH is still unknown. Damaging de novo alterations are associated with the more severe and complex phenotypes and worse clinical outcome. Phenotypic, genetic-and likely mechanistic-variability hampers individual patient diagnosis, short and long-term morbidity prediction and subsequent care strategies. Detailed phenotyping, clinical follow-up at regular intervals and detailed registries are needed to find associations between long-term morbidity, genetic alterations, and clinical parameters. Since CDH is a relatively rare disorder with only a few recurrent changes large cohorts of patients are needed to identify genetic associations. Retrospective whole genome sequencing of historical patient cohorts using will yield valuable data from which today's patients and parents will profit Trio whole genome sequencing has an excellent potential for future re-analysis and data-sharing increasing the chance to provide a genetic diagnosis and predict clinical prognosis. In this review, we explore the pitfalls and challenges in the analysis and interpretation of genetic information, present what is currently known and what still needs further study, and propose strategies to reap the benefits of genetic screening.

INTRODUCTION
Congenital diaphragmatic hernia (CDH) [OMIM: 142340] has an estimated incidence of 1 in 1,750-5,880 live births (1)(2)(3) and is characterized by a defect of the diaphragm. This defect allows herniation of the abdominal organs into the thorax. CDH can be detected prenatally during first or second trimester ultrasounds in 50-68% of CDH pregnancies (4)(5)(6)(7). Patients are often referred to a center of expertise with a specialized multidisciplinary team for prenatal assessment, prognostic and genetic counseling and care. CDH prevalence has slightly increased in the past years (3). Still, the mortality rates have decreased, probably due to better treatment strategies (8), although this decline is more pronounced in wealthier coutnries than in developing countries (9).
Most of what we know of human diaphragm development is based on descriptive and functional analyses of animal models. The diaphragm muscle develops initially from transient structures located at the top of the liver: the septum transversum, the pleuroperitoneal folds, the posthepatic mesenchymal plate, and the somites. Myoblast progenitors and other mesenchymal cells (10) in the developing pleuroperitoneal folds expand and migrate to the posthepatic mesenchymal plate. Vice versa, cells from the posthepatic mesenchymal plate migrate toward the pleuroperitoneal folds. Finally, the pleuroperitoneal folds fuse with the posthepatic mesenchymal plate between embryonic day (E) E12.5 and E13.5 (10,11). When complete, this membrane separates the thoracic and abdominal cavity (E14.5). In CDH, this process is disrupted and the diaphragm will not fully close (12,13). A more detailed description of diaphragm and CDH development can be found elsewhere in this issue (14).
Patients with aneuploidies, pathogenic single nucleotide variants, de novo Copy Number Variations (CNVs) (15)(16)(17)(18) develop CDH, often in the form of a monogenetic syndrome and in combination with other anatomical malformations (2,19). Here, we discuss what is currently known and inventoried what is necessary to provide optimal genetic counseling for the individual patients and their parents. We evaluate genetic outcome of a CDH cohort in the Erasmus MC-Sophia Children's Hospital, Rotterdam, the Netherlands, and propose strategies to reap the benefits of genetic screening.

CDH HAS SUBTYPES BASED ON DEFECT SIZE, TYPE AND ANATOMICAL LOCATION
CDH is the most severe diaphragm defects compared to other, less frequent defects such as incomplete muscularization of the diaphragm (diaphragmatic eventration) or the presence of just a thin layer of non-muscular tissue (sac hernia). Subtypes are identified by the size and anatomical location of the herniation. Most prevalent are Bochdalek hernias, which are mostly leftsided (20). Prenatal predictors for survival include associated malformations (21), defect size (7), lung volume (22), liver herniation (23), stomach position (24,25), and lung-to-head ratio (26,27). Other predictors include birth weight, Apgar score, respiratory parameters, cardiac anomalies, chromosomal changes, and pulmonary hypertension (28)(29)(30).

THE RELATION OF DEFECT SIZE AND GENETIC ALTERATIONS
Larger diaphragms defects are associated with a higher mortality rate, the prevalence of associated anatomical malformations as well as the number of associated anatomical malformations (21). We hypothesized that large continuous locus or gene changes (e.g., 15q26 loss, 17q12 loss; see Table 1) can modify multiple genes involved in diaphragm formation, and impact the development of the embryo in general. In contrast, small deletions or Single Nucleotide Variants (SNVs) as seen in for instance FBN1, TGFB3, and SLC2A10 (see Table 2) will be associated with smaller defects. Therefore, we evaluated whether the size of the defect was associated with the finding of "a pathogenic genomic variant" and/or "a genetic syndrome." We compared the genetic test results and the defect size classification (n = 336). Statistical analysis did not indicate associations of the defect size with an different, uncommon genetic test result. What we did observed was that patients with no or little follow-up revealed associations (P < 0.001). In this category patients are present lacking a registered defect size or registered genetic test. This category includes patients who have not been subjected to an intervention due to intrauterine fetal demise or termination of pregnancy. In the Netherlands, pregnancies in which severe genetic anomalies (e.g., Edwards syndrome, Patau syndrome) or structural malformations are observed that are incompatible with life, are often terminated. The CDH defect size is not determined in those cases (see Table 1). Therefore, a complete genetic and phenotypic evaluation and subsequent association analysis in this particular group is difficult and often not performed.

ISOLATED CDH AND COMPLEX CDH
CDH may present as an isolated anomaly (isolated-CDH) or patients can have one or more additional anomalies (CDHcomplex) (1,31). Anomalies can be found in all body sites; cardiac anomalies, anomalies of the urogenital system, limb malformations, nervous system anomalies, orofacial clefts, and gastrointestinal anomalies including intestinal atresia (3, 32). Zaiss et al. described syndromic clinical features such as hypertelorism not assigned to a specific syndrome in 7.7% of studied patients (32). Pathogenic genetic alterations-both in complex and in isolated CDH-are associated with a worse prognosis (33). Moreover, de novo pathogenic alterations are seen more often in complex CDH (34)(35)(36). Phenotypical complex patients could be more likely to receive a genetic test. In our cohort, genetic test results were described for patients with associated anomalies (n = 207) and for patients without associated anomalies (n = 311). Thus, there was not a priory bias in this respect (p = 0.923). Twenty patients with associated anomalies had pathogenic genetic alterations vs. one with isolated CDH (P < 0.001). Main outcome parameters of the Erasmus MC-Sophia Children's Hospital, Rotterdam, the Netherlands CDH cohort are depicted in Tables 3, 4. Full cohort descriptions and analysis methods are described in Supplementary Tables S1, S2.
Comparing features of isolated CDH and complex CDH is difficult, depending on how accurately these two groups can be distinguished. Not all patients receive the same phenotypical evaluation and registration is sometimes incomplete. For instance, not all associated anatomical malformations are detectable with ultrasound. Nevertheless, increased resolution of prenatal ultrasound over time has improved the detection of associated anatomical malformations. Neurological symptoms could develop at later age and are not noticeable during the first months or years of development. Furthermore, not all symptoms observed during often organ specific evaluations of medical subspecialities. For instance, postnatal monitoring is essential to detect any associated neurological or ophthalmological symptoms. CDH registries would benefit from regular reevaluation of these outcome measures. In short, there is a level of uncertainty in registries regarding which patients have no associated anomalies, have no associated anomalies detected, or have no associated anomalies registered.

GENETIC ASSOCIATIONS AND CO-MORBIDITY
Long-term complications in children born with CDH include chronic lung disease, feeding difficulties, gastroesophageal reflux, growth failure, scoliosis, chest asymmetry, neurodevelopmental delay, and sensorineural hearing loss (37,38). These co-morbidities can be either a direct or indirect consequence of the CDH or be a consequence of the treatment. Damaging de novo variations in both isolated CDH and complex CDH-complex have been found associated with pulmonary hypertension, higher mortality rate, and worse neurodevelopmental outcome (33). There is a large difference in survival rates between patients with or without persistent pulmonary hypertension (39) and bronchopulmonary sequestration (40). The genetic contribution to bronchopulmonary sequestration etiology is unknown. Mutations in BMPR2 (41, 42) and several SMAD signaling molecule genes have been associated with the development of pulmonary hypertension in adults and children (43)(44)(45). A striking association between TGF-β/SMAD signaling and pulmonary hypertension has been reported in CDH, as the CDH lungs had increased miR-200b expression and decreased TGF-β/SMAD signaling (46). Increasing miR-200b decreases the TGF-β signaling and reduces lung hypoplasia in a nitrofen induced congenital diaphragmatic hernia -pulmonary hypertension rat model (46). Similarly, Pereira-Terra and colleagues described a specific micro-RNA signature in tracheal aspirate fluid, upregulation of miR-200b and miR-10a and decreased TGFB signaling (47). Patients with mutations in genes from this pathway have connective tissue disorders (48). In patients and mice, several genetic factors have been associated to lung and cardiac abnormalities (2,(49)(50)(51)(52). CDH has been found in patients with connective tissue disorders such as Marfan syndrome (53), Loeys-Dietz Syndrome (54, 55) and arterial tortuosity syndrome (56). Patients with these connective tissue disorders are at increased risk of cardiovascular problems (57,58) later in life. Abnormal retinoic acid signaling can result in a diaphragm defect (59). Patients with variants in STRA6 and RARB -receptors and deletions of RBP1 at chromosome 3q22 (60,61) in the retinoic acid signaling pathway have ophthalmic symptoms (62,63). Patients with CDH may have other eye defects as well (64,65). These occurrences of direct genotype-phenotype correlations stress the importance of genetic diagnostic screening to inform parents and patients about possible co-morbidities.

CDH IS A COMPLEX GENETIC DISORDER
CDH is a multifactorial disease but neither environmental nor genetic contributions have been fully characterized. Maternal morbidities during pregnancy such as pre-gestational hypertension (66) and pre-existent maternal obesity (67)(68)(69) are associated with an increased risk for development of CDH  (77), alcohol intake (69,(77)(78)(79), and smoking (75,78,80). However, to what extent these associations impact diaphragm development and the onset of CDH is not known. The mother's nutrient intake during pregnancy is associated as well (81,82); reduced vitamin A intake during pregnancy has the strongest associations with CDH (83,84). Vitamin A shortage can be detected postnatally (85). It is hard to determine whether environmental factors explain some of the non-genetic contributions on a population level or to what extent the environment interacts with the processes disturbed by genetic anomalies. Epigenetic differences acquired during the life span can be detected between monozygotic twin pairs (86)(87)(88). Evaluating these differences-and the resulting gene expression changes-is an interesting approach. There are methods to overcome cellular heterogeneity and if epigenetic changes are present in blood these can be compared between patient and sibling (89)(90)(91).
The exact heritability-the contribution of genetic factorsis difficult to determine, in light of the relatively low disease incidence, the high mortality limiting vertical transmission and the limited numbers of twin pregnancies (92,93). Heritability can be estimated using twin studies. For CDH, the concordance rates in dizygotic and monozygotic twins are comparable. Fifty-three monozygotic twins have been described, of whom 12 were concordant for CDH (2,92). In our cohort, 24 twin pairs (15 dizygotic, 8 monozygotic, and one same sex twin pair of whom no genetic material was available to determine zygosity) are described. One dizygotic and one monozygotic twin pair were concordant for CDH. To reduce the effect of technical noise in twin comparisons, we used different alignment techniques, variant callers and statistics (see Supplementary Table S3). Neither the larger CNVs (94) nor SNVs (see Supplementary Table S3) differed between these twin siblings. Differences in phenotype can also be the result of twinto-twin perfusion differences. Furthermore, single nucleotide changes could be located outside the coding sequence or at very low frequency, and then could not be detected with exome sequencing.
Depending on the specific family the monogenetic disorder has CDH is either a common or a less prevalent feature. More than 100 (candidate) genes have been described, mostly identified from animal models or monogenetic syndromes (2,19). Monogenetic syndromes often have distinct phenotypical features and have been reviewed by Longoni et al. and Yu et al. (20,114). Monogenetic syndromes in which CDH is a frequent feature are, for instance, autosomal recessive Donnai Barrow syndrome (OMIM: #22248, LRP2 gene), syndromic microphthalmia (#601186, #615524, STRA6, RARB), and autosomal dominant cardiac-urogenital syndrome (#3618280, MYRF gene). Associated phenotypes in these syndromes are congenital heart defects, sensorineural hearing loss, microphthalmia, genitourinary malformations, craniosynostosis and myopia with each of these syndromes its distinct features. Detailed phenotyping might be crucial in diagnosing clusters of CDH patients: either "phenotype first" and searching for an overlapping gene or "genotype first" and searching if patients with the same affected gene have an overlapping phenotype. Interestingly, Fryns syndrome and also Pentalogy of Cantrell have CDH as a defining feature; yet the gene or genes responsible for these conditions are not yet known.
Interpretation of genetic results can be hindered by reduced penetrance (18,122) and variable expressivity (2) that may mask the causal culprit in segregation analysis (see Figure 1). Polygenic inheritance (51), locus heterogeneity (33,34,130), and contributions of different kinds of genetic variation (17,114) mask culprits from innocent bystanders. Therefore, large patient and control samples sizes are required to have enough power to classify variants into "benign, " "causal, " or "contributing."

FROM PATHOGENIC ALTERATION TO CDH
Finding a genetic variant predicted to be deleterious is only the first step in proving the functional effect of this DNA alteration. This is especially true for missense changes, in-frame insertiondeletions and copy number variations. Often there is only insilico evidence regarding the impact of a variant on gene function and the way in which the disturbed gene function affects a biological pathway or mechanism. What is lacking is proof how a specific deleterious variant lead to defective diaphragm formation. Unfortunately, for most likely pathogenic CNVs and SNVs, the assumed functional consequence is based on the genetic alteration itself: i.e., copy number loss or nonsense variant is assumed to result in reduced amounts of mRNA expression and protein. Deleterious de novo missense variants and in-frame insertion-deletions in conserved coding regions are more difficult to relate to a likely functional consequence and is often on insilico surveys. Improving the in-vitro evaluation of candidate variants is crucial in distinguishing causal variants from noncausal variants. These experiments require tremendous effort and can be complicated by the presence of more than one candidate alteration.
Detecting a deleterious variant in a gene in multiple patients helps prioritizing candidate genes for function evaluation and studies using animal models. In a large cohort (n = 827), seven syndromic and four recurrent CNVs were identified (104). Some of these have already been associated with CDH; e.g., 17q12 deletions, 16p13.1 duplications, 22q11 deletions, and 21q22 duplications. Furthermore, 87 CNVs were de novo, of which 54 were large (>2 Mb) deletions (104). Although non-recurrent, at least a proportion of these large de novo deletions are likely to be related to the patient's phenotype. Ten genes were enriched for de novo variants, of which mitochondrial lon peptidase 1 (LONP1) and Aly/REF export factor (ALYREF) were the most promising candidate disease genes. LONP1, MYRF as well as ZFPM2 reached or approached genome wide significance when a variant burden test was performed for all deleterious changes (i.e., including inherited variants) (104). Combining multiple "omics" and in-vitro translational approaches can potentially bridge the gap between genetic findings and animal models.
In animal models, fewer progenitors reaching the PPF at the proper developmental due to decreased proliferation, increased apoptosis, migration defects or failure to differentiate in their proper cell fates have been proposed as causes for CDH (131)(132)(133)(134). Disturbances in specific processes such as retinoic acid signaling or muscle connective tissue formation were initially discovered in animal experiments; genes associated with these pathways or processes were subsequently found altered in patients (132,(135)(136)(137)(138). Additionally, disturbed processes can be identified using gene enrichment strategies to find common denominators in the affected genes and loci. Longoni and colleagues described the enrichment of rare, likely deleterious variants in CDH patients of genes derived from mouse PPF embryonic transcriptomes (139), known human disease genes, their protein interaction partners and candidate genes from CNV hotspots (35). Often, these alterations were inherited and implicate non-Mendelian inheritance patterns. On the individual level, these changes can be regarded as risk factors. Combined, these changes may affect a biological pathway to such an extent that they result in CDH. Assigning such a pathway or process-for instance how these gene variants disturb myoblast progenitor cell proliferation or migration-is not easy. Animal models are not perfect, although they provide evidence of involvement of a gene when it is knocked-out and in which cases the animals develop CDH at a certain frequency. However, this procedure hardly ever takes into account that genetic variation is mostly not a complete loss-of-function of a gene. Missense variants, copy number gains and heterozygous changes couldand likely do-differ in impact or mechanism of action. Thus, in these cases, knock-out models either over-or underestimate the effect of a genetic variant.
In some cases, specific variants can be associated with the causative mechanism; e.g., the association of FBN1 variants in Marfan syndrome (53) and defects in the connective tissue. Indeed, our cohort included patients with FBN1 and TGFB3 alterations. In other patients, the affected pathway is known; e.g., patients with deletions of NR2F2 (123) have a defect in a gene that codes for a receptor that is activated by retinoic acid signaling (140). Of other genes, we know that they interact with other disease genes, are expressed in the developing diaphragm and are also associated with retinoic acid signaling (e.g., ZFPM2, GATA4). A small difference in spatial and temporal binding and organ-specific combination of transcription factors have been suggested as links between the different syndromes with CDH (141). Most of the deleterious CNVs and aneuploidies are assumed pathogenic and the most likely cause of the diaphragm defect. However, how these-often continuous gene deletionsin patients impact diaphragm formation and subsequently result in CDH remains unclear.

TEMPORAL SCREENING BIAS
Technologies have a different resolution to detect genomic changes ranging from chromosome arms, several mega-bases to single nucleotide level. Initially, patients were evaluated with karyotyping, MLPA and QF-PCR, with which only aneuploidies or chromosome (band) level changes could be detected. At the Erasmus MC-Sophia Children's Hospital, SNP-array was introduced in 2010 and is standard practice in case of ultrasound abnormalities since 2012. The use of SNP arrays increased the detection resolution to gains and losses of several from mb to kilobases. Many patients in our cohort have retrospectively been re-evaluated with SNP-array. In 10.9% of patients a pathogenic change was. Similarly, 10.4% of patients registered in the EUROCAT registry (1980-2009) have a chromosomal anomaly, genetic syndrome or microdeletion (3). This was before the NGS era, and the findings mostly represent the larger genetic changes with a large phenotypic effect. Whole exome sequencing was introduced in our clinic more recently (2015), and initially only used to evaluate the more complex patients. Restoring the temporal screening bias by screening large historical cohorts of patients and subsequent evaluating potential associations between genetic factors and long-term morbidity can benefit the future and today's patients and parents.

COLLABORATION IS KEY
Combining disease cohorts revealed that damaging de novo alterations are associated with the more severe and complex phenotypes (33,130). This strategy was pivotal in identifying disease genes (98,104,130,142). The success of this effort stresses the importance of collaborations such as the DHREAMS consortium (http://www.cdhgenetics.com). Trio whole genomebased approaches are recommended, as these enable to simultaneously determine different types of genetic variation. Additionally, this technique is suited for continuous re-analysis. By combining and sequencing these cohorts, the CDH-EURO consortium (143) and Congenital Diaphragmatic Hernia Study Group (144) can add to endeavors of the DHREAMS consortium. This will enable to identify genes that are more often affected in patients than by chance alone, and will allow manageable numbers of required functional tests and animal models. For collaborations to work, samples need to be stored in wellmanaged biobanks and data should be meticulously archived for later re-analysis or re-evaluation. New challenges for these biobanks and data archiving and sharing are privacy regulations (145). Sharing of patient material and data should consider the privacy of participants and their families but also acknowledge the efforts of stakeholders such as researchers and clinicians (146). An ethical and legal balance should be sought weighing the privacy needs of individual patients against the medical benefits of the patient population.

CONCLUSIONS
Diagnostic yields of up to 37% using next generation sequencing have been proposed. These yields are reached when, in addition to genes from known monogenetic syndromes, heterozygous de novo variants in genes expressed at the proper time-point in relevant tissue in animal models are classified as likely pathogenic (105). Importantly, heritability and diagnostic yield are calculated on a population level. From a patient's or parents' perspective it matters the most to know (1) if they themselves or their children have or do not have genetic changes in their genome explaining the CDH, (2) if subsequent children or patients' offspring are at risk of CDH, and (3) what the consequences of these changes are for the prognosis and/or the probability of complications. CDH is now mostly detected prenatally; consequently, fast, accurate, and predictive genetic diagnostics are increasingly needed. As about a third of patients have a de novo variant in the coding region (104). For parents to make informed choices, it is vital to knowing if a genetic variant detected in their child is causal or benign, and what the predicted consequences are of this variant.