Rare or Overlooked? Structural Disruption of Regulatory Domains in Human Neurocristopathies

In the last few years, the role of non-coding regulatory elements and their involvement in human disease have received great attention. Among the non-coding regulatory sequences, enhancers are particularly important for the proper establishment of cell type–specific gene-expression programs. Furthermore, the disruption of enhancers can lead to human disease through two main mechanisms: (i) Mutations or copy number variants can directly alter the enhancer sequences and thereby affect expression of their target genes; (ii) structural variants can provoke changes in 3-D chromatin organization that alter neither the enhancers nor their target genes, but rather the physical communication between them. In this review, these pathomechanisms are mostly discussed in the context of neurocristopathies, congenital disorders caused by defects that occur during neural crest development. We highlight why, due to its contribution to multiple tissues and organs, the neural crest represents an important, yet understudied, cell type involved in multiple congenital disorders. Moreover, we discuss currently available resources and experimental models for the study of human neurocristopathies. Last, we provide some practical guidelines that can be followed when investigating human neurocristopathies caused by structural variants. Importantly, these guidelines can be useful not only to uncover the etiology of human neurocristopathies, but also of other human congenital disorders in which enhancer disruption is involved.

and long non-coding RNAs) that are codified by genes acting in trans (at different chromosomes) (Savarese and Grosschedl, 2006). The most relevant types of non-coding cisregulatory sequences include promoters, enhancers, silencers, and insulators (Ong and Corces, 2011;Wittkopp and Kalay, 2012). Promoters are bound by a core set of widely used and highly conserved transcriptional regulators (e.g., RNA polymerase II, general transcription factors or GTFs, etc.) that confer basal transcriptional activity and enable transcription initiation (Brown and Feder, 2005). In contrast, enhancers positively control the expression of their target genes in time and space (Wray, 2007) and are major determinants of cell type-specific gene-expression programs Groudine, 2010, 2011;Buecker and Wysocka, 2012). Likewise, silencers and insulators also contribute to the establishment of specific geneexpression programs by repressing genes or blocking enhancers, respectively (Gaszner and Felsenfeld, 2006;Doni Jayavelu et al., 2020;Ngan et al., 2020;Pang and Snyder, 2020). The importance of non-coding regulatory sequences is well illustrated by the fact that up to 90% of the disease-associated variants reside in non-coding sequences, preferentially within putative enhancers (Maurano et al., 2012;Krijger and De Laat, 2016).
Despite the major regulatory functions of enhancers, their identification was historically a difficult task as they lack strong genetic-defining features (Elgar and Vavouri, 2008). However, it has been found that epigenomic profiling and chromatin signatures can be used as powerful and universal tools to identify enhancers (Heintzman et al., 2009;. In particular, active enhancers are characterized by the binding of common coactivators (e.g., p300), an open chromatin conformation, the expression of short bidirectional RNAs (eRNAs), and by being flanked by nucleosomes marked with H3K4me1 and H3K27ac (Heintzman et al., 2009;Lam et al., 2014).
Enhancers can be located at great distances from their target genes. This is well exemplified by the Sonic hedgehog (Shh) locus, where an extensively studied enhancer, named ZRS, is located at the intron of a non-target gene (Lmbr1), approximately 850 kilobases away from Shh. The ZRS enhancer specifically controls the expression of Shh in the developing limb, and consequently, the disruption of this enhancer leads to severe limb malformations (Lettice, 2003;Sagai et al., 2005). In addition, enhancers sometimes skip their most proximal genes while controlling the expression of more distally located ones (Sanyal et al., 2012). As a consequence, it is difficult to assign enhancers to their target genes. However, the study of the three dimensional (3-D) structure of the DNA has contributed to overcome these limitations.

LONG-RANGE GENE EXPRESSION CONTROL: FAR IN THE GENOME BUT CLOSE IN NUCLEAR SPACE
Enhancers can control the expression of genes located at very large genomic distances (i.e., long-range regulation) (Kleinjan and van Heyningen, 2005;Sagai et al., 2005). Here we focus on enhancer regulation in cis, which, at least in vertebrates, seems to be the most prevalent regulatory mechanism. Nevertheless, enhancer regulation can also occur in trans, with some interesting examples of interchromosomal enhancer-gene interactions being described in both flies and mammals (Müller and Schaffner, 1990;Bashkirova and Lomvardas, 2019;Monahan et al., 2019). Although several mechanisms have been proposed to explain the long-range regulatory activity of enhancers, the most accepted one is the so-called looping model, whereby enhancers and their targets become close to each other in 3-D nuclear space due to the formation of chromatin loops (Palstra et al., 2003).
The emergence of chromosome conformation capture (3C) techniques [e.g., Hi-C, 4C-seq, HiChip (Lieberman-Aiden et al., 2009;van de Werken et al., 2012;Mumbach et al., 2016)] has largely improved the study of 3-D genome organization and our capacity to systematically link enhancers with their target genes (Bickmore, 2013). These methods are based on the quantification of interaction frequencies between loci that lie in close spatial proximity independently of their linear genomic distance. One of the most relevant findings coming from studies using 3C-related methods is that genomes tend to be organized in megabase-scale regulatory domains named topologically associating domains (TADs) (Dixon et al., 2012). TADs contain genomic regions that interact with themselves with high frequency while interacting less often with the rest of the genome. The majority of the enhancer-gene interactions occur within TADs (Dixon et al., 2012;Nora et al., 2013;Rao et al., 2014;Spielmann et al., 2018). Moreover, TADs constrain the genomic regions that an enhancer can act upon and, thus, insulate enhancers from contacting ectopic target genes located in different TADs (Lupiáñez et al., 2016). The regions preventing contact between neighboring TADs are called boundaries or borders, which are preferentially bound and established by architectural proteins, such as CTCF and Cohesin (Dixon et al., 2012;Rao et al., 2014). Thus, TADs can be considered as fundamental regulatory units that facilitate enhancer-gene interactions within a domain while insulating regulatory activity from neighboring domains ( Figure 1A). As we discuss in the following section, these concepts of 3-D genome organization have dramatically improved our capacity to predict and interpret the pathological consequences of human structural variation (SV).

PATHOLOGICAL DISRUPTION OF REGULATORY DOMAINS BY STRUCTURAL VARIANTS
Structural variation (SV) refers to genomic alterations, including deletions, duplications, inversions, insertions, and translocations, that can largely differ in their sizes, ranging from a few base pairs (∼50 bp) to several megabases (Ho et al., 2020). Germline SVs are a common cause of congenital disease (Sebat et al., 2007;Walsh et al., 2008;Xu et al., 2008;Cooper et al., 2011;Soemedi et al., 2012), and high levels of somatic SVs are a key signature of human cancer genomes (Yang et al., 2013;Sudmant et al., 2015). There are different possible scenarios by which SVs can cause disease. The most studied and best FIGURE 1 | Enhancer adoption as a result of different structural variants (SV). Graphical overview of how different types of SV can lead to enhancer adoption mechanisms. In the control allele (A), Gene A and Gene B are located in two neighboring TADs. In (B-C), we illustrate how different genomic rearrangements, such as inversions (B), deletions (C), or duplications (D), can remodel the 3-D chromatin landscape (through processes, such as TAD shuffling, TAD fusion, or formation of a new TAD, respectively) and increase Gene B expression due to the regulatory effects of ectopic enhancers.
understood SVs are those that directly affect coding sequences and, for example, delete, duplicate, or fuse genes. However, in many other cases, the pathogenic effect of SVs might involve changes in enhancer-gene communication, whose effects on gene expression can be best understood if 3-D genome architecture and TAD organization are taken into consideration. This topic has received great attention, and we direct the reader to some excellent reviews for more details (Krijger and De Laat, 2016;Lupiáñez et al., 2016;Spielmann et al., 2018). Briefly, to illustrate how SV can disrupt gene expression control, we discuss one example in which an inversion brings enhancers from the EPHA4 TAD to the vicinity of WNT6, located in a neighboring TAD (Lupiáñez et al., 2015). This inversion causes ectopic interactions between WNT6 and the EPHA4 enhancers, leading to a pathological gain of WNT6 expression in the developing limb and severe limb malformations. This type of pathological mechanism whereby enhancers cause the ectopic expression of non-target genes is known as "enhancer adoption" or "enhancer hijacking" (Lettice et al., 2011). Furthermore, the same inversion causes a loss of interactions between EPHA4 and its cognate enhancers (i.e., "enhancer disconnection"), leading to EPHA4 repression in the developing limb. Although, in this particular case, the loss of EPHA4 expression is not responsible for the limb malformations, there are known instances in which SV can have pathological consequences due to similar "enhancer disconnection" mechanisms (Laugsch et al., 2019). In Figures 1, 2, we graphically illustrate how different SVs can lead to enhancer adoption or enhancer disconnection mechanisms. In addition to enhancers, silencer elements are also involved in the establishment of cell type-specific gene-expression programs. Although silencers have been historically difficult to identify and characterize at a mechanistic level, several recent reports indicate that silencers are abundant in mammalian genomes (Gaszner and Felsenfeld, 2006;Doni Jayavelu et al., 2020;Ngan et al., 2020;Pang and Snyder, 2020). Moreover, these studies also indicate that at least some silencers repress gene expression by physically interacting with their target genes (Gaszner and Felsenfeld, 2006;Doni Jayavelu et al., 2020;Ngan et al., 2020;Pang and Snyder, 2020). Therefore, similarly to enhancers, the disruption of silencers or silencer-gene communication can also contribute to disease. Overall, SVs can cause disease by disrupting TAD 3-D architecture and, consequently, the communications between enhancers/silencers and genes without altering the gene or enhancer/silencer sequences. Different in silico methods and guidelines have been developed to predict and interpret the pathogenic effect of SV (Ibn-Salem et al., 2014;McLaren et al., 2016;Ganel et al., 2017;Weischenfeldt et al., 2017;Zepeda-Mendoza et al., 2017;Bianco et al., 2018;Yauy et al., 2018;Middelkamp et al., 2019;Hertzberg et al., 2020). Whereas some of these tools, such as SVScore (Ganel et al., 2017) or the Ensembl Variant Effect Predictor (McLaren et al., 2016), are restricted to gene direct effects or do not consider the patient-specific phenotype, others (Ibn-Salem et al., 2014;Zepeda-Mendoza et al., 2017;Middelkamp et al., 2019) are specifically designed to handle changes on gene-enhancer communication as well as to consider the patient's particular phenotype. Briefly, these enhancer-gene-oriented approaches typically use TAD coordinates to delimit the genomic regions and genes that could be affected by SVs due to long-range regulatory effects. Subsequently, changes in the enhancer landscape caused by the SVs are analyzed, assessing whether any of the candidate genes could be subject to either a pathological gain ("enhancer adoption") or loss ("enhancer disconnection") of function. In addition, genes predicted to become silenced due to "enhancer disconnection" can be further prioritized by mining known gene-phenotype relationships from databases, such as OMIM 1 . Among them, genes previously associated with the patient phenotypes due to coding mutations or deletions would represent the strongest candidates. Furthermore, even if an SV directly affects a gene, this does not necessarily imply any pathogenic consequence, which may instead be caused by long-range regulatory changes in the expression of other gene/s. For instance, in the case of the SHH locus, deleting LMBR1 would cause a limb malformation. However, this would not be due to the loss of LMBR1 function, but rather due to the loss of the ZRS enhancer (located at an intron of LMBR1), which, as previously described, controls SHH expression in the limb. Taking these notions into account, some in silico approaches (Ibn-Salem et al., 2014;Middelkamp et al., 2019) estimate that a considerable fraction (10-30%) of the congenital abnormalities present in patients with SVs are caused by long-range regulatory mechanisms, either on their own or together with the direct disruption of proteincoding genes. These results, together with the fact that most disease-associated variants are found within putative enhancers (Maurano et al., 2012) emphasize the relevance and usefulness of these in silico tools. Nevertheless, and as we more extensively discuss in the following section, these predictions must be taken with caution because the regulatory rules dictating the compatibility between genes and enhancers seem to be more complex than previously anticipated.

ENHANCER RESPONSIVENESS: BEING IN THE SAME TAD IS NOT ALWAYS ENOUGH
Together with 3C technologies, the development of novel genetic engineering approaches, especially the CRISPR-Cas technique, is allowing us to dissect regulatory domains with unprecedented depth and resolution. Overall, the emerging picture is that the regulatory rules governing the compatibility between genes and enhances are rather complex (Arnold et al., 2017), and simply being in the same TAD is not sufficient for functional gene-enhancer interactions to take place (Ghavi-Helm, 2019). For example, the loss of CTCF or Cohesin function in mammalian cells results in an almost complete elimination of TAD boundaries, yet this has rather subtle effects on gene expression (Nora et al., 2017;Rao et al., 2017). Similarly, the structural disruption of TAD organization in Drosophila results in moderate gene-expression changes . Most recently, work from the PCAWG Consortium shows that only 14% of TAD boundary deletions found in human tumors resulted in significant changes in the expression of nearby genes (Akdemir et al., 2020). Nevertheless, the previous findings do not necessarily imply that TADs are not functional (Galupa et al., 2020), but rather that additional regulatory layers also contribute to the specific and functional communication between genes and enhancers. These additional layers are still largely unknown, but recent studies are starting to shed some light on this relevant topic. Using multiplex reporter assays, the Stark lab has demonstrated that distinct types of gene promoters largely differ in their enhancer responsiveness, which seems to depend, at least partly, on the cofactors that are bound to them (Arnold et al., 2017;Haberle et al., 2019). In another study, the Mundlos lab showed that, upon placing a cluster of enhancers in a novel TAD with multiple genes, those whose promoters were marked with H3K27me3/Polycomb responded more strongly to the enhancers (Kraft et al., 2019). In addition, extensive rearrangements within the Shh locus showed that altering the distance between Shh and its limb-specific enhancer ZRS does not apparently matter as long as both remain within the same TAD. However, when placing an insulator/barrier between Shh and the ZRS enhancer, reducing the distance enables Shh, at least partly, to recover its ZRS-dependent expression (Symmons et al., 2016). This last observation, together with another report in which the regulatory elements controlling Xist expression have been dissected (Galupa et al., 2020), suggest that TAD boundaries are not impenetrable, but rather partially permeable barriers that do not completely insulate genes from regulatory elements located in neighboring TADs. Moreover, TAD boundary positions are not strictly conserved among all cell types, and thus, future efforts should include obtaining Hi-C maps in additional cell types and tissues (Dixon et al., 2012;McArthur and Capra, 2020). Together with cell type-specific gene-expression data, these additional Hi-C maps would help improve the predictions of how SV might affect gene expression through long-range regulatory mechanisms. Last but not least, it is important to consider that many genes display complex regulatory landscapes in which multiple enhancers control in a totally or partially redundant manner the expression of their target genes (Osterwalder et al., 2018;Ghavi-Helm, 2019). Therefore, predicting the effects that the partial loss of the enhancers controlling a particular gene might have can be challenging.
In summary, to predict the long-range regulatory effects that a given SV might have, we should ideally consider not only TADs, but also the expression patterns of the candidate genes as well other regulatory factors, such as the type of gene promoters, the distance between genes and enhancers and the complexity of the affected enhancer landscapes. Since all these regulatory layers contributing to enhancer-gene communication are not fully understood, it is very important to experimentally validate the functional consequences of medically relevant SV.

LONG-RANGE REGULATORY EFFECTS IN NEURAL CREST CELLS AS AN ETIOLOGICAL MECHANISM FOR HUMAN NEUROCRISTOPATHIES
To further illustrate the pathological relevance of SV and of long-range gene regulation, we now focus on human neurocristopathies (NCP), a group of disorders characterized by congenital malformations in anatomical structures, such as the skeletal components of the head, the heart, or the peripheral nervous system (Simões-Costa and Bronner, 2013). NCP are caused by defects occurring during neural crest (NC) development, which, consequently, represents an obvious and appropriate model for the study of these human disorders. The NC is a vertebrate-specific embryonic cell population that originates in the dorsal neural tube. Once specified, the NC progenitors undergo an epithelial-to-mesenchymal transition and acquire impressive migratory capacity. Based on their anterior-posterior origin within the neural tube, the NC is divided into four different types: cranial, vagal, trunk, and sacral. The cranial NC specifically contributes to the development of the craniofacial skeleton, several anterior structures of the eye, teeth, and cranial ganglia. The vagal NC participates in the formation of the smooth muscle of the great vessels, the cardiac septa and the enteric ganglia. The trunk NC contributes to the development of the dorsal root ganglia, the sympathetic ganglia, and the adrenal medulla. Last, the sacral NC contributes to the proper development of the enteric ganglia (Simões-Costa and Bronner, 2013). Hence, the NC cells (NCCs) contribute to the morphogenesis and function of many different organs and tissues in the vertebrate body. Given the remarkable differentiation potential of the NC, it is not surprising that human NCP include many and diverse congenital abnormalities. However, and as we describe below, despite the prevalence of human NCP, the importance of non-coding pathological mechanisms in general and of long-range regulatory mechanisms in particular has not been extensively explored for this group of disorders.
Among the congenital abnormalities associated with human NCP, craniofacial malformations are particularly prevalent and can be found in more than 700 different syndromes (Trainor, 2010). Moreover, approximately one third of all newborns with congenital anomalies display head and face alterations, which represent a primary cause of infant mortality (Trainor, 2010). Among the NC-related craniofacial malformations, orofacial clefts are the most common ones, with a prevalence of 1 in 800 live births worldwide (Rahimov et al., 2012). The high incidence of craniofacial abnormalities is also seen among patients with congenital anomalies carrying SVs: 140 out of 273 patients (51.3%) described by Redin et al. (2017) display head, neck, or craniofacial defects. In addition to craniofacial malformations, which are mostly caused by defects during cranial NC development, human NCP include a broad range of abnormalities in other tissues and organs due to defects in other NC types (see Vega-Lopez et al. (2018) for an extensive review of NCP). For example, heterotaxy syndrome, a condition that causes a complex congenital heart disease, can be triggered by alterations in cardiac NCCs. On the other hand, congenital central hypoventilation syndrome (CCHS) is caused by impairments in the trunk NCC, and it is characterized by autonomic nervous system defects, shallow breathing, and the development of tumors (e.g., neuroblastoma). Last, one example of a syndrome caused by defects in cranial NCCs is branchiooculo-facial syndrome (BOFS), which is characterized by several facial, ocular, hearing, and cutaneous anomalies. Previous studies show that BOFS is caused by heterozygous mutations or deletions that alter the coding sequence of the TFAP2A gene, which encodes for a transcription factor considered a NC master regulator (Milunsky et al., 2011(Milunsky et al., , 2008. Interestingly, we recently described a BOFS patient who, in contrast to all previously reported cases, had two intact TFAP2A alleles. Instead, this patient presented a long heterozygous inversion that led to the physical disconnection between one of the TFAP2A alleles and its cognate NC enhancers, which resulted in TFAP2A monoallelic and haploinsufficient expression in cranial NCC (Laugsch et al., 2019; Figure 2). Hence, although this patient is still a rather isolated case, it illustrates that SV can cause NCP through longrange regulatory mechanisms.
In addition to the unique BOFS patient explained above, previous studies have described genetic changes in non-coding regulatory sequences as the possible cause for various NCP (Amiel et al., 2010). The mutation of an enhancer located at the first intron of the RET gene was associated with Hirschsprung disease susceptibility (Emison et al., 2005), an NCP caused by the failure of enteric NCCs to colonize the intestine (Vega-Lopez et al., 2018). Pierre Robin sequence (PRS), a neurocristopathy caused by abnormal cranial NCC development and characterized by craniofacial alterations, has been associated with deletions and point mutations of enhancers surrounding SOX9 (Benko et al., 2009). Moreover, in addition to coding mutations within SOX10 and other genes involved in Waardenburg syndrome, this human NCP might also be caused by alterations in enhancers surrounding SOX10 (Bondurand et al., 2012;Lecerf et al., 2014). It is worth noting that the functional characterization and pathological relevance of these previously studied enhancers and the mutations therein were largely based on reporter assays. Although these assays provide important information about the enhancer activity of a given DNA sequence, they do not directly address whether an enhancer (and mutations therein) contribute to the expression of its predicted target gene and, thus, to the etiology of the associated human disorder. For instance, a single nucleotide polymorphism (SNP) that falls in an enhancer can alter its activity by affecting a transcription factor binding site, which can be detected by reporter assays. However, this SNP might not affect the expression of the predicted target gene due to compensatory effects of redundant enhancers (Osterwalder et al., 2018) or, considering the difficulties to assign enhancers to their target genes, might even control the expression of some other gene. The emergence of novel genome editing techniques, such as CRISPR-Cas, can largely overcome these limitations as endogenous enhancer loci can be genetically modified with high efficiency. In addition, many of the previous studies used animal models, such as mice, zebrafish, or chicken, which have been historically essential to molecularly characterize the neural crest and to dissect its gene regulatory networks (Sauka-Spengler and Bronner-Fraser, 2008;Green et al., 2015). However, when it comes to human congenital disorders in general and human NCP in particular, model organisms do not always faithfully recapitulate the phenotypes observed in human patients (Mestas and Hughes, 2004). On the other hand, it is important to mention that, in addition to enhancers, silencers should also be considered when investigating the role of non-coding regulatory sequences in human NCP (Doni Jayavelu et al., 2020;Ngan et al., 2020;Pang and Snyder, 2020). For instance, using insertional mutagenesis in mice, silencer elements contributing to the inactive state of Fam162b (Bergeron et al., 2015) and Nr2f1 (Bergeron et al., 2016) in NCC were identified. Notably, disruption of those silencers relieved the repression of Fam162b and Nr2f1 in NCC, which ultimately caused the emergence of phenotypes resembling those observed in Hirschsprung's disease (Bergeron et al., 2015) and Waardenburg syndrome (Bergeron et al., 2016), respectively.

USING IN VITRO-DERIVED HUMAN NEURAL CREST CELLS TO MODEL NEUROCRISTOPATHIES: METHODS, ADVANTAGES, AND LIMITATIONS
Model organisms, especially mice, have been extensively used to investigate the etiological mechanisms of human disease. Mouse models, in particular, offer a set of optimized and robust genetic and molecular tools that can be used to investigate developmental processes and/or disease progression in an in vivo context. Consequently, work in mice and other animal models (e.g., zebrafish, chicken) has been essential to understand complex developmental and morphological processes, such as those that occur during neural crest and craniofacial development and that get disrupted in human NCP (Sauka-Spengler and Bronner-Fraser, 2008;Cordero et al., 2011). However, there are important differences between mice and humans (Mestas and Hughes, 2004), for example, in gene dosage sensitivity: For many developmental genes implicated in human congenital disorders (including NCP), humans, but not mice, are haploinsufficient. This is well illustrated by BOFS: In humans this NCP is caused by heterozygous mutations/deletions in TFAP2A, and Tfap2a +/− mice appear as morphologically normal (note that Tfap2a −/− display a severe BOFS-like phenotype) (Schorle et al., 1996;Zhang et al., 1996;Brewer et al., 2004;Milunsky et al., 2011Milunsky et al., , 2008Leblanc et al., 2013;Li et al., 2013). Therefore, when these differences in gene dosage sensitivity are encountered, the pathological mechanisms of congenital disorders should be ideally investigated in human cellular models. In addition, working with adult tissues to study congenital disorders is not fully appropriate because gene regulatory programs significantly differ between embryonic and adult stages, not to mention the even higher differences that exist between different or unrelated cell types/tissues. Taking all this into consideration, human NCC represent a relevant model to investigate human NCP.
However, having access to NCC is not easy due to their embryonic and migratory nature, which makes their isolation a difficult task. This is especially problematic in humans due to obvious ethical restrictions that limit the accessibility to human embryos. To overcome these limitations, several labs have established robust in vitro differentiation protocols that allow us to obtain NCC from human embryonic stem cells (hESC) or human induced pluripotent stem cells (hiPSC) (Bajpai et al., 2010;Menendez et al., 2013;Mica et al., 2013;Prescott et al., 2015;Fattahi et al., 2016;Huang et al., 2016;Hackland et al., 2017;Tchieu et al., 2017;Frith et al., 2018;Laugsch et al., 2019). These differentiation protocols can be broadly divided into those involving an intermediate embryoid body step and those in which the hESC/hiPSC are more directly differentiated into NCC. Each type of differentiation has its own advantages and disadvantages. For example, passing through an embryoid body step recapitulates important stages of NC differentiation, such as the epithelial-to-mesenchymal transition (EMT), whereby neural crest progenitors delaminate from the dorsal neural tube. On the other hand, more direct NCC differentiation protocols are faster and result in more homogenous cell populations. Regardless, these methods have proven to be useful to study both human NC development as well as the pathomechanisms of human NCP (Bajpai et al., 2010;Menendez et al., 2013;Mica et al., 2013;Prescott et al., 2015;Fattahi et al., 2016;Huang et al., 2016;Hackland et al., 2017;Tchieu et al., 2017;Frith et al., 2018;Laugsch et al., 2019). Nevertheless, these NC in vitro differentiation systems have some obvious and important limitations since the complexity and precision of in vivo embryogenesis can not be fully recapitulated, especially the morphogenesis of complex NCC-derived structures (e.g., palate) or the interactions that NCC established with their surrounding during embryo development. Therefore, in vitro-derived hNCCs do not represent the only or most appropriate model to study human NCP. Instead, each model has certain advantages as well as pitfalls that should be acknowledged when considering the best experimental strategy to investigate a particular NCP. In most cases, the combination of several models might be the best option as this can maximize the advantages and reduce the limitations of each individual model. In this regard, it would be beneficial to implement 3-D organoid culture systems (Lancaster and Knoblich, 2014), whereby hESC/hiPSC can be used to more faithfully recapitulate human craniofacial structures in vitro.

PRACTICAL GUIDELINES TO INVESTIGATE NEUROCRISTOPATHIES CAUSED BY STRUCTURAL VARIANTS AND INVOLVING LONG-RANGE REGULATORY MECHANISMS
Public repositories offer an increasing amount of functional genomic data obtained from in vitro-derived human NCC or early human embryonic tissues with a NC origin, which together represent a highly valuable resource to unravel the etiology of many NCP. Currently, these data sets provide information about gene-expression levels, epigenetic profiles, and enhancer maps in cranial NCC (derived in vitro) and craniofacial embryonic tissues (Rada-Iglesias et al., 2012;Prescott et al., 2015;Gerrard et al., 2016;Wilderman et al., 2018;Laugsch et al., 2019). Hence, there is still a clear need for genomic information from more posterior NC types in order to improve our understanding of the full repertoire of human NCP. In principle, gene expression profiles (1) If the SV is amenable to CRISPR-cas engineering (CRISPRable), the disease modeling can be done through (i) genome editing in animal models, if such models display the same gene dosage sensitivity as humans; (ii) genome editing of WT hESC/hiPSC; (iii) derivation of patient-specific hiPSC.
(2) If the SV is not amenable to CRISPR-cas engineering (not CRISPRable), the derivation of the patient-specific hiPSCs is the only option. Once the previous disease models are established, a phenotypic and molecular characterization can be performed.
Frontiers in Genetics | www.frontiersin.org and enhancer maps can be combined with Hi-C data (i.e., TAD maps) in order to identify gene regulatory domains during NC development (as detailed in see section "Pathological Disruption of Regulatory Domains by Structural Variants"). Unfortunately, Hi-C data is still not available for either human NCC or NCderived embryonic tissues. This will be hopefully solved in the near future because TADs might be more variable among cell types/tissues than previously anticipated (Dixon et al., 2012;McArthur and Capra, 2020). Furthermore, smaller topological domains with important regulatory functions (e.g., sub-TADs) tend to show higher tissue specificity (Berlivet et al., 2013;Kragesteen et al., 2018;Beagan and Phillips-Cremins, 2020). Nevertheless, currently available Hi-C maps (Wang et al., 2018) generated in different human cell types can still be used to infer regulatory domains in the NC. For example, Hi-C maps derived from hESC helped to define the TFAP2A regulatory domain in hNCC and to predict the pathomechanism whereby an inversion causes BOFS (Laugsch et al., 2019).
We now provide some practical guidelines that can be used to uncover the pathological mechanisms whereby SVs can cause NCP (Figure 3).
• First, if a patient with a diagnosed NCP and harboring a SV is encountered, it is essential to map the SV breakpoints with base-pair resolution. For this purpose, different methods and tools can be used (Zhao et al., 2013;Ugur Sezerman et al., 2019), such as BreakDancer (Chen et al., 2009), an algorithm for high-resolution mapping of genomic structural variation. • Then, the functional genomic data (e.g., cell type-specific gene-expression levels, epigenetic profiles, cranial NCCs, and craniofacial embryonic tissue enhancer maps, etc.) described in the previous paragraph can be used to map gene regulatory domains in NC and NC-derived tissues, which, together with in silico prediction approaches (as the ones described in see section "Pathological Disruption of Regulatory Domains by Structural Variants"), enable the prediction of the pathological mechanisms causing the NCP. Initially, these approaches can assess if the SV directly disrupt a gene/s and/or if they might involve longrange regulatory mechanisms (e.g., enhancer adoption, enhancer disconnection). • Next, if any high-confidence pathomechanism is predicted, experimental validations should be carried on. This is especially important in the case of long-range mechanisms due to our still limited capacity to predict the functional relevance of enhancers and to assign them to the correct target genes. Regarding the experimental validation, two main alternatives can be considered: (i) SV amenable to engineering with genome editing tools (referred as "CRISPRable" in Figure 3): the patient's SV can be recapitulated in a model organism, such as mice, to evaluate the molecular and phenotypic consequences during NC development (Lupiáñez et al., 2015;Kragesteen et al., 2018). This requires that the regulatory domain/s potentially disrupted by the SV are evolutionary conserved and that there are no differences in dosage sensitivity for the potentially relevant genes. If these requirements are not fulfilled, the patient's SV can be introduced into wild-type (WT) hESC/hiPSC that can be then differentiated into NCC and extensively characterized at the molecular (e.g., gene expression, 3-D chromatin structure) and cellular level (e.g., migration, differentiation into NC derivatives). Importantly, both mice and hESC/hiPSC with engineered SV can be compared with isogenic WT controls. (ii) SV not (easily) amenable to engineering with genome editing tools (referred to as not CRISPRable in Figure 3): Due to their complexity (multiple breakpoints), very long sizes or type (i.e., translocations), some SV cannot be efficiently engineered using currently available tools. Although technical advances might overcome these limitations in the future (Jiang et al., 2016;Torres-Ruiz et al., 2017), a good alternative to study these SV consists of obtaining patient fibroblasts that can be then reprogrammed into hiPSC (Takahashi et al., 2007). Subsequently, the patient-specific hiPSC can be differentiated into NCC and characterized as described above. This strategy was followed to study the BOFS patient described previously (Laugsch et al., 2019), in which an inversion causes TFAP2A haploinsufficiency in NCC by disconnecting one of the TFAP2A alleles from its NCC-specific enhancers. One limitation of the use of patient-specific hiPSC is that isogenic WT controls are not readily available. In principle, this could be overcome by repairing the SV, which, unfortunately, might be rather difficult for certain SV, or by using parental controls.

CONCLUSION
The study of the NC constitutes an essential step to advance in the comprehension of human development and human congenital disease. Using already available genomic data and various experimental strategies it should be possible to discover new pathological mechanisms causing NCP and involving alterations in long-range gene regulation. The proposed practical guidelines to investigate the pathological consequences of SV can be applied beyond NCP with the ultimate goal of improving the diagnosis, counseling, and even treatment of human congenital disorders.

AUTHOR CONTRIBUTIONS
VS-G and AR-I wrote the manuscript. MM-F prepared the figures and corrected the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
VS-G is supported by a Ph.D fellowship from the University of Cantabria (Spain). AR-I is supported by the "Programa STAR-Santander Universidades, Campus Cantabria Internacional de la convocatoria CEI 2015 de Campus de Excelencia Internacional" (Spain).