Comparative Genomics Identifies Putative Interspecies Mechanisms Underlying Crbn-Sall4-Linked Thalidomide Embryopathy

The identification of thalidomide–Cereblon-induced SALL4 degradation has brought new understanding for thalidomide embryopathy (TE) differences across species. Some questions, however, regarding species variability, still remain. The aim of this study was to detect sequence divergences between species, affected or not by TE, and to evaluate the regulated gene co-expression in a murine model. Here, we performed a comparative analysis of proteins experimentally established as affected by thalidomide exposure, evaluating 14 species. The comparative analysis, regarding synteny, neighborhood, and protein conservation, was performed in 42 selected genes. Differential co-expression analysis was performed, using a publicly available assay, GSE61306, which evaluated mouse embryonic stem cells (mESC) exposed to thalidomide. The comparative analyses evidenced 20 genes in the upstream neighborhood of NOS3, which are different between the species who develop, or not, the classic TE phenotype. Considering protein sequence alignments, RECQL4, SALL4, CDH5, KDR, and NOS2 proteins had the biggest number of variants reported in unaffected species. In co-expression analysis, Crbn was a gene identified as a driver of the co-expression of other genes implicated in genetic, non-teratogenic, limb reduction defects (LRD), such as Tbx5, Esco2, Recql4, and Sall4; Crbn and Sall4 were shown to have a moderate co-expression correlation, which is affected after thalidomide exposure. Hence, even though the classic TE phenotype is not identified in mice, a deregulatory Crbn-induced mechanism is suggested in this animal. Functional studies are necessary, especially evaluating the genes responsible for LRD syndromes and their interaction with thalidomide–Cereblon.


INTRODUCTION
Since the 1960s, when thalidomide teratogenicity was discovered, many attempts have been tried to identify its molecular mechanisms, aiming to use this knowledge to avoid new cases of the embryopathy. However, fully understanding how thalidomide causes malformations remains an urgent challenge (Asatsuma-Okumura et al., 2019).
One of the many aspects that intrigue scientists is the variability of thalidomide teratogenesis among species ( Table 1). The most commonly used animal models, rat and mouse, do not develop the classic thalidomide embryopathy (TE) phenotype, identified in a range of organisms including zebrafish (Siamwala et al., 2012), chicken (Therapontos et al., 2009), frogs (Fort et al., 2000), rabbits (Fabro et al., 1967), sea urchins (Reichard-Brown et al., 2009), and opossum (Sorensen et al., 2017). Mice were tested even in a dose of 4,000 mg/kg and did not develop TE, whereas humans are sensitive to its teratogenesis in a dose of 0.5 mg/kg (Cahen, 1966;. Other insensitive species are bushbabies, pigs, and cats . However, the original studies with bushbabies and pigs have a small sample (n = 4) (Jonsson, 1972;Butler, 1977), and experiments with pregnant cats treated with thalidomide resulted in fetal loss and a variety of cardiovascular anomalies in the offspring (Khera, 1975).
Molecular mechanisms previously identified in thalidomide are different across species, such as the induction of oxidative stress (Hansen and Harris, 2004), driven by several genes including NFKB1, NFKB2, DKK1, GSTP1, and GSS. Other mechanisms for thalidomide teratogenesis have been proposed, such as anti-angiogenesis, which led to several studies evaluating the role of the vascular endothelium mainly expressing genes (KDR, VEGFA, and CDH5) and nitric oxide synthase genes NOS2 and NOS3 (Therapontos et al., 2009;Siamwala et al., 2012). All the studies have aimed to understand how the molecular disruptions directed by thalidomide could impact in limb development (Hansen and Harris, 2004;Therapontos et al., 2009;Ito et al., 2010;Siamwala et al., 2012). However, only recently, Donovan et al. (2018) and Matyskiela et al. (2018) elucidated an important mechanism that has helped in the understanding of this mystery: Cereblon, a protein which is a thalidomide primary target (Ito et al., 2010), mediates SALL4 degradation (Donovan et al., 2018;Matyskiela et al., 2018).
SALL4 is a transcription factor (TF) involved in limb development; in humans, loss-of-function mutations in SALL4 cause Duane-radial ray syndrome (Kohlhase et al., 2002), an autosomal recessive condition characterized by limb anomalies, similar to the ones observed in TE (Kohlhase et al., 2003). Because of the pattern of anomalies, TE is considered a phenocopy of other genetic syndromes, mainly characterized by limb reduction defects (Cassina et al., 2017), caused by mutations in ESCO2, TBX5, RBM8A, and RECQL4 genes (Li et al., 1997;Vega et al., 2005;Van Maldergem et al., 2006).
Recently, it was proposed that, in mice, variants in Crbn and Sall4 genes interrupt the degradation of Sall4 protein mediated by Cereblon-thalidomide (Donovan et al., 2018;Matyskiela et al., 2018). In this scenario, genetic alterations in both SALL4 and CRBN would be pivotal for thalidomide teratogenic activity (Gao et al., 2020). However, only a few species were evaluated in the mentioned research. Besides SALL4, other neosubstrates have recently been proposed, including limb development genes such as MEIS2 (Fischer et al., 2014) and TP63 (Asatsuma-Okumura et al., 2019).
Before it was identified as a thalidomide primary target (Ito et al., 2010), a Cereblon stop-codon mutation was implicated in a mild mental retardation autosomal recessive disorder (Higgins et al., 2004). Notwithstanding, its canonical function is related to a E3-ubiquitin-ligase complex, which comprises DDB1, ROC1, and CUL4A proteins as well; Cereblon acts as a substrate receptor, ubiquitinating proteins to be degraded (Angers et al., 2006). In mouse brains, Cereblon mRNA was detected especially in serotoninergic and adrenergic neurons, as well as in hippocampal and cerebellum regions (Aizawa et al., 2011). Worthy of note is that Crbn-null mice are viable and do not present major malformations (Rajadhyaksha et al., 2012;Lee et al., 2013); however, memory and learning deficits are apparent in these animals (Bavley et al., 2018). Because of Cereblon binding and the outstanding immunomodulatory capacity of thalidomide, the drug has been implicated in the treatment of neurodegenerative and neuropsychiatric disorders; however, thalidomide neuroimmunomodulatory mechanisms do not appear to be Cereblon-dependent and probably occur through TNFa mRNA degradation (Jung et al., 2021).
Notwithstanding, TE in humans is considered heterogeneous, because the phenotypic presentation is variable, an effect that is not explained only by SALL4 or CRBN variants (Gomes et al., 2019;Kowalski et al., 2020). Considering TE intra-and interspecific variability, the aim of this research was to perform a comparative analysis in the main proteins already associated with thalidomide mechanisms of action, across different species, affected or not by classic TE. We have evaluated 42 genes, including SALL4, regarding synteny, neighborhood, and protein conservation, which might be relevant in the understanding of TE divergences. Moreover, we also performed differential gene expression and co-expression analyses, focused in the mice to better comprehend the thalidomide action mechanisms in this species; we combined these bioinformatics analyses with systems biology strategies, to evaluate the main targets of thalidomide exposure in mice and rats.

MATERIALS AND METHODS
A full description of the performed analyses is available in Supplementary File 1. Figure 1A presents a step-by-step scheme of the analysis.

Species and Gene Selection
Fourteen species were included according to previous reports regarding thalidomide exposure during embryo development ( Table 1). Forty-two candidate genes were selected from previous studies in animal models; hence, only genes previously accessed by in vitro or in vivo experiments were included (D'Amato et al., 1994;Hansen and Harris, 2004;Therapontos et al., 2009;  Thalidomide is teratogenic in sheep NA Ruckebusch, 1983 Rabbit (Oryctolagus cuniculus) Thalidomide induces limb anomalies and embryo lethality in in Chinchilla and New Zealand white rabbits.

Hamster (Mesocricetus auratus)
Thalidomide is teratogenic in certain inbred strains of hamsters, including Mesocricetus auratus NA Homburger et al., 1965 Rat (Rattus norvegicus) Thalidomide does not induce limb anomalies in rats. Lenalidomide had no effect in rat.

Comparative Genomics Analysis
Synteny, neighborhood reports, and comparison of paralog amount were done for all 42 genes based on genomic databases. The main comparative method used was searching for differences present in the genetic maps and sequences of 12 affected vs. two non-affected species (Table 1 and Figure 2). Thus, speciesspecific variations were not described. We retrieved transcript sequences from Ensembl release 100 and/or BLAST on NCBI. We performed alignments using MUSCLE (Edgar, 2004) in MEGA 7 (Kumar et al., 2016). gnomAD was assessed to verify variants in humans. Functional prediction was performed in VarSome. PubTator, Ensembl, and PharmGKB databases were used to evaluate clinical and/or pharmacogenetic associations.

Differential Co-expression Analysis
Differential co-expression analysis was performed in the GSE61306 study (Gao et al., 2015), obtained from the GEO database. RNA from mESC was collected 24, 48, and 72 h after thalidomide or saline exposure. Confirmatory analyses were performed in valproic acid, retinoic acid, or warfarinexposed mESC, available in the ArrayExpress database (accession numbers: E-TABM-903, E-TABM-1205, and E-MTAB-300). DCGL R package was used for differential co-expression analysis . Gene-gene co-expression was evaluated using Pearson correlation, including only Pearson's r of at least |± 0.5| in exposed or control samples (Mukaka, 2012). P-values of < 0.05 were included, after false discovery rate (FDR) adjustment. For the TF analysis, we incorporated the TRRUST database (Han et al., 2015) in the DCGL package, to obtain Mus musculus interactions. Co-expression networks were assembled Cytoscape v.3.7.2. Heatmaps were generated using diffcoexp and ggplot2 R packages.

Synteny, Neighborhood, and Gene Copy Number Differences Across Species
Twelve species who develop TE and two non-affected (mouse and rat) were compared as two groups, affected vs. unaffected. Except for gene copy number differences, the results are  Knobloch et al., 2007;7 D'Amato et al., 1994;8 Vargesson, 2009;9 Sampaio et al., 1991. related to differences of these groups. Hence, our synteny and neighborhood analysis either was conserved for all species or did not show a pattern of conservation between the groups. To illustrate, the vicinity of SHH in mouse and rat only has five downstream shared genes, while 20 upstream genes are divergent between these rodents. Consequently, we could not find any reasonable relation to thalidomide teratogenesis in such cases. In contrast, an outstanding difference between the upstream neighborhood of the NOS3 gene in sensitive vs. insensitive species was identified: at least 20 genes composing that region differ from other species, in the same region (Figure 2). The rodents, except the golden hamster, share a subset of genes (Supplementary Table 1). The rabbit, exclusively, has the following genes related to the TE phenotype at that region: WDR60, ESYT2, NCAPG2, UBE3C, LMBR1, RNF32, SHH, RBM33, and NOM1. Other sensitive species share GIMAPs (GTPase, IMAP Family Members), which may be associated with Behcet's syndrome, a condition treated with thalidomide (Shek and Lim, 2002), as well as cancer-related genes, such as RARRES2 and TMEM176A (Supplementary Table 2).
We found a great difference between the number of copies of some genes related to metabolism among the animals, being affected or not by TE (Supplementary Table 3). CYP2C19, the main gene responsible for thalidomide metabolism (Teo et al., 2000), has 10 tandem copies (7 paralogs > 70% id) in the mouse, while only five (>76% id) in humans. CYP3A4 and CYP3A5 also vary in number of copies; mice have 9 (>68% id), while humans have 5 (>75% id) (Supplementary Table 4). There is a core set of 103 and 77 CYPs (Cytochrome P450's-seed PF00067-pFam) in mouse and human genomes ( Supplementary  Table 5), respectively, enzymes involved directly in the oxidative metabolism.

Gene Alignments Revealed Exclusive Unaffected Species' Variants
Considering that thalidomide is not a selective pressure factor, positive selection sites were not searched. Thus, even a neutral substitution for intrinsic processes of the organisms could add some effect to molecular interactions of this medication into the cell. As a result, we found 299 amino acid substitutions exclusive in insensitive species in 34/42 analyzed genes. Proteins with a higher number of variants in the insensitive group were RECQL4 (65), SALL4 (35), CDH5 (22)

Human Genetic Variability Shares Exclusive Unaffected Species' Variants
For all the gene alignments, the reference genome sequence of each species was used. However, humans have great genetic variability; hence, missense alterations are known to exist in the human species, without being contemplated in the reference genome sequence. Based on this reasoning, all the variants exclusive for insensitive species, according to the comparative sequence alignment, were evaluated in the gnomAD database, aiming to identify similar alterations in humans.
Twenty-four variants, which led to the same missense alterations observed in mice and rats, were encountered in gnomAD registers, distributed across 13 genes (Supplementary Table 7). One missense substitution, p.P72A in Trp53 of mice and rats, has been registered to occur after two possible genomic alterations in humans: (I) rs587782769, changing the first nucleotide of the codon; and (II) rs 1042522, modifying the second nucleotide of the codon. In mice and rats, the variant occurs by the exchange of the first nucleotide and the last nucleotide of the codon, being CCC in humans and GCA in mice and rats.
Functional prediction was performed in the VarSome database according to the criteria established by the American College of Medical Genetics (Richards et al., 2015). Fourteen variants were predicted as "likely benign" and 10 as "variants of unknown significance." The variant rs587782769 of TP53 was the only one predicted as "likely pathogenic, " because of previous associations with Li-Fraumeni syndrome, a predisposing syndrome of hereditary cancer. The other TP53 variant that leads to p.P72A, rs 1042522, is of unknown significance in Li-Fraumeni syndrome. However, rs1042522 was the only variant identified with high frequency in human populations, having a global minor allele frequency of 0.6629. This was also the only alteration to be registered in the PharmGKB database, with a pharmacogenetic association for different drugs, all antineoplastic agents. The CRBN gene presented one variant of unknown significance, and SALL4 presented two likely benign variants.
Overall, some variants encountered in insensitive species are reported as exclusive for these animals; however, when evaluating the immense variability of the human genome, it was possible to identify the same alterations in humans.

Crbn Identified as a Deregulator of Gene-Pair Co-expression
The original results for the differential gene expression analysis are available in Gao et al. (2015). Here, we evaluated the co-expression of gene pairs through Pearson correlation, aiming to identify whether this correlation is disrupted in thalidomide exposure. Forty-two genes previously selected, and the genes identified to be exclusive of the insensitive species in the evolutionary analysis, were included, totalizing 71 genes. A heatmap comprising the Pearson correlation for all the FIGURE 3 | Pearson correlation for differentially co-expressed gene pairs in controls (A) and thalidomide (B) exposed cells. Networks of differentially co-expressed gene pairs for controls (C) and thalidomide (D) exposed cells. Legend: (A,B) positive correlations represented in red; negative correlations in blue; absence of correlation in white. (C,D) genes that drive co-expression are represented in pink nodes; the other gene that composes the pair is represented in blue nodes. Edges' thickness represents Pearson's r correlation coefficient; gray edges = switched opposites (r change > 1); orange edges = differentially signed (r change < 1, but altering correlation from positive to negative, or contrariwise); green edges = same signed (correlation was changed, but maintained as positive or negative). Figure 3A (for controls) and Figure 3B (for thalidomide exposure). Evaluating the red and blue areas, it is possible to visualize stronger expression correlations in controls, when compared to the white areas of the thalidomide-exposed sample heatmap. Nevertheless, aiming for a more robust statistical evaluation, a differential co-expression analysis was performed. This strategy aims to identify patterns of deregulation between genes, which could point to a dysfunctional pathway or mechanism and prioritize the main candidates in a list of hypothesized genes (de la Fuente, 2010; Savino et al., 2020).

gene pairs obtained is available in
Control vs. thalidomide-exposed mESC were evaluated, to determine the gene pairs with differential co-expression, and the gene of the pair that is probably driving this differential co-expression. As a result, 49 gene pairs differentially coexpressed were significant. The network of differential coexpression correlations can be visualized in Figure 3C (controls) and Figure 3D (thalidomide-exposed). Only Crbn, Cyp2c68, and Rbm8a genes were considered as significant drivers of the co-expression after FDR P-value adjustment (Supplementary Table 8). In 19 gene pairs, Crbn acted as the driver of this coexpression correlation; the other genes in the pairs are associated with limb reduction defects' syndromes.
In regard to Crbn-Sall4 Pearson correlation, it is set in r = 0.75 in control samples and r = -0.48, after thalidomide exposure. Hence, a positive correlation is altered to negative, with a correlation difference of 1.23. The biggest correlation difference encountered when Crbn was a co-expression gene driver was 1.37, for the Crbn-Recql4 pair.
To better comprehend the co-expression correlations obtained, the same thalidomide-affected genes were analyzed in mESC cells exposed to three different teratogens, separately: valproic acid, retinoic acid, or warfarin. No statistically significant association was identified for evaluated genes (Supplementary Table 9). Another differential co-expression analysis was performed in the thalidomide-exposed mESC array (GSE61306), using a different set of genes, from the NF-kB pathway, known to be affected by thalidomide (Hansen and Harris, 2004). Five genes were identified as co-expression drivers, providing 70 gene pairs differentially co-expressed in control vs. thalidomide-exposure samples (Supplementary Table 10). These results indicate that the gene selection performed was careful and relevant to evaluate thalidomide-induced mechanisms; it also suggests that the positive association between Crbn and the limb reduction defects' genes is not spurious.
In sequence, we evaluated which TFs could be regulators of the differentially co-expressed gene pairs. The results of this strategy are available in Supplementary Figure 1. Ikzf1, a C2H2 transcription factor degraded by thalidomide, is proposed as a driver of the co-expression, forming gene pairs with Dkk1, Vegfa, and Kdr. These genes have three other TFs in common: Bmi1, Ets1, and Runx2 (Supplementary Table 11), hence being suggested as potential regulators that may also alter the candidate gene expression.
In summary, differential co-expression analysis provided insights regarding thalidomide effects in mESC. First, Crbn drives the co-expression of other genes implicated in genetic, non-teratogenic, limb reduction defects, such as Tbx5, Esco2, Recql4, and Sall4; a fifth gene implicated in limb reduction defects, Rbm8a, was also considered a driver of the co-expression correlation. Second, Crbn and Sall4 have a moderate coexpression correlation, which is affected in thalidomide exposure, even though the typical TE phenotype is not identified in mice. Finally, Ikzf1 might be a regulator of two essential genes for angiogenesis: Vegfa and its receptor Kdr.

DISCUSSION
A comparative genomics analysis was associated with an evaluation of gene-gene co-expression to identify candidate genes for thalidomide teratogenesis variability across species. Twenty genes upstream NOS3 were different when comparing sensitive and insensitive species neighborhoods. Co-expression analysis suggests that Crbn drives the co-expression of limb reduction defects' genes, including Sall4, in mESC exposed to thalidomide. This might indicate a deregulation between Crbn-Sall4 even in a species known to be insensitive to thalidomide exposure.
During more than 50 years of research, properties of thalidomide and its analogs have been elucidated through animal model studies. Some mechanisms such as anti-angiogenesis (D'Amato et al., 1994;Therapontos et al., 2009) and oxidative stress (Hansen and Harris, 2004), evaluated in these studies, have an extremely relevant role in the whole understanding of the TE specificity. Mammals have an additional layer of complexity: the maternal-fetal interface, the mother metabolism genes affecting the drug availability and placental transfers. Recently, a new light was brought to the understanding of the TE scenario, due to the discovery of SALL4 degradation mediated by Cereblon-thalidomide (Donovan et al., 2018;Matyskiela et al., 2018). Mouse resistance to the TE development was justified by the presence of missense variants both in CRBN and in SALL4 (Donovan et al., 2018;Matyskiela et al., 2018). In the present study, we identified differences among sensitive and insensitive animals in all mechanisms studied associated with thalidomide teratogenic effects, which could explain the heterogeneous outcome in the embryos, as we reported in humans (Kowalski et al., 2020).
Anti-angiogenesis is one of the most studied properties in TE. Our analysis of differential regulation suggests a co-expression correlation between an angiogenesis gene, Vegfa, and its receptor, Kdr, driven by the Ikzf1 gene. Ikaros, the protein encoded by Ikzf1, degradation is demonstrated to be an important event in thalidomide therapeutics in multiple myeloma (Krönke et al., 2014). Furthermore, thalidomide anti-angiogenic property is suggested as independent of Cereblon binding Peach et al., 2020). Hence, the differential co-expression correlations and variants encountered in insensitive species might help to understand thalidomide anti-angiogenic effects in these animals.
Differences in anti-angiogenesis genes also have a relevant role in the whole understanding of TE specificity. In previous studies, evaluating individuals with TE, our group identified variants of susceptibility in NOS3 (Vianna et al., 2013;Kowalski et al., 2016). Our synteny and neighborhood analyses demonstrated nearby genes to NOS3 differ in unaffected rodent species, rabbit species, and other TE species. Eight genes close to NOS3 in the rabbit are involved in conditions similar to TE, including acheiropody and polydactyly. A hypothesis is that alterations in the pattern of expression of these genes during development could favor the establishment of TE characteristics, considering TE conserved regulatory mechanisms. Interestingly, van den Boogaard et al. (2019) identified, in mice, a transcribed distal enhancer involved in the KCNH2 gene regulation (the first gene upstream to NOS3, present in all species in Figure 2). In vitro knockdown of the ncRNA enhancer transcript results in reduced expression of Kcnh2b and two neighboring mRNAs, Nos3 and Abcb8. Also, multiple partially redundantly active enhancers interfere in the complex regulatory landscape of Kcnh2. However, this evidence of a connection between the regulation and expression of NOS3 immediately nearby genes refers to the syntenic region described in our results. Whether TE-tested species' variable NOS3 upstream regions share similar regulatory relationships according to their genes content remains to be explored.
Historical reports estimated that 20-50% of the human embryos exposed to thalidomide actually developed TE (Newman, 1986). Our studies have reported variants in the TE individuals when compared to Brazilians without congenital anomalies (Vianna et al., 2016;Gomes et al., 2019;Kowalski et al., 2020). The rare variants we identified in the canonical sequences of insensitive species were absent in that TE sample. The polymorphism rs1042522 of TP53 was genotyped in TE and unaffected individuals; however, its frequency was similar in both groups (Gomes et al., 2017). Despite that, rs1042522 is a functionally well-characterized polymorphism (Dumont et al., 2003) that could also help to explain interspecific variability to thalidomide-teratogenesis. p53 is implicated as a regulator of the embryo's susceptibility to teratogens (Torchinsky and Toder, 2010). Here, we demonstrate three variants in CRBN and 35 alterations in SALL4 which are exclusive in mice and rats. The implication of these variants in Cereblon-thalidomide-induced degradation must be further evaluated.
The gene-specific approach used here to evaluate potential regulators in the co-expression analysis is the most fine-grained method in this type of analysis, usable for identification of genes that more strongly change their connectivity with other genes (Savino et al., 2020). In cancers, this approach has been used to prioritize therapeutic targets (Savino et al., 2020). Global analysis could be useful for the selection of new gene candidates; however, the prioritization of candidate regulators by biological relevance improves the chance of identifying true associated factors . Here, this prioritization was provided by accessing only genes that had been previously, experimentally validated in thalidomide exposure. Hence, differential co-expression analysis proposed Crbn as a gene that drives co-expression when evaluating different gene pairs.
In humans, ESCO2, TBX5, and RECQL4 cause Roberts, Holt-Oram, and Baller-Gerold syndromes, respectively (Li et al., 1997;Vega et al., 2005;Van Maldergem et al., 2006). Besides, mutations in Rbm8a lead to thrombocytopenia-absent radius syndrome (Albers et al., 2012). All these genetic conditions are phenotypically very similar to TE (Cassina et al., 2017) and require differential diagnosis (Mansour et al., 2019). Despite being an exploratory result, this is the first suggestion of a thalidomide-driven effect in these genes, which might help to explain the phenotype similarities. It is known that Sall4 expression is not reduced in thalidomide exposure, because the degrading mechanism occurs posttranslationally (Donovan et al., 2018). Here we demonstrate Crbn-Sall4 gene co-expression being affected after thalidomide exposure in mESC. This result does not invalidate the idea of a genetic resistance driven by both Crbn and Sall4 variants; however, the altered co-expression in mice raises the hypothesis whether this differential co-expression is also present in TE-affected species. A positive result in the affected species would point to an associated pre-transcriptional mechanism that may help to understand the TE susceptibility, as we have reported in humans .
Concluding, we found several differences between sensitive and insensitive animals regarding known mechanisms of thalidomide teratogenicity: we identified gene copy number divergences among species; non-syntenic genes at the upstream vicinity of NOS3 in rodent, rabbit, and other TE species; and 299 variants in mice and rats within genes of interest. Some variants are in domains, and, until now, they are original data toward the thalidomide teratogenesis molecular context. Consequently, we could not predict if they would be neutral or functional for TE differential outcomes. However, 24 substitutions are rare missense alterations in humans. These results can inspire in vitro and in vivo additional investigations. We also performed a differential co-expression analysis in mESC exposed to thalidomide and identified Crbn as a gene that drives the expression correlation of a series of genes reported previously as affected by thalidomide, as well as genes related to limb reduction defects (of which TE is a phenocopy). These correlated gene pairs include Crbn-Sall4. Finally, differential regulation analysis suggests Ikzf1 as a driver gene for co-expression as well, a result that deserves to be explored in depth in other studies, especially in vivo.
The research here conducted complements the recent results regarding thalidomide comparative teratogenesis. Our study is limited by the lack of experimental data of thalidomide exposure in other affected and unaffected species. However, this is an exploratory analysis that provides new insights for this challenge. The outburst of babies born with TE led to many changes in legislation concerning animal model research and drug commercialization around the world (Daemmrich, 2002). In order to guarantee safe commercial liberation of a medicine, all drugs must be tested at least in two animal species, one non-rodent (Cook and Fairweather, 1968), and results in these models must not be completely extrapolated to human species, once teratogenesis is not a species-specific process (De Santis et al., 2004). Hence, studies of comparative teratogenesis like this are important to trace strategies for avoiding a new thalidomide tragedy.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s. Data studied was obtained from the public genomic databases Ensembl and GenBank (NCBI). Gene expression data is available in Gene Expression Omnibus (GEO) database, series number GSE61306. Confirmatory analyses were performed in valproic acid, retinoic acid or warfarin-exposed mESC, available in the ArrayExpress database (accession numbers: E-TABM-903, E-TABM-1205, and E-MTAB-300).

ETHICS STATEMENT
Ethical review and approval was not required for the animal study because the present study uses only secondary data, available in GEO database (GSE61306) and ENA database (E-TABM-903, E-TABM-1205, and E-MTAB-300), as reported in the Data Availability Statement.

AUTHOR CONTRIBUTIONS
TK and GC-G contributed in devising the concept, designing and conducting the analyses, and writing the manuscript. JG contributed in devising the concept, designing the analysis, and performing the analyses. LS-F contributed in devising and supervising the analyses. LF, MR-M, VP-C, and FV contributed in devising the concept, designing the analyses, supervising the analyses, and correcting the manuscript. All authors discussed the results and contributed scientifically to the manuscript.

ACKNOWLEDGMENTS
We deeply thank L. Neto, A. Dupont, M. Furtado, and B. Rengel for the technical assistance, and all the members of the Laboratory of Genomic Medicine (HCPA), Laboratory of Medical Genetics and Evolution (UFRGS), and Bioinformatics Core (HCPA) for all the helpful discussions and scientific support.