Computational Detection of Stage-Specific Transcription Factor Clusters during Heart Development

Transcription factors (TFs) regulate gene expression in living organisms. In higher organisms, TFs often interact in non-random combinations with each other to control gene transcription. Understanding the interactions is key to decipher mechanisms underlying tissue development. The aim of this study was to analyze co-occurring transcription factor binding sites (TFBSs) in a time series dataset from a new cell-culture model of human heart muscle development in order to identify common as well as specific co-occurring TFBS pairs in the promoter regions of regulated genes which can be essential to enhance cardiac tissue developmental processes. To this end, we separated available RNAseq dataset into five temporally defined groups: (i) mesoderm induction stage; (ii) early cardiac specification stage; (iii) late cardiac specification stage; (iv) early cardiac maturation stage; (v) late cardiac maturation stage, where each of these stages is characterized by unique differentially expressed genes (DEGs). To identify TFBS pairs for each stage, we applied the MatrixCatch algorithm, which is a successful method to deduce experimentally described TFBS pairs in the promoters of the DEGs. Although DEGs in each stage are distinct, our results show that the TFBS pair networks predicted by MatrixCatch for all stages are quite similar. Thus, we extend the results of MatrixCatch utilizing a Markov clustering algorithm (MCL) to perform network analysis. Using our extended approach, we are able to separate the TFBS pair networks in several clusters to highlight stage-specific co-occurences between TFBSs. Our approach has revealed clusters that are either common (NFAT or HMGIY clusters) or specific (SMAD or AP-1 clusters) for the individual stages. Several of these clusters are likely to play an important role during the cardiomyogenesis. Further, we have shown that the related TFs of TFBSs in the clusters indicate potential synergistic or antagonistic interactions to switch between different stages. Additionally, our results suggest that cardiomyogenesis follows the hourglass model which was already proven for Arabidopsis and some vertebrates. This investigation helps us to get a better understanding of how each stage of cardiomyogenesis is affected by different combination of TFs. Such knowledge may help to understand basic principles of stem cell differentiation into cardiomyocytes.

Transcription factors (TFs) regulate gene expression in living organisms. In higher organisms, TFs often interact in non-random combinations with each other to control gene transcription. Understanding the interactions is key to decipher mechanisms underlying tissue development. The aim of this study was to analyze co-occurring transcription factor binding sites (TFBSs) in a time series dataset from a new cell-culture model of human heart muscle development in order to identify common as well as specific co-occurring TFBS pairs in the promoter regions of regulated genes which can be essential to enhance cardiac tissue developmental processes. To this end, we separated available RNAseq dataset into five temporally defined groups: (i) mesoderm induction stage; (ii) early cardiac specification stage; (iii) late cardiac specification stage; (iv) early cardiac maturation stage; (v) late cardiac maturation stage, where each of these stages is characterized by unique differentially expressed genes (DEGs). To identify TFBS pairs for each stage, we applied the MatrixCatch algorithm, which is a successful method to deduce experimentally described TFBS pairs in the promoters of the DEGs. Although DEGs in each stage are distinct, our results show that the TFBS pair networks predicted by MatrixCatch for all stages are quite similar. Thus, we extend the results of MatrixCatch utilizing a Markov clustering algorithm (MCL) to perform network analysis. Using our extended approach, we are able to separate the TFBS pair networks in several clusters to highlight stage-specific co-occurences between TFBSs. Our approach has revealed clusters that are either common (NFAT or HMGIY clusters) or specific (SMAD or AP-1 clusters) for the individual stages. Several of these clusters are likely to play an important role during the cardiomyogenesis. Further, we have shown that the related TFs of TFBSs in the clusters indicate potential synergistic or antagonistic interactions to switch between different stages. Additionally, our results suggest that cardiomyogenesis follows the hourglass model which was already proven for Arabidopsis and some vertebrates. This investigation helps us to get a better understanding of how each stage of cardiomyogenesis is affected by different combination of TFs. Such knowledge may help to understand basic principles of stem cell differentiation into cardiomyocytes.

INTRODUCTION
Transcription factors (TFs) regulate the expression of genes and genetic programs to maintain survival and adaption to the environment in adult organisms as well as in embryoand organogenesis. Most of them bind to recognized specific sequences in the DNA regulatory regions of genes and modify transcription, such as the assembly of the gene expression machinery. In mammalian tissues TFs often work in combinatorial interactions for precise regulation of specific programs (Boyer et al., 2005;Odom et al., 2006;Hu and Gallo, 2010;Neph et al., 2012). Such interactions can be positive, resulting in an enhanced expression of a gene or negative, resulting in reduced expression of a target gene. Thus, the identification of co-occurring transcription factor binding sites (TFBSs) in the promoter regions of regulated genes indicate potential combinatorial interactions between TFs that are important for understanding the molecular mechanisms, e.g., of tissue development during embryogenesis.
The human heart is the first organ formed during embryogenesis (Kirby, 2002;Brand, 2003;Buckingham et al., 2005;Brewer and Pizzey, 2006;Schleich et al., 2013), and it consists of different cell types, which develop simultaneously and are regulated by TFs as well as their combinatorial interactions. Until now, several groups analyzed TFs and their influence on cardiac development (Ryan and Chin, 2003;Pikkarainen et al., 2004;Peterkin et al., 2005;Brewer and Pizzey, 2006;Martin et al., 2010;Shi and Jin, 2010;Turbendian et al., 2013;Chaudhry et al., 2014;Takeuchi, 2014;Wang and Jauch, 2014). These studies mainly focus on individual TFs or their related families e.g., GATA family, TBX family, or NKX2 family (Ryan and Chin, 2003;Pikkarainen et al., 2004;Miura and Yelon, 2013;Turbendian et al., 2013). However, a detailed analysis of interactions between TFs and their role in cardiac development is limited to interactions between known cardiac TFs like NKX2-5 or MEF2 which are essential for the generation of cardiac tissues from stem cells (Martin et al., 2010;Sylva et al., 2014;Takeuchi, 2014). A complete survey of potential TF interactions by co-occurring TFBSs in the promoter regions of genes which regulate cardiac development is still missing, but needed to understand embryonic cardiac development, in particular of cardiomyocytes (CMs).
CMs comprise the most important functional cells in the human heart (Ye et al., 2013;Sylva et al., 2014). CMs show a limited potential to regenerate after myocardial infarction or other cardiovascular diseases (CVDs), which is at maximum 50% CM renewal per lifetime and less than 1% per year (Bergmann et al., 2009;Sylva et al., 2014;Takeuchi, 2014). Replacing CMs in elderly by for example enhanced cardiomyocyte proliferation may improve the quality of their life, but requires an understanding of how CMs develop and of how they can be replaced (Akhurst, 2012;Ye et al., 2013;Euler, 2015).
One approach is to apply tissue engineered myocardium to restore muscle mass and thus reintroduce contractility (Zimmermann et al., 2006). Such tissues can be generated from embryonic stem cells (ESCs), induced pluripotent stem cells (iPSCs), or parthenogenetic stem cells (Soong et al., 2012;Didié et al., 2013;Ye et al., 2013;Tiburcy and Zimmermann, 2014). Controlling cardiomyogenesis in vitro requires insight into biological processes governing embryonic heart development. To understand cardiac development from a systems biology perspective, identification of the mechanisms controlling the expression of fate determining TFs and their regulation of transcription are of fundamental importance. Co-occurring TFBSs in the regulatory regions of genes which are specific for a particular developmental stage reveal potential TF interactions that are likely to regulate these stages. There are in fact plenty of TF-TF interactions known as implicated in organogenesis, but the specific time points when particular interactions occur, are difficult to obtain and mostly not annotated in public databases. Only intense literature surveys provide such information.
Recent studies identifying the co-occurrence of TF pairs focus either on combinatorial approaches where e.g., specific DNAsequences bound by different TFs simultaneously were selected from a library of random sequences (Jolma et al., 2015) or approaches that focus on data integration e.g., ChIP-seq, SELEX together with Hi-C to reveal long-range chromatin interactions (Jolma et al., 2013;Wong et al., 2016). Although the selection of interacting TF pairs from a library of random sequences underpins potential interactions of TFs, it does not give any hints on the actual interactions in particular cell types or tissues. Data integration and especially Hi-C technology is very promising for the future, but currently there is a lack in publicly available data sets that cover the time dependent organogenesis of the human heart.
In this study we analyze a time series dataset obtained from RNAseq at different time points of in vitro cardiomyogenesis (Hudson et al.; in revision) to identify co-occurring TFBSs which indicate potential interacting TFs that are crucial for understanding the gene regulatory mechanisms during the heart development. The dataset consists of six different time points (day: 0, 3, 8, 13, 29, and 60) where the gene expression in the tissue culture was measured by RNAseq. The data comprises early heart development in general and can be differentiated in the following major developmental stages: (i) mesoderm induction stage (day 0-day 3); (ii) cardiac specification stage (day 3-day 13; early 3-8, late 8-13); (iii) cardiac maturation stage (day 13-day 60; early 13-29, late 29-60). For each stage we determined the set of unique differentially expressed genes (DEGs) utilizing limma on the FPKM-values in the dataset (Smyth, 2004). To identify specific TF interactions in individual stages, we analyzed the promoter sequences of corresponding DEGs employing the MatrixCatch approach (Deyneko et al., 2013). As a result, we observed a set of co-occurring TFBSs for each stage whose corresponding TFs are likely to represent potential core regulators of a particular developmental stage. Although the analyzed DEGs are unique in each stage, the identified TFBS pairs are highly overlapping between stages. To overcome this problem in MatrixCatch results, we further applied Markov clustering algorithm (MCL; Dongen, 2000) for the detection of clusters which contain stage specific co-occurrences between TFBSs. In recent years, MCL has gained great attention in the bioinformatics community for the detection of highquality clusters in biological networks due to its highly effective and successful algorithm. Especially, for the clustering of proteinprotein interaction networks, several studies have shown that MCL is superior to conventional clustering approaches in terms of detection of high-quality and more accurate functional clusters (Brohée and van Helden, 2006;Vlasblom and Wodak, 2009;Shih and Parthasarathy, 2012). These articles encouraged us to utilize MCL for the elimination of negligible pairs at each stage and thus for the determination of remaining TFBS pairs, which may play crucial roles during cardiomyogenesis. To this end, we focused on clusters whose central binding site is present at almost all stages, but its partners differ stage-specifically. These clusters may regulate DEGs in each stage and are likely to be fundamentally implicated in cardiac muscle development.

MATERIALS AND METHODS
In this section we describe the differentially expressed genes analyzed and the methods applied and partly developed. Our analysis follows the structure of Figure 1.

Selection of Differentially Expressed Genes
The data, available as a FPKM normalized RNAseq time series, was mapped to corresponding gene symbols (hgncsymbols) and further analyzed using limma package from the Bioconductor project for R with standard procedures (Smyth, 2004;R Core Team, 2015). The time series data describe human cardiomyogenesis in vitro at time points day 0, 3, 8, 13, 29, and 60, whereas day 0 resembles blastocyst stage development and day 60 early fetal stages (Hudson et al.; in revision). We calculated DEGs between two time points which define a particular developmental stage where: (i) day 0-3 defines the mesoderm induction stage; (ii) day 3-8 early cardiac specification; (iii) day 8-13 late cardiac specification; (iv) day 13-29 early cardiac maturation and; (v) day 29-60 the late cardiac maturation stage (this stage describes the transition from an embryonic to a fetal cardiac maturation stage). We filtered the set of all DEGs for protein coding genes (excluding TFs) and their uniqueness in a stage by comparison to all other stages with p-value ≤ 0.05 and FDR ≤ 0.01 (see Supplementary File 1). A heatmap of stage-specific DEGs is given in Supplementary File 2.

Promoter Sequences
Using UCSC genome browser (Karolchik et al., 2004), we extracted for each protein coding gene (RefSeq gene) based on its annotated transcription start site (TSS) the -1 kb putative regulatory promoter region.
It is important to note that, according to TSS annotations, a RefSeq gene can have multiple overlapping promoter regions which results in overestimation of the importance of some transcription factor binding sites (TFBSs). Thus, following the line of PC-TraFF to remove the redundancy between sequences, we filtered them regarding their TSSs (Meckbach et al., 2015). Consequently, we used in our analysis only those sequences which have no overlap. In this study, the assembly of the hg19 release of the human genome was used and only UCSC track refGene annotations were considered which correspond to the chromosomes chr1-chr22, chrX, and chrY.

MatrixCatch Analysis
MatrixCatch is a novel method introduced by Deyneko et al. (2013) to recognize experimentally verified TF pairs based on the co-localization of their TFBSs, known as composite regulatory modulues (CRMs), in single promoters. To detect CRMs in the individual sequences under study, MatrixCatch scans each sequence and its reverse complement using a special library of position weight matrices (PWMs). This library has been specified by considering the TF binding scores, relative orientations and distances between TFs that are experimentally known to interact, as documented in the TRANSCompel database (Kel-Margoulis et al., 2002). Consequently, the usage of MatrixCatch yields an important practical advantage since this method provides a high number of known CRMs in sequences with their biological interpretation (for details, see Deyneko et al., 2013).
In our study, we applied MatrixCatch to the promoter sequences of the filtered DEGs of the different heart developmental stages. As we have recently suggested in PC-TraFF (Meckbach et al., 2015), we prefer in this study the usage of TFBS pairs instead of CRMs, since those pairs were detected in a set of sequences. This indicates the importance of potential collaborations between corresponding TFs in the gene set of interest.

Clustering of Co-Occurring TFBSs
Since MatrixCatch provides all detected TFBS pairs of experimentally verified TF interactions in promoters, the detected pairs are highly overlapping between developmental stages. To differentiate stage specific roles of TFBS pairs, we first determined the frequency of each pair in MatrixCatch results. After that, we applied the Markov clustering algorithm (MCL; Dongen, 2000) which is able to eliminate negligible TFBS pairs based on their frequencies at each stage. To this end, we constructed an interaction network based on the TFBS pairs for each heart developmental stage, where nodes are TFBSs and edges display the co-occurrences between them.
Let N : = V, E be an undirected interaction network of TFBS pairs where any two elements (v i , v j ∈ V) of N are connected by an edge e (v i ,v j ) belonging to E, if and only if the corresponding TFBS pair was identified by MatrixCatch. Further, w(v i , v j ) denotes the weight of an edge e (v i ,v j ) , which represents the observed frequency of the TFBS pair (v i , v j ) found by MatrixCatch in the promoter sequences of genes under study.
Based on the weights of edges, an adjacency matrix A n×n of each network was constructed as A n×n was then converted into a row stochastic "Markov" matrix M n×n , where m i×j represents the transition probability between nodes v i and v j in the network under study. The most common way to construct a row stochastic transition matrix M is the normalization of rows in A to sum to 1. This process can be simply given as: M = −1 · A, where is a n × n diagonal degree matrix and defined as: Based on matrix M, we employed MCL (Dongen, 2000) to detect densely connected TFBSs in each network. Briefly, the basic intuition of MCL was based on a simulation of stochastic flows on the underlying interaction network to separate highflow regions from low-flow regions. To this end, Expand and Inflate operations were applied on M until M reaches its steady state. While the Expand operation corresponds to matrix multiplication (M = M × M), the Inflate operation is used to increase the contrast between higher and lower probability transitions by taking each entry m i×j in M to the power of inflation parameter r > 1. Finally, M was re-normalized into a row stochastic matrix. The pseudo-code for MCL is given in Algorithm 1.

Algorithm 1 : Markov Clustering Algorithm
Input: M and r > 1 Output: C: A list of clusters Methode:

RESULTS
We analyzed a time course data set which covers heart muscle development in human embryonic stem cell derived tissue cultures at days 0, 3, 8, 13, 29, and 60 (Hudson et al., in revision). These time points cover the mesoderm induction stage (day 0-day 3), the cardiac specification stage (day 3-day 13), and the cardiac maturation stage (day 13-day 29). We further defined cardiac specification and cardiac maturation into two more stages, i.e.,: (i) early cardiac specification and maturation stage from days 3-8 and days 13-29, respectively; (ii) late cardiac specification and maturation with transition from embryonic to fetal stages defined by culture days 8-13 and days 29-60, respectively. By comparison of neighboring time points, for each stage, we determined the set of DEGs and filtered them according to their uniqueness in a particular stage. Afterwards, we utilized Due to underlying methodology of MatrixCatch, the detected TFBS pairs show a large overlap between different stages although they may play different roles in these stages. To reduce this drawback of MatrixCatch, we further applied Markov clustering algorithm that seeks to remove negligible TFBS pairs by emphasizing the roles of remaining pairs at each stage. Consequently, we obtained (i) 19 clusters for the mesoderm induction stage; (ii) 25 clusters for the early cardiac specification stage; (iii) 11 clusters for the late cardiac specification stage; (iv) 21 clusters for the early cardiac maturation stage, and (v) 24 clusters for the late cardiac maturation stage (see

Supplementary File 4).
We focused only on clusters with V$AP1_01, V$HMGIY_Q6, V$SMAD_Q6_01, and V$NFAT_Q6 binding sites in their center (see Figure 2), because these clusters contain at least three interactions and the changes in their constitution provide crucial information about different cardiac developmental stages. We analyzed the TFBS pairs in these clusters according to their potential role in cardiac development. We omitted clusters, when the expression values of TF genes are below a certain threshold or their importance in heart development is currently unknown. For our analysis, we applied a FPKM threshold value of 10, which discriminates robustly between expressed TF genes and low or not expressed TF genes.

AP-1-Cluster
The AP-1-cluster is an assembly of different TFBSs with the V$AP1_01 binding site in its center (see Figure 2A). As described in Table 1 and in Figure 3, V$AP1_01 binding site co-occurs with V$OCT_C binding site during mesoderm induction (< day 3) and early cardiac specification stage (day 3-day 8) and at late cardiac maturation stage (> day 29). Further, V$AP1_01 co-occurs with V$GATA_Q6 binding site at all stages except days 8-13. Interestingly, a co-occurring pair between V$AP1_01 and V$HNF4_Q6 binding site was detected only between day 3 and day 8. Additionally, Figure 3 shows for these TFBSs the related TF genes which are expressed in at least one time point.
AP-1 is a family of leucine zipper transcription factors (bZIP) which forms homo-or heterodimers composed of proteins belonging to JUN or FOS protein families (Shaulian and Karin, 2002;Hess et al., 2004;Shaulian, 2010). AP-1 plays a role in the regulation of general functions like proliferation, differentiation, and apoptosis. We identified that V$AP1_01 co-occurs with V$OCT_C binding sites which are bound by AP-1 and POUdomain factors like POU5F1, respectively. POU5F1 is also known as OCT-4, which is an important pluripotency maintenance factor (Schöler et al., 1990;Nichols et al., 1998;Pesce and Schöler, 2001;Guo et al., 2002). Regarding the expression values, POU5F1 shows higher expression in early stages (< day 8) and is absent after day 13 (see Figure 4B). This is in contrast to AP-1, where AP-1 components (FOS as well as JUN) are not present or only present at reduced levels during early stages, but they show increased expression values after day 13 (see Figure 4A). This suggests that AP-1 may not be formed during early stages, where POU5F1 controls the associated genes, and that during the late cardiac maturation stage (> day 29) the analyzed genes are under control of AP-1.
Our analysis identified a co-occurrence of V$AP1_01 with V$GATA_Q6 binding sites. GATA factors form a protein family of six zinc finger transcription factors that share a highly conserved DNA-binding sequence (Orkin, 1992;Ohneda and Yamamoto, 2002;Pikkarainen et al., 2004;Brewer and Pizzey, 2006). As suggested in Brewer and Pizzey (2006), the family can be dissected into two subfamilies (GATA-1,2,3 and GATA-4,5,6), based on their expression levels in different tissues, where only GATA -4, -5 and -6 are associated with cardio-and organogenesis (Pikkarainen et al., 2004;Peterkin et al., 2005;Brewer and Pizzey, 2006;Whitfield et al., 2012;Turbendian et al., 2013). We found only GATA4 and GATA6 to be expressed. Interactions between GATA-factors and AP-1 are well known, especially co-occurrence of AP-1 together with GATA-4 in several heart cell types and in Leydig cells (Herzig et al., 1997;Suzuki et al., 1999;Schröder et al., 2006;Linnemann et al., 2011;Martin et al., 2012). In our system, GATA6 was expressed in high amounts during the mesoderm induction (< day 3) and early cardiac specification Constitution of co-occurring pairs in the AP-1-cluster, a "+" indicates the presence of a pair; a "−" its absence. During the late stage of cardiac specification (Day8-Day13), the cluster is completely absent.
FIGURE 3 | Stage specific representation of TFBSs and the expression of associated TF genes, referring to Figure 2A. The encircled nodes represent the found TFBSs which are connected by color-coded round-edged rectangles which highlight stages where a TFBS pair was found. TF genes which are associated to TFBSs are linked by dashed lines. The TF genes are represented by color-coded rectangles representing the presence at a partiular time point. The absence of a TF gene during a particular time point or the absence of a pair during a particular stage is encoded in white. Both, the color-code for the stage specificity as well as for the gene expression of a TF gene is shown on the bottom right side. TF genes which are associated to a TFBS but are in all time points below the set threshold are omitted.
stage (day 3-day 8) but was not expressed or only at minor extent during cardiac maturation (> day 13, see Figure 4C). In contrast, GATA4 was expressed in high amounts during the late cardiac specification stage as well as during cardiac maturation (> day 8). The missing of AP-1 during mesoderm induction (< day 3) suggests that genes specific for mesoderm induction might be under control of GATA-6, whereas GATA-4 and AP-1 may regulate genes during cardiac maturation (> day 13), synergistically (see Pikkarainen et al., 2004 for the role of GATA-4 and GATA-6).
The role of the co-occurrence between V$AP1_01 and V$HNF4_Q6, which represents a binding site for HNF4A or HNF4G TFs, during cardiomyogenesis is uncertain. This TFBS pair was detected during early cardiac specification stage (days 3-8), but no expression of the related genes could be found. As mentioned before, the formation of AP-1 during this stage at relevant levels is uncertain (see Figure 4A), due to the low expression of the AP-1 components. Furthermore, the role of HNF4-genes, which where frequently reported to be associated with lipid metabolism in the liver (Watt et al., 2003;Chandra et al., 2013), during cardiac development is still unclear, but may point to changes in the metabolism at this stage.

HMGIY-Cluster
The HMGIY-cluster is assembled in a total of five TFBS pairs (see Figures 2B, 5) with the V$HMGIY_Q6 binding site in its center. Table 2 shows the co-occurring TFBS pairs of this cluster and Figure 5 shows for these TFBSs the related TF genes which are expressed in at least one time point. The TFBS pair V$HMGIY_Q6 -V$OCT_Q6 was found during all stages and the co-occurrence between V$HMGIY_Q6 and V$ATF3_Q6 binding sites was found at days 3-8, and after day 29. Interestingly, we found in this cluster three binding sites, namely V$NFKAPPAB_01, V$NFKB_Q6_01, and FIGURE 4 | (A) Expression of AP-1 factor genes. In orange FOS TF genes are shown, and in blue JUN TF genes. At early stages the expression levels of FOS and JUN genes which are AP-1 components is rather limited. It is likely that AP-1 cannot be formed due to the low expression of FOS genes. In later stages (> day 13), AP-1 and especially FOS increases its expression. (B) Expression of TF genes which contain a POU-domain, in blue (POU5F1) and orange (POU2F1) are the two genes which are above the threshold. POU5F1 which is more abundant than POU2F1 decreases during the time course and was absent after day 13. (C) Expression of GATA genes, in blue (GATA6) and orange (GATA4) are above the threshold. GATA6 is expressed during the mesoderm induction stage and decreases afterwards, while GATA4 becomes supreme in subsequent stages. The red lines show a FPKM value of 10 that we consider as threshold for sufficiently expressed genes which contribute to regulatory effects.

Day0-Day3
Day3 Constitution of co-occurring pairs within the HMGIY-cluster, a "+" indicates the presence of a matrix pair; a "−" its absence.
V$NFKB_Q6 which can be bound by the family of NF-κB-related factors. While the V$HMGIY_Q6 -V$NFKB_Q6 TFBS pair was detected only during the mesoderm induction stage (<day 3), the co-occurrence between V$HMGIY_Q6 and V$NFKB_Q6_01 binding sites was found at all stages. The TFBS pair V$HMGIY_Q6 -V$NFKAPPAB_01 was found at all stages except the late cardiac specification stage (day 8-day 13). To ensure the quality of these three NF-κB binding sites, we further investigated their position weight matrices (PWMs) as well as their binding motifs. Considering the PWMs, we observed that all PWMs have relatively high value of information content (see Table 3) which assess their quality. In addition, a comparison between motifs shows different binding behavior of NF-κBrelated factors which could be linked to specific members of this family. HMGA1 is a TF which is represented by the PWM V$HMGIY_Q6 and was recently described as a positive regulator of pluripotency in cellular reprogramming (Shah et al., 2012). The expression levels of HMGA1 in our system are in agreement with previous studies, which describe HMGA1 as highly abundant during embryogenesis, especially in embryonic stem cells; with intermediate expression levels in undifferentiated cancers and at low or at not detectable levels in adult differentiated cells and fibroblasts (Fusco and Fedele, 2007;Hillion et al., 2008Hillion et al., , 2009Resar, 2010;Chou et al., 2011;Schuldenfrei et al., 2011;Shah et al., 2012;Williams et al., 2015). The detected co-occurrence between V$HMGIY_Q6 and V$OCT_Q6 binding sites was found at all stages. The corresponding TF genes (HMGA1, HMGA2, and POU5F1) of this TFBS pair did not show such behavior (see Figures 4B, 6A). HMGA1 as well as POU5F1 are expressed at high levels during early cardiac development with their maximum expression levels at day 3 and declined afterwards. However, this pair was found at later stages indicating that the detected DEGs at these stages could be potentially regulated by this pair. POU5F1 is below the threshold after day 13, whereas HMGA1 is always above the threshold but stablized at low levels. After day 13, HMGA1, which is in its expression values always more abundant than HMGA2, could regulate the detected pairs alone.
The co-occurrence of V$HMGIY_Q6 and different NF-κB binding sites was detected at all time points (see Table 2). Interestingly, our findings show that this interaction could occur based on different NF-κB binding sites which are bound by the same TFs. It is known that the interaction between HMGA1 and NF-κB plays a pivotal role in formation of an enhancer complex which is essential to regulate interferon-β signaling on FIGURE 5 | Stage specific representation of TFBSs and the expression of associated TF genes, referring to Figure 2B. The encircled nodes represent the found TFBSs which are connected by color-coded round-edged rectangles which highlight stages where a TFBS pair was found. TF genes which are associated to TFBSs are linked by dashed lines. The TF genes are represented by color-coded rectangles representing the presence at a partiular time point. The absence of a TF gene during a particular time point or the absence of a pair during a particular stage is encoded in white. Both, the color-code for the stage specificity as well as for the gene expression of a TF gene is shown on the bottom right side. TF genes which are associated to a TFBS but are in all time points below the set threshold are omitted.
genomic level (Thanos and Maniatis, 1992;Lewis et al., 1994;Wood et al., 1995;Himes et al., 1996;Thanos and Maniatis, 1996;Mantovani et al., 1998;Perrella et al., 1999;Zhang and Verdine, 1999). Within this complex, NF-κB acts on the one hand as a key regulator in hypertrophy and, on the other hand it acts as cardioprotective factor during embryogenesis (Dewey et al., 2011;Gordon et al., 2011;Liu et al., 2012;Zhou et al., 2013). The expression levels of NF-κB genes may indicate an increasing importance of NFKB1 and especially of RELA during cardiac maturation (> day 13), where it is expressed at considerable levels (see Figure 6B).
The co-occurrence of V$HMGIY_Q6 with the V$ATF3_Q6 binding site, which is bound by ATF3, was detected during early cardiac development until day 8 and at the latest stage after day 29. ATF-3 is a FOS-related TF, which contains a basic leucine zipper as structural motif (Chen et al., 1994). ATF-3 acts as homo-or heterodimer to activate or to repress the expression of target genes, depending on its environment. Further, it is also involved in TGF-β signaling in several cell types and in cardiac development (Ishiguro et al., 2000;Mayr and Montminy, 2001;Yan et al., 2005;Gilchrist et al., 2006;Yin et al., 2010;Lin et al., 2014). While HMGA1 is expressed at high levels during early stages (days 0-3) and is declined afterwards, the ATF3 gene is close to the threshold before day 13 and increases its expression levels during subsequent stages (see Figure 6C). Our results suggest that the genes regulated by this pair are under control of HMGA1 in the early stages and ATF-3 afterwards. Gilchrist et al. The family of NF-κB-related factors can be represented by different PWMs each of which have relatively high information content and different binding motifs. (rc) : reverse complement demonstrate the co-occurrence of ATF-3 and NF-κB binding sites in regulated target genes (Gilchrist et al., 2006). According to their binding sites, our analysis suggests that together with ATF-3 and NF-κB factor, HMGA1 may play an important role in the regulation of target genes in cardiac development.

SMAD-Cluster
The SMAD-cluster is assembled in a total of three TFBS pairs with the V$SMAD_Q6_01 binding site in its center (see Figures 2C, 7) . Table 4 shows the co-occurrence of V$SMAD_Q6_01 and V$FOX_Q2 binding sites in the promoters of the regulated genes and was observed during all stages. The TFBS pair V$SMAD_Q6_01 -V$AP1FJ_Q2 was detected in our system at early stages until day 8 and at late stages after day Expression of corresponding TF genes which can be represented by the PWMs (V$NFKAPPAB_01, V$NFKB_Q6_01, V$NFKB_Q6) related to NF-κB binding sites. While NFKB2 never reaches the threshold, RELA and NFKB1 increase their expression levels in later stages (> day 13). (C) Expression of the corresponding ATF3 gene to its TF which can be represented by the PWM V$ATF3_Q6. ATF3 is present in the first stage, but in subsequent stages until day 13 it is quite close to the threshold. It changes its expression levels drastically during the cardiac maturation stage (> day 13). The red lines show a FPKM value of 10 that we consider as threshold for sufficiently expressed genes which contribute to regulatory effects.
13, but not during late cardiac specification stage (days 8-13). In contrast, the co-occurrence between V$SMAD_Q6_01 and V$LEF1TCF1_Q4 was detected only during cardiac specification (days 3-13). In addition, Figure 7 shows for these TFBSs the related TF genes which are expressed in at least one time point.
SMADs are members of a family of transcription factors that form a beta-hairpin structure which interacts with the major groove of the DNA (Burke et al., 1976;Macias et al., 2015). SMAD1-4 which can be represented by the PWM V$SMAD_Q6_01 act as TFs in the nucleus and as signaling molecules, where they are involved in numerous pathways like canonical and non-canonical SMAD-signaling pathways, TGFβ-as well as BMP-and WNT-signaling (Heldin et al., 1997;Leask and Abraham, 2004;Euler-Taimor and Heger, 2006;Pal and Khanna, 2006;Schröder et al., 2006;Leask, 2007;Ruiz-Ortega et al., 2007;Calvieri et al., 2012;Massagué, 2012;Dyer et al., 2014;Euler, 2015). Figure 8A shows that SMAD1, SMAD2, and SMAD4 genes are continuously expressed at all stages. The detected SMAD3 expression after day 3 exceeds the set threshold only slightly. SMAD2 and SMAD4 show the highest expression levels in our system, but the differences in their expression levels are rather small.
The co-occurrence of V$SMAD_Q6_01 and V$FOX_Q2 binding sites was detected at all stages (see Table 4). Recently, the cooperative regulatory interaction of FOX factors, which play an important role in cardiovascular development and in other organs (Yamagishi et al., 2003;Maeda et al., 2006;Seo and Kume, 2006;Fortin et al., 2015), with SMAD3 and SMAD4 has been shown by (Fortin et al., 2015). Although the SMAD-FOX pair can be detected during the whole time course, the expression of FOX-genes is limited to FOXH1, which seems to play a role in early heart development only (< day 13, see Figure 8C).
The co-occurrence between V$SMAD_Q6_01 and V$AP1FJ_Q2 binding sites were found in almost all stages except for the late cardiac specification stage (between day 8 and day 13). In adult CMs, AP-1 together with SMAD proteins modulates hypertrophic, apoptotic and fibrotic pathways. Additionally, AP-1 together with SMAD forces the shift toward apoptosis after stimulation of TGF-β-signaling (Schneiders et al., 2005;Schröder et al., 2006;Euler, 2015). In the embryonic hearts, the activation of TGF-β-pathways results in an induction of cardioprotective functions (Leask and Abraham, 2004;Pal and Khanna, 2006;Leask, 2007;Ruiz-Ortega et al., 2007;Calvieri et al., 2012;Euler, 2015). Although there is no known AP-1 SMAD interaction during cardiogenesis, Yuan et al., shows the interaction of these TFs by usage of AP-1 and SMAD decoy oligodeoxynucleotides, which reduces fibrosis in their study .
The detected TFBS pair V$SMAD_Q6_01 -V$LEF1TCF1_Q4 is limited to the cardiac specification stage (day 3-day 13). TCF-7 and LEF-1 transcription factors, which are represented by V$LEF1TCF1_Q4, can be activated by β-catenin and are involved in canonical WNT-signaling (Brade et al., 2006;Chen et al., 2006;Pal and Khanna, 2006;Kwon et al., 2007;Naito et al., 2010). The measured gene expression of TCF as well as LEF genes shows that during cardiac specification both groups are quite close to or below the set threshold (see Figure 8B). This indicates that no TCF or LEF binding occurs, which may result in the absence of canonical WNT-signaling during cardiac specification.

NFAT-Cluster
The NFAT-cluster consists in a total of six TFBS pairs with V$NFAT_Q6 binding site in its center (see Figures 2D, 9). As described in Table 5 and Figure 9, V$NFAT_Q6 co-occurs with V$PEBP6_Q6 and V$ETS1_B binding sites only during the mesoderm induction stage (days 0-3). Three TFBS pairs, namely V$NFAT_Q6 -V$AP1_C, V$NFAT_Q6 -V$CREBP1CJUN_01, FIGURE 7 | Stage specific representation of TFBSs and the expression of associated TF genes, referring to Figure 2C. The encircled nodes represent the found TFBSs which are connected by color-coded round-edged rectangles which highlight stages where a TFBS pair was found. TF genes which are associated to TFBSs are linked by dashed lines. The TF genes are represented by color-coded rectangles representing the presence at a partiular time point. The absence of a TF gene during a particular time point or the absence of a pair during a particular stage is encoded in white. Both, the color-code for the stage specificity as well as for the gene expression of a TF gene is shown on the bottom right side. TF genes which are associated to a TFBS but are in all time points below the set threshold are omitted.

Day0-Day3
Day3 Constitution of co-occurring pairs within the SMAD-cluster, a "+" indicates the presence of a pair; a "−" its absence.
and V$NFAT_Q6 -V$MAF_Q6_01, were found during the complete time course. The co-occurrence of V$NFAT_Q6 with V$CEBPB_01 binding sites in the promoter regions of the analyzed set of genes was found as present until day 8 and during the cardiac maturation stage after day 13. This TFBS pair was not present during the late cardiac specification stage (days 8-13). In addition, Figure 9 shows for these TFBSs the related TF genes which are expressed in at least one time point. Regulatory roles for NFAT factors, which can be represented by the PWM V$NFAT_Q6, have been discovered in diverse organs and cells, including the central nervous system, blood vessels, heart, skeletal muscle and haematopoietic stem cells (Macián, 2005). In general, an activation of factors of the NFAT family is calcium dependent and has been described to be of specific importance in development of the atrial myocardium and the morphogenesis of heart valves (Graef et al., 2001;Crabtree and Olson, 2002;Schubert et al., 2003;Schulz and Yutzey, 2004). In our system, only NFATC3 and NFATC4 showed expression levels above the threshold. Comparing the expression levels, NFATC4 is more abundant than NFATC3 at all time points, except for day 3, but both genes increase their expression levels at later stages and especially after day 29 (see Figure 10A).  Figure 4A. (A) Expression of corresponding genes to TFs which can be represented by the PWM V$SMAD_Q6_01. Each SMAD is expressed during the complete time course at similiar levels, while the expression levels of SMAD2/4 are higher than the expression levels of SMAD1/3. After beginning of the cardiac specification (> day 3) SMAD4 is slightly more abundant than SMAD2 and remains in this position. (B) Expression of corresponding TF genes which can be represented by the PWM V$LEF1TCF1_Q4, TCF7 is below the threshold set by us as a limit for robust transcription while LEF1 is clearly transcribed after the mesoderm induction stage (> day 3). The SMAD-TCFLEF-pair was found during the cardiac specification stage only (day 3-day 8). (C) Expression of corresponding TF genes which can be represented by the PWM V$FOX_Q2. FOXH1 is the only expressed gene and present until day 13. The red lines show a FPKM value of 10 that we consider as threshold for sufficiently expressed genes which contribute to regulatory effects.
FIGURE 9 | Stage specific representation of TFBSs and the expression of associated TF genes, referring to Figure 2D. The encircled nodes represent the found TFBSs which are connected by color-coded round-edged rectangles which highlight stages where a TFBS pair was found. TF genes which are associated to TFBSs are linked by dashed lines. The TF genes are represented by color-coded rectangles representing the presence at a partiular time point. The absence of a TF gene during a particular time point or the absence of a pair during a particular stage is encoded in white. Both, the color-code for the stage specificity as well as for the gene expression of a TF gene is shown on the bottom right side. TF genes which are associated to a TFBS but are in all time points below the set threshold are omitted.
Constitution of the NFAT-cluster, a "+" indicates the presence of a matrix pair; a "−" its absence.
The detected co-occurrence of TFBS pairs V$NFAT_Q6 -V$AP1_C and V$NFAT_Q6 -V$PEBP_Q6 refers either to NFAT-AP-1 or to NFAT-RUNX interactions which have been mainly observed in the immune system (Macián, 2005). Macián et al. have demonstrated that the interaction between NFAT and AP-1 can be linked to calcineurin dependent pathways as well as to regulation of MAP kinase pathways (Macián et al., 2001). Additionally, NFAT and AP-1 cooperate in naïve T-cells with RUNX TFs as well as with NF-κB in the promoter of IL-2 during T-cell activation (see Figures 10C,E) (Hermann-Kleiter and Baier, 2010). In our system, the low or absent expression of RUNX indicates no relevance for these factors. However, the corresponding binding site can be also occupied by CBFB, which is associated to congenital heart anomalies and is expressed during all time points (Khan et al., 2006).
We found the co-occurring TFBS pair V$NFAT_Q6 -V$MAF_Q6_01 at all stages. For the corresponding factors it has been shown by Hogan et al. that NFAT factors and MAF were able to activate IL-4 promoters (Hogan et al., 2003). Of all TFs linked to V$MAF_Q6_01, BACH1 is expressed at all stages and is always more abundant than the other genes shown in Figure 10B. This suggests a synergistic interaction in gene regulation between these factors during the complete time course. Furthermore, the interaction between NFAT and MAF factors was observed simultaneously at classical NFAT-AP-1 interaction sites (Hogan et al., 2003).
The co-occurrence between V$NFAT_Q6 and V$CEBPB_01 binding sites has been described in liver cell lines by Yang and Chow (2003). The corresponding factors to this pair seem to interact in a formation of a composite enhancer complex (Yang and Chow, 2003). In our system, genes that are linked to V$CEBPB_01 binding sites are not expressed (see Figure 10F). The observation of this pair and its potential role in heart development remains unclear.
The role of the TFBS pair V$NFAT_Q6 -V$ETS1_B, which was detected during the mesoderm induction stage, remains unclear. ETS1, a TF gene which can be linked to the PWM V$ETS1_B, is required for the differentiation of cardiac neural crest (Gao et al., 2010). Although ETS1 was expressed during the mesoderm induction stage (days 0-3), its expression is markedly reduced afterwards. DAXX is another gene that is linked to the PWM V$ETS1_B and is at all time points more abundant than ETS1 (see Figure 10D). The DAXX factor inhibits apoptosis in cardiac myocytes (Zobalova et al., 2008). An interaction between NFAT and DAXX was not found in literature, and thus the role of this pair remains unclear.

DISCUSSION
Today, it is known that in higher organisms transcription factors have to interact with each other to regulate gene expression which leads to a proper development of tissues and organs. So far, several studies have shown that the co-occurence of TF binding sites (TFBSs) on sequences is an essential indication for the identification of interactions between TFs. In this study, we identified co-occurring TFBS pairs by applying MatrixCatch algorithm to the promoter regions of five differentially expressed gene sets, which are based on a time course dataset of developing human myocardium, modeled in a tissue engineering approach (Hudson et al., in revision). MatrixCatch is a statistically affirmed computational method for the recognition of experimentally verified interactions between TFs according to their TFBS localizations in promoters. However, MatrixCatch recognizes based on its underlying algorithm all detectable TFBS pairs of known interacting TFs in promoter regions. This results in a huge overlap between recognized pairs at different stages, although these pairs can play different roles for each stage. To eliminate this drawback of MatrixCatch to some extent, we created an interaction network based on the TFBS pairs for each stage and then applied the MCL algorithm. MCL differentiates negligible TFBS pairs from densely connected TFBS pairs within these interaction networks and thus determines clusters of TFBSs. Such clusters are important to highlight stage specific co-occurrences of TFBS pairs which provide essential knowledge in the understanding of molecular mechanism of cardiac development.
Additionally, we applied our approach to different lengths of putative promoter regions ([from −500 bp to 0], [from −500 bp to +100 bp], [from −1000 bp to 0]) to determine the influence of promoter lengths on the composition of stage-specific clusters. The results denote that there is a considerably high overlap between stage-specific clusters derived from different putative promoter regions (data not shown). Thus, we considered the -1 kb putative regulatory promoter region for our analysis, which is consistent with our experience and provides the most reliable results.
Although, we filtered MatrixCatch outputs using MCL algorithm to reduce weak co-occurrence of TFBSs in each stage, we detected in our analysis several clusters as well as TFBS pairs whose potential role during cardiac development are unclear. One possible reason for the detection of such pairs could depend on the underlying methodology of MatrixCatch. It uses a computational prediction approach which scans promoter FIGURE 10 | Expression of TF genes corresponding to their factors and represented by the PWM V$AP1_C have been shown in Figure 4A. (A) Expression of TF genes which corresponds to NFAT factors represented by PWM V$NFAT_Q6. NFATC3 and NFATC4 are above the set threshold, whereas NFATC3 is more abundant than NFATC4. (B) Expression of TF genes which can be represented by the PWM V$MAF_Q6_01. MAFB shows expression levels slightly above the threshold set by us as a limit for robust transcription during the mesoderm induction stage while MAFF is expressed during cardiac maturation (> day 13). BACH1 is found to be expressed during the complete time course at considerable levels and is always more abundant than all other TF genes, which corresponds to V$MAF_Q6_01. Additionally, BACH1 increases its expression value after the mesoderm induction stage (> day 3). (C) Expression of TF genes which can be represented by the PWM V$CREBP1CJUN_01. ATF2 is expressed during the complete time course and increases its expression value in the latest stage. JUN is expressed at day 0 and after day 13 where it exceeds the expression levels of ATF2. (D) Expression of TF genes which can be represented by the PWM V$ETS1_B. DAXX is expressed during all time points, but its expression diminishes continuously. Nevertheless, it shows expression levels which are always above ETS1. (E) Expression of TF genes which can be represented by the PWM V$PEBP_Q6. Only CBFB shows expression above the threshold and was found as continuously expressed. (F) Expression of CEBPB which can be represented by the PWM V$CEBPB_01. CEBPB is during the complete time course below the set threshold and is considered to be low or not expressed. The red lines show a FPKM value of 10 that we consider as threshold for sufficiently expressed genes which contribute to regulatory effects. sequences and their reverse complements to identify TFBSs using PWMs. However, computational identifications of TFBSs generally suffer from high rates of false positive predictions. Another reason for the detection of those clusters or pairs could be due to genes which are expressed at high levels but play different roles in different tissues. As a result, we could identify such clusters or pairs that might play important roles in the regulation of those genes in other tissues but not in heart. For example, we identified the TFBS pair (V$NFAT_Q6 -V$CEBPB_01) in the NFAT-cluster whose importance has been shown by Yang and Chow in liver (Yang and Chow, 2003), but the potential role of this pair during the cardiac development is unclear. In this context, we also observed the ETS cluster with the V$ETS_Q6 binding site in its center (see Supplementary File 4). Only some individual components, like ETS factors, in this cluster are associated with potential cardiac functionalities. However, considering TFBS pairs in the ETS cluster, we cannot verify their potential role during the cardiac development.
Our results suggest that different types of co-occurring TFBS pairs can be assigned into two main categories: (i) TFBS pairs which are present in the beginning and in later stages but absent in at least one of the subsequent stages; (ii) TFBS pairs which are present during all stages. In our clusters presented in the Result section, there are different co-occurring TFBS pairs, like V$AP1_01 -V$OCT_C and V$HMGIY_Q6 -V$ATF3_Q6, which fall into the first category. Considering the expression values of TF genes for those pairs, we observed that one TF gene was highly expressed in the beginning stages while its partner is expressed at low levels. After the re-occurrence of such a pair in later stages, the measured expression values of TF genes are exactly the opposite. Consequently, the related TFs cannot act in a synergistic manner but rather in an antagonistic manner. Very drastically, we observed this situation in the expression of AP-1 components and POU5F1, which can be linked to V$AP1_01 -V$OCT_C TFBS pair (see Figures 4A,B). Due to this finding we hypothesize that further TFBS pairs, which fall into the first category, could be helpful to enhance our knowledge on the combinatorial code underlying transcriptional regulation of cardiomyogenesis.
This findings could be discussed in the perspective of the "embryonic hourglass" which describes high divergence in the embryonic shape of vertebrates, insects, like Drosophila, and plants, in early and late developmental stages, but minor divergence in mid-stages (Duboule, 1994;Raff and Wolpert, 1996;Kalinka et al., 2010;Quint et al., 2012). In our study, the number of DEGs as well as the number of identified clusters is high in early stages, converge to a minimum during the late cardiac specification stage (day 8-day 13) and increase afterwards again, which is consistent with the general structure of the hourglass model. Furthermore, the identified TFBS pairs, which fall into the first category, could be separated into two different subsets of genes, the one subset is up-regulated before the late cardiac specification stage, while the other subset is up-regulated afterwards and is supposed to regulated cardiac maturation processes. Our findings support the hourglass model derived by previous findings in Arabidopsis as well as several animals (Domazet-Lošo and Tautz, 2010;Kalinka et al., 2010;Quint et al., 2012).
In contrast to the TFBSs pairs in the first category, the co-occurrence of TFBS pairs that fall into the second category seems to indicate a synergistic cooperation between related TFs. In our presented clusters, we obtained several TFBS pairs like V$HMGIY_Q6 -V$OCT_Q6, V$SMAD_Q6_01 -V$FOX_Q2, and V$NFAT_Q6 -V$CREBP1CJUN_01 (for detail see Tables 2-4). Considering the expression values of corresponding TF genes for those pairs, we determined that these genes are regulated similarly. For instance, the TF genes HMGA1 and POU5F1, which are linked to V$HMGIY_Q6 and V$OCT_Q6, respectively, are highly expressed during first developmental stages and diminish their levels after day 3. This condition is also observed for the TFBS pair V$NFAT_Q6 -V$CREBP1CJUN_01 where the associated TF genes are expressed at low levels in the beginning and increase their expression levels in later stages.
Altogether, in our study we performed a systematic analysis of TFBS pairs to address the question of cooperation between TFs linked to TFBS pairs, which could play a crucial role through five different cardiac developmental stages. Addressing this question, our results show that some TFBS pairs can be detected at all developmental stages. Furthermore, we obtained the same TFBS pairs at very early and very late stages of the differentiation, although these stages are completely different in their functions. Especially considering expression values of related TF genes of these pairs, we determined that co-occurrence between TFBSs does not always indicate a synergistic regulation of target genes. This finding suggests that corresponding TFs of these pairs can be bound in a mutual exclusive manner, which is important during cardiac development to differentiate between stem cell programs and later embryogenic programs.

CONCLUSION
We identify transcription factor pairs that drive cardiac development from stem cells to mature cells in a 60 day time course dataset. Our approach is motivated by the importance of potentially interacting transcription factors represented by the co-occurrence of their TFBSs in the regulated stages specific genes and their mediated effects. We identified the relevant pairs employing MatrixCatch method with Markov clustering algorithm together to highlight stage specific clusters of cooccurring TFBS pairs. Furthermore, we analyzed the changes within these clusters to show the specificity of the gene regulation in cardiac development. Our results demonstrate that similar pairs potentially regulate different developmental stages depending on the expression values of the corresponding genes. This may define switches between embryonic and maturation programs and could contribute to a better understanding of embryonic cardiac development.

AUTHOR CONTRIBUTIONS
SZ, CM, and MG participated in the design of the study, conducted computational and statistical analyses. EW supervised the computational and statistical analyses. AR, FR prepared the time course data and the experiments. WZ supervised the experimental design and the experiments. SU prepared and processed the RNAseq data which are used in this study. SZ, CM, RT, and MG were involved in interpretation of the results and the literature survey. MG and SZ wrote the final version of the manuscript. MG conceived of and managed the project. All authors read and approved the final manuscript.