ORIGINAL RESEARCH article

Front. Genet., 22 November 2018

Sec. Computational Genomics

Volume 9 - 2018 | https://doi.org/10.3389/fgene.2018.00571

Borders of Cis-Regulatory DNA Sequences Preferentially Harbor the Divergent Transcription Factor Binding Motifs in the Human Genome

  • 1. Institute of Information Science, Academia Sinica, Nankang, Taipei, Taiwan

  • 2. Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States

Abstract

Changes in cis-regulatory DNA sequences and transcription factor (TF) repertoires provide major sources of phenotypic diversity that shape the evolution of gene regulation in eukaryotes. The DNA-binding specificities of TFs may be diversified or produce new variants in different eukaryotic species. However, it is currently unclear how various levels of divergence in TF DNA-binding specificities or motifs became introduced into the cis-regulatory DNA regions of the genome over evolutionary time. Here, we first estimated the evolutionary divergence levels of TF binding motifs and quantified their occurrence at DNase I-hypersensitive sites. Results from our in silico motif scan and experimentally derived chromatin immunoprecipitation (TF-ChIP) show that the divergent motifs tend to be introduced in the edges of cis-regulatory regions, which is probably accompanied by the expansion of the accessible core of promoter-associated regulatory elements during evolution. We also find that the genes neighboring the expanded cis-regulatory regions with the most divergent motifs are associated with functions like development and morphogenesis. Accordingly, we propose that the accumulation of divergent motifs in the edges of cis-regulatory regions provides a functional mechanism for the evolution of divergent regulatory circuits.

Introduction

Transcription factors (TFs) are primary regulators of gene expression that function by interacting with DNA in a sequence-specific manner. The capacity of a TF to recognize particular patterns of nucleotides (i.e., motifs) via DNA-binding domains is defined as the TF’s DNA-binding specificity (). Previous studies have reported that the DNA-binding specificities of TF orthologs between human and Drosophila are mostly conserved (). Nonetheless, TFs do evolve divergent binding specificities in different species through genetic variation, such as gene duplication and the expansion of gene families (; Weirauch et al., 2014; ). Divergence in TF binding specificities contributes significantly to differential gene regulation, and shapes eukaryotic evolution (Wittkopp and Kalay, 2012; ; ).

In eukaryotic cells, multiple TFs interact cooperatively with genomic DNA to temporally and spatially regulate gene expression. Most eukaryotic chromatin is packed into nucleosomes, whereas active cis-regulatory elements have functional TF binding sites in nucleosome-depleted regions, where DNA is hypersensitive to cleavage by DNase I. DNase I hypersensitive sites (DHSs) have been studied extensively and are found to overlap with most TF binding sites (TFBSs) in a wide range of organisms. Major advances in the ENCODE project have used DHSs to map active cis-regulatory elements in the human genome (; ). Integrative analyses using ENCODE data have identified hundreds of TF binding motifs (Wang et al., 2013; Yan et al., 2013) and extended the repertoire of TFs in the human genome (). However, there is high turnover in cis-regulatory sequences (Weirauch and Hughes, 2011) and over longer timescales, rapid and flexible transcription factor binding site (TFBS) gain and loss events occur between closely related species (; ; ).

From a functional genomics perspective, the interplay between TF binding events and cis-regulatory regions is a pivotal step that allows transcriptional regulation to be rewired through evolutionary time. Many general properties of regulatory genomes rely on the broad presence of clustered TFBSs in cis-regulatory regions (Wang et al., 2012; ). The divergence of cis-regulatory sequences harboring various TFBSs and alterations of TF DNA-binding specificities have been proposed as the major driving forces of phenotypic change (Zheng et al., 2011; ; ). However, the manner by which DNA sequence changes in cis-regulatory regions could arise as a result of harboring diversified TF binding motifs remains unclear. Since a given region of DNA sequences can harbor more than one TF binding motif, the evolvability within cis-regulatory DNA sequences of a range of TF binding motifs has not been systematically studied.

To address this knowledge gap, we have developed a novel measurement, the motif prevalence index (MPI), for the level of divergence of motifs among eukaryotes, based on the discovery that TF binding motifs are generally conserved among diverse organisms. The method integrates the phylogenetic relationship between TF orthologs among animals and a comprehensive collection of TF binding motifs to compute the prevalence of human motifs across metazoan evolution using the Cis-BP database (Weirauch et al., 2014), which provides stringent inferences for TF binding motifs in diverse organisms. By averaging the MPI of all the motifs in the DNA region, we can study the evolution of DNA sequence preference in a range of TF DNA-binding motifs. Our results showed that the preference of the divergent motifs tends to locate in the borders of the open-chromatin regions. Furthermore, an integrative analysis of DHS regions using TF chromatin immunoprecipitation sequencing (ChIP-seq) from the ENCODE project confirmed our in silico results. Combining these results, the discovery of the introduction of divergent motifs across evolutionary time highlights the co-evolution between TF binding specificities and the functional effects of cis-regulatory variants on gene expression, and therefore on phenotypic evolution.

Materials and Methods

Motif Prevalence Index

The primary TF binding motifs of humans and 73 other metazoan species were obtained from the Cis-BP database (Weirauch et al., 2014). Given a motif x, n species S1...n possessing its corresponding TF families can be revealed based on the annotations in the Cis-BP database. We constructed a phylogenetic tree Ts with time of divergence between the 74 metazoan species based on the TimeTree database () with neighbor-joining method, using the APE package of R (). Next, we used the species that had motif x, according to the Cis-BP annotation, to obtain subtree Tx. It should be noted that B(Ts) was the total length of branches in Ts, and B(Tx) was the sum of the lengths of all the branches from their common ancestor node to n species that had motif x. The motif prevalence index (MPI), which we defined as the ratio B(Tx)/B(Ts) and is a score between 0 and 1, was then calculated (Supplementary Figure S1). To obtain a reliable TF set for the motif-scanning analysis, we selected 364 motifs that were well-curated TF models from the JASPAR 2018 database (). We used Tomtom () to group them into 93 clusters of nonredundant motifs with a threshold p-value of < 0.05, and retained the motifs possessing the highest MPI in each cluster were retained.

Identification of TF Binding Sites for Each Motif in Open and Closed Chromatin Regions

The human genome sequence and gene annotations were obtained from Ensembl (GRCh37, release 75; ). We identified the occurrences of TF binding sites in the promoter regions (-1 kb to +500 bp from the transcription start site) for each of the 93 nonredundant motifs by scanning TF sequence preference in position-weight-matrix (PWM) format, using Matrix-scan from the RSAT (Regulatory Sequence Analysis Tools) toolbox (). Of note, we applied the Matrix-scan with a threshold false discovery rate of < 10-4, which is a recommended stringent parameter for putative cis-regulatory elements detection (). DNase I hypersensitive-site (DHS) cluster data were downloaded from the UCSC genome browser () for 125 cell types identified by the ENCODE project (). DHS peaks were defined as open chromatin regions, and chromatin regions without overlapping DHS peaks were defined as closed chromatin regions.

The Ages of Human Genes

The ages of human genes arising at different evolutionary times were identified by combining homolog clustering with phylogeny inference, as described in recent literature (Yin et al., 2016). Gene category 1 denoted Primates origin, i.e., the youngest genes; category 2 denoted Mammalia origin; category 3 denoted Vertebrata origin; category 4 denoted Metazoan origin; category 5 denoted Eukaryota origin; and category 6 denoted cellular-organism origin, i.e., the oldest genes.

Identification of Enriched Functions Associated With DHS Regions

In order to investigate the functional annotation of the DHS regions with the many divergent motifs, we collected the longer DHS regions (300–400 bp in length) in the promoters of protein-coding genes before eukaryotic origin (categories 5 and 6) and computed their mean MPI scores in the DHS-edge regions. In assessing the proportions of divergent versus common motifs in the DHS regions, the 10th percentiles of the mean MPI scores for the DHS edges were considered divergent, while the 90th percentiles were considered common. The functional enrichment of the gene sets near the divergent or common DHS regions was performed using GREAT (Genomic Regions Enrichment of Annotations Tool; ), with the default parameters and all DHS regions of a similar length (300–400 bp) as the background. In particular, the GREAT web interface was used to automatically submit DHS regions and retrieve results for subsequent parsing.

TF ChIP-Seq and Enhancer Datasets

The ChIP-Seq peaks of 243 TFs (Supplementary Table S1) in numerous cell lines were downloaded from the ENCODE Consortium () based on the genome hg19 assembly. For each TF, the tracks of the same cell lines were combined by retaining the overlapping base pairs with at least half of the tracks. Since the average length of the ChIP-seq peaks were longer (∼300 bp) than those of the TF binding motifs, we applied TF binding sites of 25 bp before and after the summits of the ChIP-seq peaks. Overlaps of genomic intervals with TF ChIP-seq peaks and human enhancer regions obtained from either FANTOM5 (Atlas of transcribed enhancers, ) or VISTA Enhancer Browser (Visel et al., 2007) calculated using Bedtools.

Expression Data for TFs

The expression profiles of the human TFs were collected from the Human Protein Atlas (HPA; ). Since the HPA divides all human-expressed genes into five categories, we here categorized the expression of TF genes in relatively general terms, as either ubiquitous expression or tissue-elevated expression. The categories ‘expressed in all tissues’ and “mixed” from the HPA were grouped as ubiquitous expression. The categories “tissue-enhanced,” “group-enriched,” and “tissue-enriched” from the HPA were grouped as tissue-elevated expression.

Code Availability

The computer code that supports the findings of this study is available from Git-Hub, with the identifier doi: 10.5281/zenodo.1208608.

Results and Discussion

Motif Prevalence Index Estimates the Divergence Level of Motif Sequences

We proposed a new measure, the MPI, to estimate the evolutionary divergence level of TF DNA-binding preferences (motifs) in humans, based on the finding that the primary DNA-binding specificities of TFs with similar amino acid sequences in their DNA-binding domains (DBDs) are generally conserved between distantly related species (; Weirauch et al., 2014; ). Based on phylogenetic distance and the existence of a given motif (i.e., homologous TFs with conserved amino acid sequences in their DBDs, based on the Cis-BP database) across metazoan species, the MPI represented the evolutionary divergence level of human motifs, with a score from 0 (human-specific) to 1 (common in all 74 metazoan species used in this study). Next, we selected the human motifs for which there is experimental evidence in the JASPAR database (). Most of the human motifs (72.8% of the 364 motifs shown in Supplementary Tables S1, S2) were common across the Metazoa and Bilateria taxa, but the divergent motifs (MPI < 0.1, 7.7%) in humans emerged approximately after the divergence of the Vertebrata lineage (Figure 1). The MPI was not biased by some intrinsic motif properties, such as motif length or information content (no significant correlation; Supplementary Figure S2), but the GC content was significantly lower in the more divergent motifs. Moreover, the finding that there was no significant correlation between the MPI and the gene ages of the corresponding TFs reflects the independence of their evolutionary history from the changes in their binding specificity of the TF repertoires.

FIGURE 1

Edges of DHS Regions Prefer Divergent Motifs

A theoretical study has suggested that the emergence of newly evolved binding sites occurs preferentially in the DNA sequences bordering pre-existing TFBSs (). Accordingly, we propose that the relatively common motifs are located around the centers of the open-chromatin regions, whereas relatively divergent motifs are located in the border regions. To test this hypothesis, we conducted an in silico motif scan from 1 kb upstream to 500 bp downstream of transcription start sites (TSS) in protein-coding genes, and further filtered the DNase I hypersensitivity-site (DHS) clusters in 125 cell types, that are highly corresponding to TFBSs (). We then investigated the open-chromatin regions, as defined by DHS peaks in the range 150–400 bp (i.e., one to two nucleosome-free regions), which theoretically contain several TFBSs, and then computed the mean MPI of the motifs that were identified. It is important to note that, to reduce the ambiguity of motif occurrences in similar motif patterns, we focused on 93 nonredundant JASPAR motifs that were clustered by Tomtom (), with a threshold p-value of < 0.05. The MPIs of these motifs remained evenly distributed (Supplementary Figure S3). As expected, the spatial distribution of the mean MPI scores decreased significantly from center to border within the DHS regions (Spearman’s correlation coefficient rho = -0.753, p < 2.2 × 10-16; Figure 2A). Specifically, the mean MPI scores in the DHS-edge zones (the decile regions of both DHS borders) were significantly lower than those in the DHS-center zones (the quintile regions of the center of the DHS; one-sided Wilcoxon rank-sum test, p = 4.76 × 10-30; Figure 2A). In contrast, the closed-chromatin regions in the promoters showed a negligible decline in their mean MPI scores (Spearman’s correlation coefficient rho = -0.01; Figure 2A), and these were similar to the mean values obtained by randomly selecting a subset of 93 nonredundant motifs 1000 times (Supplementary Figure S4). Additionally, we noted a significantly decreasing correlation between motif MPIs and the occurrence ratios of open-to-closed chromatin regions (Supplementary Figure S5). In other words, one likely explanation for the lower mean MPI scores in the open-chromatin regions is that divergent motifs arise preferentially in these regions. Since the divergent motifs with lower MPIs are the TFs that have evolved to recognize new DNA sequences across evolutionary time, the question immediately arose as to whether the DNA sequences in the DHS regions exhibit different conservation levels.

FIGURE 2

Thus, we sought to determine whether the decreasing trend in mean MPI as a function of position was systematically paralleled by changes of evolutionary conservation in open-chromatin regions. We used the PhastCons score () to calculate the levels of evolutionary conservation of DNA sequences from alignments of 99 vertebrate genomes (). As expected, the open-chromatin regions (DHSs) possessed higher conservation levels than the closed-chromatin ones, which have the highest background mutation rate (; Figure 2B). In fact, the flattened distribution of the mean MPI scores of the closed-chromatin regions without evolutionary constraint could be the result of randomly introduced motifs across the regions. However, the PhastCons scores in the DHS-center zones of the open-chromatin regions were significantly higher than those in the DHS-edge zones (one-sided Wilcoxon rank-sum test, p = 1.70 × 10-20; Figure 2B). Of note, there was no correlation between MPI values of motifs and the mean PhastCons scores of their occurrences (Supplementary Figure S6), because the conservation in TF binding specificities and in the sequences of TFBSs were independently from each other. Therefore, a modest evolutionary constraint at the edges of the DHS regions is most likely to reflect the rapid TFBS turnover, which would readily allow the introduction of divergent motifs.

DHS Regions With Many Divergent Motifs at the Edge Are Associated With Specific Functions

Previous studies indicate that regulatory complexity, such as the number of TFs regulating a gene, increases continuously over evolutionary time (Warnefors and Eyre-Walker, 2011; ). We thus examined whether the differences between the mean MPI scores for the DHS-center and -edge regions were constant across genes of different ages. We found that there was a consistent significant difference for the promoters of protein-coding genes of all ages (Figure 3A). Despite this, there were larger numbers of longer DHSs in the older genes (Supplementary Figure S7). We then performed a further analysis (Figure 3B) incorporating DHS length as a variable, and found that the differences between the DHS-center and -edge regions were greater for the longer DHSs (> 200 bp). Intrigued by these results, we compared the fold enrichment of the motif occurrences between divergent (MPI < 0.1) and common motifs (MPI ≥ 0.9) across gene ages and DHS lengths. The divergent motifs were not enriched in the short DHS (150–199 bp) regions, but were in the boundary regions of longer DHSs (Figure 3C). Similar robust results were found when applying different cut-offs for specific (MPI < 0.2) and common motifs (MPI ≥ 0.8) (Supplementary Figure S8). Therefore, one feasible interpretation of our observations is that the introduction of divergent motifs is likely to accompany the elongation of cis-regulatory DNA regions, particularly on the boundaries.

FIGURE 3

We next explored whether longer DHS regions with many divergent motifs in their edge regions were associated with genes for specific biological functions. We analyzed the longer DHSs (300–400 bp) in the promoters of older genes (groups 5 and 6) and found larger numbers of the DHSs displaying low mean MPI scores at their edge regions (Supplementary Figure S9). We used GREAT () to determine the associated functions of the gene sets found in the proximity of DHS regions with many divergent motifs (10th percentiles of the mean MPI scores at the edges) or many common motifs (90th percentiles). Unexpectedly, those neighboring DHS regions with many common motifs at the edges were not associated with any functions. However, those DHS regions with many divergent motifs at the edges were linked to genes showing significantly enriched functions in biological processes related to morphogenesis and development, such as heart morphogenesis (GO:0003007, q-val = 7.98 × 10-3) and placenta blood-vessel development (GO:0060674, q-val = 6.82 × 10-3; Figure 3D, and full results in Supplementary Table S3). With an increased number of longer DHSs in the promoters of older genes, therefore, such expansion of cis-regulatory regions via the introduction of divergent motifs could contribute to the regulatory complexity of genes related to tissue development across evolutionary time.

TF ChIP-Seq Reveals Similar Distribution of MPI Scores Within DHS Regions

To validate our discovery of the motif distribution within the cis-regulatory DNA regions independently of the motif-scanning approach, we overlapped DHSs using in vivo chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) data. We used 243 TFs (Supplementary Table S1) downloaded from the ENCODE project (Wang et al., 2012), and recalculated the mean MPI scores using the corresponding MPIs of the TFs. Remarkably, the empirical TF–ChIP-seq results for within-DHS region means revealed significantly lower mean MPI scores for the borders than the central regions, on a genome-wide scale (Figure 4A, Spearman’s correlation coefficient rho = -0.940, p < 2.2 × 10-16). This result was highly consistent with the in silico motif-scanning results (Figure 2A). Additionally, the differences in mean MPI between DHS-center and -edge regions were significantly different among several cis-regulatory regions, such as gene promoters (protein-coding genes, non-coding genes, and pseudogenes) and enhancers, which were obtained from either FANTOM5 () or VISTA (Visel et al., 2007) (Supplementary Table S4). The TF–ChIP-seq results also confirmed that the significant differences in the mean MPI scores between DHS-center and -edge regions were consistent for DHSs of different lengths (Figure 4B).

FIGURE 4

Besides, we noticed that the motifs corresponding to those pioneer TFs that were reported for chromatin-remodeling activity () had significantly higher MPIs than others (Figure 4C, one-sided Wilcoxon rank-sum test, p = 2.52 × 10-3). Such high MPIs for most pioneer factors implies that their binding specificities are highly conserved throughout metazoan species. Pioneer TFs have been recognized to disrupt chromatin structure to create a nucleosome-free DNA region, and in turn, allow other TFs to access the nearby DNA regions (Zaret and Carroll, 2011; ). Accordingly, we next sought to examine a hypothetical scenario that pioneer TFs prefer to locate in the middle of open-chromatin regions, using a direct assessment of their TF–ChIP-seq data. We also did a genome-wide comparison of the distribution of ChIP-seq signals in the DHS regions for pioneer TFs with common motifs (MPI = 1) or the other TFs with most divergent motifs (MPI = 0). We found that pioneer TFs were located mostly in the centers rather than in the edges of DHS regions (Figure 4D). In contrast, the TFs with divergent motifs showed a distinct distribution pattern, with more occurrences in the DHS-edge regions (Figure 4D). Hence, the binding preferences of pioneer TFs provide a feasible rationale to explain the higher mean MPI scores for the DHS-center regions.

In summary, our results for both the in silico motif scan and the experimentally derived TF–ChIP-seq analysis unveil a differential preference of TFBSs within cis-regulatory DNA regions, whereby the border regions tend to harbor motifs that are bound by TFs with divergent DNA-binding specificities.

TFs With Divergent Motifs Tend to Be Ubiquitously Expressed in Human Tissues

Based on expression profiles for 32 human tissues obtained from the Human Protein Atlas (HPA; ), we divided TFs into one group showing ubiquitous expression (that are expressed in most tissues) and another showing significantly elevated expression in at least one human tissue. Remarkably, the majority of TFs possessing divergent motifs are ubiquitously expressed in human tissues, whereas the larger numbers of TFs possessing common motifs, i.e., those with higher MPIs, are more strongly expressed in specific human tissues (Figure 5A for the TFs with ChIP-seq data, Supplementary Figure S10 for all other TFs from the HPA). Notably, a recent study has reported that duplicate genes tend to diverge in their expression profiles in different tissues during the course of evolution (). According to our observations, a common motif is usually shared by a couple of members of TF paralogs (Supplementary Table S1). The higher fraction of TFs showing tissue-specific expression most likely accounted for the larger number of gene paralogs. Thereafter, we computed the fold enrichment of the TF–ChIP-seq signals within the DHS regions by comparing the ubiquitously expressed TFs with divergent motifs (MPI < 0.1) with all TFs with common motifs (MPI ≥ 0.9). We found that the former were significantly enriched in the DHS-edge regions and represented a higher proportion of the total than the latter (Figure 5B and Supplementary Figure S11 for the enrichment analyses). In contrast, the tissue-specific TFs with common motifs represented the highest proportion of the total and were significantly enriched in the DHS-center regions (Figure 5B and Supplementary Figure S11). Taken together, these results provide the insight that DHS-center regions are bound by tissue-specific TF paralogs, which share similar motifs, while the DHS-edge regions are enriched in ubiquitously expressed TFs with divergent motifs. These results therefore imply that there is another level of transcriptional regulation dynamics affecting the interplay of DNA motifs and the distinct expression patterns of TFs.

FIGURE 5

Extensive studies indicate that the alternations of genomic sequences in TFBSs are widespread in metazoan species, even in closely related species (). The patterns in our mean MPI scores, which correspond to different levels of divergence in TF binding specificity, indicate that the introduction of divergent motifs occurs preferentially in the borders of cis-regulatory regions (as opposed to their centers; Figures 2A, 4A). Our results are in line with theoretical studies, which show that sequences adjacent to ancestral TFBSs readily evolve, facilitating the emergence of new TFBSs (; ). Since common motifs (high MPIs) are prevalent among metazoan species, the central cis-regulatory regions are most likely to contain ancestral binding sites and to be constrained over evolutionary time, as indicated by their higher PhastCons scores (Figure 2B). Moreover, TFBS clustering in the genomic regions with the cooperative interactions of multiple regulators can be a consequence of fast turnover of genetic sequences for TF binding evolution (; ).

Finally, we proposed a model for the expansion of TFBSs with conserved motifs via the introduction of divergent motifs to adjacent sites in the cis-regulatory regions (Figure 6). Cis-regulatory evolution, such as changes in TFBSs over the evolutionary time scale, is an important source of diversity in the development of morphological traits via the gradual modification of transcription circuits (; ; ). Studies on the effect of genetic variation on TF binding from ChIP-seq experiments provided direct evidence that the TF binding divergence is often a result of sequence changes in the bound genetic sequences (; ; ). Furthermore, TFs often bind cooperatively to sites adjacent to regulatory regions (Wray et al., 2003; ), the regulatory circuits, by coordinating alternative TFs, could diversify as the motifs in the TFBS-enriched border regions are replaced, allowing the expansion of new motifs. Since the rewiring of regulatory networks is crucial for the evolution of divergent expression patterns (; ), we suspect that an expansion mechanism that incorporates more divergent motifs in the boundaries of cis-regulatory regions serves as a common evolutionary intermediate in the rewiring process.

FIGURE 6

Statements

Author contributions

J-HH and H-KT conceived the idea, designed the study, and wrote the manuscript. J-HH, RK, T-CL, and ZT developed the computational algorithms and performed the bioinformatics analysis. ZT provided guidance in data analysis and interpretation of the results. All authors contributed to amending the manuscript and have read the submitted version.

Funding

This work was supported by the Institute of Information Science, Academia Sinica (AS-TP-107-ML06), and the Ministry of Science and Technology (MOST 106-2811-E-001-005 to J-HH and MOST 105-2221-E-001-029-MY3 to H-KT).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2018.00571/full#supplementary-material

References

  • 1

    AnderssonR.GebhardC.Miguel-EscaladaI.HoofI.BornholdtJ.BoydM.et al (2014). An atlas of active enhancers across human cell types and tissues.Nature507455461. 10.1038/nature12787

  • 2

    BakerC. R.BoothL. N.SorrellsT. R.JohnsonA. D. (2012). Protein modularity, cooperative binding, and hybrid regulatory states underlie transcriptional network diversification.Cell1518095. 10.1016/j.cell.2012.08.018

  • 3

    BerthelotC.VillarD.HorvathJ. E.OdomD. T.FlicekP. (2018). Complexity and conservation of regulatory landscapes underlie evolutionary resilience of mammalian gene expression.Nat. Ecol. Evol.2152163. 10.1038/s41559-017-0377-2

  • 4

    ChenH.LiH.LiuF.ZhengX.WangS.BoX.et al (2015). An integrative analysis of TFBS-clustered regions reveals new transcriptional regulation models on the accessible chromatin landscape.Sci. Rep.5:8465. 10.1038/srep08465

  • 5

    De MendozaA.Sebé-PedrósA.ŠestakM. S.MatejčićM.TorruellaG.Domazet-LošoT.et al (2013). Transcription factor evolution in eukaryotes and the assembly of the regulatory toolkit in multicellular lineages.Proc. Natl. Acad. Sci. U.S.A.110E4858E4866. 10.1073/pnas.1311818110

  • 6

    DeplanckeB.AlpernD.GardeuxV. (2016). The genetics of transcription factor DNA binding variation.Cell166538554. 10.1016/j.cell.2016.07.012

  • 7

    DowellR. D. (2010). Transcription factor binding variation in the evolution of gene regulation.Trends Genet.26468475. 10.1016/j.tig.2010.08.005

  • 8

    FlicekP.AmodeM. R.BarrellD.BealK.BillisK.BrentS.et al (2014). Ensembl 2014.Nucleic Acids Res.42D749D755. 10.1093/nar/gkt1196

  • 9

    GuptaS.StamatoyannopoulosJ. A.BaileyT. L.NobleW. S. (2007). Quantifying similarity between motifs.Genome Biol.8:R24. 10.1186/gb-2007-8-2-r24

  • 10

    HedgesS. B.DudleyJ.KumarS. (2006). TimeTree: a public knowledge-base of divergence times among organisms.Bioinformatics2229712972. 10.1093/bioinformatics/btl505

  • 11

    JarvelaA. M. C.HinmanV. F. (2015). Evolution of transcription factor function as a mechanism for changing metazoan developmental gene regulatory networks.Evodevo6111. 10.1186/2041-9139-6-3

  • 12

    JolmaA.TaipaleJ. (2011). “Methods for analysis of transcription factor DNA-binding specificity in vitro,” inA Handbook of Transcription Factors Subcellular Biochemistry, ed.HughesT. R. (Berlin: Springer), 155173.

  • 13

    JolmaA.YanJ.WhitingtonT.ToivonenJ.NittaK. R.RastasP.et al (2013). DNA-binding specificities of human transcription factors.Cell152327339. 10.1016/j.cell.2012.12.009

  • 14

    KarolchikD.HinrichsA. S.FureyT. S.RoskinK. M.SugnetC. W.HausslerD.et al (2004). The UCSC table browser data retrieval tool.Nucleic Acids Res.32D493D496. 10.1093/nar/gkh103

  • 15

    KhanA.FornesO.StiglianiA.GheorgheM.Castro-MondragonJ. A.van der LeeR.et al (2017). JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework.Nucleic Acids Res.46D260D266. 10.1093/nar/gkx1126

  • 16

    KhoueiryP.GirardotC.CiglarL.PengP.-C.GustafsonE. H.SinhaS.et al (2017). Uncoupling evolutionary changes in DNA sequence, transcription factor occupancy and enhancer activity.eLife6:e28440. 10.7554/eLife.28440

  • 17

    Kryuchkova-MostacciN.Robinson-RechaviM. (2016). Tissue-specificity of gene expression diverges slowly between orthologs, and rapidly between paralogs.PLoS Comput. Biol.12:e1005274. 10.1371/journal.pcbi.1005274

  • 18

    LambertS. A.JolmaA.CampitelliL. F.DasP. K.YinY.AlbuM.et al (2018). The human transcription factors.Cell172650665. 10.1016/j.cell.2018.01.029

  • 19

    LevineM.TjianR. (2003). Transcription regulation and animal diversity.Nature424147151. 10.1038/nature01763

  • 20

    LynchV. J.WagnerG. P. (2008). Resurrecting the role of transcription factor change in developmental evolution.Evolution6221312154. 10.1111/j.1558-5646.2008.00440.x

  • 21

    McLeanC. Y.BristorD.HillerM.ClarkeS. L.SchaarB. T.LoweC. B.et al (2010). GREAT improves functional interpretation of cis-regulatory regions.Nat. Biotechnol.28495501. 10.1038/nbt.1630

  • 22

    MüllerF.ToraL. (2014). Chromatin and DNA sequences in defining promoters for transcription initiation.Biochim. Biophys. Acta1839118128. 10.1016/j.bbagrm.2013.11.003

  • 23

    NittaK. R.JolmaA.YinY.MorgunovaE.KiviojaT.AkhtarJ.et al (2015). Conservation of transcription factor binding specificities across 600 million years of bilateria evolution.eLife4:e04837. 10.7554/eLife.04837

  • 24

    NocedalI.JohnsonA. D. (2015). How transcription networks evolve and produce biological novelty.Cold Spring Harb. Symp. Quant. Biol.80265274. 10.1101/sqb.2015.80.027557

  • 25

    ParadisE.ClaudeJ.StrimmerK. (2004). APE: analyses of phylogenetics and evolution in R language.Bioinformatics20289290. 10.1093/bioinformatics/btg412

  • 26

    PayneJ. L.WagnerA. (2014). The robustness and evolvability of transcription factor binding sites.Science343875877. 10.1126/science.1249046

  • 27

    PrendergastJ. G.CampbellH.GilbertN.DunlopM. G.BickmoreW. A.SempleC. A. (2007). Chromatin structure and evolution in the human genome.BMC Evol. Biol.7:72. 10.1186/1471-2148-7-72

  • 28

    ReddyT. E.GertzJ.PauliF.KuceraK. S.VarleyK. E.NewberryK. M.et al (2012). Effects of sequence variation on differential allelic transcription factor occupancy and gene expression.Genome Res.22860869. 10.1101/gr.131201.111

  • 29

    Roadmap Epigenomics Consortium KundajeA.MeulemanW.ErnstJ.BilenkyM.YenA.et al (2015). Integrative analysis of 111 reference human epigenomes.Nature518317330. 10.1038/nature14248

  • 30

    RosenbloomK. R.ArmstrongJ.BarberG. P.CasperJ.ClawsonH.DiekhansM.et al (2015). The UCSC genome browser database: 2015 update.Nucleic Acids Res.43D670D681. 10.1093/nar/gku1177

  • 31

    SchmidtD.WilsonM. D.BallesterB.SchwalieP. C.BrownG. D.MarshallA.et al (2010). Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding.Science32810361040. 10.1126/science.1186176

  • 32

    SchmitzJ. F.ZimmerF.Bornberg-BauerE. (2016). Mechanisms of transcription factor evolution in Metazoa.Nucleic Acids Res.4462876297. 10.1093/nar/gkw492

  • 33

    ShibataY.SheffieldN. C.FedrigoO.BabbittC. C.WorthamM.TewariA. K.et al (2012). Extensive evolutionary changes in regulatory element activity during human origins are associated with altered gene expression and positive selection.PLoS Genet.8:e1002789. 10.1371/journal.pgen.1002789

  • 34

    SiepelA.BejeranoG.PedersenJ. S.HinrichsA. S.HouM.RosenbloomK.et al (2005). Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes.Genome Res.1510341050. 10.1101/gr.3715005

  • 35

    StefflovaK.ThybertD.WilsonM. D.StreeterI.AleksicJ.KaragianniP.et al (2013). Cooperativity and rapid evolution of cobound transcription factors in closely related mammals.Cell154530540. 10.1016/j.cell.2013.07.007

  • 36

    The Encode Project Consortium. (2012). An integrated encyclopedia of DNA elements in the human genome.Nature4895774. 10.1038/nature11247

  • 37

    ThurmanR. E.RynesE.HumbertR.VierstraJ.MauranoM. T.HaugenE.et al (2012). The accessible chromatin landscape of the human genome.Nature4897582. 10.1038/nature11232

  • 38

    TuǧrulM.PaixãoT.BartonN. H.TkačikG. (2015). Dynamics of transcription factor binding site evolution.PLoS Genet.11:e1005639. 10.1371/journal.pgen.1005639

  • 39

    TuratsinzeJ.-V.Thomas-ChollierM.DefranceM.van HeldenJ. (2008). Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules.Nat. Protoc.315781588. 10.1038/nprot.2008.97

  • 40

    UhlénM.FagerbergL.HallströmB. M.LindskogC.OksvoldP.MardinogluA.et al (2015). Tissue-based map of the human proteome.Science347:1260419. 10.1126/science.1260419

  • 41

    VernimmenD.BickmoreW. A. (2015). The hierarchy of transcriptional activation: from enhancer to promoter.Trends Genet.31696708. 10.1016/j.tig.2015.10.004

  • 42

    VillarD.FlicekP.OdomD. T. (2014). Evolution of transcription factor binding in metazoans — mechanisms and functional implications.Nat. Rev. Genet.15221233. 10.1038/nrg3481

  • 43

    ViselA.MinovitskyS.DubchakI.PennacchioL. A. (2007). VISTA enhancer browser—a database of tissue-specific human enhancers.Nucleic Acids Res.35D88D92. 10.1093/nar/gkl822

  • 44

    WangJ.ZhuangJ.IyerS.LinX.WhitfieldT. W.GrevenM. C.et al (2012). Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors.Genome Res.2217981812. 10.1101/gr.139105.112

  • 45

    WangJ.ZhuangJ.IyerS.LinX.-Y.GrevenM. C.KimB.-H.et al (2013). Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium.Nucleic Acids Res.41D171D176. 10.1093/nar/gks1221

  • 46

    WarneforsM.Eyre-WalkerA. (2011). The accumulation of gene regulation through time.Genome Biol. Evol.3667673. 10.1093/gbe/evr019

  • 47

    WeirauchM. T.HughesT. R. (2011). “A catalogue of eukaryotic transcription factor types, their evolutionary origin, and species distribution,” inA Handbook of Transcription Factors Subcellular Biochemistry, ed.HughesT. R. (Berlin: Springer), 2573.

  • 48

    WeirauchM. T.YangA.AlbuM.CoteA. G.Montenegro-MonteroA.DreweP.et al (2014). Determination and inference of eukaryotic transcription factor sequence specificity.Cell15814311443. 10.1016/j.cell.2014.08.009

  • 49

    WittkoppP. J.KalayG. (2012). Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence.Nat. Rev. Genet.135969. 10.1038/nrg3095

  • 50

    WrayG. A.HahnM. W.AbouheifE.BalhoffJ. P.PizerM.RockmanM. V.et al (2003). The evolution of transcriptional regulation in eukaryotes.Mol. Biol. Evol.2013771419. 10.1093/molbev/msg140

  • 51

    YanJ.EngeM.WhitingtonT.DaveK.LiuJ.SurI.et al (2013). Transcription factor binding in human cells occurs in dense clusters formed around cohesin anchor sites.Cell154801813. 10.1016/j.cell.2013.07.034

  • 52

    YinH.WangG.MaL.YiS. V.ZhangZ. (2016). What signatures dominantly associate with gene age?Genome Biol. Evol.830833089. 10.1093/gbe/evw216

  • 53

    ZaretK. S.CarrollJ. S. (2011). Pioneer transcription factors: establishing competence for gene expression.Genes Dev.2522272241. 10.1101/gad.176826.111

  • 54

    ZhengW.GianoulisT. A.KarczewskiK. J.ZhaoH.SnyderM. (2011). Regulatory variation within and between species.Annu. Rev. Genomics Hum. Genet.12327346. 10.1146/annurev-genom-082908-150139

Summary

Keywords

transcription factor binding sites, motifs, cis-regulatory elements, TF binding specificities, open chromatins

Citation

Huang J-H, Kwan RS-Y, Tsai ZT-Y, Lin T-C and Tsai H-K (2018) Borders of Cis-Regulatory DNA Sequences Preferentially Harbor the Divergent Transcription Factor Binding Motifs in the Human Genome. Front. Genet. 9:571. doi: 10.3389/fgene.2018.00571

Received

06 September 2018

Accepted

06 November 2018

Published

22 November 2018

Volume

9 - 2018

Edited by

Alfredo Pulvirenti, Università degli Studi di Catania, Italy

Reviewed by

Iros Barozzi, Imperial College London, United Kingdom; Ka-Chun Wong, City University of Hong Kong, Hong Kong

Updates

Copyright

*Correspondence: Huai-Kuang Tsai,

These authors are joint first authors

This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Genetics

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics