Abstract
How pleiotropy influences evolution of protein sequence remains unclear. The male-specific lethal (MSL) complex in Drosophila mediates dosage compensation by 2-fold upregulation of the X chromosome in males. Nevertheless, several MSL proteins also bind autosomes and likely perform functions not related to dosage compensation. Here, we study the evolution of MOF, MSL1, and MSL2 biding sites in Drosophila melanogaster and its close relative Drosophila simulans. We found pervasive expansion of the MSL binding sites in D. melanogaster, particularly on autosomes. The majority of these newly-bound regions are unlikely to function in dosage compensation and associated with an increase in expression divergence between D. melanogaster and D. simulans. While dosage-compensation related sites show clear signatures of adaptive evolution, these signatures are even more marked among autosomal regions. Our study points to an intriguing avenue of investigation of pleiotropy as a mechanism promoting rapid protein sequence evolution.
Introduction
The rate and mechanism of protein sequence evolution are central questions in evolutionary biology. Empirical data have shown that essential genes do not evolve more slowly than non-essential genes (Greenberg et al., ; Zhang and Yang, ). This supports the view that the rate of protein sequence evolution depends primarily on the level of functional constraint (Zhang and Yang, ; Wollenberg Valero, ), rather than on the level of functional importance (Karp et al., ). Among many correlates of protein evolutionary rate (Zhang and Yang, ), pleiotropy has long been recognized as an important mechanism constraining protein evolution (He and Zhang, ). Amino acid sequences of highly pleiotropic (i.e., influencing many phenotypes) genes evolve relatively slowly (He and Zhang, ), due to the potential deleterious effects of mutations at these loci on additional traits. However, synergistic effects of some genes on multiple phenotypes can override the costs of complexity (McGee et al., ) and facilitate rapid adaptation (Archambeault et al., ). To better understand how pleiotropy shapes adaptation driven by rapidly evolving proteins, it is important to examine a variety of cases in depth. Sex chromosome evolution and genetic conflicts are fertile grounds to find examples of fast adaptation. Dosage compensation, the process whereby expression of sex-linked genes is equalized in both sexes, brings together evolutionarily labile sex determination and constrained fundamental transcriptional regulation. It is thus a promising place to look for fast-evolving pleiotropic genes.
The Male-Specific Lethal (MSL) or Dosage Compensation Complex (DCC) in Drosophila upregulates transcription from male X chromosomes to equal the two X chromosomes in females. MSL-DCC in this study is used specifically to refer to the MSL complex that functions in dosage compensation. It is composed of five proteins [MSL1, MSL2, MSL3, MOF (males absent on the first), and MLE (maleless)] and two long non-coding RNAs (roX1 and roX2). MSL1 and MSL2 are required for scaffolding the MSL complex and targeting it to the X chromosome (Lyman et al., ; Scott et al., ). This targeting enables MOF, a histone H4 lysine 16-specific acetyltransferase, to induce transcriptional up-regulation through histone modification (Larschan et al., ). Two models can explain how the MSL complex achieves X dosage compensation (Lee and Oliver, ). In one, the MSL complex directly boosts gene expression primarily via enhanced elongation of transcription (Larschan et al., ). Alternatively, MSL proteins indirectly mediate dosage compensation by triggering an inverse dosage effect through MOF sequestration to counteract the potential over-expression of X-linked genes (Sun et al., ,). In both models, the MSL complex binds to the male X at high-affinity (HAS) or chromosome entry sites (CES) and then spreads to the entire chromosome (Alekseyenko et al., ; Straub et al., ). Loss-of-function mutations in each of the five MSL protein-coding genes result in male-specific lethality (Belote and Lucchesi, ; Skripsky and Lucchesi, ; Hilfiker et al., ). Given their essential role in male function, MSL proteins are expected to be highly constrained and under purifying selection.
Despite their essential functions, all five genes encoding the MSL proteins evolved adaptively on the Drosophila melanogaster branch (Levine et al., ; Rodriguez et al., ). It is unclear whether selection acts on the dosage compensation function itself or on an independent function carried out by either an MSL protein or an interacting gene product (Levine et al., ). While its role in dosage compensation is well-documented, there is reason to believe that the MSL complex has additional functions, as suggested, for example, by the expression of all MSL protein subunits except MSL2 in females (Prestel et al., ). Interestingly, there are three modes of MSL protein binding depending on chromatin context (Straub et al., ). First, MSL2 and MLE establish the primary contact that defines high-affinity sites for the MSL-DCC whereas MSL1 and MOF associate more indirectly (Straub et al., ). Second, MSL3 mediates the association of the MSL-DCC with actively transcribed gene bodies in an HAS-dependent manner (Straub et al., ). Third, MOF and MSL1 bind to active promoter regions across the genome with no chromosomal preference (Straub et al., ). The MSL1-MOF binding at promoters is independent of MSL2, clearly indicating their function outside of dosage compensation. In addition, MOF is associated with autosomes as part of the non-specific lethal (NSL) complex (Cai et al., ; Raja et al., ), binds to many housekeeping genes in both sexes (Feller et al., ; Lam et al., ), and plays a role in transcriptional noise reduction (Lee et al., ).
These distinct, although perhaps mechanistically linked, MSL complex functions provide us with an opportunity to study how different selection pressures shape MSL protein evolution. This should help us understand how pleiotropy influences adaptive protein sequence evolution. At the molecular level, pleiotropy may represent the necessity for a protein to bind to multiple interacting partners. Binding sites of a protein can thus shed light on the impact of pleiotropy on its sequence and functional evolution. While evolution of the MSL complex binding sites has been documented in Drosophila species (Rodriguez et al., ; Bachtrog, ; Alekseyenko et al., ; Ellison and Bachtrog, , ; Quinn et al., ), little is known about the effect of pleiotropy on intensity of positive selection. Using MOF, MSL1, and MSL2 ChIP-seq data (Figueiredo et al., ; Chlamydas et al., ), we examined MSL-binding site evolution between D. melanogaster and D. simulans by classifying these sites as those involved in dosage compensation and those performing unrelated functions (hereafter referred to as DC and non-DC sites, respectively). We show that while both groups of sites have evolved rapidly on the D. melanogaster branch, non-DC sites harbor stronger signatures of positive selection. A substantial fraction of non-DC sites in D. melanogaster overlaps cis-regulatory elements and/or transposable elements (TEs), and is associated with increased expression divergence between D. melanogaster and D. simulans. These findings support the idea of co-evolution of DNA-protein interactions of the Drosophila MSL complex and suggest that selection for gene expression regulation, independent of dosage compensation, has contributed to adaptive evolution of MSL proteins in D. melanogaster.
Materials and Methods
Calling Binding Peaks of MSL Proteins
MOF, MSL1, and MSL2 ChIP-seq data collected from male third instar larval salivary glands of D. melanogaster and D. simulans were retrieved from Figueiredo et al. () and Chlamydas et al. () (Supplementary Table 1). Raw ChIP-seq reads were first trimmed for quality using Trimmomatic (version 0.36) (Bolger et al., ) with parameter “LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:25,” and then mapped to whole genomes of D. melanogaster (Ensembl BDGP6) and D. simulans (FlyBase r2.02) using bowtie (version 1.1.2, -a -m 1) (Langmead et al., ). Binding peaks were called using macs2 (version 2.1.1.20160309, callpeak -B –nomodel –SPMR -g dm) (Zhang et al., ). Each peak was considered as a binding site as there is only one sample for ChIP-seq of each protein in each species. Cross-correlation analysis was performed using version 1.15.2 of SPP (Kharchenko et al., ) with the default parameter of “-s= -100:5:600.”
Comparative Genomic Analysis of MSL Binding Sites Between D. melanogaster and D. simulans
We estimated the gain or loss of the bound regions of each MSL protein following Bradley et al. () with some modifications. Briefly, we calculated binding signals as the linear scaled fold enrichments by macs2 (version 2.1.1.20160309, bdgcmp -c treat_pileup.bdg -t control_lambda.bdg -m FE) (Zhang et al., ). The output files of “macs2 callpeak -B –nomodel –SPMR -g dm” were used as the treat and control .bdg files. For each bound region in each species, we searched the highest binding signal in the region in the source species and in the orthologous region of the other species. We extended the orthologous region by its half length on each side to capture the highest binding signal in the other species. Peaks were called as absent if the binding signal was reduced 10-fold or more in its ortholog.
Furthermore, binding sites in one species were mapped onto the orthologous regions of the other. Binding sites were considered conserved if at least half of its binding region in one species overlapped the orthologous region that was also bound by the same MSL protein in the other species (Sundaram et al., ). Otherwise, the binding sites were considered unconserved. The reciprocal best chains of D. simulans to D. melanogaster (http://hgdownload.cse.ucsc.edu/goldenpath/dm3/vsDroSim2/reciprocalBest/) and bnMapper (https://github.com/bxlab/bx-python/blob/main/scripts/bnMapper.py, Denas et al., ) were used to do the one-to-one ortholog mapping.
Determining DC and Non-DC Sites
MSL protein binding sites were considered overlapping if the common regions between two sites were larger than half of the region covered by each site. MOF and MSL1 binding sites were considered DC sites if they overlapped MSL2 bound regions; otherwise, they were considered non-DC sites. Because of the existence of many-to-one or one-to-many relationship when determining overlapping across MSL proteins, individual binding peaks were sorted according to their genomic positions and merged if the common region between adjacent binding peaks was larger than half of the region of either peak. These merged regions are reflected in the binding site numbers presented in Figure 1A.
Figure 1
Annotations of TE Insertions and Identification of TE-Derived Binding Peaks
We adopted the pipeline described by Kofler et al. () to annotate TE insertions in the D. melanogaster (Ensembl BDGP6) and D. simulans (FlyBase r2.02) genomes. Briefly, consensus TE sequences (RepBase version 22.02) (Quesneville et al., ) were mapped against both reference genomes using RepeatMasker (version open-4.0.7) (Smit et al., ). TEs overlapping microsatellites, which were identified by SciRoKo (version 3.4) (Kofler et al., ), were identified using bedtools (version 2.25.0) (Quinlan and Hall, ) and removed from further analyses. All parameters and filter criteria were the same as described in Kofler et al. (). Overlapping TE insertions from the same TE family were merged, and those from different families were resolved by prioritizing the longest TE insertions and iteratively truncating common regions. TE insertions <100 bp were excluded. All TE insertions were classified according to the RepBase (version 22.02) (Quesneville et al., ). A binding peak was considered TE-derived if at least one half of the binding region overlapped a TE insertion.
De novo Prediction of Binding Motifs
To measure sequence variation in MSL binding sites, the MOF/MSL1 DC and non-DC sites and MSL2 DC sites in D. melanogaster and D. simulans were used to de novo predict all overrepresented binding motifs using MEME (version 4.12.0, -mod zoops -nmotifs 50 -evt 0.05 -minw 6 -maxw 50 -revcomp) (Bailey et al., ). TE-derived binding sequences were excluded from these predictions.
McDonald-Kreitman (MK) Test
An extended MK test framework (Mackay et al., ) was used to detect natural selection of MSL binding sites. We retrieved D. melanogaster population genomic data from Lack et al. (197 individuals, DPGP3) (Lack et al., ) and D. simulans data from Signor et al. (183 individuals, SRP075682) (Signor et al., ) (Supplementary Table 1). The D. melanogaster lines were from a single ancestral range population from Zambia (Lack et al., ) and the D. simulans lines were from a North America population (Signor et al., ). The raw fastq files were first mapped to the corresponding reference genome (Ensembl BDGP6 for D. melanogaster and FlyBase r2.02 for D. simulans) by bwa mem (version 0.7.12) (Li, ) and processed by samtools (version 1.6) (Li et al., ) with default parameters. Reads that were missed by bwa mem were remapped to the corresponding genome again using stampy.py (version 1.0.32) (Lunter and Goodson, ). PCR duplicates were removed using Picard MarkDuplicates (version 2.9.0, http://broadinstitute.github.io/picard/). Variant calling of single nucleotide polymorphisms (SNPs) and small insertions and deletions (indels) was performed by GATK (version 3.8) (McKenna et al., ) with default parameters. Polymorphism data were extracted from previously published Variant Call Format (VCF) files (McKenna et al., ). Singletons and structural variants were discarded. Using Drosophila yakuba as the outgroup, divergent sites were inferred from whole genome alignment of D. yakuba to D. melanogaster and D. yakuba to D. simulans. Whole genome alignment of D. yakuba to D. melanogaster was retrieved from https://hgdownload-test.gi.ucsc.edu/goldenPath/dm6/vsDroYak3/. The D. yakuba to D. simulans alignment was completed independently following the procedure described at http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto. Four-fold degenerate sites in the whole genome of D. melanogaster and D. simulans were used as the neutral control to make inferences about selection, respectively. Estimates of the fraction of sites that are adaptive fixations (α) were calculated according to Mackay et al. (). The MK test (Mackay et al., ) was also applied to individual protein-coding genes bearing MOF or MSL non-DC sites in their 5′UTR, CDS, intron, 3′UTR, and 2 kb upstream or 2 kb downstream regions according to annotations of the D. melanogaster genome (Ensembl BDGP6). A 2*2 contingency table was used to compare polymorphic synonymous and non-synonymous polymorphism, with synonymous and non-synonymous divergence. P-values were calculated using a Fisher′s exact test followed by Benjamini-Hochberg correction (Benjamini and Hochberg, ) for multiple testing. Only polymorphic sites with available outgroup were used. Software we used to perform site extraction can be found at https://github.com/tonymugen/polyDivExtract.
Determining Overlap Between Cis-Regulatory Elements and Non-DC Sites or TEs
To investigate overlaps between cis-regulatory elements and non-DC sites or TEs, three datasets of cis-regulatory element annotations in D. melanogaster were used (Supplementary Table 1). (i) High quality cis-element annotations described by the modENCODE cis-regulatory annotation project, including novel promoters, CBP only enhancers, and Class I and II insulators (Negre et al., ); (ii) An integrated promoter annotation obtained from Hoskins et al. (); (iii) Genome-wide enhancer activity profile of S2 and ovarian somatic cells (OSCs) by STARR-seq (Arnold et al., ). Since non-DC sites were determined using dm6 coordinates, the centers/summits of cis-elements based on dm3 were translated into dm6 using UCSC liftover tools (https://genome.ucsc.edu/cgi-bin/hgLiftOver). Non-DC sites or TE fragments were considered in common with cis-elements if the center/summit of a cis-element was located within the region.
Expression Divergence and Gene Ontology Analysis
Whole female and male adult RNA-seq data from D. melanogaster and D. simulans, each with two or four biological replicates, were retrieved from the GEO database (GSE28078) (Graveley et al., ; Chen et al., ) (Supplementary Table 1). Paired reads were mapped D. melanogaster (Ensembl BDGP6) and D. simulans (FlyBase r2.02) reference genomes using hisat2 (version 2.1.0) (Kim et al., ). Expression levels were measured as transcripts per million (TPM) using StringTie (version 1.3.3b, -e -A) (Pertea et al., ) and averaged across biological replicates for each species. We classified 11,063 one-to-one orthologous protein-coding genes between D. melanogaster and D. simulans (Flybase 2017_06, http://flybase.org) into three groups. Transcripts that have a non-DC MOF or MSL1 binding site <2 kb upstream or within the gene body are in a BS+ group, otherwise BS–. If a TE is present within this region, a transcript is marked as TE+. Thus, for example, a transcript with both a non-DC site and a transposon is marked BS+TE+, while one with only a TE is in the BS–TE+ group. Expression divergence between these genes was estimated by 1 – ρ (Spearman's correlation) for each group of genes (Coolon et al., ). The distributions of expression divergence per gene group were estimated by drawing 1,000 bootstrap values. To estimate changes in expression levels between D. melanogaster and D. simulans, TPMs of 11,063 one-to-one orthologous protein-coding genes were normalized to the normal distribution N(0,1) by calculating (x-mean)/sd within species and expression changes were estimated as the differences between these normalized expression values between D. melanogaster and D. simulans. All statistical analyses were conducted using R (version 3.1.3, https://www.R-project.org). DAVID (version 6.7, http://david.abcc.ncifcrf.gov/) was used to perform a Gene Ontology (GO) enrichment test for genes harboring novel MOF or MSL1 non-DC sites (447 MOF only genes, 3,685 MSL1 only, and 537 genes harboring both). Only the GO terms for “biological processes” were used for the enrichment test.
Enrichment of Specific TE Families in Non-DC Sites
To estimate whether TE-derived non-DC sites were enriched in specific TE families, a log2 odds-ratio method (Sundaram et al., ) was used to identify peak-enriched TE families:
We used a threshold of 1.5 for the log2 odds-ratio to identify TE families that are enriched in non-DC sites. Consensus sequences of TE families in D. melanogaster were obtained from Repbase (version 22.02) (Quesneville et al., ), while those in D. simulans were extracted using in-house perl scripts from a sequence alignment generated by MAFFT (version 7.273) (Katoh and Standley, ). Scanning TE consensus sequences for occurrence of three frequently detected MSL binding motifs (GAGA, CACA, and GCA) was conducted using FIMO (version 4.12.0, –verbosity 1 –thresh 1.0E-4) (Grant et al., ).
Comparison of TE-Derived Non-DC Binding Sites Between D. melanogaster and D. simulans
To compare TE-derived non-DC binding sites between D. melanogaster and D. simulans, the MOF/MSL1 ChIP-seq reads mapping to non-DC binding regions of D. melanogaster and D. simulans were extracted by samtools fastq (version 1.6) (Li et al., ) and mapped to TE consensus sequences (Repbase version 22.02) (Quesneville et al., ) using bowtie (version 1.1.2, -a –best –strata) (Langmead et al., ). Non-DC binding peaks in TE consensus sequences were called by macs2 (version 2.1.1.20160309, callpeak -B –nomodel –SPMR -g 684334) (Zhang et al., ).
Results
Expansion of MSL Complex Binding Sites in D. melanogaster
The first step in studying MSL binding site evolution is to identify loci occupied by these proteins in at least two species. We made use of publicly available ChIP-seq data that measure MOF, MSL1, and MSL2 chromatin association in salivary glands from third instar larvae of D. melanogaster and D. simulans (Figueiredo et al., ; Chlamydas et al., ). Following the ChIP-seq quality control (QC) guidelines (Landt et al., ), cross-correlation analysis revealed that the MSL protein ChIP-seq libraries had QC scores of 0–2 (Supplementary Table 2), indicating that they were of intermediate to very high quality. The normalized ratio between the fragment-length cross-correlation peak and the background cross-correlation (normalized strand coefficient, NSC) and the ratio between the fragment-length peak and the read-length peak (relative strand correlation, RSC) are strong metrics for assessing signal-to-noise ratios in a ChIP-seq experiment (Landt et al., ). There was no significant difference for NSC or RSC values between D. melanogaster and D. simulans (t-test, both P > 0.05, Supplementary Table 2), suggesting that the overall qualities of the MSL protein ChIP-seq data are comparable between the two species.
We called protein binding sites de novo from raw data using macs2 (Zhang et al., ) for each species independently (see Materials and Methods). We identified 4,436 MOF, 4,424 MSL1, and 413 MSL2-binding sites in D. melanogaster. The number of loci occupied by the MSL complex in D. simulans was much lower (MOF: 318, MSL1: 2,677, and MSL2: 82; Figure 1A). It is possible that more MSL binding sites in D. melanogaster might be caused by the lower binding affinity in D. simulans. To evaluate this, we estimated the gain or loss of bound regions in each species following a previous study on binding site turnover between Drosophila species (Bradley et al., ) with some modifications (see Materials and Methods). A higher proportion of peaks in D. melanogaster (8.2–46.8% depending on the MSL protein) had no ortholog in D. simulans than vice versa (6.5–10.1%; Supplementary Table 3). The gain or loss of bound regions for orthologous sequences between D. melanogaster and D. simulans was rare, with zero to 2.4% peaks found in one species clearly absent or displaced in the other (Supplementary Table 3). The gain/loss rate near genes that flank known HAS was similar to the genome-wide rate, and comparable between species (Supplementary Table 3). These results suggested that binding affinities are sufficiently strong to capture both highly- and poorly-bound regions that are ortholgous between species. Consistent with this, there was no correlation between NSC or RSC values of the ChIP-seq data and the total numbers of detected binding sites of the MSL proteins (Spearman's correlation, both P > 0.05, Supplementary Figure 1).
Interestingly, most of the MSL-binding sites specific to D. melanogaster are on autosomes (Figure 1B; Supplementary Table 4), suggesting they are unlikely to function in dosage compensation. There is a corresponding drop in the proportions of X-linked MOF- and MSL1-binding sites: 88.7 and 58.2% in D. simulans to 42.0 and 21.6% in D. melanogaster, respectively (Figure 1B). MSL2 is the only protein of the complex that preferentially targets the X chromosome in both D. melanogaster (97.6%) and D. simulans (90.2%, Figure 1B), consistent with the idea that it functions almost exclusively in dosage compensation. We then classified MOF and MSL1 binding sites into those involved in dosage compensation (we call them DC sites) and those that likely have additional or independent functions (non-DC sites) according to whether they overlap MSL2-associated loci. We considered two protein binding sites overlapping if at least half of the binding region of one site intersected the other (see Materials and Methods). Such classification may lead to underestimation of the number of non-DC sites, since MSL2 was recently found to target autosomal genes involved in patterning and morphogenesis (Valsecchi et al., ); however, the underestimation cannot be large since there are few autosomal MSL2 binding sites (10 in D. melanogaster and eight in D. simulans, Figure 1B). About 94.0% (3,502 out of 3,724) of MOF binding sites are MSL2-independent in D. melanogaster, whereas the fraction (75.6%) is appreciably smaller in D. simulans (167 out of 221, Figure 1A). By contrast, the vast majority (97%) of MSL1-associated loci are MSL2-independent in both species (4,285 out of 4,414 in D. melanogaster, 2,611 out of 2,677 in D. simulans, Figure 1B). The number of binding sites shared by MOF, MSL1, and MSL2, the putative HAS for the MSL-DCC, is two times larger in D. melanogaster (125) than in D. simulans (54, Figure 1A). In both species, non-DC MOF and MSL1 sites consistently show narrower binding peaks and lower log2 ChIP/Input ratios than DC sites (all Mann-Whitney U-test, all P < 0.001, Figures 1C,D), suggesting their involvement in distinct molecular functions.
We next sought to investigate how the two classes of binding sites have contributed to the observed expansion in D. melanogaster. To do so, we defined the occupancy conservation of MSL binding sites following a previously described procedure (Sundaram et al., ) (also see Materials and Methods). Briefly, we considered a binding site conserved if at least half of its binding region in one species overlapped the orthologous region that was also bound by the same MSL protein in the other species. Expansion in either species is then estimated by the number of non-conserved sites. The increase in D. melanogaster-specific DC sites is mostly attributable to MSL2-associated regions (D. melanogaster: 358 vs. D. simulans: 24), followed by MOF (D. melanogaster: 113 vs. D. simulans: 6) and MSL1 (D. melanogaster: 23 vs. D. sim: 19; Figure 1E). Much more non-DC sites were found unconserved in either species, particularly in D. melanogaster (Figure 1E). We identified 4,159 distinct MOF binding sites in D. melanogaster but only 46 in D. simulans. The number of non-conserved non-DC MSL1 sites in D. melanogaster (3,699) is twice as many as that in D. simulans (1,931).
Rapid Evolution of MSL Binding Sites in D. melanogaster
Given the expansion of MSL-binding sites in D. melanogaster, we wondered whether DNA sequence motifs targeting the complex to specific regions have diverged (Berg et al., ; Tugrul et al., ) between D. melanogaster and D. simulans. Using MEME (Bailey and Elkan, ), we identified sequences overrepresented in DC or non-DC sites in D. melanogaster and D. simulans. As expected, the GAGA motif is enriched in DC sites (Figure 2). This motif is known to associate with high-affinity sites and recruit the MSL complex to chromatin (Alekseyenko et al., ; Straub et al., ). As for MOF DC sites in D. melanogaster, the TATA-box (E-value = 1.5e−155) is more prevalent than GAGA (E-value > 0.05), consistent with the known function of MOF binding to active promoters (Raja et al., ; Feller et al., ; Straub et al., ). In contrast, while the GAGA motif still ranks first at non-DC sites in D. simulans, it is not predominant at non-DC loci in D. melanogaster (Figure 2). GCA and CACA motifs are overrepresented in these regions instead (Figure 2).
Figure 2
We wondered whether the observed motif turnover can be attributed to positive selection. To estimate the number of adaptive nucleotide substitutions, we applied the extended McDonald-Kreitman (MK) test (Mackay et al.,
Table 1
| Species | Proteins | Chra | Site type | Length | D | P | P-singleton | D/P-singleton | αb |
|---|---|---|---|---|---|---|---|---|---|
| D. mel | MOF | A+X | 4-fold | 3,455,307 | 555,391 | 374,706 | 299,171 | 1.856 | |
| Total | 4,199,054 | 164,454 | 80,282 | 59,102 | 2.783*** | 0.333 | |||
| A | 4-fold | 2,835,156 | 458,286 | 303,736 | 243,823 | 1.880 | |||
| Non-DC | 2,251,825 | 23,903 | 1,520 | 1,285 | 18.602*** | 0.899 | |||
| X | 4-fold | 620,151 | 97,105 | 70,970 | 55,348 | 1.754 | |||
| Non-DC | 1,263,250 | 77,324 | 43,833 | 32,152 | 2.405*** | 0.270 | |||
| DC | 683,979 | 63,227 | 34,929 | 25,665 | 2.464*** | 0.288 | |||
| MSL1 | A+X | 4-fold | 3,455,307 | 555,391 | 374,706 | 299,171 | 1.856 | ||
| Total | 4,139,270 | 240,542 | 91,477 | 72,545 | 3.316*** | 0.440 | |||
| A | 4-fold | 2,835,156 | 458,286 | 303,736 | 243,823 | 1.880 | |||
| Non-DC | 3,325,728 | 187,235 | 66,297 | 53,497 | 3.500*** | 0.463 | |||
| X | 4-fold | 620,151 | 97,105 | 70,970 | 55,348 | 1.754 | |||
| Non-DC | 689,191 | 41,493 | 18,484 | 14,096 | 2.944*** | 0.404 | |||
| DC | 124,351 | 11,814 | 6,696 | 4,952 | 2.386*** | 0.265 | |||
| MSL2 | X | 4-fold | 620,151 | 97,105 | 70,970 | 55,348 | 1.754 | ||
| DC | 177,823 | 17,469 | 9,191 | 7,329 | 2.384*** | 0.264 | |||
| D. sim | MOF | A+X | 4-fold | 3,396,349 | 580,321 | 161,318 | 115,418 | 5.028 | |
| Total | 123,565 | 13,057 | 5,007 | 3,119 | 4.186*** | −0.201 | |||
| A | 4-fold | 2,812,623 | 483,206 | 138,081 | 101,660 | 4.753 | |||
| Non-DC | 10,110 | 1,559 | 532 | 430 | 3.626*** | −0.311 | |||
| X | 4-fold | 583,726 | 97,115 | 23,237 | 13,758 | 7.059 | |||
| Non-DC | 66,219 | 6,675 | 2,740 | 1,623 | 4.113*** | −0.716 | |||
| DC | 47,236 | 4,823 | 1,735 | 1,066 | 4.524*** | −0.560 | |||
| MSL1 | A+X | 4-fold | 3,396,349 | 580,321 | 161,318 | 115,418 | 5.028 | ||
| Total | 1,433,280 | 152,204 | 61,039 | 39,435 | 3.860*** | −0.303 | |||
| A | 4-fold | 2,812,623 | 483,206 | 138,081 | 101,660 | 4.753 | |||
| Non-DC | 323,127 | 41,098 | 16,173 | 12,205 | 3.367*** | −0.412 | |||
| X | 4-fold | 583,726 | 97,115 | 23,237 | 13,758 | 7.059 | |||
| Non-DC | 975,417 | 99,200 | 40,092 | 24,221 | 4.096*** | −0.724 | |||
| DC | 134,736 | 11,906 | 4,774 | 3,009 | 3.957*** | −0.784 | |||
| MSL2 | X | 4-fold | 583,726 | 97,115 | 23,237 | 13,758 | 7.059 | ||
| DC | 22,543 | 2,442 | 751 | 470 | 5.196*** | −0.359 |
The McDonald Kreitman (MK) test of MOF and MSL1 DC and non-DC sites in D. melanogaster (D. mel) and D. simulans (D. sim).
Autosomal binding sites of MSL2 and autosomal DC sites of MOF or MSL1 were not used for the MK test due to their limited number (see Supplementary Table 4).
The 4th chromosome was excluded from autosomes.
Fisher's exact test, P < 0.001. To increase statistical power, we used polymorphisms excluding singletons in the MK test (Mackay et al.,
Fraction of adaptive fixations, calculated using the methods described by Mackay et al. (
D. melanogaster Non-DC Sites Overlap Known Cis-Regulatory Elements
D. melanogaster non-DC sites are enriched for distinct DNA motifs and appear to be under positive selection, suggesting a gain in functionality. Given that MOF and MSL1 have been shown to bind close to active promoters (Straub et al.,
Table 2
| # of Non-DC binding sites | Overlaps withcis-elements | |||||
|---|---|---|---|---|---|---|
| Promoter | Enhancer | Insulator | Totala | |||
| MOF | TE-derived | 935 | 8 | 172 | 1 | 177* |
| Non-TE-derived | 3,274 | 223 | 377 | 45 | 514 | |
| Sum | 4,209 | 231(33.4%) | 549(79.5%) | 46(6.7%) | 691 | |
| MSL1 | TE-derived | 921 | 9 | 196 | 6 | 208 |
| Non-TE-derived | 3,366 | 1,832 | 1,452 | 1,773 | 2,661*** | |
| Sum | 4,287 | 1,841(64.2%) | 1,648(57.4%) | 1,779(62.0%) | 2,869 | |
Overlap of non-DC MOF/MSL1 binding sites and known cis-regulatory elements in D. melanogaster.
Note that one binding site can span more than one cis-element. We counted non-DC sites that overlap multiple cis-elements redundantly and thus the sum of percentages was >100%.
*Fisher's exact test,
P < 0.05;
P < 0.001.
Non-DC Sites Associate With Expression Divergence Between D. melanogaster and D. simulans
To test whether non-DC site and regulatory element overlap has functional consequences, we examined the effect of autosomal MSL site presence on the D. melanogaster/D. simulans expression divergence of nearby genes (defined as loci within 2-kb downstream of or containing MSL regions). We obtained published RNA-seq data from whole female and male flies in both species (Graveley et al.,
Figure 3

Regulatory role of non-DC sites. (A) Changes in gene expression levels between D. melanogaster and D. simulans in according to presence or absence of non-DC sites. Genes were grouped by presence or absence of non-DC sites 2 kb upstream or in gene bodies of D. melanogaster. Expression levels were measured as Transcripts Per Million (TPM) and normalized to a standard normal distribution before between-species comparisons. BS–, without non-DC sites; conserved BS+, with conserved non-DC sites; unconserved BS+, with novel non-DC sites. (B) Non-DC sites are associated with increased expression divergence of nearby genes. Gene expression divergence estimated by Spearman's correlation coefficients (1 – ρ) between D. melanogaster and D. simulans. Genes were grouped as in (A). (C) Gene Ontology (GO) analyses of genes with novel non-DC sites. Only significantly enriched GO terms of “biological processes” are shown. MOF unique: genes only harboring MOF non-DC sites; overlaps: genes harboring both MOF/MSL1 non-DC sites; MSL1 unique: genes only harboring MSL1 non-DC sites. (D) TE-derived non-DC sites contribute to expression divergence of nearby genes between D. melanogaster and D. simulans. Expression divergence was calculated as in (B) and genes were grouped according to the presence of TE insertions or MOF/MSL1 binding sites close to genes as defined in (A). Significance was determined by the Mann-Whitney U-test: ***P < 0.001.
Non-DC Sites in D. melanogaster Overlap Transposons
We next wanted to determine the source of the novel autosomal MSL binding sites in D. melanogaster. A possible candidate is transposable elements that are known to play a key role in acquisition of novel MSL complex binding sites (Ellison and Bachtrog,
Figure 4

Contribution of TEs to non-DC sites. (A) Numbers of DC and non-DC sites located in or outside of TEs in D. melanogaster (D. mel) and D. simulans (D. sim). (B) FW transposon family provides MSL binding motifs in D. melanogaster but not in D. simulans. Sequences alignments show GAGA, GCA, and CACA motifs in the consensus sequences of the FW family in D. melanogaster and D. simulans. Sequences in red are in MSL1 non-DC binding peaks.
To determine whether specific TE families are overrepresented at non-DC sites, we calculated an enrichment score estimating the log-odds ratios between the observed number of binding peaks overlapping specific TE families and the number of binding peaks expected by chance (Sundaram et al.,
Table 3
| LOR* | TE family | TE class | Enriched non-DC BS | # of GAGA | # of GCA | # of CACA | # Promoter | # Enhancer | # Insulator | # cis-elements | TE CopyNumber in D.mel | TE CopyNumber in D.sim |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 7.851 | 5S_DM | Unclassified | MSL1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 1 |
| 6.169 | ROO_LTR | LTR | MSL1 | 0 | 0 | 0 | 4 | 28 | 0 | 31 | 299 | 41 |
| 5.771 | DM297_LTR | LTR | MSL1 | 0 | 0 | 0 | 0 | 32 | 0 | 32 | 162 | 56 |
| 5.462 | DM412_LTR | LTR | MSL1 | 0 | 0 | 0 | 0 | 22 | 0 | 22 | 33 | 3 |
| 5.312 | DM412B_LTR | LTR | MSL1 | 0 | 0 | 0 | 2 | 23 | 1 | 25 | 48 | 6 |
| 5.192 | DMHMR1 | Unclassified | MSL1 | 0 | 3 | 0 | 0 | 1 | 0 | 1 | 19 | 6 |
| 4.716 | PROTOP_A | DNA | MSL1 | 2 | 0 | 0 | 0 | 48 | 0 | 48 | 329 | 87 |
| 3.658 | FB4_DM | DNA | MSL1 | 0 | 0 | 0 | 0 | 2 | 0 | 2 | 212 | 165 |
| 3.155 | Jockey | Non-LTR | MSL1 | 1 | 2 | 1 | 0 | 7 | 0 | 7 | 128 | 77 |
| 3.026 | NTS_DM | Unclassified | MSL1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 0 |
| 2.896 | PROTOP | DNA | MSL1 | 3 | 0 | 1 | 0 | 9 | 0 | 9 | 305 | 211 |
| 2.804 | FW | Non-LTR | MSL1 | 2 | 2 | 5 | 0 | 68 | 0 | 68 | 232 | 67 |
| 2.354 | ROO_I | LTR | MSL1 | 0 | 29 | 1 | 3 | 27 | 0 | 30 | 240 | 49 |
| 2.279 | DOC | Non-LTR | MSL1 | 1 | 1 | 0 | 0 | 10 | 0 | 10 | 165 | 27 |
| 2.200 | MDG1_LTR | LTR | MSL1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 88 | 12 |
| 2.057 | S_DM | DNA | MSL1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 144 | 19 |
| 1.695 | STALKER4_LTR | LTR | MSL1 | 0 | 1 | 0 | 0 | 2 | 0 | 2 | 172 | 10 |
| 7.848 | 5S_DM | Unclassified | MOF | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 1 |
| 6.186 | ROO_LTR | LTR | MOF | 0 | 0 | 0 | 3 | 18 | 0 | 20 | 299 | 41 |
| 5.967 | NOMAD_LTR | LTR | MOF | 0 | 0 | 1 | 0 | 8 | 0 | 8 | 61 | 4 |
| 5.935 | DM412_LTR | LTR | MOF | 0 | 0 | 0 | 0 | 28 | 0 | 28 | 33 | 3 |
| 5.812 | DM297_LTR | LTR | MOF | 0 | 0 | 0 | 0 | 22 | 0 | 22 | 162 | 56 |
| 5.603 | COPIA_DM_LTR | LTR | MOF | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 76 | 4 |
| 5.576 | DM412B_LTR | LTR | MOF | 0 | 0 | 0 | 0 | 27 | 0 | 27 | 48 | 6 |
| 5.516 | COPIA2LTR_DM | LTR | MOF | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 45 | 14 |
| 5.459 | MDG1_LTR | LTR | MOF | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 88 | 12 |
| 4.994 | LTRMDG3_DM | LTR | MOF | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 37 | 8 |
| 4.618 | QUASIMODO_LTR | LTR | MOF | 0 | 0 | 13 | 0 | 32 | 0 | 32 | 127 | 8 |
| 3.982 | INVADER1_LTR | LTR | MOF | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 115 | 34 |
| 3.133 | HETRP_DM | Unclassified | MOF | 10 | 3 | 1 | 0 | 1 | 0 | 1 | 28 | 7 |
| 3.053 | NTS_DM | Unclassified | MOF | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 0 |
| 2.455 | Jockey | Non-LTR | MOF | 1 | 2 | 1 | 0 | 0 | 0 | 0 | 128 | 77 |
| 2.325 | FB4_DM | DNA | MOF | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 212 | 165 |
| 2.245 | FW | Non-LTR | MOF | 2 | 2 | 5 | 0 | 15 | 0 | 15 | 232 | 67 |
| 2.091 | ROO_I | LTR | MOF | 0 | 29 | 1 | 1 | 8 | 0 | 9 | 240 | 49 |
| 1.838 | PROTOP_A | DNA | MOF | 2 | 0 | 0 | 0 | 5 | 0 | 5 | 329 | 87 |
| 1.754 | TART_DV | Non-LTR | MOF | 2 | 423 | 4 | 5 | 10 | 1 | 13 | 799 | 747 |
Summary of motifs and cis-regulatory elements specific TE family consensus sequences enriched in non-DC binding sites in D. melanogaster.
log2 Odds-ratio, defined as the ratio between (# of TE-derived BS/total TE length) and (# of non-DC sites/genome size).
MOF TE-derived non-DC sites are slightly more likely than non-TE associated regions to overlap cis-elements (Fisher's exact test, P < 0.05, Table 2), while transposon MSL1 loci are significantly less likely to do so (Fisher's exact test, P < 0.001, Table 2). We further looked for overlap between specific TE families enriched for non-DC sites and known cis-elements in D. melanogaster. The vast majority of these TE families not only overlaps enhancers but also harbors MSL-binding motifs in their consensus sequences (Table 3). We wondered whether TE-derived non-DC sites are associated with larger gene regulation effects than non-TE-derived non-DC sites. Genes with TE-derived non-DC MOF binding sites show significantly higher expression divergence between D. melanogaster and D. simulans than genes with non-TE-derived or no MOF binding sites in both female and male flies (Mann-Whitney U-test, all P < 0.005, Figure 3D). However, no consistent pattern was found for MSL1 non-DC sites (Figure 3D), probably because MSL1 only has a small number of TE-derived non-DC sites (Table 3) and exhibits an even distribution of cis-regulatory element types (Table 2). Taken together, our results strongly suggest that transposable elements are important drivers of non-DC site expansion in D. melanogaster and this expansion in turn affects gene expression divergence from D. simulans.
Discussion
Pleiotropic effects are observed if a gene product participates in multiple biological processes. This can be achieved, for example, if a protein engages in a variety of protein-protein or protein-DNA interactions (He and Zhang,
One caveat of this study is that the ChIP-seq data we used are from two studies: the MSL1 data in D. melanogaster are from Chlamydas et al. (
While both DC and non-DC sites appear to be evolving adaptively, selection strength is not the same across these groups of sites. Looking at patterns of excess divergence across site categories offers a way to reveal these discrepancies. Proportions of adaptively fixed sites are very similar (between 26 and 29%) among DC loci bound by MSL1, MOF, and MSL2. This is not surprising since dosage compensation requires all three genes, although overlap between their binding sites is not complete. Thus, there is evidence for significant concerted adaptive evolution of MSL complex proteins and their DC sites. In contrast, MOF and MSL1 non-DC sites show, to different extents, more dramatic signatures of positive selection. Almost all (90%) autosomal MOF non-DC sites have been fixed adaptively and the proportion is also high (46%) for MSL1 non-DC sites. This suggests that some dosage compensation unrelated functions of these proteins are under even more intense positive selection in D. melanogaster than the selection driven by their participation in dosage compensation. Moreover, these functions need not be common between MSL1 and MOF. This is consistent with the observation that only about a quarter of autosomal MSL1 and MOF sites overlap each other (960 common sites out 3,502 MOF and 4,285 MSL1 loci). Both proteins bind non-DC sites enriched for DNA sequences different from the GAGA motif found at DC sites and both are found at autosomal promoters (Straub et al.,
The most dramatic signatures of positive selection are localized to the N-terminal domains of MSL1 and MSL2. These domains are essential for these proteins' interaction with each other and their targeting to the X chromosome (Rodriguez et al.,
The autosomal binding and function of MOF appear to be independent of MSL1. Indeed, there is no evidence for positive selection on amino acids at the MSL1-MOF interaction surface (Rodriguez et al.,
The selective pressures that drive adaptive evolution of MSL proteins and their binding sites in D. melanogaster remain elusive. Genetic conflict with TEs is a possible force driving this adaptive evolution (Rodriguez et al.,
While molecular mechanisms and consequences of MSL protein action at their autosomal loci remain unclear, early embryogenesis, where functions associated with autosomal binding of MSL proteins can work together with dosage compensation to maintain fitness, is a possible candidate. Autosomal binding of MOF in both sexes can dampen transcriptional noise (Lee et al.,
Statements
Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.
Author contributions
AD, YW, AG, ZL, and TT conceived and designed the study. AD and AG developed the methodology. AD, YW, AG, and TT analyzed and interpreted the data. AD, AG, and TT wrote, reviewed, and revised the manuscript. All authors contributed to the article and approved the submitted version.
Funding
This work was supported by the National Science Foundation of China (grant number 31801081, 31770246, and 31970245), the Fundamental Research Funds for the Central Universities, Sun Yat-sen University, and the China Postdoctoral Science Foundation (grant number 2019TQ0391 to YW).
Acknowledgments
We thank the authors who provided the public datasets and the reviewers for their instructive comments.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2021.675027/full#supplementary-material
References
1
AlekseyenkoA. A.EllisonC. E.GorchakovA. A.ZhouQ.KaiserV. B.TodaN.et al. (2013). Conservation and de novo acquisition of dosage compensation on newly evolved sex chromosomes in Drosophila. Genes Dev.27, 853–858. 10.1101/gad.215426.113
2
AlekseyenkoA. A.PengS.LarschanE.GorchakovA. A.LeeO. K.KharchenkoP.et al. (2008). A sequence motif within chromatin entry sites directs MSL establishment on the Drosophila X chromosome. Cell134, 599–609. 10.1016/j.cell.2008.06.033
3
ArchambeaultS. L.BartschiL. R.MerminodA. D.PeichelC. L. (2020). Adaptation via pleiotropy and linkage: association mapping reveals a complex genetic architecture within the stickleback Eda locus. Evol. Lett.4, 282–301. 10.1002/evl3.175
4
ArnoldC. D.GerlachD.SpiesD.MattsJ. A.SytnikovaY. A.PaganiM.et al. (2014). Quantitative genome-wide enhancer activity maps for five Drosophila species show functional enhancer conservation and turnover during cis-regulatory evolution. Nat Genet.46, 685–692. 10.1038/ng.3009
5
BachtrogD. (2008). Positive selection at the binding sites of the male-specific lethal complex involved in dosage compensation in Drosophila. Genetics180, 1123–1129. 10.1534/genetics.107.084244
6
BaileyT. L.ElkanC. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol.2, 28–36.
7
BaileyT. L.JohnsonJ.GrantC. E.NobleW. S. (2015). The MEME Suite. Nucleic Acids Res.43, W39–49. 10.1093/nar/gkv416
8
BeloteJ. M.LucchesiJ. C. (1980). Male-specific lethal mutations of Drosophila melanogaster. Genetics96, 165–186. 10.1093/genetics/96.1.165
9
BenjaminiY.HochbergY. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B57, 289–300. 10.1111/j.2517-6161.1995.tb02031.x
10
BergJ.WillmannS.LassigM. (2004). Adaptive evolution of transcription factor binding sites. BMC Evol Biol.4:42. 10.1186/1471-2148-4-42
11
BolgerA. M.LohseM.UsadelB. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics30, 2114–2120. 10.1093/bioinformatics/btu170
12
BradleyR. K.LiX. Y.TrapnellC.DavidsonS.PachterL.ChuH. C.et al. (2010). Binding site turnover produces pervasive quantitative changes in transcription factor binding between closely related Drosophila species. PLoS Biol8:e1000343. 10.1371/journal.pbio.1000343
13
CaiY.JinJ.SwansonS. K.ColeM. D.ChoiS. H.FlorensL.et al. (2010). Subunit composition and substrate specificity of a MOF-containing histone acetyltransferase distinct from the male-specific lethal (MSL) complex. J. Biol. Chem.285, 4268–4272. 10.1074/jbc.C109.087981
14
CamposJ. L.ZengK.ParkerD. J.CharlesworthB.HaddrillP. R. (2013). Codon usage bias and effective population sizes on the X chromosome versus the autosomes in Drosophila melanogaster. Mol. Biol. Evol.30, 811–823. 10.1093/molbev/mss222
15
CastilloD. M.MellJ. C.BoxK. S.BlumenstielJ. P. (2011). Molecular evolution under increasing transposable element burden in Drosophila: a speed limit on the evolutionary arms race. BMC Evol. Biol.11:258. 10.1186/1471-2148-11-258
16
CharlesworthB.CamposJ. L.JacksonB. C. (2018). Faster-X evolution: theory and evidence from Drosophila. Mol. Ecol.27, 3753–3771. 10.1111/mec.14534
17
ChenZ. X.SturgillD.QuJ.JiangH.ParkS.BoleyN.et al. (2014). Comparative validation of the D. melanogaster modENCODE transcriptome annotation. Genome Res.24, 1209–1223. 10.1101/gr.159384.113
18
ChlamydasS.HolzH.SamataM.ChelmickiT.GeorgievP.PelechanoV.et al. (2016). Functional interplay between MSL1 and CDK7 controls RNA polymerase II Ser5 phosphorylation. Nat. Struct. Mol. Biol.23, 580–589. 10.1038/nsmb.3233
19
CoolonJ. D.McManusC. J.StevensonK. R.GraveleyB. R.WittkoppP. J. (2014). Tempo and mode of regulatory evolution in Drosophila. Genome Res.24, 797–808. 10.1101/gr.163014.113
20
CopurO.GorchakovA.FinklK.KurodaM. I.MullerJ. (2018). Sex-specific phenotypes of histone H4 point mutants establish dosage compensation as the critical function of H4K16 acetylation in Drosophila. Proc. Natl. Acad. Sci. U. S. A.115, 13336–13341. 10.1073/pnas.1817274115
21
DenasO.SandstromR.ChengY.BealK.HerreroJ.HardisonR. C.et al. (2015). Genome-wide comparative analysis reveals human-mouse regulatory landscape and evolution. BMC Genom.16:87. 10.1186/s12864-015-1245-6
22
EllisonC.BachtrogD. (2019). Contingency in the convergent evolution of a regulatory network: dosage compensation in Drosophila. PLoS Biol.17:e3000094. 10.1371/journal.pbio.3000094
23
EllisonC. E.BachtrogD. (2013). Dosage compensation via transposable element mediated rewiring of a regulatory network. Science342, 846–850. 10.1126/science.1239552
24
FellerC.PrestelM.HartmannH.StraubT.SodingJ.BeckerP. B. (2012). The MOF-containing NSL complex associates globally with housekeeping genes, but activates only a defined subset. Nucleic Acids Res.40, 1509–1522. 10.1093/nar/gkr869
25
FigueiredoM. L.KimM.PhilipP.AllgardssonA.StenbergP.LarssonJ. (2014). Non-coding roX RNAs prevent the binding of the MSL-complex to heterochromatic regions. PLoS Genet.10:e1004865. 10.1371/journal.pgen.1004865
26
GrantC. E.BaileyT. L.NobleW. S. (2011). FIMO: scanning for occurrences of a given motif. Bioinformatics27, 1017–1018. 10.1093/bioinformatics/btr064
27
GraveleyB. R.BrooksA. N.CarlsonJ. W.DuffM. O.LandolinJ. M.YangL.et al. (2011). The developmental transcriptome of Drosophila melanogaster. Nature471, 473–479. 10.1038/nature09715
28
GreenbergA. J.StockwellS. R.ClarkA. G. (2008). Evolutionary constraint and adaptation in the metabolic network of Drosophila. Mol. Biol. Evol.25, 2537–2546. 10.1093/molbev/msn205
29
HeX.ZhangJ. (2006). Toward a molecular understanding of pleiotropy. Genetics173, 1885–1891. 10.1534/genetics.106.060269
30
HilfikerA.Hilfiker-KleinerD.PannutiA.LucchesiJ. C. (1997). mof, a putative acetyl transferase gene related to the Tip60 and MOZ human genes and to the SAS genes of yeast, is required for dosage compensation in Drosophila. EMBO J.16, 2054–2060. 10.1093/emboj/16.8.2054
31
HoskinsR. A.LandolinJ. M.BrownJ. B.SandlerJ. E.TakahashiH.LassmannT.et al. (2011). Genome-wide analysis of promoter architecture in Drosophila melanogaster. Genome Res.21, 182–192. 10.1101/gr.112466.110
32
KarpG.IwasaJ.MarshallW. (2008). Karp's Cell and Molecular Biology: Concepts and Experiments.New York, NY: The Wiley Press.
33
KatohK.StandleyD. M. (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol.30, 772–780. 10.1093/molbev/mst010
34
KharchenkoP. V.TolstorukovM. Y.ParkP. J. (2008). Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat. Biotechnol.26, 1351–1359. 10.1038/nbt.1508
35
KimD.LangmeadB.SalzbergS. L. (2015). HISAT: a fast spliced aligner with low memory requirements. Nat. Methods.12, 357–360. 10.1038/nmeth.3317
36
KoflerR.NolteV.SchlottererC. (2015). Tempo and mode of transposable element activity in Drosophila. PLoS Genet.11:e1005406. 10.1371/journal.pgen.1005406
37
KoflerR.SchlottererC.LelleyT. (2007). SciRoKo: a new tool for whole genome microsatellite search and investigation. Bioinformatics23, 1683–1685. 10.1093/bioinformatics/btm157
38
LackJ. B.CardenoC. M.CrepeauM. W.TaylorW.Corbett-DetigR. B.StevensK. A.et al. (2015). The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population. Genetics199, 1229–1241. 10.1534/genetics.115.174664
39
LamK. C.MuhlpfordtF.VaquerizasJ. M.RajaS. J.HolzH.LuscombeN. M.et al. (2012). The NSL complex regulates housekeeping genes in Drosophila. PLoS Genet.8:e1002736. 10.1371/journal.pgen.1002736
40
LandtS. G.MarinovG. K.KundajeA.KheradpourP.PauliF.BatzoglouS.et al. (2012). ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res.22, 1813–1831. 10.1101/gr.136184.111
41
LangmeadB.TrapnellC.PopM.SalzbergS. L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol.10:R25. 10.1186/gb-2009-10-3-r25
42
LarschanE.BishopE. P.KharchenkoP. V.CoreL. J.LisJ. T.ParkP. J.et al. (2011). X chromosome dosage compensation via enhanced transcriptional elongation in Drosophila. Nature471, 115–118. 10.1038/nature09757
43
LeeH.ChoD. Y.WojtowiczD.HarbisonS. T.RussellS.OliverB.et al. (2018). Dosage-dependent expression variation suppressed on the Drosophila Male X chromosome. G3 (Bethesda)8, 587–598. 10.1534/g3.117.300400
44
LeeH.OliverB. (2018). Non-canonical Drosophila X chromosome dosage compensation and repressive topologically associated domains. Epigenet. Chromatin11:62. 10.1186/s13072-018-0232-y
45
LevineM. T.HollowayA. K.ArshadU.BegunD. J. (2007). Pervasive and largely lineage-specific adaptive protein evolution in the dosage compensation complex of Drosophila melanogaster. Genetics177, 1959–1962. 10.1534/genetics.107.079459
46
LiH. (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv.arXiv:1303.3997v2 [q-bio.GN]
47
LiH.HandsakerB.WysokerA.FennellT.RuanJ.HomerN.et al. (2009). The sequence alignment/map format and SAMtools. Bioinformatics25, 2078–2079. 10.1093/bioinformatics/btp352
48
LottS. E.KreitmanM.PalssonA.AlekseevaE.LudwigM. Z. (2007). Canalization of segmentation and its evolution in Drosophila. Proc. Natl. Acad. Sci. U. S. A.104, 10926–10931. 10.1073/pnas.0701359104
49
LunterG.GoodsonM. (2011). Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res.21, 936–939. 10.1101/gr.111120.110
50
LymanL. M.CoppsK.RastelliL.KelleyR. L.KurodaM. I. (1997). Drosophila male-specific lethal-2 protein: structure/function analysis and dependence on MSL-1 for chromosome association. Genetics147, 1743–1753. 10.1093/genetics/147.4.1743
51
MackayT. F.RichardsS.StoneE. A.BarbadillaA.AyrolesJ. F.ZhuD.et al. (2012). The Drosophila melanogaster genetic reference panel. Nature482, 173–178. 10.1038/nature10811
52
ManuSurkovaS.SpirovA. V.GurskyV. V.JanssensH.KimA. R.et al. (2009). Canalization of gene expression in the Drosophila blastoderm by gap gene cross regulation. PLoS Biol.7:e1000049. 10.1371/journal.pbio.1000049
53
MarkowT. A.BeallS.MatzkinL. M. (2009). Egg size, embryonic development time and ovoviviparity in Drosophila species. J. Evol. Biol.22, 430–434. 10.1111/j.1420-9101.2008.01649.x
54
McGeeL. W.SackmanA. M.MorrisonA. J.PierceJ.AnismanJ.RokytaD. R. (2016). Synergistic pleiotropy overrides the costs of complexity in viral adaptation. Genetics202, 285–295. 10.1534/genetics.115.181628
55
McKennaA.HannaM.BanksE.SivachenkoA.CibulskisK.KernytskyA.et al. (2010). The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res.20, 1297–1303. 10.1101/gr.107524.110
56
NegreN.BrownC. D.MaL.BristowC. A.MillerS. W.WagnerU.et al. (2011). A cis-regulatory map of the Drosophila genome. Nature471, 527–531. 10.1038/nature09990
57
PerteaM.PerteaG. M.AntonescuC. M.ChangT. C.MendellJ. T.SalzbergS. L. (2015). StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol.33, 290–295. 10.1038/nbt.3122
58
PrestelM.FellerC.BeckerP. B. (2010). Dosage compensation and the global re-balancing of aneuploid genomes. Genome Biol.11:216. 10.1186/gb-2010-11-8-216
59
QuesnevilleH.BergmanC. M.AndrieuO.AutardD.NouaudD.AshburnerM.et al. (2005). Combined evidence annotation of transposable elements in genome sequences. PLoS Comput. Biol.1, 166–175. 10.1371/journal.pcbi.0010022
60
QuinlanA. R.HallI. M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics26, 841–842. 10.1093/bioinformatics/btq033
61
QuinnJ. J.ZhangQ. C.GeorgievP.IlikI. A.AkhtarA.ChangH. Y. (2016). Rapid evolutionary turnover underlies conserved lncRNA-genome interactions. Genes Dev.30, 191–207. 10.1101/gad.272187.115
62
RajaS. J.CharapitsaI.ConradT.VaquerizasJ. M.GebhardtP.HolzH.et al. (2010). The nonspecific lethal complex is a transcriptional regulator in Drosophila. Mol. Cell38, 827–841. 10.1016/j.molcel.2010.05.021
63
RavensS.FournierM.YeT.StierleM.DembeleD.ChavantV.et al. (2014). Mof-associated complexes have overlapping and unique roles in regulating pluripotency in embryonic stem cells and during differentiation. Elife3:e02104. 10.7554/eLife.02104
64
RodriguezM. A.VermaakD.BayesJ. J.MalikH. S. (2007). Species-specific positive selection of the male-specific lethal complex that participates in dosage compensation in Drosophila. Proc. Natl. Acad. Sci. U. S. A.104, 15412–15417. 10.1073/pnas.0707445104
65
ScottM. J.PanL. L.ClelandS. B.KnoxA. L.HeinrichJ. (2000). MSL1 plays a central role in assembly of the MSL complex, essential for dosage compensation in Drosophila. EMBO J.19, 144–155. 10.1093/emboj/19.1.144
66
SheikhB. N.GuhathakurtaS.AkhtarA. (2019). The non-specific lethal (NSL) complex at the crossroads of transcriptional control and cellular homeostasis. EMBO Rep.20:e47630. 10.15252/embr.201847630
67
SignorS. A.NewF. N.NuzhdinS. (2018). A large panel of Drosophila simulans reveals an abundance of common variants. Genome Biol. Evol.10, 189–206. 10.1093/gbe/evx262
68
SkripskyT.LucchesiJ. C. (1982). Intersexuality resulting from the interaction of sex-specific lethal mutations in Drosophila melanogaster. Dev. Biol.94, 153–162. 10.1016/0012-1606(82)90078-1
69
SmitA. F. A.HubleyR.GreenP. (2013–2015). RepeatMasker Open-4.0. Available online at: http://www.repeatmasker.org/ (accessed March 2, 2021).
70
StraubT.GrimaudC.GilfillanG. D.MitterwegerA.BeckerP. B. (2008). The chromosomal high-affinity binding sites for the Drosophila dosage compensation complex. PLoS Genet.4:e1000302. 10.1371/journal.pgen.1000302
71
StraubT.ZabelA.GilfillanG. D.FellerC.BeckerP. B. (2013). Different chromatin interfaces of the Drosophila dosage compensation complex revealed by high-shear ChIP-seq. Genome Res.23, 473–485. 10.1101/gr.146407.112
72
SunL.FernandezH. R.DonohueR. C.LiJ.ChengJ.BirchlerJ. A. (2013a). Male-specific lethal complex in Drosophila counteracts histone acetylation and does not mediate dosage compensation. Proc. Natl. Acad. Sci. U. S. A.110, E808–817. 10.1073/pnas.1222542110
73
SunL.JohnsonA. F.DonohueR. C.LiJ.ChengJ.BirchlerJ. A. (2013b). Dosage compensation and inverse effects in triple X metafemales of Drosophila. Proc. Natl. Acad. Sci. U. S. A.110, 7383–7388. 10.1073/pnas.1305638110
74
SundaramV.ChengY.MaZ.LiD.XingX.EdgeP.et al. (2014). Widespread contribution of transposable elements to the innovation of gene regulatory networks. Genome Res.24, 1963–1976. 10.1101/gr.168872.113
75
TugrulM.PaixaoT.BartonN. H.TkacikG. (2015). Dynamics of transcription factor binding site evolution. PLoS Genet.11:e1005639. 10.1371/journal.pgen.1005639
76
ValsecchiC. I. K.BasilicataM. F.SemplicioG.GeorgievP.GutierrezN. M.AkhtarA. (2018). Facultative dosage compensation of developmental genes on autosomes in Drosophila and mouse embryonic stem cells. Nat. Commun.9:3626. 10.1038/s41467-018-05642-2
77
Wollenberg ValeroK. C. (2020). Aligning functional network constraint to evolutionary outcomes. BMC Evol. Biol.20:58. 10.1186/s12862-020-01613-8
78
ZhangJ.YangJ. R. (2015). Determinants of the rate of protein sequence evolution. Nat. Rev. Genet.16, 409–420. 10.1038/nrg3950
79
ZhangY.LiuT.MeyerC. A.EeckhouteJ.JohnsonD. S.BernsteinB. E.et al. (2008). Model-based analysis of ChIP-Seq (MACS). Genome Biol.9:R137. 10.1186/gb-2008-9-9-r137
Summary
Keywords
dosage compensation complex, positive selection, pleiotropy, transposable elements, Drosophila
Citation
Dai A, Wang Y, Greenberg A, Liufu Z and Tang T (2021) Rapid Evolution of Autosomal Binding Sites of the Dosage Compensation Complex in Drosophila melanogaster and Its Association With Transcription Divergence. Front. Genet. 12:675027. doi: 10.3389/fgene.2021.675027
Received
02 March 2021
Accepted
19 May 2021
Published
14 June 2021
Volume
12 - 2021
Edited by
Hao Zhu, Southern Medical University, China
Reviewed by
Zhen-Xia Chen, Huazhong Agricultural University, China; Lizandra Jaqueline Robe, Federal University of Santa Maria, Brazil
Updates

Check for updates
Copyright
© 2021 Dai, Wang, Greenberg, Liufu and Tang.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Tian Tang lsstt@mail.sysu.edu.cn
This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.