Protein–Protein and Protein–DNA Dosage Balance and Differential Paralog Transcription Factor Retention in Polyploids

Most eukaryotes have an evolutionary history of repeated polyploidization fol-lowed by fractionation (or diploidization; Makino and McLysaght, 2010; Jiao et al., 2011). The progression to the near dip-loid level is not random with regard to the classes of genes that are retained (Freeling et al., 2008; Freeling, 2009; Makino and McLysaght, 2010). Typically, the genes that are preferentially retained are involved with macromolecular machines or heavily con-nected in the interactome. This differential progression of genic retention is unlikely to be only due to changes in protein function of the different members of a duplicate pair in processes referred to as subfunctionali-zation (subdivision of function) and neo-functionalization (gain of a novel function; Freeling, 2009). There are two arguments why this should be the case. First, the same classes of genes that are preferentially retained following whole genome duplica-tion are preferentially underrepresented in segmental duplications (Freeling, et al., 2008; Makino and McLysaght, 2010). Both processes will produce duplicate genes that are available for divergence but the recipro-cal distribution suggests that other factors are operative. Secondly, the duplicates that are retained for longer periods of evolution-ary time very often eventually decay to the diploid state indicating that there has been no bona fide subdivision of function that would maintain both copies. It should be noted, however, that subdivision or gain of function has certainly been documented for duplicate genes in evolution and the reten-tion of regulatory genes for longer peri-ods of evolutionary time provides greater opportunity for these changes in function to accumulate.The types of genes that are preferen-tially retained following whole genome duplications and depleted in segmental copy number changes are quite similar to those shown to exhibit dosage effects in aneuploids (Birchler, 1979; Birchler and Newton, 1981; Guo and Birchler, 1994; Birchler et al., 2001). An analogy can be made to the generalized lack of effects on gene expression by whole genome changes but a regular and consistent set of modu-lations that occur in aneuploids (Birchler and Newton, 1981; Guo and Birchler, 1994; Guo et al., 1996). This set of observations led to the concept that the stoichiometry of members of regulatory macromolecular complexes involved in the control of tran-scription was important in affecting the expression of the target genes ( Birchler and Newton, 1981; Birchler et al., 2001 ). These types of dosage effects can often be reduced to the action of single genes ( Birchler et al., 2001) and indeed heterozygous mutations of transcription factors were recognized to produce human clinical conditions ( Veitia, 2002, 2003, 2004). The stoichiometry of members of macromolecular complexes was postulated to explain this (semi-) dominance (Veitia, 2002). An issue perti-nent to this discussion is the relationship of gene copy number to protein expression level. For instance, in a study in diploid yeast, knockouts of every gene were exam-ined for protein concentration (Springer et al., 2010). Only 5% showed no correla-tion and 80% of genes showed a strong correlation, i.e., 50% expression of normal. The connection between gene dosage and the phenotype can be traced back to clas-sical genetics in which it was known that changes in whole ploidy would produce some level of morphological change but alterations in the copy number of portions of the genome could be quite detrimental or indeed lethal ( Birchler and Veitia, 2007 ). Thus, the change in stoichiometry of dos-age balanced gene products would have negative fitness consequences manifested in the phenotype and be selected against (Papp et al., 2003; Birchler et al., 2005; Veitia et al., 2008).Biophysical evidence suggests that the more interaction partners a particular pro-tein has, the less likely it is to be involved with a duplication event, indicating further that macromolecular complexes require a balance of subunits to maintain good fitness (Liang et al., 2008 ). Examinations of protein databases also indicate that proteins with many interactions display lower expres-sional noise and are underrepresented in copy number variants (Schuster-Bockler et al., 2010). Thus, from the biochemical level to the phenotype, there is evidence for a balance of gene products involved in such complexes, which provides implications in biophysics, evolution, gene expression, and quantitative trait analysis. This synthesis is referred to as the Gene Balance Hypothesis (Birchler and Veitia, 2007, 2010). To reit-erate, the underlying theme of the above synthesis is that the amounts of different subunits and mode of assembly of multi-subunit complexes will affect the final yield and that this fact will impact the phenotype. One of the tenets of this concept is that dur -ing the assembly of multi-subunited com-plexes, a relative excess of one subunit might lead to the production of potentially inac-tive subcomplexes. Such a circumstance will produce a different quantity of the whole complex under consideration and affect the functional output.Schnable et al. (2011) highlight another aspect for the study of retained genes fol-lowing ancient tetraploidy. These authors


A commentary on
Dose-sensitivity, conserved noncoding sequences and duplicate gene retention through multiple tetraploidies in the grasses by Schnable, J. C., Pedersen, B. S., Subramaniam, S., andFreeling, M. (2011). Front. Plant Sci. 2:2. doi: 10.3389/ fpls.2011.00002 Most eukaryotes have an evolutionary history of repeated polyploidization followed by fractionation (or diploidization; Makino and McLysaght, 2010;Jiao et al., 2011). The progression to the near diploid level is not random with regard to the classes of genes that are retained (Freeling et al., 2008;Freeling, 2009;Makino and McLysaght, 2010). Typically, the genes that are preferentially retained are involved with macromolecular machines or heavily connected in the interactome. This differential progression of genic retention is unlikely to be only due to changes in protein function of the different members of a duplicate pair in processes referred to as subfunctionalization (subdivision of function) and neofunctionalization (gain of a novel function; Freeling, 2009). There are two arguments why this should be the case. First, the same classes of genes that are preferentially retained following whole genome duplication are preferentially underrepresented in segmental duplications (Freeling, et al., 2008;Makino and McLysaght, 2010). Both processes will produce duplicate genes that are available for divergence but the reciprocal distribution suggests that other factors are operative. Secondly, the duplicates that are retained for longer periods of evolutionary time very often eventually decay to the diploid state indicating that there has been no bona fide subdivision of function that would maintain both copies. It should be noted, however, that subdivision or gain of function has certainly been documented for duplicate genes in evolution and the retention of regulatory genes for longer periods of evolutionary time provides greater opportunity for these changes in function to accumulate.
The types of genes that are preferentially retained following whole genome duplications and depleted in segmental copy number changes are quite similar to those shown to exhibit dosage effects in aneuploids (Birchler, 1979;Birchler and Newton, 1981;Guo and Birchler, 1994;Birchler et al., 2001). An analogy can be made to the generalized lack of effects on gene expression by whole genome changes but a regular and consistent set of modulations that occur in aneuploids (Birchler and Newton, 1981;Guo and Birchler, 1994;Guo et al., 1996). This set of observations led to the concept that the stoichiometry of members of regulatory macromolecular complexes involved in the control of transcription was important in affecting the expression of the target genes (Birchler and Newton, 1981;Birchler et al., 2001). These types of dosage effects can often be reduced to the action of single genes (Birchler et al., 2001) and indeed heterozygous mutations of transcription factors were recognized to produce human clinical conditions (Veitia, 2002(Veitia, , 2003(Veitia, , 2004. The stoichiometry of members of macromolecular complexes was postulated to explain this (semi-) dominance (Veitia, 2002). An issue pertinent to this discussion is the relationship of gene copy number to protein expression level. For instance, in a study in diploid yeast, knockouts of every gene were examined for protein concentration (Springer et al., 2010). Only 5% showed no correlation and 80% of genes showed a strong correlation, i.e., 50% expression of normal. The connection between gene dosage and the phenotype can be traced back to classical genetics in which it was known that changes in whole ploidy would produce some level of morphological change but alterations in the copy number of portions of the genome could be quite detrimental or indeed lethal (Birchler and Veitia, 2007). Thus, the change in stoichiometry of dosage balanced gene products would have negative fitness consequences manifested in the phenotype and be selected against (Papp et al., 2003;Birchler et al., 2005;Veitia et al., 2008).
Biophysical evidence suggests that the more interaction partners a particular protein has, the less likely it is to be involved with a duplication event, indicating further that macromolecular complexes require a balance of subunits to maintain good fitness (Liang et al., 2008). Examinations of protein databases also indicate that proteins with many interactions display lower expressional noise and are underrepresented in copy number variants (Schuster-Bockler et al., 2010). Thus, from the biochemical level to the phenotype, there is evidence for a balance of gene products involved in such complexes, which provides implications in biophysics, evolution, gene expression, and quantitative trait analysis. This synthesis is referred to as the Gene Balance Hypothesis (Birchler andVeitia, 2007, 2010). To reiterate, the underlying theme of the above synthesis is that the amounts of different subunits and mode of assembly of multisubunit complexes will affect the final yield and that this fact will impact the phenotype. One of the tenets of this concept is that during the assembly of multi-subunited complexes, a relative excess of one subunit might lead to the production of potentially inactive subcomplexes. Such a circumstance will produce a different quantity of the whole complex under consideration and affect the functional output. Schnable et al. (2011) highlight another aspect for the study of retained genes following ancient tetraploidy. These authors examined conserved non-coding sequences (CNS) associated with genes encoding transcription factors and found that they too can exhibit an extended retention in duplicate over evolutionary time beyond the standard deletion frequency. This observation suggests that there may be negative fitness consequences of deletion of one member of a duplicate pair and, as such, a requirement for the proper balance of these sequences relative to other factors (namely, DNA-binding proteins) in the genome. This concept is based on the idea that transcription factor genes encode proteins that very often function in multiprotein complexes in interaction with DNA. The typical example of this situation is the complex enhanceosome, which is a higher order nucleoprotein "aggregate" that works as transcriptional pre-initiation/stimulatory complexes (Carey, 1998;Levine, 2010). Enhanceosomes are thought to ensure the formation of a specific activation surface that is "complementary" to other co-activators and the transcription machinery. These considerations led the authors to hypothesize that protein-DNA interactions should be sensitive to the "concentration" of the transcription factors and the binding sites of the cisregulatory regions of the genes encoding transcription factors. The concept of dosage sensitive protein-DNA interactions, would be an important confirmation and extension of the Gene Balance Hypothesis.
To address whether the retention of CNS associated with transcription factor genes was simply coincidental, Schnable and colleagues asked whether there was a preferential retention of CNS-rich genes compared to CNS-poor genes, which indeed was the case. Consistently, their analysis showed that the less CNS-rich genes were significantly less likely to have both duplicate copies retained in a second round of whole genome duplication in the maize lineage. Indeed, this finding of preferential retention of some cases of CNS from whole genome duplications suggests that protein-DNA interaction is an important aspect of stoichiometric balance. In terms of complex assembly, the kinetics and stoichiometry of binding to DNA of transcription factors could certainly influence the final amount of functional complexes and hence their biological activity. The change in copy number of either genes encoding transcription factors or their cognate binding sites might influence the dynamics and outcome of complex formation. Indeed, it would not be surprising that the concentration of the DNA-binding sites and the concentration of the relevant factors that recognize them would have evolved preferred stoichiometries. In such a case, fractionation (deletion) of a copy of the gene encoding a transcription factor would be counter-selected because this would change the relative concentration of binding sites and binding factors. From the perspective of the deletion of the DNA-binding sites, deletion of only one gene is not likely to alter much the protein/DNA stoichiometry. However, one must note that a transcription factor can be controlling hundreds or thousands of target genes that can be undergoing fractionation.
In this discussion we cannot overlook the fact that DNA-binding proteins may also establish non-specific interactions with DNA. Given the size of plant genomes, there may be a substantial amount of nonspecific interactions. A transcription factor normally recognizes many fewer specific binding sites with high affinity than nonspecific ones. Mathematical simulations show, for instance, that increasing the concentration of a transcription factor for a smaller concentration of non-specific binding sites (due to DNA deletion), can lead to a non-linear increase in the concentration of specific transcription factor-DNA complexes. As previously suggested, a strategy to maintain non-specific interactions at optimal levels involves pseudogenization without deletion or replacement of deleted DNA by repetitive DNA (Veitia and Bottani, 2009).
One correlate of the proposition of the authors would be that CNS-rich genes would be less represented in segmental copy number changes than CNS-poor genes, an issue that has yet to be examined. Also, the rich collection of data about the classes of genes that are preferentially retained in whole genome duplications and depleted in segmental changes has yet to inspire molecular biological experiments that will clarify aspects of the dynamics of protein-protein and protein-DNA interactions in producing these ultimate balance consequences. If the findings of Schnable and colleagues are confirmed with further genomic and biochemical evidence, the gene dosage balance concept should be broadened to include DNAprotein interactions.