Is central dogma a global property of cellular information flow?
- 1Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata, Japan
- 2Graduate School of Media and Governance, Keio University, Fujisawa, Kanagawa, Japan
The central dogma of molecular biology has come under scrutiny in recent years. Here, we reviewed high-throughput mRNA and protein expression data of Escherichia coli, Saccharomyces cerevisiae, and several mammalian cells. At both single cell and population scales, the statistical comparisons between the entire transcriptomes and proteomes show clear correlation structures. In contrast, the pair-wise correlations of single transcripts to proteins show nullity. These data suggest that the organizing structure guiding cellular processes is observed at omics-wide scale, and not at single molecule level. The central dogma, thus, globally emerges as an average integrated flow of cellular information.
Information processing is essential in all fields of science. In molecular biology, the central dogma, first coined by Francis Crick (Crick, 1958, 1970), is a classical backbone of living cells to fundamentally execute processes from cell division to death through the DNA, RNA, and protein information pathways. More specifically, the central dogma describes the transfer of sequence information during DNA replication, transcription into RNA, and translation into amino-acid chains forming proteins. At the same time, it also states that information cannot flow from protein to protein or nucleic acid.
Since the advent of systemic and high throughput approaches over the last two decades, these broad steps, which do not include complex regulatory details, have come under intense scrutiny. The missing regulatory features, such as the DNA proofreading/repair mechanisms and alternative splicing of pre-mRNA, introduce several intermediary steps. These additional steps interfere with the key steps of the dogma and likely alter the information dynamics. In addition, epigenetics, or the role played by chromatin structures, DNA methylation and histone modifications, also seem to go against the simple pathways of the dogma (Shapiro, 2009; Luco et al., 2011). Protein splicing, or the ability of a protein (inteins) to alter its own sequence, discovered in recent times (Volkmann and Mootz, 2012) and prions, which modify other protein sequences (Prusiner, 1998), bypass the information transfer pathway of the dogma. Other investigations reported errors or mismatches between RNA sequences and their coding DNA (Hayden, 2011; Li et al., 2011). Taken together, these data cast doubts on the validity of the central dogma in the context of present day science and, therefore, question the simplicity of linear information flow (DNA to RNA, and RNA to protein).
To put things into perspective, we require analytical tools that investigate the concerns or discrepancies regarding the long-standing theory. One simple, yet highly useful technique for searching global properties in high-throughput datasets is statistical correlation analysis, which has been widely and successfully used to observe patterns in complex systems such as the weather (Stewart, 1990), stock markets (Lo and MacKinlay, 1988) and cosmology (Amati et al., 2008). There are several kinds of correlation analyses that evaluate both linear (e.g., Pearson product-moment) and non-linear (e.g., Spearman's rank, Mutual Information) dependencies (Steuer et al., 2002; Rosner, 2011). In particular, the Pearson product-moment correlation analysis has become the most popular due to its ability to show organizational structure in the simplest form.
In biology, there have been numerous works that have studied the correlations in the mRNA and protein expression data (see below and Table 1). In theory, when two samples containing high-dimensional (such as microarray and proteomic) data are compared, the correlation analyses provide a measure of deviation from unity as a source of difference between the samples. Briefly, two samples with identical and completely non-identical information will show unit (R2 = 1) and null (R2 = 0) correlation, respectively.
Perfect correlation (R2 = 1) is an idealized situation that is far from reality, as technical or experimental noise alone interferes and reduces correlation. Moreover, the recent years have highlighted the existence of biological noise: the studies on individual cells and molecules have shown stochasticity in gene expression dynamics due to the combinatorial effect of low molecular copy numbers and the quantal nature of promoter dynamics (Raj and van Oudenaarden, 2009; Eldar and Elowitz, 2010). On the other hand, clonal populations of cells display heterogeneity in the levels of a given protein expression per cell at any measured time (Chang et al., 2008). Together, stochasticity and heterogeneity are essential for producing cell fate diversification, phenotypic variations, and amplification of intracellular signals (Locke et al., 2011; Selvarajoo, 2012).
The stochastic fluctuations, or intrinsic noise, cause the expression of a molecular species to vary in time and between cells, leading to uncorrelated responses (Elowitz et al., 2002). This is especially prominent for mRNAs and proteins with low copy numbers. Thus, the between samples (cells) correlation can be lowered due to intrinsic noise (Figure 1A). Other sources of biological noise due to extrinsic factors include variability in cell size, molecular copy numbers, and environmental fluctuations between individual cells. These factors distort the deterministic central dogma and likely alter strong correlations into weaker ones (Figure 1B).
Figure 1. Biological and non-biological noise reduce the between samples correlation structure. (A) Stochastic fluctuations reduce correlations, especially for low copy number of molecular species (R2 ~0.15 for log(X) < 2). The green dotted lines represent the intrinsic noise region generated by Poisson process (Raj and van Oudenaarden, 2009). Insert: the correlation structure disappears when zooming at smaller or single molecule scale. (B) Stochastic fluctuations (intrinsic) on variable (extrinsic) noise further reduce the overall correlation structure. Variable noise is represented by a Gamma distribution (Taniguchi et al., 2010). R2 is obtained by squaring the Pearson product-moment correlation coefficient, where X = (x1, …, xi, …, xN) and Y = (y1, …, yi, …, yN) are 2 N-dimensional variables, xi and yi are the ith observation (i = 1, …, N) of X and Y respectively. μX and μY are the statistical means of the two variables. (C) Stochastic and (D) total (stochastic and variable) noise reduce when single samples are averaged into population. (E) and (F) show noise, η2 = σ2XY / μ2XY, versus <log(Xi)> for (C) and (D), respectively, where (Jones and Payne, 1997), and the jth element of vectors Xi = (xi, 1, …, xi,j, …, xi,P) and Yi = (yi, 1, …, yi,j, …, yi,P) is the expression of the ith gene in the jth sample for (P = 100) pairs of samples. In (F), at higher expressions for single cells, the remaining noise represents the extrinsic or variable noise. At averaged population scale, this noise is significantly reduced due to the effect of random noise cancellation.
One recent study compared Escherichia coli mRNA and protein expressions between individual cells at single molecule level and provided a scenario that deeply questions the central dogma. Taniguchi et al. (2010) revealed that there is no correlation (R2 ~ 0) between individual tufA mRNA and protein levels in single cells. Notably, they concluded that the lack of correlation is likely due to differences in mRNA and protein lifetimes. Although this is a plausible explanation, Taniguchi et al. were careful not to disprove the long-holding hypothesis by claiming that time averages of mRNA levels should correlate with protein levels. However, there was no evidence shown to demonstrate that this is the actual case, and when we evaluated non-linear dependencies using mutual information (Steuer et al., 2002; Tsuchiya et al., 2010) in Taniguchi et al. dataset, we found the result to be non-dependent, i.e., I ~ 0. This confirms that mRNA to protein expressions between individual cells at single molecule level are clearly unrelated. Furthermore, when zooming at single molecule level in the correlation plot, it is evident that their pair-wise correlations are weak (Figure 1A, insert, for illustration).
Notably, at cell population level, Taniguchi et al. were able to show relatively high correlation between mRNA and protein expressions with R2 = 0.29 (Figure 2A). In fact, another independent study by Lu et al. (2007), for E. coli population, also showed relatively high correlation (R2 = 0.47). Similar analyses performed on Saccharomyces cerevisiae (Futcher et al., 1999), murine NIH/3T3 fibroblast (Schwanhäusser et al., 2011) and several other cell populations (Nie et al., 2006; Schmidt et al., 2007; Jayapal et al., 2008; de Sousa Abreu et al., 2009) all showed correlated structures between transcriptome-wide and proteome-wide expressions (Table 1). So, why is there no correlation between individual mRNA and protein expressions in single cells, while at population level, collective relationships are observed between large-scale mRNA and protein expressions?
Figure 2. Omics-wide expression correlations. Cell populations: mRNA-protein correlations in (A) E. coli (Taniguchi et al., 2010) and (B) S. cerevisiae (Fournier et al., 2010) between mRNA expressions at t = 60 min and protein expressions at t = 360 min. Insert: correlation matrix between all time points shows a delayed increase in correlations between mRNA and proteins. (C) mRNA and (D) protein expressions between two samples of murine NIH/3T3 cells (Schwanhäusser et al., 2011). Single cells: (E) mRNA expressions between two oocytes (Tang et al., 2009). The red dotted lines indicate the regions of low mRNA expressions (log(mRNA) < 5). (F) Noise (η2) versus log(mRNA expressions) for cell population (NIH/3T3, black dots, Schwanhäusser et al., 2011) and single cells (Oocytes, green triangles, Tang et al., 2009). Each dot represents the value for a group of P = 100 mRNAs. η2 is near zero for the cell population for all mRNA expressions. For single cells, η2 is highest for mRNAs with the lowest copy numbers, and approaches zero for higher copy numbers.
We believe there are two major reasons for the differences. Firstly, as noted earlier, noise, whether biological or non-biological in nature, reduces correlation. Since analyses on single cells have shown the importance of stochasticity and variability, these effects are crucial for reducing single cell correlations. At ensemble level, when cells are sampled into a population, the total (intrinsic + extrinsic) noise is reduced, as random noise cancels out across all range of molecular expressions (Figures 1C–F), to reveal average response and self-organization (Karsenti, 2008; Selvarajoo, 2011; Hekstra and Leibler, 2012; Selvarajoo and Giuliani, 2012). Hence, a good degree of mRNA-protein expression correlation emerges. Secondly, for the single cell study (Taniguchi et al., 2010), individual mRNA-protein expression correlation was compared across numerous cells. In cell population studies, however, the comparison is made in entirety, across thousands of mRNAs and proteins over several orders of magnitude greater than the range of expression found for single molecule between cells. This, therefore, leads to higher correlations at population level as the effect of single molecular variations becomes negligible.
Despite correlated structures being observed for cell populations, there are tangible reasons for the large deviation from perfect correlation. As noted earlier, one key point is that mRNAs and proteins are sequentially located with several missing processes, unrepresented in the central dogma. Adding the missing intermediates along a biochemical pathway will incur a noticeable delay in information flow (Selvarajoo, 2006, 2011; Piras et al., 2011), and the correlation between them could suffer as a result. This could also be part of the fact noted by Taniguchi et al. that mRNA and protein expressions have different lifetimes. Notably, this postulation is supported in a recent work on S. cerevisiae treated with Rapamycin that showed the temporal correlations of mRNA-protein expression were initially low, R2 = 0.01 at 40 min, nevertheless, over 360 min after perturbation, the correlation increased, R2 = 0.36 (Fournier et al., 2010, Figure 2B). The data indicate that upon chemical perturbation, the initial response between mRNA and protein expressions deviates due to time-delay and different kinetic mechanisms between them, as well as secondary effects such as autocrine or paracrine signaling interference (Shvartsman et al., 2002; Isalan et al., 2008). When the effects of the perturbation are attenuated over time, the recovery of correlations occurred.
To further check the postulation that sequential delay processes or different lifetimes are crucial for decreasing mRNA-protein correlations, we compared R2 between the same molecular species of the central dogma (e.g., between mRNA and mRNA) in cell populations and single cells. The transcriptome-wide mRNA-mRNA expression correlation between replicates of NIH/3T3 (Schwanhäusser et al., 2011) (Figure 2C) and Mycobacterium tuberculosis (Ward et al., 2008) cell population samples are both very high, with R2 > 0.9 (Table 1). Such strong correlations are also observed between population samples for protein–protein expressions in NIH/3T3 cells (Schwanhäusser et al., 2011) (Figure 2D), Porphyromonas gingivalis (Xia et al., 2007) and Glycine max (Brandão et al., 2010) (Table 1). Since these data that compare same species yield very high correlations, it is conceivable that the sequential delay processes or different lifetimes are responsible for lowering the population level correlation structures between mRNA and protein expressions.
In single murine oocytes (Tang et al., 2009), when comparing entire mRNA–mRNA expressions, a highly correlated structure is observed (R2 = 0.92, Figure 2E). However, focusing only on lowly expressed mRNAs (with logarithmic expressions < 5), the stochastic noise lowers the pair-wise correlation quite dramatically (R2 < 0.54). To probe this result we evaluated noise, η2 = σ2XY/μ2XY, across entire mRNA expressions (Figure 2F). We noted that η2 is highest for the lowest expressions, due to the pronounced effect of stochastic fluctuations in comparison to their expressions, and approaches zero for higher expressions, where such noise becomes less significant (Piras et al., 2012). For cell population, as expected, near zero noise is observed across the entire expression range due to the canceling out of random noise (Figures 1E,F).
Highly correlated structures for entire mRNA–mRNA expressions were also reported for single cancer cell (Fan et al., 2012), albeit less significant with R2 ~ 0.7 (Table 1). Furthermore, protein–protein expressions comparison in LPS-stimulated human macrophages also showed high correlations, R2 ~0.72 (Shin et al., 2011) (Table 1). Although there is no correlation between individual mRNA-protein expressions in single cells, the large-scale or omics-wide correlation between same molecular species in single cells is very high.
Thus, whether single cells or cell populations, the omics-wide data indicate that the correlations between the same molecular species (mRNA vs. mRNA, and protein vs. protein) are noticeably higher than between different species (mRNA vs. protein). This reflects the fact that although time-delay processes and differing lifetimes are key for reducing correlations, these mechanisms are not sufficient for supporting the lack of correlation structure observed between single cells' individual transcript to protein expressions.
So far, through investigating large-scale expressions of mRNAs and proteins of various cellular systems, we have shown that correlation structures emerge at a global scale. However, the correlation analyses reveal only the connectivity between two tested samples, and do not show the direction of information flow. For the central dogma to be valid on a global scale, the overall flow of information should be from DNA to proteins. Such flow of information has been demonstrated by myriad other studies that involve perturbing the receptors of cell populations and monitoring the resultant dynamics of transcription factors binding to DNA and the induction of large scale gene expressions (Figure 3A). For example, in the case of LPS-stimulated immune cells, it has been demonstrated that the activation of the transcription factor NF-κB occurs at around 15 min (Liu et al., 1999), the induction of its downstream genes at about 30 min (Liu et al., 1999; Xaus et al., 2000; Selvarajoo et al., 2008), and the translation of the corresponding proteins in the region of 60–90 min (Kawai et al., 1999; Xaus et al., 2000) (Figure 3B). Such sequential direction of the overall transcription to translation information flow is also observed for bacterial systems, such as E. coli, at cell population level (Golding et al., 2005).
Figure 3. The information flow of central dogma. (A) Schematic of LPS/TLR4-induced TNF expression, via transcription factor NF-κB and tnf gene, following linear information flow. (B) Experimental temporal profiles of promoter binding activity of NF-κB (upper panels), tnf (middle panels), and TNF (lower panels) expressions at cell population level. (C) Schematic temporal profiles of promoter dynamics, mRNA, and protein expressions at single-cell level (Raj and van Oudenaarden, 2009).
Alternatively, investigations at single cell resolution reveal random fluctuations over the linear information flow: the transcription factors binding to DNA promoter regions is quantal, resulting in bursting behavior of the mRNA transcription and, subsequently, induces variability in the protein translation, even between identical cells (Figure 3C) (Raj and van Oudenaarden, 2009; Eldar and Elowitz, 2010; Locke et al., 2011; Hekstra and Leibler, 2012; Selvarajoo, 2012). As a result, at any particular time point, the individual molecular response for single cells is rather noisy compared to population average scale (Selvarajoo, 2011).
The examples shown in this paper highlight the differences in the order of correlation values observed between species in the central dogma over cell populations and single cells. The statistical analyses from cell populations paint a picture that the expression correlation between the same molecular species is very high and between species is moderately high. Although single cell correlations between the same species are comparable with cell populations, they showed a wider scatter in their expressions plots due to the pronounced effect of biological noise, especially for transcripts with low copy numbers. Notably, the single cells' pair-wise correlation becomes zero for individual molecules (Taniguchi et al., 2010). In fact, stochastic fluctuations and variability in molecular expressions are known to be functional in generating cell fate decision and tipping cellular states (Losick and Desplan, 2008; Eldar and Elowitz, 2010; Kuwahara and Schwartz, 2012). We believe that the strong omics-wide correlations occur as a result of tight gene and protein regulatory networks across thousands of molecules (Barabási and Oltvai, 2004; Karsenti, 2008) resulting in emergent average responses. Analyzing small number or individual molecules, the correlation structure cannot be observed.
Overall, it is conceivable that viewing the information flow of single DNA to protein will question the central dogma as the response of each molecule at any single time will not likely correlate. However, globally, the observation of average deterministic response suggests that the net equilibrium of the genetic information remains to the far right of the pathways. Therefore, the central dogma should be viewed as a macroscopic cellular information flow on an omics-wide scale, and not at single gene to protein level. As such, we believe its simplicity will continue to remain as one of the most influential theoretical pillars of living systems.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Kentaro Hayashi is thanked for comments. The research fund of Tsuruoka city and Yamagata Prefecture is appreciated for their support.
Amati, L., Guidorzi, C., Frontera, F., Della Valle, K., Finelli, F., Landi, R., et al. (2008). Measuring the cosmological parameters with the Ep, i-Eiso correlation of Gamma-Ray Bursts. Mon. Not. R. Astron. Soc. 391, 577–584.
Brandão, A. R., Barbosa, H. S., and Arruda, M. A. (2010). Image analysis of two-dimensional gel electrophoresis for comparative proteomics of transgenic and non-transgenic soybean seeds. J. Proteomics 73, 1433–1440.
Fan, J. B., Chen, J., April, C. S., Fisher, J. S., Klotzle, B., Bibikova, M., et al. (2012). Highly parallel genome-wide expression analysis of single mammalian cells. PLoS ONE 7:e30794. doi: 10.1371/journal.pone.0030794
Fournier, M. L., Paulson, A., Pavelka, N., Mosley, A. L., Gaudenz, K., Bradford, W. D., et al. (2010). Delayed coxrrelation of mRNA and protein expression in rapamycin-treated cells and a role for Ggc1 in cellular sensitivity to rapamycin. Mol. Cell. Proteomics 9, 271–284.
Jayapal, K. P., Philp, R. J., Kok, Y. J., Yap, M. G., Sherman, D. H., Griffin, T. J., et al. (2008). Uncovering genes with divergent mRNA-protein dynamics in Streptomyces coelicolor. PLoS ONE 3:e2097. doi: 10.1371/journal.pone.0002097
Lu, P., Vogel, C., Wang, R., Yao, X., and Marcotte, E. M. (2007). Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nat. Biotechnol. 25, 117–124.
Nie, L., Wu, G., and Zhang, W. (2006). Correlation of mRNA expression and protein abundance affected by multiple sequence features related to translational efficiency in Desulfovibrio vulgaris: a quantitative analysis. Genetics 174, 2229–2243.
Selvarajoo, K., Takada, Y., Gohda, J., Helmy, M., Akira, S., Tomita, M., et al. (2008). Signaling flux redistribution at toll-like receptor pathway junctions. PLoS ONE 3:e3430. doi: 10.1371/journal.pone.0003430
Shvartsman, S. Y., Hagan, M. P., Yacoub, A., Dent, P., Wiley, H. S., and Lauffenburger, D. A. (2002). Autocrine loops with positive feedback enable context-dependent cell signaling. Am. J. Physiol. Cell Physiol. 282, C545–C559.
Taniguchi, Y., Choi, P. J., Li, G. W., Chen, H., Babu, M., Hearn, J., et al. (2010). Quantifying, E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science 329, 533–538.
Tsuchiya, M., Piras, V., Giuliani, A., Tomita, M., and Selvarajoo, K. (2010). Collective dynamics of specific gene ensembles crucial for neutrophil differentiation: the existence of genome vehicles revealed. PLoS ONE 5:e12116. doi: 10.1371/journal.pone.0012116
Volkmann, G., and Mootz, H. D. (2012). Recent progress in intein research: from mechanism to directed evolution and applications. Cell. Mol. Life Sci. doi: 10.1007/s00018-012-1120-4. [Epub ahead of print].
Xaus, J., Comalada, M., Valledor, A. F., Lloberas, J., López-Soriano, F., Argilés, J. M., et al. (2000). LPS induces apoptosis in macrophages mostly through the autocrine production of TNF-alpha. Blood 95, 3823–3831.
Xia, Q., Wang, T., Park, Y., Lamont, R. J., and Hackett, M. (2007). Differential quantitative proteomics of Porphyromonas gingivalis by linear ion trap mass spectrometry: non-label methods comparison, q-values and LOWESS curve fitting. Int. J. Mass Spectrom. 259, 105–116.
Keywords: gene expression, central dogma, biological noise, correlation analysis, emergent behavior
Citation: Piras V, Tomita M and Selvarajoo K (2012) Is central dogma a global property of cellular information flow? Front. Physio. 3:439. doi: 10.3389/fphys.2012.00439
Received: 05 October 2012; Paper pending published: 30 October 2012;
Accepted: 02 November 2012; Published online: 23 November 2012.
Edited by:Xiaogang Wu, Indiana University-Purdue University Indianapolis, USA
Reviewed by:Steven G. Gray, St. James Hospital/Trinity College Dublin, Ireland
Katja Wegner, Cooperative State University Baden-Wuerttemberg, Germany
Tianhua Niu, Tulane University School of Public Health and Tropical Medicine, USA
Copyright © 2012 Piras, Tomita and Selvarajoo. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.
*Correspondence: Kumar Selvarajoo, Institute for Advanced Biosciences, Keio University, 14-1 Baba-cho, Tsuruoka, Yamagata, Japan. e-mail: email@example.com