Edited by: Ping Ma, University of Georgia, United States
Reviewed by: Xiaowei Wu, Virginia Tech, United States; Wenxuan Zhong, University of Georgia, United States
*Correspondence: Zoran Nikoloski
This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Plant Science
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Recent advances in metabolomics technologies have resulted in high-quality (time-resolved) metabolic profiles with an increasing coverage of metabolic pathways. These data profiles represent read-outs from often non-linear dynamics of metabolic networks. Yet, metabolic profiles have largely been explored with regression-based approaches that only capture linear relationships, rendering it difficult to determine the extent to which the data reflect the underlying reaction rates and their couplings. Here we propose an approach termed Stoichiometric Correlation Analysis (SCA) based on correlation between positive linear combinations of log-transformed metabolic profiles. The log-transformation is due to the evidence that metabolic networks can be modeled by mass action law and kinetics derived from it. Unlike the existing approaches which establish a relation between pairs of metabolites, SCA facilitates the discovery of higher-order dependence between more than two metabolites. By using a paradigmatic model of the tricarboxylic acid cycle we show that the higher-order dependence reflects the coupling of concentration of reactant complexes, capturing the subtle difference between the employed enzyme kinetics. Using time-resolved metabolic profiles from
Metabolomics profiling technologies are increasingly used for phenotyping of biological systems to understand the contribution of metabolism to complex phenotypes, including growth and diseases (Sumner et al.,
Regulation of reaction rates is necessary to ensure that the activities attributed to different parts of the system are coordinated. The simplest way to capture the coordination of reaction rates is through their coupling, whereby the ratio of the reaction rates is maintained in a narrow range (Millard et al.,
Despite the apparent non-linearites due to the metabolic structure and regulation, metabolic data profiles are usually analyzed by regression-based approaches that can only capture linear relationships. Ever since the seminal work of Vance et al. (
Assuming random fluctuations around a given steady state, metabolic correlations have been related to the Jacobian of the system of ODEs that describe the change in metabolite concentrations (van Kampen,
Here we take a principally different approach motivated by biochemically reasonable assumptions which often hold in realistic biological scenarios. Since biological systems sense and respond to environmental perturbations, they achieve normal functionality in face of these perturbations. To this end, various feedbacks and mechanisms based on network structure have evolved to maintain coupling of reaction rates. Based on this idea and under the assumption that elementary biochemical reactions can be modeled via mass action kinetics (without neglecting the effect of enzymes), here we propose a novel means to analyze metabolic profiles based on the concept of constrained maximal correlation coefficient. We use this approach to analyze and characterize the role of metabolites in a network that captures the reaction rate coupling. First, by using a paradigmatic model of the tricarboxylic acid (TCA) cycle, we investigate the effect from departures of the assumption of mass action on the identified reaction coupling and couplings of reactant complexes. We then show that Stoichiometric Correlation Analysis (SCA) can be employed to perform cross-species comparison of the TCA cycle and amino acid synthesis pathways. In addition, we demonstrate that the proposed approach can be used to mechanistically understand the agronomicaly important process of domestication, here, in the case of wheat as well as in tomato and strawberry.
Modern applications, particularly in computational biology, often consider a large number of variables involved in nonlinear (pairwise) relationships. The maximal correlation coefficient ρ, between a pair of random variables
where the supremum is taken over all functions
Representation of the Maximal Correlation.
There exist efficient algorithms to compute maximal correlation for both discrete (Breiman and Friedman,
Here we define a constrained version of the maximal correlation coefficient which is motivated by modeling of metabolic networks and the principles of their operation. A metabolic network is a collection of metabolites and biochemical reactions through which they are transformed and/or exchanged with the environment. For instance, the network on Figure
Illustration of reaction couplings.
The change in the levels of
Let
To arrive at our approach termed Stoichiometric Correlation Analysis (SCA) we rely on the observation that metabolic networks, as part of inter-related cellular systems (e.g. transcription, translation, and signaling), operate toward providing robust functionality (Kitano,
For instance, at any steady state for the network in Figure
Since the non-zero stoichiometric coefficients α
with
and
If
If there exist multiple vectors β and η, yielding the same value of the stoichiometric correlation, we consider the one of smallest magnitude ||β+η2||. Therefore, stoichiometric correlation can be regarded as constrained maximal correlation, where the constraints pertain to the limited set of values that the entries of β and η are allowed to take following the stoichiometry of reactants. The transformation used in the constrained maximal correlation is explicitly non-linear, since the function
Clearly, the reverse direction also holds and can be used to draw hypotheses about the couplings in reaction rates and substrate complexes in a given metabolic network. To this end, we focus on the statistically significant stoichiometric correlations larger than a threshold value of 0.8 (to account for effects of noise and small deviations from coupling of reaction rates, per Definition 1). Note that since
Given a data set of
The code and one example can be found on GitHub:
Metabolite levels were simulated with three different models using Michaelis-Menten kinetics (Singh and Ghosh,
The change of concentration was simulated with each model over a time course of 1,280 min. The initial concentration of the metabolites, metabolite-enzyme complexes and enzymes was randomly assigned for each of the 10 repetitions from the range of the minimum and maximum metabolite concentration reported in Khodayari et al. (
We applied SCA to several publicly available metabolomics data sets, including metabolic profiles from
We also used the metabolomics data from a recent evolutionary metabolomics study (Beleggia et al.,
Moreover, we included metabolomics data from six different tomato species, namely
To have a more comprehensive comparative analysis pertaining to domestication, we included further data of wild strawberry accessions (
From the derivation of our SCA, it follows that the findings based on the constrained correlation of metabolic data profiles reflect the apparent couplings of elementary reaction rates, assumed to obey mass action kinetic. In addition, the findings reflect the additional couplings which cannot be directly related to reaction rates but are direct consequence of them (e.g., components
To investigate the effects of the departure from the mass action kinetic for the considered reactions (with and without accounting for enzyme action), we considered three models of the tricarboxylic acid (TCA) cycle. All three models include the same metabolites, and differ only with respect to whether or not they include the effect of enzyme action and if they use mass action kinetic or the more involved functional forms of the Michaelis-Menthen kinetic. All reactions are considered reversible, and they are split into irreversible reactions in the cases in which mass action kinetic was employed. We used the TCA cycle model embedded in the kinetic model of
To conduct the comparative analysis, we simulated the models metabolite concentrations with physiologically relevant randomly chosen initial values. The simulation time ranged from 0 to 1,280 min and metabolite concentrations were obtained at 21 time points identical to those used in the study of Caldana et al. (
Distribution of the number of stoichiometric correlations for three models of the TCA cycle. Shown are the distributions of the total number of stoichiometric correlations at four thresholds 0.8, 0.85, 0.9, and 0.95. The distributions for the mass action simulation are shown in red, the distributions for the substrate-enzyme complex mass action simulation are shown in blue, whereas the Michaelis-Menten simulation of the TCA cycle is shown in green.
We found that the total number of stoichiometric correlations between the models with mass action kinetic was more similar with the increase in the considered threshold. In fact, at a threshold of 0.95, the distributions of the total number of stoichiometric correlations between the mass action models with and without the consideration of enzyme action largely overlapped. However, the consideration of reversible Michaelis-Menten kinetic results, on average, in at least three-fold increase in the total number of stoichiometric correlations (see Figure
Empirical cumulative distribution function of stoichiometric correlations for three models of the TCA cycle. Shown is the Empirical cumulative distribution of the total number of stoichiometric correlations of simulations of the TCA cycle. The distribution for the mass action simulation is shown in red, the distribution for the substrate-enzyme complex mass action simulation is shown in blue, whereas the Michaelis-Menten simulation is shown in green.
Therefore, in the case of the TCA cycle models, we concluded that the findings from the assumption that the network is composed of elementary reactions modeled with mass action do not differ upon consideration of enzyme action. In these cases, the couplings corresponding to the stoichiometric correlations reflect the underlying reaction couplings. In contrast, the usage of Michaelis-Menten kinetic results in a considerably larger number of stoichiometric correlations, which cannot be brought in direct correspondence to the coupling of reaction rates and are challenging to mechanistically explain.
The stringent response is one of the most important regulatory systems used by bacteria to adapt to environmental stresses. Upon sensing the environmental change, like nutrient limitation, the organism starts a series of reactions to redirect its metabolic fluxes. The stringent response is mediated by guanosine 3′,5′-bis(pyrophosphate) (ppGpp) whose level is controlled by two enzymes, RelA and SpoT (Traxler et al.,
To help answer this question, we analyzed the set of metabolites from the TCA cycle and the amino acid synthesis pathways from the two model organisms. We used these metabolites since ppGpp controls transcription and translation, which is ultimately reflected in the levels of amino acids. Moreover, a comparative analysis between the two organisms is only meaningful for the same set of metabolites. Altogether, we used the publicly available data profiles of three metabolites from the TCA cycle (i.e., malate, succinate, and fumarate) as well as 16 amino acid measured over seven and five conditions in
The degree of coupling for metabolite
Overview of the number of significant stoichiometric correlations at the considered thresholds for metabolic profiles of the stringent response in
0.80 | 3,419 | 13 | 579 | 2,827 | 24 | |
3,301 | 9 | 517 | 2,775 | 10 | ||
0.85 | 2,500 | 8 | 398 | 2,094 | 18 | |
1,921 | 6 | 285 | 1,630 | 7 | ||
0.90 | 1,821 | 6 | 285 | 1,530 | 15 | |
597 | 1 | 76 | 520 | 2 | ||
0.95 | 1,137 | 6 | 188 | 943 | 7 | |
2 | 0 | 0 | 2 | 0 |
For the purpose of comparison, at all threshold values and in both species, we observed a decrease on the number of significant stoichiometric correlations for pairs of metabolites, compared to Pearson correlation (i.e.,|
However, SCA allows the analysis of stoichiometric correlations due to triples and quadruples of metabolites, which provides information about the presence of non-linear relationships via the couplings of reaction rates. For all considered thresholds, applying SCA with the
Additionally, we can investigate overlapping pairs, triple and quadruple for each threshold. The small similarity of SCA findings was reflected in 65 and 442 stoichiometric correlations due to triple and quadruple, respectively, shared between the two species at a threshold value of 0.8 (see Supplemental Table
Domestication of tetraploid wheats,
Since important domestication-associated traits (e.g., the increase in seed size, the loss of dormancy Gepts and Papa,
We applied SCA to contents of 22 metabolites from four compound classes (i.e., amino acids, sugars, organic acids, and alcohols) within each population at four threshold values (see Materials and Methods). These metabolites were selected based on their presence in every of the analyzed accessions to allow comparative analysis of the populations without the need of imputation as well as assumptions about the reasons for absence of detected metabolite. The number of stoichiometric correlations due to triples shared between emmer and wild emmer was the highest, followed by that between durum and wild emmer (at threshold of 0.8). At a threshold value of 0.85, both emmer and wild emmer had one overlapping triple with durum. However, at a threshold value of 0.8, durum wheat shared more stoichiometric correlations due to quadruples with wild emmer than emmer. At thresholds of 0.85 and 0.9, durum shared the same number of quadruples with emmer and wild emmer. In all cases, only stoichiometric correlations due to quadruples were shared between all three populations (e.g., stoichiometric correlations at threshold of 0.85, Supplemental Table
Number of Stoichiometric Correlations of Wheat taxa. Shown is the number of stoichiometric correlations at the four thresholds 0.8, 0.85, 0.9, and 0.95. The bars represent the total number of stoichiometric correlations, pairs, triplets and quadruples for all three wheat taxa. The exact values are shown above the bars.
This finding implied that the loss of traits due to domestication and increase in seed size were associated with an overall loss of reaction couplings reflected in the smaller number of stoichiometric correlations in durum wheat in comparison to (wild) emmer (Supplemental Tables
To further validate our results from the three wheat taxa, we included data of six different tomato species into the analysis. We compared the domesticated
Overview of number of significant stoichiometric correlations at different thresholds for the considered tomato and strawberry species.
0.80 | Tomato wildtype | 19,519 | 8 | 1,245 | 18,266 | 27 |
M82 | 23,571 | 15 | 1,824 | 21,732 | 12 | |
1,346 | 6 | 204 | 1,136 | 6 | ||
2,374 | 12 | 433 | 1,929 | 5 | ||
0.85 | Tomato wildtype | 9,291 | 5 | 588 | 8,698 | 20 |
M82 | 9,539 | 5 | 688 | 8,846 | 4 | |
504 | 2 | 73 | 429 | 3 | ||
2,075 | 10 | 366 | 1,699 | 5 | ||
0.90 | Tomato wildtype | 3,741 | 3 | 255 | 3,483 | 11 |
M82 | 1,493 | 1 | 112 | 1,380 | 0 | |
135 | 1 | 22 | 112 | 1 | ||
1,153 | 2 | 185 | 966 | 1 | ||
0.95 | Tomato wildtype | 818 | 1 | 76 | 741 | 3 |
M82 | 21 | 0 | 0 | 21 | 0 | |
1 | 0 | 0 | 1 | 0 | ||
423 | 1 | 56 | 366 | 1 |
A very similar scenario was considered with the strawberry accessions
The application of SCA to metabolomics data from domestication implies a new principle which underlies this agronomically and evolutionary important process; namely, optimizing a given trait could be accomplished by breaking the existing regulatory mechanisms, reflected in the coupling of the biochemical reaction rates, which in turn provides a greater space of possibilities on which selection can operate.
Here we proposed a constrained extension to the concept of maximal correlation, based on the concept of reaction rate coupling in networks of metabolic reactions. The concept of reaction couplings forms the core of the stoichiometric correlation analysis. The constraints in the maximal correlation are due to the values which the linear combinations of log-transformed metabolic profiles are allowed to take. SCA facilitates the comparison of data sets on the same metabolites between two scenarios with the idea of comparing and contrasting the degree of coupling. By determining the stoichiometric correlations of metabolic profiles from the TCA cycle and amino acid synthesis, we showed that
ZN conceived the project and wrote the article with contribution of all authors. KS performed most of the analysis and analyzed the data. RB provided data and helped interpret the findings related to domestication. NO designed parts of the algorithm and helped with its implementation.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The reviewer WZ and handling Editor declared their shared affiliation.
The authors would like to thank Prof. Roberto Papa at the Universita Politecnica delle Marche, Ancona, Italy, for kindly providing the metabolomics data from the three wheat taxa. The authors would also like to thank Dr. Detlef Ulrich from the Julius Kühn-Institut (JKI) - Bun- desforschungsinstitut für Kulturpflanzen, Institut für ökologische Chemie, Pflanzenanalytik und Vorratsschutz, for providing the strawberry data.
The Supplementary Material for this article can be found online at: