^{1}

^{2}

^{*}

^{3}

^{*}

^{1}

^{2}

^{1}

^{4}

^{1}

^{1}

^{2}

^{3}

^{4}

Edited by: Jorge L. M. Rodrigues, University of California, Davis, United States

Reviewed by: Md Abdul Wadud Khan, University of Texas MD Anderson Cancer Center, United States; Christopher Blackwood, Kent State University, United States

This article was submitted to Systems Microbiology, a section of the journal Frontiers in Microbiology

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Understanding the interactions between microbial communities and their environment sufficiently to predict diversity on the basis of physicochemical parameters is a fundamental pursuit of microbial ecology that still eludes us. However, modeling microbial communities is problematic, because (i) communities are complex, (ii) most descriptions are qualitative, and (iii) quantitative understanding of the way communities interact with their surroundings remains incomplete. One approach to overcoming such complications is the integration of partial qualitative and quantitative descriptions into more complex networks. Here we outline the development of a probabilistic framework, based on Event Transition Graph (ETG) theory, to predict microbial community structure across observed chemical data. Using reverse engineering, we derive probabilities from the ETG that accurately represent observations from experiments and predict putative constraints on communities within dynamic environments. These predictions can feedback into the future development of field experiments by emphasizing the most important functional reactions, and associated microbial strains, required to characterize microbial ecosystems.

Recent advances in molecular biology and computational biology have transformed approaches to characterize microbial communities (Segata et al.,

Among the techniques that integrate uncertainties, the Bayesian network is a probabilistic graph model that represents the biological compound interactions via a directed acyclic graph (Friedman et al.,

ETG was originally developed to model multi-scale systems and Bourdon et al. (

Network of the nitrogen cycle and its probabilistic simulation.

Herein, we briefly describe the ETG modeling approach and the associated requirements for running the programs. We will then demonstrate the application of ETG within the context of microbial ecology for the first time. We focus here on the nitrogen cycle. Beyond the intrinsic importance of nitrogen for biological systems, its cycling results from versatile redox chemical reactions. Combined together, these reactions promote complex biogeochemical transformations and structure microbial communities. From a modeling viewpoint, the nitrogen cycle presents three features that make it a promising candidate for new quantitative modelings. First, and despite recent studies uncovering new reactions and pathways (Kuypers et al.,

ETG requires expert biological knowledge be formalized as a graph. Experimental knowledge will be then incorporated into the model via a learning procedure that weights the edges of this graph.

The first input into ETG modeling is a list of biological events as well as the consequences of these events. For the sake of illustration, when representing the nitrogen cycle, the events are reactions (e.g., nitrification, denitrification, etc.) and their consequences are the respective production and consumption of metabolites (e.g., NH_{4}^{+} NO_{3}^{−}). This knowledge is a mechanistic description of an event and is necessary to estimate the “

Concomitantly, as an additional modeling input, interactions between events take the form of a graph that links reactions (i.e., nodes of the graph) when the product of one reaction becomes the substrate for another reaction (directed edge). Thus, the above 14 reactions result in a graph of 14 nodes and 32 edges (see

In addition to the overall definition of an event (i.e., reactions and product/substrate definition) and description of the interactions within events (through the construction of a graph), the cost of considering one event over another must also be defined. As a mechanistic description, each event consumes and produces compounds, which will point to the cost of using events. For instance, each reaction within the nitrogen cycle can be described by its stoichiometry (i.e., –1 for a metabolite consumption and + 1 for a metabolite production). However, when randomly crossed, the graph could promote an artificial increase or decrease of a given compound, solely due to the graph topology and chemical stoichiometry. Such a result would not represent a correct output of the modeling approach, but rather a prospective flaw. To avoid this, one must compute the cost (denoted initial cost) for all compounds for each event, that is not the stoichiometry

ETG modeling estimates probabilities associated with interactions between events (herein reactions) such that the succession of events reproduce quantitative experimental data. For illustration, we use chemical variables from Bouskill et al. (

Dissolved inorganic nitrogen concentrations (μM) over the time-course of dataset from sampling station CB100 surface as presented in Bouskill et al. (

April 2001 | 8.4 | 0.8 | 88.7 |

August 2001 | 4.2 | 0.4 | 19.9 |

October 2001 | 9.3 | 7.9 | 24.2 |

April 2002 | 8.1 | 0.1 | 59.1 |

August 2002 | 6.2 | 0.4 | 11 |

October 2002 | 2.2 | 5.7 | 19.3 |

April 2003 | 6.3 | 0.5 | 76.8 |

October 2003 | 1.8 | 1.1 | 101.9 |

April 2004 | 3.4 | 0.6 | 94.7 |

August 2004 | 9.7 | 2 | 86.2 |

October 2004 | 5.8 | 1.1 | 63.7 |

Experimental variation in rates for each season (from April to August, from August to October, and from October to April) for the years 2001, 2002, 2003, and 2004, and for each nutrient was thus estimated from

Once the ETG model considers (i) a set of events and their putative interactions (section 2.1.1); (ii) a cost for each event (section, one (iii) a quantitative rate that depicts an experimentally observed quantitative variation impacted by at least one event (section 2.1.3), one seeks then to learn probabilities to prioritize interactions between events in order to reproduce the above computed rates as they resume the environmental conditions to reproduce. The overall parameterized model will herein reproduce variations of ammonia, nitrite, and nitrate by weighting the succession of metabolic reactions (e.g., the cost of consuming or producing a given compound resuming a reaction). An optimization process (see technical details in Bourdon et al.,

Summary of the ETG model Probabilities and Sensitivities trained on ammonia and nitrite concentrations. Panel

Along with probability estimates for transitions between each event, a sensitivity score (

Following the training protocol, a Markov Chain simulation algorithm allows to simulate the variation of quantities over time. As input, the simulation considers (i) initial quantities (i.e., red dots in

Summary of the random ETG model Probabilities and Sensitivities trained on ammonia and nitrites concentrations.

However, there is no notion of atoms in the ETG simulation. Indeed, our simulation process differs from the Gillespie algorithms in the sense that the probabilities of reaching an event are supposed to be constant. In the Gillespie algorithm, there is a significant compromise between the number of molecules of a particular species and the volume of the cell. In our simulation method, the compromise is between the costs, that describe how the molecules evolve for a given event, and the duration of an event (time-step) that one fixed in our study to 2 h.

The Event Transition Graph (ETG) modeling was performed via a Python package called POGG. This package is the first python implementation of the ETG modeling, as proposed in Bourdon et al. (

Ammonia oxidizing organisms (AOO) mediate the rate-limiting step of nitrification (i.e., NH_{3} → NO_{2}), a rate-limiting step in the Nitrogen cycle (Ward et al.,

To estimate probabilities between reactions and train the ETG, we used an existing environmental dataset representing variations in Chesapeake Bay ammonia, nitrite, and nitrate concentrations (μM) between 2001 and 2004 (Bouskill et al.,

Beyond the probabilistic simulations, the analysis of probabilities between reactions (

Concomitantly, the transition between R00148 and R00143 indicates a small efficiency of transforming ammonia into nitrite. Combined, both results emphasized the need to constrain fluxes between ammonia to hydroxylamine and back to replicate the variation of quantities; fluxes in which, among others, AOO could be involved by carrying the

_{3} and NO_{2} transformations that are required to reproduce training conditions in

Additionally, the model shows interactions of these reactions with others that are also of interest. Dissolved inorganic nitrogen concentration variations (see

The general criticism about probabilistic models concerns their use as a statistical protocol that reproduces observed data with no biological specificity. Contrary to other probabilistic modelings, ETG considers a mechanistic interpretation of the systems via the use of a graph of events. The use of a description of events allows specifying the model to perform a given (biological) behavior and to test it regarding experimental data. For an illustration of the interest of ETG specificity, we propose to build a counterexample by randomizing the model and training it on the same dataset.

The randomized model consists of building a graph similar to the nitrogen cycle graph for which all edges have been shuffled by permutation. The randomized model is then similar to the ETG nitrogen cycle model regarding the numbers of nodes and edges. We then applied a similar modeling and training procedure to that described above. As pictured in

The goal of this study is to demonstrate the interest of the ETG modeling framework. In this purpose, one uses a reduction of the nitrogen metabolic network. From the biological viewpoint, despite partial promising outcomes, several modeling results do not reproduce the experiments. First, the probabilistic model does not accurately simulate the variation of nitrate, while reproducing ammonia and nitrite quantity variations. Second, and not presented in this study, the model is not able to simulate ammonia and nitrite quantities as taken from anaerobic samples (Bouskill et al.,

Unlike other probabilistic modelings, ETG modeling is less plastic. The modeling requires a qualitative description of biological events that take place to reproduce quantitative biological data. The qualitative specification constrains the model by describing all putative biological behaviors (i.e., the succession of events and their effects). Among them, once learned, probabilities allow considering a few to reproduce a given quantitative behavior. Compared to general Bayesian modelings, this combination of qualitative and quantitative knowledge makes our probabilistic modeling sensitive to mechanistic descriptions that take the form of scrupulous accumulations or consumptions of quantities over time (Picard et al.,

This ETG framework is ideal for investigating the dynamic and transient nature of microbial ecosystems when few quantitative knowledge is available. ETG does not begin with an assumption of a community at steady state, unlike Flux Balance Analysis techniques to model metabolic networks (see Perez-Garcia et al.,

Applied to the metabolic network modeling, ETG emphasizes the biochemical constraints (i.e., the transition between reactions) necessary to satisfy for reproducing variations of quantities emerging from the biological system. One advocates that these constraints could impact as well the microbial communities that are providing the constrained metabolic reactions. To validate this assumption, one must consider further biological knowledge such as a systematic description of the microbial ecosystem over time. For instance via 16S rRNA sequencing, one could associate the patterns of microbial diversity with the metabolic constraints as highlighted by the ETG (i.e., sensitivities) and further compared them to co-occurrence patterns (Cram et al.,

Similarly, the same dynamical property should be a benefit to decipher subsets of metabolites that are of interest in a given ecosystem. In this purpose, one must associate this result with genomic descriptions of prokaryotic organisms, for instance via metatranscriptomic or metagenomic studies. Such association between modeling outcomes and meta-omics knowledge could drive future definitions of the keystone species (i.e.,

Finally, this study considered an hypothetical metabolic network as a qualitative description, but ETG is a modeling paradigm that could consider other qualitative descriptions of the microbial ecosystem such as co-occurrence networks (Patel et al.,

DE, NJB, BW, and JB designed the study. DE, DV, JG, and JB performed the study. DE and NJB analyzed the data. DE, NJB, DV, BW, and JB wrote the paper.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The Supplementary Material for this article can be found online at: