The CoLoMoTo Interactive Notebook: Accessible and Reproducible Computational Analyses for Qualitative Biological Networks

Naldi, Aurélien; Hernandez, Céline; Levy, Nicolas; Stoll, Gautier; Monteiro, Pedro T.; Chaouiya, Claudine; Helikar, Tomáš; Zinovyev, Andrei; Calzone, Laurence; Cohen-Boulakia, Sarah; Thieffry, Denis; Paulevé, Loïc

doi:10.3389/fphys.2018.00680

METHODS article

Front. Physiol., 19 June 2018

Sec. Systems Biology Archive

Volume 9 - 2018 | https://doi.org/10.3389/fphys.2018.00680

This article is part of the Research TopicLogical Modeling of Cellular Processes: From Software Development to Network DynamicsView all 23 articles

The CoLoMoTo Interactive Notebook: Accessible and Reproducible Computational Analyses for Qualitative Biological Networks

Aurélien Naldi¹

Céline Hernandez¹

Nicolas Levy^2,3

Gautier Stoll^4,5,6,7,8

Pedro T. Monteiro⁹

Claudine Chaouiya¹⁰

Tomáš Helikar¹¹

Andrei Zinovyev^12,13,14,15

Laurence Calzone^12,13,14

Sarah Cohen-Boulakia²

Denis Thieffry¹^*

Loïc Paulevé²^*

¹Computational Systems Biology Team, Institut de Biologie de I'Ecole Normale Supérieure, Centre National de la Recherche Scientifique UMR8197, Institut National de la Santé et de la Recherche Médicale U1024, École Normale Supérieure, PSL Université, Paris, France
²Laboratoire de Recherche en Informatique UMR8623, Université Paris-Sud, Centre National de la Recherche Scientifique, Université Paris-Saclay, Orsay, France
³École Normale Supérieure de Lyon, Lyon, France
⁴Université Paris Descartes/Paris V, Sorbonne Paris Cité, Paris, France
⁵Équipe 11 Labellisée Ligue Nationale Contre le Cancer, Centre de Recherche des Cordeliers, Paris, France
⁶Institut National de la Santé et de la Recherche Médicale, U1138, Paris, France
⁷Université Pierre et Marie Curie, Paris, France
⁸Metabolomics and Cell Biology Platforms, Gustave Roussy Cancer, Villejuif, France
⁹INESC-ID/Instituto Superior Técnico, University of Lisbon, Lisbon, Portugal
¹⁰Instituto Gulbenkian de Ciência, Oeiras, Portugal
¹¹Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, NE, United States
¹²Institut Curie, PSL Research University, Paris, France
¹³Institut National de la Santé et de la Recherche Médicale, U900, Paris, France
¹⁴MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
¹⁵Lobachevsky University, Nizhni Novgorod, Russia

Analysing models of biological networks typically relies on workflows in which different software tools with sensitive parameters are chained together, many times with additional manual steps. The accessibility and reproducibility of such workflows is challenging, as publications often overlook analysis details, and because some of these tools may be difficult to install, and/or have a steep learning curve. The CoLoMoTo Interactive Notebook provides a unified environment to edit, execute, share, and reproduce analyses of qualitative models of biological networks. This framework combines the power of different technologies to ensure repeatability and to reduce users' learning curve of these technologies. The framework is distributed as a Docker image with the tools ready to be run without any installation step besides Docker, and is available on Linux, macOS, and Microsoft Windows. The embedded computational workflows are edited with a Jupyter web interface, enabling the inclusion of textual annotations, along with the explicit code to execute, as well as the visualization of the results. The resulting notebook files can then be shared and re-executed in the same environment. To date, the CoLoMoTo Interactive Notebook provides access to the software tools GINsim, BioLQM, Pint, MaBoSS, and Cell Collective, for the modeling and analysis of Boolean and multi-valued networks. More tools will be included in the future. We developed a Python interface for each of these tools to offer a seamless integration in the Jupyter web interface and ease the chaining of complementary analyses.

1. Introduction

Recently, the scientific community has been increasingly concerned about difficulties in reproducing already published results. In the context of preclinical studies, observed difficulties to reproduce important findings have raised controversy (see e.g., Richter et al., 2010; Begley and Ellis, 2012; Smith and Houghton, 2013; Errington et al., 2014; and Begley and Ioannidis, 2015 for a review on this topic). Although not invalidating the findings, these observations have shaken the community. In 2016, a Nature survey pointed to the multi-factorial origin of this “reproducibility crisis” (Baker, 2016). Factors related to computational analyses were highlighted, in particular the unavailability of code and methods, along with the technical expertise required to reproduce the computations. The scientific community is progressively addressing this problem. Prestigious conferences (such as two major conferences from the database community, namely, VLDB¹ and SIGMOD²) and journals (such as PNAS, Biostatistics (Peng, 2009), Nature (Santori, 2016), and Science (Yaffe, 2015), to name only a few) now encourage or even require published results to be accompanied by all the information necessary to reproduce them.

While the reproducibility challenges have first been observed in domains where deluge of data were quickly becoming available (e.g., Next Generation Sequencing data analyses), the problem is now present in many (if not all) communities where computational analyses and simulations are performed. In particular, the Systems Biology community is facing a proliferation of approaches to perform a large variety of tasks, including the development of dynamical models, complex simulations, and multiple comparisons between varying conditions of model variants. Consequently, reproducing results from systems biology studies becomes increasingly difficult. Furthermore, although the combination of different tools would provide various new scientific opportunities, this is currently hindered by technical issues.

Several initiatives have been launched by the community to address reproducibility issues for computational modeling of biochemical networks. These include guidelines for model annotations (MIRIAM, Le Novère et al., 2005) and simulation descriptions (MIASE, Waltemath et al., 2011a), as well as standards for model exchange (SBML, Hucka et al., 2003) and simulation parametrizations (SED-ML, Waltemath et al., 2011b). This collective effort is coordinated by the COmputational Modeling in BIology NEtwork (COMBINE³).

The Consortium for Logical Models and Tools (CoLoMoTo⁴) has been organized to bring together computational modeling researchers and address the aforementioned reproducibility and reusability issues within the sub-domain of logical models and software tools (Naldi et al., 2015). As a first outcome to foster model exchange and software interoperability, the SBML L3 package qual was developed (Chaouiya et al., 2013, 2015). In this manuscript, we report the next phase of the CoLoMoTo efforts in the area of reproducibility in computational systems biology: The CoLoMoTo Interactive Notebook, which provides an easy-to-use environment to edit, execute, share, and reproduce analyses of qualitative models of biological networks by seamlessly integrating various logical modeling software tools.

The teams involved in CoLoMoTo, gathering around 50 researchers within 20 groups and laboratories, have produced various software tools for the qualitative modeling and analysis of biological networks. They are also involved in the development of novel computational methods and models. This method article presents a collective effort to provide the community with a reproducibility-oriented framework combining software tools related to logical modeling. This framework combines the power of different approaches to ensure repeatability and to reduce the requirement of technical knowledge from users. The provided Docker image facilitates the stability of a contained environment needed for repeatable computational modeling and analyses. The framework includes a set of pre-installed tools from the CoLoMoTo community. On the other hand, specific binding and interfaces integrated in a Jupyter environment reduce the learning curve and improve accessibility. The use of this framework is demonstrated by a case study in a companion protocol article, which consists in a thoroughly annotated Jupiter notebook (Levy et al., 2018)⁵.

The method article is structured as follows. Section 2 provides a brief introduction to qualitative models of biological networks and to their analyses. Section 3 describes the main components (Docker image, Python programming interface, Jupyter interactive web interface) of our framework to facilitate the access to CoLoMoTo software tools, a prime prerequisite for the reproducibility of the computational analyses. Section 4 illustrates how our framework can address several challenges related to the reproducibility of computational analyses, ranging from the repeat of a sequence of analyses in the exact same software environment, to the use of alternate methods to reproduce a result. Finally, section 5 provides an introductory guide on how to use the new framework, and section 6 discusses possible extensions.

2. Background on Qualitative Dynamical Models and Their Computational Analysis

Since the pioneering work of Kauffman (1969), Thomas (1973), and others, logical (e.g., Boolean) models have emerged as a framework of choice to model complex biological networks, focusing for example on the roles of transcriptional regulatory circuits in cell differentiation and development, of signaling pathways in cell fate decisions, etc. (for a review, see e.g., Abou-Jaoudé et al., 2016).

2.1. Qualitative Modeling

The definition of a qualitative logical model, such as a Boolean model usually relies first on the delineation of a regulatory graph, where each node denotes a regulatory component (e.g., a protein or a gene), while (positive or negative) arcs represent interactions (activation or inhibition) between their source and target nodes. Each node is modeled as a discrete variable, having a finite number of possible values, typically Boolean, i.e., only two values, 0 or 1, denoting e.g., protein absence/inactivity or presence/activity. A Boolean function or rule is then defined for each node to specify how its value may change depending on the values of its regulators.

The state of a network is modeled as a vector encompassing the (Boolean or multi-valued) values of all the nodes of the regulatory graph, with a prescribed ordering. The state of the network can be updated according to the logical functions defined for each node, triggering a transition toward a successor state. When at a given state, several nodes are called for an update, different updating modes can be considered. The synchronous updating mode updates all nodes simultaneously, thus leading to a unique successor state. Hence, the dynamical behavior is fully deterministic. In contrast, the asynchronous updating mode updates only one node, choosen non-deterministically, thus leading to different possible successor states. Several variants and extensions of these updating modes have been defined, for instance assigning pre-determined priorities or assigning probabilities to node updates, or considering simultaneous updates of sub-groups of nodes.

2.2. Dynamical Analysis

The dynamical behavior of the model can be represented as a state transition graph, where vertices correspond to different states of the network, and directed edges represent transitions between states, following a selected updating mode. Dynamical analyses consist then in characterizing different properties of this state transition graph.

Attractors are one of the most prominent features studied in Boolean and multi-valued networks. Attractors model the asymptotic behaviors of the system, and correspond to the terminal strongly connected components of the state transition graph. Attractors can be of different nature, either reduced to a single stable state (or fixed point), from which no transition is possible, or cyclic sequences of states, modeling sustained oscillations. From a biological point of view, computing attractors is generally particularly relevant. The presence of multiple attractors can represent alternative cell fates (such as cell differentiation states), while cyclic attractors further represent periodic behaviors (such as cell cycle or circadian rhythms). The computation of attractors is addressed by different software tools, such as BIOLQM (Naldi, in review⁶), GINSIM (Naldi et al., 2018), PINT (Paulevé, 2017), BOOLSIM (Garg et al., 2008), BOOLEANNET (Albert et al., 2008), PYBOOLNET (Klarner et al., 2017), and BOOLNET (Müssel et al., 2010).

Simulations allow capturing the states reachable from a given (set of) initial state(s). They can consist of random walks in the complete state transition graph, take into account updating priority schemes to distinguish fast versus slow processes and thereby obtain a simpler state transition graph (Fauré et al., 2006), as implemented in the software tool BIOLQM, or rely on user-defined transition probabilities and timing, as implemented into the software tools MABOSS (Stoll et al., 2012, 2017) and CELLCOLLECTIVE (Helikar et al., 2012; Todd and Helikar, 2012).

Model checking techniques developed for software verification in computer science allow verifying formally dynamical properties on state transition graphs and are regularly employed for analysing biological systems (Batt et al., 2005; Abou-Jaoudé et al., 2015; Bartocci and Lió, 2016; Traynard et al., 2016). The properties are specified using so-called temporal logics, which enable the formulation of queries regarding asymptotic or transient dynamical properties, taking into account all the state transitions of the model. The accordance of a Boolean/multi-valued model with such properties is verified using a general purpose model checker such as NUSMV (Cimatti et al., 2002) to which GINSIM and PINT provide access.

It is worth noticing that the number of states of a Boolean or multi-valued network grows exponentially with the number of nodes. The above mentioned methods typically suffer from this complexity, and hence face limitations regarding network size (currently, this limit is of the order of fifty to a hundred of nodes, depending on the analysis and the complexity of the dynamics). Nevertheless, different approaches enable the analysis of large scale qualitative networks by means of structural analyses, model reductions or abstractions. The CoLoMoTo Interactive Notebook provides access to methods for model reductions, such as by Naldi et al. (2011), implemented in BIOLQM, which preserves stable states, while cyclic attractors and reachability can be affected in predictable ways, or by using formal approximations of the dynamical behavior, as implemented in PINT, which allow tackling networks with several thousands of nodes (Paulevé, 2017, in press). Other approaches include, for instance, Petri net model reduction for trajectories in signaling pathways (Talcott and Dill, 2006), subnetwork analysis (Siebert, 2009), computational algebra (Veliz-Cuba et al., 2014), and motif-based abstractions for attractors (Gan and Albert, 2018).

Figure 1 gives an overview of a range of software tools for the analysis of qualitative models, specifying their main features along with the main underlying technologies.

FIGURE 1

Figure 1. Feature matrix and characteristics of a range of software tools devoted to the qualitative modeling and analysis of biological networks: CELLCOLLECTIVE (Helikar et al., 2012); GINSIM (Naldi et al., 2018); BIOLQM (Naldi, in review⁶); MABOSS (Stoll et al., 2017); PINT (Paulevé, 2017); NUSMV (Cimatti et al., 2002); BOOLSIM (Garg et al., 2008); BOOLEANNET (Albert et al., 2008); PYBOOLNET (Klarner et al., 2017); BOOLNET (Müssel et al., 2010); CELLNOPT (Terfve et al., 2012); CASPOTS (Ostrowski et al., 2016). The current CoLoMoTo Docker (2018-03-31) ships the software indicated with a bold font and a light blue background. “Model repository” refers to searchable databases of models; “Model edition” refers to the features related to creating and modifying a qualitative model, where “ab initio” refers to the interactive model building from scratch, and “transformations” refers to operations such as mutations, Booleanization, model reduction, etc. “Topological analysis” refers to the extraction of features from the regulatory graph, such as the different feedback cycles, graph theory measures, etc. “Dynamical analysis” refers to properties related to the state transition graph of Boolean/multi-valued networks, where “Attractor analysis” refers to the identification of stable states, cyclic attractors, and related features; “Simulation” refers to the sampling of trajectories within the state transition graph, possibly parameterized with stochastic rates and mutations; “Formal verification and control” refers to exhaustive analyses for assessing strictly temporal properties, such as reachability, and deducing mutations for controlling the system. Finally, “Model inference” refers to the derivation of Boolean/multi-valued network which are compatible with given properties and observation data.

3. Accessibility of CoLoMoTo Software Tools

The CoLoMoTo Interactive Notebook aims at offering a unified environment for accessing a range of complementary software tools for the analysis of qualitative models of biological networks. To achieve such a goal, our framework relies on three complementary technologies.

First, we use the Docker system to provide images of pre-installed selected CoLoMoTo software tools, thus reducing significantly the burden of installing individually each software tool. The software installed within Docker images can be executed on GNU/Linux, macOS, and Microsoft Windows, and can be accessed by standard workflow systems, such as SNAKEMAKE (Köster and Rahmann, 2012).

Then, we developed a collection of Python modules to provide a unified interface to the features of the selected software tools. The Python modules allow to parameterize and execute the different analyses, and fetch their results, which can then be further processed, including by a different tool through its respective Python module. This uniform Python interface is particularly relevant in the Jupyter web interface (Ragan-Kelley et al., 2014), where it allows editing executable notebooks on qualitative biological networks by seamlessly combining different software tools.

3.1. The CoLoMoTo Docker Image

Overall, we witness a growing ecosystem of software tools based on different technologies and offering a wide range of complementary features. Noteworthy, these tools typically rely on tailored formalisms and settings, which enable specific methods but at the same time affect the results. One obvious example is the consideration of a specific updating mode, as synchronous and asynchronous dynamics may differ extensively. Furthermore, to address increasingly large networks, many tools rely on advanced data-structures and resolution methods, which are implemented in dedicated software libraries. The distribution of these tools then become challenging, as they rely on numerous dependencies, often difficult to install or available only for a specific operating systems (most of the time GNU/Linux).

The Docker container technology allows to circumvent such distribution issues by providing a mean to supply pre-installed and fully configured software environments in so-called Docker images. On GNU/Linux, the execution of a Docker image consists mainly in executing the software in an isolated environment, requiring no operating system virtualization. Therefore, the overhead of using Docker on GNU/Linux is close to zero. A Docker image can also be executed on macOS or Microsoft Windows without any modification. On these operating systems, Docker relies on virtualization technologies, which are relatatively lightweight and result in limited performance loss on recent hardware.

The current CoLoMoTo Docker image colomoto/colomoto-docker:2018-03-31 contains the following pre-installed software for the logical modeling and analsyis of biological networks: GINSIM (Naldi et al., 2018), BIOLQM (Naldi, in review)⁶, CELLCOLLECTIVE (Helikar et al., 2012), MABOSS (Stoll et al., 2017), PINT (Paulevé, 2017), and NUSMV (Cimatti et al., 2002). The CoLoMoTo Docker image then provides access to these tools without requiring any installation step beside installing Docker⁷. For instance, the Docker image can be used in association with a workflow manager to chain and run a series of software functionalities. Supplementary File “SnakeMake” provides an example of SNAKEMAKE workflow relying on GINSIM and NUSMV.

An important challenge is the maintenance and extendibility of such Docker images to reduce the complexity of upgrading or adding software tools with their respective dependencies. To that aim, we require that each software tool is independently packaged for GNU/Linux using the Conda package manager⁸. We then rely on the dependency management system of Conda to ensure that the correct pre-requisites are installed in the Docker image⁹. A beneficial side effect of this technical choice is that the aforementioned software tools can be installed on GNU/Linux platforms using Conda, without using Docker.

3.2. A Unified Interface for Calling and Chaining Tools With Python

The software tools considered for the CoLoMoTo Docker image present different interfaces: CELLCOLLECTIVE is a web application, GINSIM has a graphical user interface along with a scripting interface, BIOLQM has a command line and a scripting interface, PINT has a command line and a Python interface, MABOSS has a command line interface. GINSIM, CELLCOLLECTIVE and BIOLQM support the SBML-qual format, while BIOLQM provides the conversion of a standard SBML-qual model into PINT or MABOSS model formats, thereby enabling the exchange of models between all these tools.

The recourse to different interfaces complicates the design of a model analysis combining multiple tools. To address this issue, we have developed a Python interface for each of the tools embedded in the CoLoMoTo Docker image, which greatly ease the execution of different tool functionalities, fetch the results, and use these as input for other executions.

Each tool comes with a dedicated Python module, providing a set of functions to invoke the underlying software tool appropriately. Therefore, from a single Python shell, one can invoke and chain analyses performed by different tools. This can be seen as an improved command line interface, greatly enhanced by the use of intermediate Python objects. Such an approach also promotes the use of standard Python data-structures to store objects such as model states or graphs, which can then be processed by common Python libraries, e.g., PANDAS¹⁰ or NETWORKX¹¹.

Hereafter, we give an overview of the resulting Python programming interface, focusing on the general model input mechanism and the main features implemented for each of the software tools.

3.2.1. Model Input and Tool Conversions

Despite their very different features, all the tools considered here take as input a logical model, in an adequate format. All the related Python modules provide a load function, which takes as input the location of the model, being a local file, for instance:

m = biolqm.load(“path/to/localfile.sbml”)

a web link to a file, as obtained on GINSIM repository¹² for instance:

m = biolqm.load(“http://ginsim.org/sites/default/files/Traynard_Boolean_MamCC_Apr2016.sbml”)

or a web link to the model on CELLCOLLECTIVE, for instance:

m = biolqm.load(“http://cellcollective.org/#5128/lac-operon”)

In each case, the returned object (identified by m in the above examples) is a Python object representing the loaded model and defined specifically for the corresponding tool (Python module). Table 1 lists the supported input format for each software tool.

TABLE 1

Table 1. Model input formats for the software tools included in the CoLoMoTo Docker image.

TABLE 2

Table 2. List of notebook files in supplemental data “Notebooks” (Data Sheet 2) demonstrating some features of the CoLoMoTo Interactive Notebook.

When possible, Python modules provide functions to convert a model for a compatible tool. These functions are of the form moduleA.to_moduleB(modelA). Figure 2 lists the currently supported model conversions. The following Python code shows an example of usage:

FIGURE 2

Figure 2. Supported model conversions between Python modules.

lrg = ginsim.load(“http://ginsim.org/sites/default/files/Traynard_Boolean_MamCC_Apr2016.sbml”)
lqm = ginsim.to_biolqm(lrg)
an = biolqm.to_pint(lqm)

Here, lrg is a Python object representing a GINSIM model, lqm is a Python object representing a BIOLQM model, and an is a Python object representing a PINT model.

3.2.2. CELLCOLLECTIVE – Modeling Platform, Repository, and Knowledge Base

The cellcollective Python module allows connecting to the CELLCOLLECTIVE (Helikar et al., 2012) web application (https://www.cellcollective.org), in order to download the models in SBML-qual format and extract network node meta-data (e.g., UnitProt identifiers) when available. The Supplementary File “Notebooks/demo-cellcollective”¹³ provides a brief demonstration of the Python module usage.

3.2.3. GINSIM – Regulatory Network Modeling

The ginsim Python module provides direct access to the Java programming interface of GINSIM (Naldi et al., 2018). GINSIM is available and documented at http://www.ginsim.org. In particular, besides the export of a GINsim model into various file formats, the Python module allows to visualize the network regulatory graph, with the activation and inhibition relationships between the nodes. The visualization function (ginsim.show) optionally takes as argument a Python dictionary associating a level with each node; then, the nodes of the network are colored according to these levels. This is illustrated in the Supplementary File “Notebooks/demo-ginsim”¹⁴.

3.2.4. BIOLQM – Qualitative Model Toolbox

The biolqm Python module provides direct access to the Java programming interface of BIOLQM (Naldi, in review⁶). BIOLQM is available and documented at http://colomoto.org/biolqm. BIOLQM supports the conversion of SBML-qual files, GINML files, as well as simple textual files specifying the raw logical functions into the formats associated with the different software tools. Besides the file format features, BIOLQM implements model modifications, such as mutations forcing the value of given nodes, iterative model reduction (see above), model reversal, the conversion of multi-valued model into Boolean ones, as well as the computation of stable states, trap spaces, and simulations. Part of these features are illustrated in the Supplementary File “Notebooks/demo-biolqm”¹⁵.

3.2.5. PINT – Formal Predictions for Controlling Trajectories

The pypint Python module provides complete access to features documented at https://loicpauleve.name/pint. The software PINT is devoted to the analysis of trajectories in very large-scale asynchronous Boolean and multi-valued networks (Paulevé, 2017). Its main features include the verification of the existence of a trajectory reaching a state of interest (reachability), the identification of common points between all the trajectories leading to a state of interest (cut sets), and the formal prediction of mutations preventing the existence of any trajectory to the given state. These features are illustrated in the Supplementary File “Notebooks/demo-pint”¹⁶.

3.2.6. NuSMV – Model Verification

The nusmv Python module provides a simple interface to the NUSMV model checker for verifying LTL (trace) and CTL (computation tree) temporal logic properties (Cimatti et al., 2002) . The specification of LTL and CTL properties can be facilitated using the colomoto.temporal_logics Python module, which takes advantage of Python objects for the different logical operators and ease their combination.

Let us consider the following example using the CTL operators from the aforementioned Python module:

p1 = AG (S(a = 1))
p2 = EF (p1)

Here, the variable p1 is a CTL formula specifying that the node a is active (S(a=1)) in all the reachable states (AG operator). The variable p2 is a CTL formula specifying that there exists a trajectory leading to a state (EF operator) from which the property p1 is verified.

In the above example, S specifies a property on a state, by giving the values of some nodes of the network. The conversion of a network model into NUSMV format depends on the tool used, sometimes introducing different variable names for the nodes of the original biological network. But this technical point is transparent for the user: the nusmv Python module will automatically translate the node names into the correct NUSMV variable names.

The Supplementary File “Notebooks/demo-nusmv”¹⁷ gives a simple example of usage of the nusmv Python module to verify properties of a GINSIM model.

3.2.7. MaBoSS – Stochastic Simulations

The maboss Python module provides an interface to MABOSS, available at https://maboss.curie.fr, as well as basic plotting functionalities (Stoll et al., 2017). The purpose of MABOSS is to perform stochastic simulations of a Boolean network, where the propensity of transitions (probabilistic rates) are explicitly specified. The Python module allows to fully define and parameterize a model, as well as to parse an existing MaBoSS model and modify it programmatically. The object returned after the simulations can then be used to plot the probability of node activation over time, and the proportion of states in which the simulations ended, in order to estimate the probability of reaching different attractors. The Supplementary File “Notebooks/demo-maboss”¹⁸ provides a brief tutorial to the main features of the maboss Python module.

3.2.8. Advanced Combinations of Tools

These Python modules provide a unified interface to chain different tools and process their results. The small tutorials referenced above show simple chaining of tools, most of the time using a tool to import a model (e.g., from CELLCOLLECTIVE or GINSIM) and convert it (using BIOLQM) for specific analysis by another tool. As the Python functions of the different modules rely on standard Python data-structures, such as lists and dictionaries, it is possible to easily re-use the result from a tool function as input to the function of a different tool. A simple example is provided in Supplementary File “Notebooks/demo-ginsim”¹⁹, where we use BIOLQM to compute the stable states of a GINSIM model, and then give one of the resulting state as input to GINSIM show function to display it over the regulatory graph.

Moreover, one can use the programmatic features of Python to implement advanced algorithms for executing multiple analyses and process their results. For instance, one can program loops to iterate over a list of results of a preceding analysis from one tool to perform a subsequent analysis on each result with another tool. This is illustrated in the Supplementary File “Notebooks/demo-pint+maboss”²⁰, where we use PINT to formally predict combinations of mutations controlling the existence of trajectories toward a specified state; then, we quantify with MABOSS the efficiency of applying only partially the predicted combinations, by evaluating each related double-mutants. The example involves Python for loops and a function to enumerate all possible subsets provided by the standard Python library. The notebook also relies on CELLCOLLECTIVE to fetch the model, and on BIOLQM to perform the adequate model conversions.

3.3. CoLoMoTo Jupyter Interactive Notebook

Jupyter²¹ is a software providing an interactive web interface for creating documents, called notebooks, mixing code, equations, and formatted texts. A notebook typically describes a full analysis workflow, combining textual explanations, the code itself, along with parameters to reproduce the results. A notebook is a single file, which can be easily modified, shared, re-executed, and visualized online. The short tutorials of the previous section provided in the Supplementary File “Notebooks” are actually Jupyter notebooks (files with the extension .ipynb) and can be re-executed using Jupyter.

A Jupyter notebook is made of a sequence of so-called cells, which can contain formatted text, including sections, links, images, tables, etc., or which can contain code in a specified programming language, typically Python. A code cell can be executed (by pressing Shift-Enter) and the value returned by the code is displayed below the cell. The display format is selected according to the type of the returned value (image, graph, list, table, …) to offer an adequate visualization.

Having a unified Python interface to invoke the CoLoMoTo software tools, one can directly create Jupyter notebooks for the analysis of qualitative biological networks using these tools, as shown in Supplementary File “Notebooks” and in the companion publication providing a complete model analysis workflow (Levy et al., 2018).

We added several features in the CoLoMoTo Python modules to increase interactivity and improve the user experience for editing Jupyter notebooks. First, menus provide pre-defined Python code for accessing to the main features of the tools. Figure 3 shows a screenshot during the edition of a Jupyter notebook with its graphical interface. Next, we added the possibility to interactively upload a model file. This feature is particularly useful when used in combination with Docker, or on a remote server with no direct access to the user file system. Finally, some Python modules, in particular the maboss Python module, provide JavaScript widgets to generate Python code interactively.

FIGURE 3

Figure 3. Screenshot during a Jupyter notebook edition showing the menu of the BIOLQM tool.

The Jupyter notebook server is included in the CoLoMoTo Docker image (see the Discussion section for a quick usage guide), while a public demonstration web instance is available at http://tmpnb.colomoto.org.

4. Reproducibility of Computational Analyses

4.1. From Repeatability to Reproducibility

The literature provides a range of definitions for the reproducibility of in silico experiments by analogy to wet lab experiments (Drummond, 2009; Freire et al., 2012; Stodden et al., 2013; Freire et al., 2016; Goodman et al., 2016; Lewis et al., 2016; Cohen-Boulakia et al., 2017). Four levels of reproducibility are commonly distinguished.

An in silico experiment is said to be repeated when it is performed using the same computational set-up as the original experiment. The major goal of the repeat task is to check whether the initial experimental result was correct and can be obtained again. The difficulty lies in recording as much information as possible to repeat the experiment so that the same conclusion can be drawn. Interestingly, Freire et al. (2012) discusses the granularity at which information (experiments, data sets, parameters, environment) should or could be recorded, and underlines the fact that the key point is to determine the right balance between the effort required to record information and the capability of obtaining identical results.

An in silico experiment is said to be replicated when it is performed in a new setting and computational environment, although similar to the original ones. When it can be successfully replicated, a result has a high level of robustness: it remained valid when using a similar (although different) protocol. A continuum of situations can be considered between repeated and replicated experiments.

A result is then defined as reproduced, in the broadest possible sense of the term, by denoting the situation where an experiment is performed within a different environment, with the aim to validate the same scientific hypothesis. In other words, what matters here is the conclusion obtained and not the methodology considered reaching it. Completely different approaches can be designed, different data sets can be used, as long as the experiments support the same scientific conclusion. A reproducible result is thus a high-quality result, confirmed in various ways.

A last important concept related to reproducibility is that of reuse, which denotes the case where a different experiment is performed, with similarities with an original experiment. A specific kind of reuse occurs when a single experiment is reused in a new context (and thus adapted to new needs), the experiment is then said to be repurposed.

It is worth noticing that repeating and replicating may appear to be technical challenges compared to reproducing and reusing, which are the most important scientific objectives. However, before investigating alternative ways of obtaining a result (to reach reproducibility), or before reusing a given methodology in a new context (to reach reuse), the original experiment has to be carefully tested, especially by reviewers or any peer, demonstrating its ability to be at least repeated and hopefully replicated (Freire et al., 2012; Stodden et al., 2014).

4.2. Repeat Analysis in the Same Software Environment

Ensuring that a sequence of computational analyses can be repeated by other scientists several months or years after its publication is difficult. Indeed, besides software availability, the version of the tools can be crucial: a new version of a tool can change the default parameters, and even some features, so that the published instructions become obsolete. Whereas a Docker image addresses efficiently the issue of making software available, providing a safe way for repeating a notebook content years after its creation requires additional technical procedures.

First, CoLoMoTo Docker images are constructed by specifying explicitly the version of each software. Furthermore, an automatic validation procedure is performed by checking that a set of notebooks still execute without error. Once validated, the Docker image is then tagged with a time-stamp, typically the date of the image validation (of the form YYYY-MM-DD, e.g., 2018-03-31). These tagged images are then stored in the public Docker image registry, and can be retrieved any time later. The list of existing tags of colomoto/colomoto-docker Docker images can be viewed at https://hub.docker.com/r/colomoto/colomoto-docker/tags/.

When sharing a notebook, and notably when attaching it to a publication, it is highly recommended to specify the time-stamp of the Docker image in which the notebook has been executed. Then, by downloading the image with this specific tag, other users are ensured to repeat the execution in the exact same software environment. To help following this recommendation, we took two technical decisions. First, we do not use the default non-persistent tag for Docker images (latest). It means that the user has always to specify explicitly the time-stamp of the CoLoMoTo Docker image. To remove the burden of actively checking the list of existing time-stamps, we provide a script which, by default, fetches the most recent Docker image (see section 5). Second, when loading a CoLoMoTo-related Python module within a Docker container, a textual message indicating the time-stamp of the Docker image is displayed. Therefore, when created within a CoLoMoTo Docker image, notebooks always contain the required information to repeat their execution.

Because a Jupyter notebook is a single file containing everything to execute it, one can easily check if it can be replicated in a different software environment, e.g., using a more recent CoLoMoTo Docker image. Moreover, a notebook can be easily repurposed by modifying some arguments of the Python function calls, for instance changing the input model or analysis parameters. One can even define interactive notebooks describing a common model analysis, so that the user only needs to provide the input model and execute the Jupyter code cells, as shown in the Supplementary File “Notebooks/demo-interactive-fixpoints”²² for the computation and visualization of the stable states of a bioLQM model.

4.3. Reproduce Analysis With a Different Method

Reproducing the same analysis with two different methods is a good mean to increase confidence in the results, as it reduces the chance of software misuse or that the results are affected by a software bug.

The subset of software tools selected for this first CoLoMoTo Docker image presentation already provides redundant implementations of equivalent model analyses, in particular for the identification of stable states and for the verification of temporal properties with NUSMV. To help switch between two tools for performing the same task, we harmonized the usage of Python module functions to ensure that the same functions with the same arguments generate equivalent results with different tools.

4.3.1. Stable States

There exists several methods to compute the full set of stable states (or fixed points) of a logical model, relying on different data-structures and different algorithms. The software BIOLQM implements the computation of stable states for Boolean and multi-valued logical models using a Java implementation of decision diagrams. In contrast, the software PINT implements the computation of stable states of Automata networks (a generalization of logical networks) using Boolean satisfaction constraints. As BIOLQM provides a conversion of Boolean/multi-valued network into equivalent Automata networks, it is possible to compute the stable states of a model with both software tools.

Both biolqm and pypint Python modules provide a fixpoint function taking as input the model instance of the corresponding tool and returning a list of Python dictionaries describing the stable states. Provided lqm is a BIOLQM model, the following Python code compute its stable states with both tools:

fps_biolqm = biolqm.fixpoints(lqm)
fps_pint = pypint.fixpoints(biolqm.to_pint(lqm))

The Supplementary File “Notebooks/demo-reproducibility-fixpoints”²³ shows a complete example of reproduction of stable state computation using BIOLQM and PINT.

4.3.2. Temporal Property Verification (Model-Checking)

Both GINSIM and PINT allow to export their respective model into NUSMV format, where temporal properties can be specified using LTL or CTL (see section 2.2). However, the generated NUSMV models have different features as the input formalisms of these tools rely on different paradigms: the specification is centered on logical rules in the case of Boolean/multi-valued networks in GINSIM, and on transitions (à la Petri nets) in the case of Automata networks in PINT. Nevertheless, in the appropriate settings, the verification of an equivalent CTL or LTL property should give the same result. Hence, the functions ginsim.to_nusmv and pypint.to_nusmv are implemented in such ways that, when using their default options, the resulting NUSMV models, albeit different, should produce identical results for identical temporal logic properties. Note, however, that each tool provides specific options for the NUSMV export, which can lead to incomparable results.

The following Python code uses operators defined in the Python module colomoto.temporal_logics to specify a property p, meaning that from any state, there always exists a trajectory leading to a cyclic attractor where the level of node a can always oscillate. Then, assuming lrg is a GINSIM model, the code uses GINSIM and PINT conversions to NUSMV to perform model verification.

p = EF (AG (EF (S(a = 0)) & EF (S(a=1))))

nusmv_ginsim = ginsim.to_nusmv(lrg)
nusmv_ginsim.add_ctl(p)
nusmv_ginsim.verify()

nusmv_pint = pypint.to_nusmv(ginsim.to_pint(lrg))
nusmv_pint.add_ctl(p)
nusmv_pint.verify()

Note that the Python object p represents the CTL property to be tested, whatever the origin of the model (GINSIM or PINT).

The Supplementary File “Notebooks/demo-reproducibility-modelchecking”²⁴ provides a more detailed example of the reproduction of model-checking results using GINSIM and PINT.

5. Quick-Usage Guide

On GNU/Linux, macOS, or Microsoft Windows, provided that Docker and Python are installed, a helper script to run the CoLoMoTo Docker image and the embedded Jupyter notebook can be installed and upgraded from a terminal using the following command²⁵:

pip install -U colomoto-docker

The Docker image and the Jupyter notebook interface can be started by executing the following command in a terminal²⁶:

colomoto-docker

Without any argument, the command will use the most recent CoLoMoTo Docker image. To use the image with a specific tag, append the -V option (e.g., colomoto-docker -V 2018-03-31).

The execution of this command will open a web page with the Jupyter notebook interface, enabling loading and execution of notebooks. A new notebook can be created by using the “New/Python3” menu. In this environment, the user has access to all CoLoMoTo Python modules. A code cell is executed by typing “Shift+Enter.” The menu and tool bar allow quick access to the main Jupyter functionalities.

Warning: by default, the files within the Docker container are isolated from the running host computer, and are deleted when stopping the container. To have access to the files of the current directory of the host computer, the option --bind can be used:

colomoto-docker --bind .

The container can later be stopped by pressing Ctrl+C keys in the terminal. See colomoto-docker --help for other options. Additional documentation for running the CoLoMoto Docker image can be found at http://colomoto.org/notebook.

6. Discussion

6.1. Academic Use Cases

The prime aim of the CoLoMoTo Interactive Notebook is to foster the production of accessible and reproducible computational analysis of biological models, with a focus on qualitative models, including Boolean and multi-valued networks. As demonstrated in the Supplementary File “SnakeMake”, the CoLoMoTo Docker image can also be used in standard workflow systems, such as SnakeMake, to lighten the burden of installing the different software tools and make them accessible on different operating systems.

A notebook issued from the CoLoMoTo Docker image gives some guarantees of repeatability, as it contains references to the persistent Docker image to re-execute code in the same software environment. Therefore, the notebook file (with .ipynb extension) can be distributed as a Supplementary File of the related scientific article, along with instruction to run the Docker image. The Jupyter interface further allows to export the notebook in a static HTML file, which could also be joined as a Supplementary File to provide a quick visualization. A notebook can also be distributed independently, for instance by publishing it on Gist²⁷ or myExperiment²⁸ (Goble et al., 2010), to follow download and potential updates. For instance, the tutorial notebook presented by Levy et al. (2018) is hosted at https://gist.github.com/pauleve/a86717b0ae8750440dd589f778db428f. Services like Zenodo²⁹ further provide persistent DOI links to notebook files.

The CoLoMoTo Interactive Notebook is also relevant for teaching purposes. With Jupyter, students can straightforwardly execute, modify, and extend a template notebook to learn methods for analysing models of biological networks. Docker is a standard technology often supported by local cloud infrastructures, which can therefore provide dedicated resources to execute remotely and privately the CoLoMoTo Jupyter web interface.

6.2. Extending the CoLoMoTo Interactive Notebook

The CoLoMoTo Docker image can be easily extended to include additional tools. The Docker architecture allows inheriting from an existing container, adding a new layer with additional executables. Contributions are welcome through GitHub³⁰. Each software tool must be usable from the Jupyter interface and should be able to connect with at least one other tool already included. Furthermore, a demonstration notebook should be provided to illustrate the tool usage and how it can be combined with other tools.

Currently, all the embedded tools require an already defined model. Nevertheless, once loaded, a model can be subsequently modified from the Python interface (see tool feature matrix in Figure 1). We are currently considering the development of a programmatic interface for model definition ab initio. One of the main challenge is to provide a decent visualization of the programmatically-created model. A potential direction is to include a visual edition module in the Jupyter interface, which represents a substantial development effort.

The support for standard exchange formats is key to enable reproducibility of analyses with different tools. In that sense, BIOLQM plays an important role for the CoLoMoTo Interactive Notebook as it provides bridges between SBML-qual standard specifications and numerous software tools (Figure 2). The Tellurium Notebook system by Sauro et al. (in review)³¹ offers support for SED-ML to help reproduce quantitative simulation of biological networks. Future work should consider bringing this feature for qualitative models as well, in order to better meet FAIR (Findability, Accessibility, Interoperability, and Reusability) recommendations (Wittig et al., 2017).

Author Contributions

AN, CH, DT, and LP designed the main principles of the CoLoMoTo Interactive notebook and its distribution. AN, CH, NL, and LP implemented the necessary Python modules, their integration in the Jupyter interface, and the Docker image. AN and LP edited the notebook tutorials, while CH edited the SnakeMake workflow example. All authors contributed to the writing of the article under the supervision of DT and LP.

All authors reviewed the content of this article and agreed to endorse it.

Funding

DT and CH acknowledge support from the French Plan Cancer, in the context of the projects CoMET (2014–2017) and SYSTAIM (2015–2019). DT and AN acknowledge support from the French Agence Nationale pour la Recherche (ANR), in the context of the project SCAPIN [ANR-15-CE15-0006-01]. CC and PM acknowledge support from the Fundação para a Ciência e a Tecnologia, through grants PTDC/BEX-BCB/0772/2014 and PTDC/EEI-CTP/2914/2014. TH acknowledges support from the National Institutes of Health (#5R35GM119770-02). SC-B acknowledges support from CNRS (défi Mastodons). AZ and LC acknowledge support from COLOSYS project in EU ERACoSysMed programme. AZ acknowledges support by the Ministry of education and science of Russia (Project No. 14.Y26.31.0022). AZ, LC, and LP acknowledge support from ANR in the context of the project ANR-FNR project AlgoReCell [ANR-16-CE12-0034]. LP and SC-B acknowledge support from Paris Ile-de-France Region (DIM RFSI) and Labex DigiCosme [ANR-11-LABEX-0045-DIGICOSME] operated by ANR as part of the program Investissement d'Avenir Idex Paris-Saclay [ANR-11-IDEX-0003-02].

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

The authors thank the attendees of the fourth CoLoMoTo meeting in Paris, July 2017, for the insightful discussions which led to designing the CoLoMoTo Interactive notebook. The authors thank Laurent Darré and the technical staff at LRI, Université Paris-Sud, France, for providing the computing resources for hosting http://tmpnb.colomoto.org.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphys.2018.00680/full#supplementary-material

Data Sheet 1. The supplemental file “SnakeMake” contains an example of SnakeMake workflow that uses the CoLoMoTo Docker image to execute complementary analyses.

Data Sheet 2. The supplemental data “Notebooks” contains several short Jupyter notebooks which demonstrate different usage of the CoLoMoTo interactive notebook, listed in Table 2. The .ipynb files can be imported and executed within the Jupyter interface of the CoLoMoTo notebook, using the Docker image colomoto/colomoto-docker:2018-03-31. For each of these notebooks, a static HTML file previews the Jupyter rendering of the notebook, without any requirement. These notebooks can also be previewed and downloaded at https://nbviewer.jupyter.org/github/colomoto/colomoto-docker/tree/2018-03-31/tutorials.

Footnotes

1. ^International conference on Very Large Data Bases.

2. ^ACM's Special Interest Group on Management Of Data.

3. ^http://co.mbine.org

4. ^http://colomoto.org

5. ^The notebook can be previewed and downloaded at https://nbviewer.jupyter.org/github/colomoto/colomoto-docker/blob/2018-03-31/usecases/Usecase%20-%20Mutations%20enabling%20tumour%20invasion.ipynb

6. ^Preprint on bioRxiv https://doi.org/10.1101/287011

7. ^See https://docker.com for installation instructions.

8. ^https://conda.io

9. ^CoLoMoTo-related conda packages are available in the colomoto conda channel. See https://anaconda.org/colomoto

10. ^https://pandas.pydata.org

11. ^http://networkx.github.io

12. ^http://ginsim.org/models_repository

13. ^The notebook can be previewed and downloaded at https://nbviewer.jupyter.org/github/colomoto/colomoto-docker/blob/2018-03-31/tutorials/CellCollective/CellCollective%20-%20Knowledge%20Base.ipynb

14. ^The notebook can be previewed and downloaded at https://nbviewer.jupyter.org/github/colomoto/colomoto-docker/blob/2018-03-31/tutorials/GINsim/GINsim%20-%20visualization.ipynb

15. ^The notebook can be previewed and downloaded at https://nbviewer.jupyter.org/github/colomoto/colomoto-docker/blob/2018-03-31/tutorials/bioLQM/bioLQM_tutorial.ipynb

16. ^The notebook can be previewed and downloaded at https://nbviewer.jupyter.org/github/colomoto/colomoto-docker/blob/2018-03-31/tutorials/Pint/quick-tutorial.ipynb

17. ^The notebook can be previewed and downloaded at https://nbviewer.jupyter.org/github/colomoto/colomoto-docker/blob/2018-03-31/tutorials/NuSMV/NuSMV%20with%20GINsim.ipynb

18. ^The notebook can be previewed and downloaded at https://nbviewer.jupyter.org/github/colomoto/colomoto-docker/blob/2018-03-31/tutorials/MaBoSS/MaBoSS%20-%20Quick%20tutorial.ipynb

19. ^The notebook can be previewed and downloaded at https://nbviewer.jupyter.org/github/colomoto/colomoto-docker/blob/2018-03-31/tutorials/GINsim/GINsim%20-%20visualization.ipynb

20. ^The notebook can be previewed and downloaded at https://nbviewer.jupyter.org/github/colomoto/colomoto-docker/blob/2018-03-31/tutorials/MaBoSS/Predict%20mutations%20with%20Pint,%20refine%20with%20MaBoSS.ipynb

21. ^http://jupyter.org

22. ^The notebook can be previewed and downloaded at https://nbviewer.jupyter.org/github/colomoto/colomoto-docker/blob/2018-03-31/tutorials/bioLQM/Fixpoints%20(interactive).ipynb

23. ^The notebook can be previewed and downloaded at https://nbviewer.jupyter.org/github/colomoto/colomoto-docker/blob/2018-03-31/tutorials/Reproducibility%20-%20fixpoints.ipynb

24. ^The notebook can be previewed and downloaded at https://nbviewer.jupyter.org/github/colomoto/colomoto-docker/blob/2018-03-31/tutorials/Reproducibility%20-%20model%20checking.ipynb

25. ^You may have to use pip3 instead of pip depending on your configuration.

26. ^If using Docker Toolbox, the command should be executed within the Docker Terminal.

27. ^https://gist.github.com

28. ^https://www.myexperiment.org

29. ^https://zenodo.org

30. ^Guidelines available at https://github.com/colomoto/colomoto-docker/blob/master/CONTRIBUTING.md

31. ^Preprint on bioRxiv: https://doi.org/10.1101/239004

References

Abou-Jaoudé, W., Monteiro, P. T., Naldi, A., Grandclaudon, M., Soumelis, V., Chaouiya, C., et al. (2015). Model checking to assess t-helper cell plasticity. Front. Bioeng. Biotechnol. 2:86. doi: 10.3389/fbioe.2014.00086

PubMed Abstract | CrossRef Full Text | Google Scholar

Abou-Jaoudé, W., Traynard, P., Monteiro, P. T., Saez-Rodriguez, J., Helikar, T., Thieffry, D., et al. (2016). Logical modeling and dynamical analysis of cellular networks. Front. Genet. 7:94. doi: 10.3389/fgene.2016.00094

PubMed Abstract | CrossRef Full Text | Google Scholar

Albert, I., Thakar, J., Li, S., Zhang, R., and Albert, R. (2008). Boolean network simulations for life scientists. Source Code Biol. Med. 3:16. doi: 10.1186/1751-0473-3-16

PubMed Abstract | CrossRef Full Text | Google Scholar

Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nat. News 533:452. doi: 10.1038/533452a

PubMed Abstract | CrossRef Full Text | Google Scholar

Bartocci, E. and Lió, P. (2016). Computational modeling, formal analysis, and tools for systems biology. PLOS Comput. Biol. 12:e1004591. doi: 10.1371/journal.pcbi.1004591

PubMed Abstract | CrossRef Full Text | Google Scholar

Batt, G., Ropers, D., de Jong, H., Geiselmann, J., Mateescu, R., Page, M., et al. (2005). Validation of qualitative models of genetic regulatory networks by model checking: analysis of the nutritional stress response in Escherichia coli. Bioinformatics 21(Suppl. 1), i19–i28. doi: 10.1093/bioinformatics/bti1048

PubMed Abstract | CrossRef Full Text | Google Scholar

Begley, C. G., and Ellis, L. M. (2012). Drug development: raise standards for preclinical cancer research. Nature 483, 531–533. doi: 10.1038/483531a

PubMed Abstract | CrossRef Full Text | Google Scholar

Begley, C. G., and Ioannidis, J. P. (2015). Reproducibility in science improving the standard for basic and preclinical research. Circ. Res. 116, 116–126. doi: 10.1161/CIRCRESAHA.114.303819

PubMed Abstract | CrossRef Full Text | Google Scholar

Chaouiya, C., Bérenguier, D., Keating, S. M., Naldi, A., van Iersel, M. P., Rodriguez, N., et al. (2013). SBML qualitative models: a model representation format and infrastructure to foster interactions between qualitative modelling formalisms and tools. BMC Syst. Biol. 7:135. doi: 10.1186/1752-0509-7-135

PubMed Abstract | CrossRef Full Text | Google Scholar

Chaouiya, C., Keating, S. M., Berenguier, D., Naldi, A., Thieffry, D., van Iersel, M. P., et al. (2015). The Systems Biology Markup Language (SBML) level 3 package: qualitative models, version 1, release 1. J. Integr. Bioinform. 12:270. doi: 10.1515/jib-2015-270

PubMed Abstract | CrossRef Full Text | Google Scholar

Cimatti, A., Clarke, E., Giunchiglia, E., Giunchiglia, F., Pistore, M., Roveri, M., et al. (2002). “NuSMV Version 2: an OpenSource Tool for Symbolic Model Checking,” in Proceedings of International Conference on Computer-Aided Verification (CAV 2002), Vol. 2404 of LNCS (Copenhagen: Springer). doi: 10.1007/3-540-45657-0_29

CrossRef Full Text | Google Scholar

Cohen-Boulakia, S., Belhajjame, K., Collin, O., Chopard, J., Froidevaux, C., Gaignard, A., et al. (2017). Scientific workflows for computational reproducibility in the life sciences: status, challenges and opportunities. Fut. Gen. Comput. Syst. 75, 284–298. doi: 10.1016/j.future.2017.01.012

CrossRef Full Text | Google Scholar

Drummond, C. (2009). “Replicability is not reproducibility: nor is it good science,” in Proceedings of the Evaluation Methods for Machine Learning Workshop at the 26th ICML (Montreal, QC).

Google Scholar

Errington, T. M., Iorns, E., Gunn, W., Tan, F. E., Lomax, J., and Nosek, B. A. (2014). An open investigation of the reproducibility of cancer biology research. Elife 3:e04333. doi: 10.7554/eLife.04333

PubMed Abstract | CrossRef Full Text | Google Scholar

Fauré, A., Naldi, A., Chaouiya, C., and Thieffry, D. (2006). Dynamical analysis of a generic boolean model for the control of the mammalian cell cycle. Bioinformatics 22, 124–31. doi: 10.1093/bioinformatics/btl210

PubMed Abstract | CrossRef Full Text | Google Scholar

Freire, J., Bonnet, P., and Shasha, D. (2012). “Computational reproducibility: state-of-the-art, challenges, and database research opportunities,” in Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (Scottsdale, AZ), 593–596. doi: 10.4230/DagRep.6.1.108

CrossRef Full Text | Google Scholar

Freire, J., Fuhr, N., and Rauber, A. (2016). “Reproducibility of data-oriented experiments in e-science,” in Dagstuhl Seminar 16041 (Dagstuhl), 108–159.

Google Scholar

Gan, X., and Albert, R. (2018). General method to find the attractors of discrete dynamic models of biological systems. Phys. Rev. E 97:042308. doi: 10.1103/PhysRevE.97.042308

PubMed Abstract | CrossRef Full Text | Google Scholar

Garg, A., Di Cara, A., Xenarios, I., Mendoza, L., and De Micheli, G. (2008). Synchronous versus asynchronous modeling of gene regulatory networks. Bioinformatics 24, 1917–1925. doi: 10.1093/bioinformatics/btn336

PubMed Abstract | CrossRef Full Text | Google Scholar

Goble, C. A., Bhagat, J., Aleksejevs, S., Cruickshank, D., Michaelides, D., Newman, D., et al. (2010). myExperiment: a repository and social network for the sharing of bioinformatics workflows. Nucleic Acids Res. 38 (Suppl. 2), W677–W682. doi: 10.1093/nar/gkq429

PubMed Abstract | CrossRef Full Text | Google Scholar

Goodman, S. N., Fanelli, D., and Ioannidis, J. P. (2016). What does research reproducibility mean? Sci. Transl. Med. 8:341ps12. doi: 10.1126/scitranslmed.aaf5027

PubMed Abstract | CrossRef Full Text

Helikar, T., Kowal, B., McClenathan, S., Bruckner, M., Rowley, T., Madrahimov, A., et al. (2012). The Cell Collective: toward an open and collaborative approach to systems biology. BMC Syst. Biol. 6:96. doi: 10.1186/1752-0509-6-96

PubMed Abstract | CrossRef Full Text | Google Scholar

Hucka, M., Finney, A., Sauro, H. M., Bolouri, H., Doyle, J. C., Kitano, H., et al. (2003). The Systems Biology Markup Language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19, 524–531. doi: 10.1093/bioinformatics/btg015

PubMed Abstract | CrossRef Full Text | Google Scholar

Kauffman, S. A. (1969). Metabolic stability and epigenesis in randomly constructed genetic nets. J. Theor. Biol. 22, 437–467. doi: 10.1016/0022-5193(69)9001

PubMed Abstract | CrossRef Full Text | Google Scholar

Klarner, H., Streck, A., and Siebert, H. (2017). PyBoolNet: a python package for the generation, analysis and visualization of Boolean networks. Bioinformatics 33, 770–772. doi: 10.1093/bioinformatics/btw682

PubMed Abstract | CrossRef Full Text | Google Scholar

Köster, J., and Rahmann, S. (2012). Snakemake - a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522. doi: 10.1093/bioinformatics/bts480

PubMed Abstract | CrossRef Full Text | Google Scholar

Le Novère, N., Finney, A., Hucka, M., Bhalla, U. S., Campagne, F., Collado-Vides, J., et al. (2005). Minimum information requested in the annotation of biochemical models (MIRIAM). Nat. Biotechnol. 23, 1509–1515. doi: 10.1038/nbt1156

PubMed Abstract | CrossRef Full Text | Google Scholar

Levy, N., Naldi, A., Hernandez, C., Stoll, G., Thieffry, D., Zinovyev, A., et al. (2018). Prediction of mutations to control pathways enabling tumour cell invasion with the CoLoMoTo interactive notebook (tutorial). Front. Physiol. 9:787. doi: 10.3389/fphys.2018.00787

CrossRef Full Text | Google Scholar

Lewis, J., Breeze, C. E., Charlesworth, J., MacLaren, O. J., and Cooper, J. (2016). Where next for the reproducibility agenda in computational biology? BMC Syst. Biol. 10:52. doi: 10.1186/s12918-016-0288-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Müssel, C., Hopfensitz, M., and Kestler, H. (2010). BoolNet–an R package for generation, reconstruction and analysis of Boolean networks. Bioinformatics 26, 1378–1380. doi: 10.1093/bioinformatics/btq124

PubMed Abstract | CrossRef Full Text | Google Scholar

Naldi, A., Hernandez, C., Abou-Jaoudé, W., Monteiro, P. T., Chaouiya, C., and Thieffry, D. (2018). Logical modelling and analysis of cellular regulatory networks with GINsim 3.0. Front. Physiol. 9:646. doi: 10.3389/fphys.2018.00646

CrossRef Full Text | Google Scholar

Naldi, A., Monteiro, P. T., Müssel, C., Kestler, H. A., Thieffry, D., Xenarios, I., et al. (2015). Cooperative development of logical modelling standards and tools with CoLoMoTo. Bioinformatics 31, 1154–1159. doi: 10.1093/bioinformatics/btv013

PubMed Abstract | CrossRef Full Text | Google Scholar

Naldi, A., Remy, E., Thieffry, D., and Chaouiya, C. (2011). Dynamically consistent reduction of logical regulatory graphs. Theor. Comput. Sci. 412, 2207–2218. doi: 10.1016/j.tcs.2010.10.021

CrossRef Full Text | Google Scholar

Ostrowski, M., Paulevé, L., Schaub, T., Siegel, A., and Guziolowski, C. (2016). Boolean network identification from perturbation time series data combining dynamics abstraction and logic programming. Biosystems 149, 139–153. doi: 10.1016/j.biosystems.2016.07.009

PubMed Abstract | CrossRef Full Text | Google Scholar

Paulevé, L. (2017). “Pint: a static analyzer for transient dynamics of qualitative networks with IPython interface,” in CMSB 2017 - 15th Conference on Computational Methods for Systems Biology Volume 10545 of Lecture Notes in Computer Science (Darmstadt: Springer), 370–316. doi: 10.1007/978-3-319-67471-1_20

CrossRef Full Text | Google Scholar

Paulevé, L (in press). Reduction of qualitative models of biological networks for transient dynamics analysis. IEEE/ACM Trans. Comput. Biol. Bioinform. doi: 10.1109/TCBB.2017.2749225.

PubMed Abstract | CrossRef Full Text

Peng, R. D. (2009). Reproducible research and biostatistics. Biostatistics 10, 405–408. doi: 10.1093/biostatistics/kxp014

PubMed Abstract | CrossRef Full Text | Google Scholar

Ragan-Kelley, M., Perez, F., Granger, B., Kluyver, T., Ivanov, P., Frederic, J., et al. (2014). “The Jupyter/IPython architecture: a unified view of computational research, from interactive exploration to communication and publication,” in AGU Fall Meeting Abstracts (San Francisco, CA).

Google Scholar

Richter, S. H., Garner, J. P., Auer, C., Kunert, J., and Würbel, H. (2010). Systematic variation improves reproducibility of animal experiments. Nat. Methods 7, 167–168. doi: 10.1038/nmeth0310-167

PubMed Abstract | CrossRef Full Text | Google Scholar

Santori, G. (2016). Journals should drive data reproducibility. Nature 535, 355–355. doi: 10.1038/535355b

PubMed Abstract | CrossRef Full Text | Google Scholar

Siebert, H. (2009). Deriving behavior of boolean bioregulatory networks from subnetwork dynamics. Math. Comput. Sci. 2, 421–442. doi: 10.1007/s11786-008-0064-4

CrossRef Full Text | Google Scholar

Smith, M. A., and Houghton, P. (2013). A proposal regarding reporting of in vitro testing results. Clin. Cancer Res. 19, 2828–2833. doi: 10.1158/1078-0432.CCR-13-0043

PubMed Abstract | CrossRef Full Text | Google Scholar

Stodden, V., Guo, P., and Ma, Z. (2013). Toward reproducible computational research: an empirical analysis of data and code policy adoption by journals. PLoS ONE 8:e67111. doi: 10.1371/journal.pone.0067111

PubMed Abstract | CrossRef Full Text | Google Scholar

Stodden, V., Leisch, F., and Peng, R. D. (2014). Implementing Reproducible Research. CRC Press.

Google Scholar

Stoll, G., Caron, B., Viara, E., Dugourd, A., Zinovyev, A., Naldi, A., et al. (2017). MaBoSS 2.0: an environment for stochastic Boolean modeling. Bioinformatics 33, 2226–2228. doi: 10.1093/bioinformatics/btx123

PubMed Abstract | CrossRef Full Text | Google Scholar

Stoll, G., Viara, E., Barillot, E., and Calzone, L. (2012). Continuous time boolean modeling for biological signaling: application of gillespie algorithm. BMC Systems Biology 6:116. doi: 10.1186/1752-0509-6-116

PubMed Abstract | CrossRef Full Text | Google Scholar

Talcott, C., and Dill, D. L. (2006). “Multiple representations of biological processes,” in Transactions on Computational Systems Biology VI, eds C. Priami and G. Plotkin (Berlin; Heidelberg: Springer Science Business Media), 221–245. doi: 10.1007/11880646_10

CrossRef Full Text | Google Scholar

Terfve, C., Cokelaer, T., Henriques, D., MacNamara, A., Goncalves, E., Morris, M. K., et al. (2012). CellNOptR: a flexible toolkit to train protein signaling networks to data using multiple logic formalisms. BMC Syst. Biol. 6:133. doi: 10.1186/1752-0509-6-133

PubMed Abstract | CrossRef Full Text | Google Scholar

Thomas, R. (1973). Boolean formalization of genetic control circuits. J. Theor. Biol. 42, 563–585. doi: 10.1016/0022-5193(73)90247-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Todd, R. G., and Helikar, T. (2012). Ergodic sets as cell phenotype of budding yeast cell cycle. PLoS ONE 7:e45780. doi: 10.1371/journal.pone.0045780

PubMed Abstract | CrossRef Full Text | Google Scholar

Traynard, P., Fauré, A., Fages, F., and Thieffry, D. (2016). Logical model specification aided by model-checking techniques: application to the mammalian cell cycle regulation. Bioinformatics 32, i772–i780. doi: 10.1093/bioinformatics/btw457

PubMed Abstract | CrossRef Full Text | Google Scholar

Veliz-Cuba, A., Aguilar, B., Hinkelmann, F., and Laubenbacher, R. (2014). Steady state analysis of boolean molecular network models via model reduction and computational algebra. BMC Bioinformatics 15:221. doi: 10.1186/1471-2105-15-221

PubMed Abstract | CrossRef Full Text | Google Scholar

Waltemath, D., Adams, R., Beard, D. A., Bergmann, F. T., Bhalla, U. S., Britten, R., et al. (2011a). Minimum Information About a Simulation Experiment (MIASE). PLoS Comput. Biol. 7:e1001122. doi: 10.1371/journal.pcbi.1001122

PubMed Abstract | CrossRef Full Text | Google Scholar

Waltemath, D., Adams, R., Bergmann, F. T., Hucka, M., Kolpakov, F., Miller, A. K., et al. (2011b). Reproducible computational biology experiments with SED-ML – the Simulation Experiment Description Markup Language. BMC Syst. Biol. 5:198. doi: 10.1186/1752-0509-5-198

PubMed Abstract | CrossRef Full Text | Google Scholar

Wittig, U., Rey, M., Weidemann, A., and Müller, W. (2017). Data management and data enrichment for systems biology projects. J. Biotechnol. 261, 229–237. doi: 10.1016/j.jbiotec.2017.06.007

PubMed Abstract | CrossRef Full Text | Google Scholar

Yaffe, M. B. (2015). Reproducibility in science. Sci. Signal. 8:eg5. doi: 10.1126/scisignal.aaa5764

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: computational systems biology, reproducibility, model analysis, Boolean networks, Python programming language

Citation: Naldi A, Hernandez C, Levy N, Stoll G, Monteiro PT, Chaouiya C, Helikar T, Zinovyev A, Calzone L, Cohen-Boulakia S, Thieffry D and Paulevé L (2018) The CoLoMoTo Interactive Notebook: Accessible and Reproducible Computational Analyses for Qualitative Biological Networks. Front. Physiol. 9:680. doi: 10.3389/fphys.2018.00680

Received: 05 April 2018; Accepted: 15 May 2018;
Published: 19 June 2018.

Edited by:

Pierre De Meyts, de Duve Institute, Belgium

Reviewed by:

Oksana Sorokina, University of Edinburgh, United Kingdom
Kyle B. Gustafson, Naval Surface Warfare Center Carderock Division (NSWCCD), United States

Copyright © 2018 Naldi, Hernandez, Levy, Stoll, Monteiro, Chaouiya, Helikar, Zinovyev, Calzone, Cohen-Boulakia, Thieffry and Paulevé. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Denis Thieffry, dGhpZWZmcnlAZW5zLmZy
Loïc Paulevé, bG9pYy5wYXVsZXZlQGxyaS5mcg==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.