Control in Boolean Networks With Model Checking

Understanding control mechanisms in biological systems plays a crucial role in important applications, for instance in cell reprogramming. Boolean modeling allows the identification of possible efficient strategies, helping to reduce the usually high and time-consuming experimental efforts. Available approaches to control strategy identification usually focus either on attractor or phenotype control, and are unable to deal with more complex control problems, for instance phenotype avoidance. They also fail to capture, in many situations, all possible minimal strategies, finding instead only sub-optimal solutions. In order to fill these gaps, we present a novel approach to control strategy identification in Boolean networks based on model checking. The method is guaranteed to identify all minimal control strategies, and provides maximal flexibility in the definition of the control target. We investigate the applicability of the approach by considering a range of control problems for different biological systems, comparing the results, where possible, to those obtained by alternative control methods.


Introduction
The study of control of cellular systems has opened multiple possibilities for application in bioengineering and medicine.It also provides the possibility to make predictions on model behaviour, for instance about the reachability of phenotypes under certain mutations, that could be verified experimentally and used for model validation.Experimental approaches for the identification of effective control targets are usually costly and time consuming.To help reducing these efforts, mathematical modelling can be used to identify, in silico, potentially useful interventions that could lead to the reduction of experimental trials [9].
Boolean modeling is often used to model biological systems, since it is able to capture qualitative behaviours by describing the activating or inhibiting interactions between different species using logical functions.The species are represented by binary-valued nodes, whose two activity levels might indicate for instance in a gene-regulatory network whether a certain gene is expressed or not.The simplicity of the Boolean formalism helps coping with the usual problem of lack of parameter information when modeling biological processes while capturing the relevant dynamics of biological systems [3,10,22].
In the context of control for drug target identification or cell reprogramming, the main goal is the identification of controls that require a minimal number of system interventions.Providing multiple alternatives for minimal control interventions is also desirable, so that suitable interventions for experimental implementation can be found.Furthermore, there are many different scenarios and goals to which control might be applied, for instance, to enforce or avoid a specific behaviour in a biological system.An example of such a scenario could be a cell differentiation system where a particular cell type is to be avoided since it can be linked to the development of cancer or another pathology [8].
Many approaches have been developed for control of biological systems, covering different contexts and goals.Some of them focus on leading the system to an attractor of interest, starting from a specific initial state [14] or from any possible initial state [21].This control problem is known as attractor control.However, in some cases, a small number of observable and measurable components is sufficient to capture the relevant features of the system attractors, for example the set of biomarkers defining a phenotype.In such cases, it might be useful to aim the control towards the phenotype defined by these biomarkers rather than a specific attractor, since fewer interventions might be sufficient.This approach, which targets a set of relevant variables instead of a specific attractor, is known as target control.Several methods have been developed for such control problems [16,2] using different computation techniques.
A basic approach to control is value percolation, which is also a core step in many more sophisticated methods [16,20].Approaches based on value percolation can be implemented efficiently [11].However, they are quite restrictive and might miss many possible control strategies.A step towards the identification of some of these missed control strategies using trap spaces was presented in [7].Although this approach is more flexible than just value percolation, it also does not identify all the possible control strategies.In the last years, multiple methods have been developed for control strategy identification, looking for instance at the stable motifs of the system [21] or exploiting computational algebra methods [15].These approaches are usually focused on targeting an attractor or subspace and they also do not generally uncover all possible minimal control strategies.In order to bridge this gap, recent works have tackled the problem of attractor control by using basins of attraction, sets of states from which only a specific attractor is reachable [18].Such approaches increase in many cases the amount of strategies identified.However, they are still limited to control for attractors and lack flexibility to deal with groups of attractors or phenotypes as well as with attractor avoidance.To the best of our knowledge, there is no method that can identify all the optimal control strategies for a general set of states or attractors.
In this work, we introduce a new approach for control strategy identification that provides a complete solution set of minimal controls and allows full flexibility in the control target.Identifying all the minimal control strategies for a general set of states is a complex problem.It might require the full exploration of the state space, which grows exponentially with the size of the network.To deal with this computational explosion, we explore model checking techniques.Model checking is a verification method that allows to determine whether a transition system satisfies a specific property.Although originating in the field of computer science, model checking has been successfully applied to analyse biological networks and a wide variety of tools have been developed [5].Model checking presents many advantages, for instance the use of symbolic representation, which allows to deal with systems with a large number of states and other problems that could not be handled otherwise.Yet, tackling a wider and more complex control problem naturally entails higher computational costs, since many shortcuts and reduction methods do not apply.Therefore, we investigate efficient preprocessing techniques that can be used to significantly reduce the computational cost and make it suitable for application.
As mentioned above, this work presents a model checking-based method to identify optimal control strategies for any target subset.We start with a general overview about Boolean modeling and model checking (Section 2), focusing on the main concepts used in this work.Then we introduce the formal definition of control strategy, present some properties of value percolation that are used in our approach and establish the basis for the control strategy computation with model checking (Section 3).The implementation of our approach is detailed in Section 4, with ideas to reduce the search space size and improve the performance of the method.Finally, in Section 5 we show the applicability of our method to different biological networks and compare our results with existing control approaches.

Boolean networks and dynamics
We define a Boolean network as a function f : B n → B n , with B = {0, 1}.The set of variables or components of f is denoted by V = {1, . . ., n}.Given a Boolean function different dynamics can be defined depending on the way components are updated.A dynamics is usually represented by the state transition graph (STG), a graph whose set of vertices is the state space B n and whose edges represent the transitions between them.The synchronous dynamics SD(f ) defines transitions that update at the same time all the components that can be updated.Thus, the synchronous state transition graph has an edge from x ∈ B n to y ∈ B n if and only if x = y and y = f (x).In order to better capture the different times scales that might coexist in a biological system, the asynchronous dynamics AD(f ) is often used.It defines transitions that update only one component at a time.Therefore, its state transition graph has an edge from x ∈ B n to y ∈ B n if there exists i ∈ V such that y i = f i (x) = x i and y j = x j for all j = i.The general asynchronous dynamics GD(f ) generalises the two previous ones by defining transitions that update a non-empty subset of components.Thus, given x, y ∈ B n there is a transition from x to y if there exists a subset ∅ = I ⊆ V such that y i = f i (x) = x i for all i ∈ I and y j = x j for all j / ∈ I. To simplify the notation, we use D(f ) to refer to any of these three dynamics.
A path in an STG is defined as a sequence of nodes x 0 , x 1 , . . .such that there exists an edge from x i−1 to x i for all i ≥ 1.We denote the set of paths starting in a state x as Paths(x).Given a state An attractor is a minimal trap set under inclusion.Attractors correspond to the minimal strongly connected components of the STG and they might vary in different updates.In biological systems, attractors consisting of only one state (steady states) might correspond to different cell fates, while attractors consisting of more than one state (cyclic attractors) might be associated with sustained oscillation.Figure 1 shows an example of asynchronous dynamics with a steady state and a cyclic attractor.
Given I ⊆ V and c ∈ B n we define the subspace induced by I and c as the set Σ(I, c) = {x ∈ B n | x i = c i ∀i ∈ I}.We denote a subspace by writing the value 0 or 1 for variables that are fixed and * for the free variables.For example, 0 * * 1 denotes the subspace fixing the first variable to 0 and the fourth to 1, that is, S = {x ∈ B n | x 1 = 0, x 4 = 1}.Thus, a subspace can be seen as an element of {0, 1, * } n .Given S ∈ {0, 1, * } n , S i denotes the value of the component i.A trap space is a subspace that is also a trap set.Trap spaces, contrary to attractors or trap sets, are the same in any update.
In this work we consider interventions that fix certain components to specific values.Note that a set of such interventions can be seen as a subspace and the effect that these interventions have on the dynamics can be described by restricting the original Boolean function to the intervention subspace.Given a Boolean function f and a subspace Θ = Σ(I, c) we define the restriction of f to the subspace Θ as: Note that all the definitions above can be applied to the restriction by identifying f Θ with a Boolean function on B n−|I| .An example of the dynamics of a Boolean function restricted to a subspace is shown in Figure 2. , where φ i (x) := (x i = 1), while EFφ 3 is satisfied by all the states except 010 and 110.The path π that starts at 000 and then oscillates between 010 and 110 (in red) satisfies for instance Fφ 2 and G¬φ 3 but not Gφ 1 .

Model checking
This section provides a practical introduction to the model checking concepts required for the description of our approach.For a more extensive and detailed explanation of model checking we refer the reader to [1].Model checking is a formal method used in computer science to solve verification problems.Its application to the control strategy problem presents many advantages, for instance the use of symbolic representation, which allows to deal with systems with a large number of states, like STGs of Boolean networks.Moreover, many efficient algorithms have been developed and are available for running model checking queries.An overview of existing model checking tools in the context of biochemical networks analysis can be found in [5].
Model checking allows to verify whether a given transition system satisfies a specific property.A transition system is defined as a set of states and a set of transitions, which represent changes from one state to another.Formally, a labeled transition system (LTS) is defined by a tuple (S, T, L) where S is a finite set of states, T ⊆ S × S is a transition relation such that (x 1 , x 2 ) ∈ T if there exists a possible transition from state x 1 to state x 2 and L : S → 2 AP is a labeling function with AP a finite set of atomic propositions.In the following, a transition (x 1 , x 2 ) will also be denoted by x 1 → x 2 .The labeling function L gives a set L(x) ∈ 2 AP of atomic propositions for each state x which includes exactly the atomic propositions satisfied by x.In the Boolean context, an STG defines an LTS, where the set of states is B n and the transitions are defined by the Boolean function and the type of update that is chosen.For our purposes, we need a deadlock-free transition system, so we add extra transitions (x, x) ∈ T for every steady state x ∈ B n .We use the atomic propositions There are different ways to express properties of a transition system.In our case, we use Computational Tree Logic (CTL).CTL is based on a branching notion of time, where the behavior of the system is represented by a tree of states.In the case of Boolean networks, one can imagine that every path starting in a state x ∈ B n is represented as a branch in a tree rooted in x.In the following we introduce the main concepts of CTL.
We distinguish between state properties and path properties.In this context, a path is an infinite sequence x 0 , x 1 , . . .∈ S such that (x i−1 , x i ) ∈ T for all i ≥ 1.A statement about a state or a path can be made using a CTL formula.A CTL state formula φ over the set of atomic propositions AP is of the form where a ∈ AP is an atomic proposition, E is the exists operator, A is the for all operator, φ, φ where F is the future operator, G the global operator and ψ a CTL state formula.See Table 1 for the satisfaction relation |= for transition systems and CTL formulas.We say that a CTL state formula φ is satisfied by a state x if and only if x |= φ, that is, φ(x) = true.Analogously, a CTL path formula ϕ is satisfied by a path π if and only if π |= ϕ. Figure 1 shows some examples of state and path formulas which are satisfied in an STG.

Control strategies
A control strategy is defined as a set of interventions that lead the controlled system to a target subset.This target subset usually represents a specific stable behaviour, for instance an attractor or a subspace representing a phenotype.The formal definition of a control strategy is given below.Definition 3.1.Given a Boolean network f and a subset P ⊆ B n , a control strategy for the target P in D(f ) is a subspace Θ ⊆ B n such that, for any attractor A of D(f Θ ), A ⊆ P .
In other words, a subspace Θ = Σ(I, c) is a control strategy for a subset P if fixing the variables in I to their corresponding values in c forces the system to evolve to the desired target P .The size of a control strategy Θ = Σ(I, c) is defined as the size of I, and is therefore the number of interventions.Optimal control strategies are the ones with the lowest number of interventions, that is, the maximal subspaces with respect to inclusion.An example of a control strategy is shown in Figure 2, where fixing the variable x 3 = 0 is enough to guarantee that the system will evolve to the target steady state 110.
Note that Definition 3.1 considers a subset as the control target, encompassing both attractor control and target control.Moreover, it provides the flexibility to deal with further control problems, such as control to union of attractors (P = i A i ) or attractor avoidance (P = B n \A).
) with two steady states 110 and 011.(b) Asynchronous dynamics of the restriction of . Ω is a control strategy for P = 110 in AD(f ).Ω does not percolate to P nor to any non-trivial trap space.T = 110 ⊆ P is the only minimal trap space of f Ω and is complete in D(f Ω ).

Percolation
In this subsection we introduce formally the concept of percolation, used in many approaches to control.We also deduce properties of percolated subspaces that are useful for control strategy identification and that are used later in our approach.
Definition 3.2.Given a Boolean function f , the percolation function with respect to f is the function Definition 3.3.Let f be a Boolean function and S, S ⊆ B n two subspaces.We say that the subspace S percolates to S under f if and only if there exists k ≥ 0 such that F (f ) k (S) = S .
For any trap space T and its image T = F (f )(T ), we have T ⊆ T , since by definition F preserves the fixed components of T .The free components in T might get fixed or remain free depending on the Boolean function f .In fact, T is a trap space of f , since for any fixed variable Note that the paths in the dynamics of F (f ) starting at a trap space T cannot have cycles and, consequently, all the reachable attractors from T in these dynamics are fixed points.When considering the synchronous dynamics of F (f ), each initial trap space T leads to a unique fixed point.Definition 3.5.Given a Boolean function f and a trap space T , we call the unique fixed point T of the synchronous dynamics of F (f ) reachable from T the percolated subspace from T with respect to f , that is, In order to relate value percolation to control strategies, we derive some dynamical properties of percolated subspaces.Lemma 3.6.Let f be a Boolean function, T ⊆ B n a trap space.Let k ≥ 0 and T k = F (f ) k (T ).Then for every x ∈ T there exists a path in D(f ) from x to some y ∈ T k .
Proof.It is enough to show that if T is a trap space, then for every x ∈ T there exists a path in D(f ) from x to some y ∈ F (f )(T ).Set T = F (f )(T ), with I ⊆ V being the set of fixed variables in T .Since T ⊆ T , for all i ∈ I , f i (x) = T i by definition of F .Now let us look at every update separately.If D = AD, for every i ∈ I , x admits a successor y in AD(f ) with y i = T i and y j = x j for j = i; therefore there exists a path from any state in T to T .If D = SD, f i (x) = T i for all i ∈ I and so x admits a successor y ∈ T .The case D = GD follows from the other cases, since all the paths in SD(f ) or AD(f ) are also paths in GD(f ).Proof.As noted in Remark 3.4, S k is a trap space of f S .Then f S (x) = f S k (x) for all x ∈ S k .Therefore, any attractor of f S k is also an attractor of f S and if A is an attractor of f S and A ⊆ S k , then A is also an attractor of f S k .Let A be an attractor of f S .Then A = Reach f S (A).By Corollary 3.6.1,for every x ∈ S there exists a path in f S from x to some y ∈ S k .Therefore, Reach f S (A) ∩ S k = ∅ and so, S k ∩ A = ∅.Since S k is a trap space of f S , A ⊆ S k .Therefore all the attractors of f S are contained in S k and, consequently, are also attractors of f S k .Corollary 3.7.1.Let f be a Boolean function and S ⊆ B n a subspace and P ⊆ B n a subset.Let k ≥ 0 and S k = F (f S ) k (S).S is a control strategy for P if and only if S k is a control strategy for P .Corollary 3.7.1 provides a way to identify control strategies or discard candidate subspaces by using value percolation.Moreover, checking whether the percolated subspace satisfies the conditions of Definition 3.1 instead of the original subspace allows to reduce the dimension of the restricted network and, consequently, to simplify the verification problem.Percolation-only methods select candidate subspaces and percolate them.If the resulting subspace is contained in the target subspace, the candidate subspace is identified as a control strategy [16].This type of control strategies that can be identified efficiently [11].However, additional control strategies might exist, that do not directly percolate to the target subspace, as shown in [7], or even to non-trivial intermediate trap spaces (see Figure 2).With our approach, value percolation can also be exploited as a first step towards control strategy identification, as a means to achieve dimensionality reduction, as described by Corollary 3.7.1.

Completeness
To improve control detection, we propose to define sufficient conditions on the system restricted to a candidate subspace to identify this candidate as a control strategy.Moreover, conditions on minimal trap spaces can be defined in order to deduce properties of the system attractors, in particular, their belonging to a target subset.Definition 3.8.A set of trap spaces T is complete in D(f ) if and only if for every attractor A of D(f ) there exists T ∈ T such that A ⊆ T .A Boolean function f is complete if its minimal trap spaces are complete.
Completeness of the minimal trap spaces has been used for attractor approximation and it can be detected using model checking as described in [12].The following proposition presents sufficient conditions for a subspace to be a control strategy.Given a candidate subspace, if the set of minimal .Ω is a control strategy for P = 00 * * in AD(f ).Ω is not a trap space nor percolates to any trap space.There is no non-trivial trap space in f Ω and therefore, Ω would not be identified as control strategy by the completeness approach.
trap spaces of the restricted system is complete and contained in the target subset, then the candidate subspace is a control strategy for that subset.
Proposition 3.9.Let f be a Boolean function, P, Θ ⊆ B n subspaces and T a set of trap spaces of f Θ .If all the trap spaces of T are contained in P and T is complete for D(f Θ ), then Θ is a control strategy of P .
Proof.Let A be an attractor of D(f Θ ).Since T is complete for D(f Θ ), there exists a minimal trap space T ∈ T such that A ⊆ T .Therefore, A ⊆ T ⊆ P .Proposition 3.9 provides sufficient conditions that allow to identify new control strategies missed by percolation-based approaches (see example in Figure 2).We refer to this approach for control strategy identification as the completeness approach.However, it still does not characterize all the possible control strategies satisfying Definition 3.1, as can be seen in Figure 3.To obtain the full solution set, we formulate a model checking approach, as shown in the next section.

Control with model checking
In this section, we present the basis of our new approach for the identification of all the minimal control strategies, based on model checking.To do so, we express the definition of control strategy in terms of CTL formulas.We start by rewriting it in terms of paths.(ii) For every x ∈ Θ there exists y ∈ P such that there exists a path in D(f Θ ) from x to y and there does not exist any path in D(f Θ ) from y to any state outside P (that is, all paths starting at y are contained in P ).
Proof.(⇒) Let x ∈ Θ and let A be an attractor of D(f Θ ) that can be reached from x. Since Θ is a control strategy, A ⊆ P .Take y ∈ A. Since A is reached from x, there exists a path from x to y ∈ A ⊆ P and there are no paths from y leaving P .
(⇐) Let A be an attractor of f Θ .Let x ∈ A. Since x ∈ A ⊆ Θ, there exists y ∈ P such that there exists a path in D(f Θ ) from x to y and there does not exist any path in D(f Θ ) from y to any state outside P .Since A is an attractor, y ∈ A and Reach D(f Θ ) (y) = A.Then, A ∩ B n \P = ∅, that is, A ⊆ P , and Θ is a control strategy for P .
Before expressing Lemma 3.10 in terms of CTL formulas, we introduce a state formula ψ S that is satisfied if and only if a state x belongs to a subspace S = Σ(I, c): This formulation can be extended to subsets as well.Let P ⊆ B n be a subset.We define φ P to be satisfied if and only if x ∈ P : where S is a subspace cover of P .Now we can express Lemma 3.10 in terms of CTL formulas, using φ P (x) as defined above.
Lemma 3.11.Let f be a Boolean function, Θ ⊆ B n a subspace and P ⊆ B n a subset.The following are equivalent: (i) Θ is a control strategy for P in D(f ).
Proof.x satisfies Φ P if and only if there exists a path x = x 0 , x 1 , ... such that x i satisfies AGφ P for some i ≥ 1.Let y = x i .y satisfies AGφ P if and only if for all paths y = y 0 , y 1 , ..., for all i ≥ 0, y i satisfies φ P , that is, y i ∈ P .Thus, by Lemma 3.10, Φ P is satisfied for all x ∈ Θ if and only if Θ is a control strategy for P in D(f ).
The CTL formula Φ P defined in Lemma 3.11 provides a way to determine whether a candidate subspace is a control strategy for P .The next section presents the implementation of this idea for control strategy identification.

Computation of control strategies
Building on the model checking formulas derived in the previous section, we develop a method for control strategy identification.The formula derived in Lemma 3.11 can be used to define a CTL query that can determine whether a candidate subspace is a control strategy for a target subset.In addition, in order to improve the performance of the method, preliminary checks on the candidate subspace and the restricted network can be conducted to possibly discard it without exhaustive exploration.Moreover, the dimension of the problem, that is, the free variables of the Boolean function, can be reduced by restricting the function to the percolated subspace instead of the candidate subspace, as for S in S do: if (S ⊆ S' for all S' in CS) then: S * ← percolate(f, S) if S * ⊆ P then: 13: add S to CS else: if (T ∩ P = ∅ for all T in minTS) then: valid ← true 21: for T in halfTS do: 22: f * * ← reduce(f * , T)

24:
if not CTLModelChecking(f * * , D, Φ P ) then: return CS stated in Section 3.1.The complete implementation of the control strategy identification is detailed in Algorithm 1 and explained in the following.
The main algorithm takes as inputs a Boolean function f , a target subset P and the type of update D and returns the minimal control strategies for P in D(f ).
For each candidate subspace S, its percolated subspace for f S , S * , is computed (line 9), as defined in Definition 3.5.By Corollary 3.7.1,S * is a control strategy if and only if S is a control strategy, so we perform all the checks on S * .If S * is contained in the target subset P , then S is a control strategy (lines 12-13).If not, the algorithm continues to compute the restriction to S * , f * , and its minimal trap spaces (lines [16][17].If there exists a minimal trap space disjoint from the target, the candidate subspace is discarded, since each minimal trap space contains at least one attractor (line 18).Trap spaces that are partially contained in the target subset are analyzed first (line 19).Since f (x) = f T (x) for all x ∈ T , we can reduce the function to T and run the model checking query for the restriction to T , f * * , (lines 22-24).If the formula is not satisfied in one of these trap spaces, the candidate subspace is discarded.Otherwise, the algorithm concludes by checking the CTL formula Φ P for the restricted function f * and deciding whether S is a control strategy (lines 29-30).
Since the aim is to identify optimal control strategies, the candidate subspaces S are taken randomly fixing an increasing number of variables, so that supersets of sets already defining a successful intervention are not considered (lines 5-8).Furthermore, an upper bound m for the size of the control strategies can be set.Moreover, the decisions made for each percolated subspace are stored in two variables ST (for positively checked subspaces) and SF (for negatively checked subspaces) to avoid repeating the same verification query.
The algorithm presented above is implemented using PyBoolNet [13], a Python package that allows generation and analysis of Boolean networks, in the module control strategies.py.The source code is available at https://github.com/Lauracf/PyBoolNet/blob/develop/pyboolnet/control_strategies.py.PyBoolNet uses NuSMV to decide model checking queries for Boolean networks.It also provides an efficient computation of trap spaces for relatively large networks.

Results
In this section we study the applicability of our method to different biological networks.We start by applying our method to a network modeling the epithelial-to-mesenchymal transition, considering different control targets: attractor, subspace and subset avoidance.In addition, we compare our approach to current control methods in different Boolean networks for attractor and target control.We show that our method is able to identify all the minimal control strategies identified by other approaches, uncovering in some cases minimal control strategies missed by them.All the results presented here were obtained with a regular desktop 8-processor computer, Intel ® Core TM i7-2600 CPU at 3.40GHz, 16GB memory.The running times vary significantly from scenario to scenario, depending on the type of target considered and the upper bound set on the size of the control strategies.They can range from seconds or minutes, for small control strategies or simple networks, to hours or days of computation for complex target subsets or large control strategies.

Case Study: EMT network
The network considered in this case study was recently introduced by Selvaggio et al. [17] to model how microenvironmental signals influence cancer-related phenotypes along the epithelial-to-mesenchymal  [17].Further information about the model can be found in [17].transition (EMT).The original network consists of 56 components, ten of them being inputs and two readouts or outputs (see Figure 4).Since the original model is multivalued, we work with its booleanised version obtained with GINsim [6].This booleanisation maps a multivalued component of maximum value m to m Boolean components.For instance, a component taking the values 0, 1, 2, 3 is encoded using 3 Boolean variables that would take values 000, 100, 110, 111 respectively (see Table 2).Although this method introduces states that do not correspond to any value of the multivalued variable (non-admissible states), these cannot be part of any attractor since they always have at least one path to an admissible state and do not have incoming transitions from admissible states.Therefore, the asymptotic behaviour generated strictly replicates the original model.The booleanised network of this case study consists of 60 Boolean variables, whose regulatory functions can be found in the PyBoolNet repository [13].
The asynchronous dynamics has 1452 attractors, all of them steady states.They are classified according to the values of the readout components AJ and FA, which represent the different degrees of cell adhesions by adherens junctions and focal adhesions respectively [17].The eight resulting biological phenotypes are divided in four groups: epithelial (E1), mesenchymal (M1, M2, M3), hybrid (H1, H2, H3) and unknown (UN) (see Table 2).
To show the flexibility of our method, we analyse different control targets.We start by targeting single steady states (attractor control).Then, we target the subspaces corresponding to each phenotype (target control).Finally, we study the avoidance of the hybrid phenotype, setting as target the complement of the general hybrid phenotype.

Attractor control: steady states
Here we consider the problem of attractor control.Since the control targets are steady states, the minimum number of interventions in each control strategy is at least the number of inputs (each input Table 2: Relation of the phenotypes of the EMT network, the values of the multivalued readouts (AJ, FA) and the values of the equivalent Boolean components (AJ1, AJ2, FA1, FA2, FA3).The number of steady states belonging to each phenotype is also shown.
Mesenchymal phenotypes component needs to be fixed to the corresponding value in the attractor).
For each of the 1452 steady states, the control strategies up to size 13 were identified.788 steady states have minimal control strategies of size 10, meaning that the dynamics can be controlled to the steady state only by fixing the values of the input components to their values in the attractor.Of the remaining ones, 396 need an extra component to be fixed, 212 require fixing at least two more components and the last 56 steady states require fixing three extra components.
Figure 5 shows the number of control strategies identified for each size (10)(11)(12)(13) with steady states grouped by phenotype, distinguishing between control strategies identified by direct percolation or only by the model checking approach.In most of the cases, our approach is able to identify many control strategies that are missed by direct percolation.In addition, we observe that the mesenchymal phenotypes are the ones with the highest amount of control strategies, which is to be expected since they are also the ones containing more attractors.The number of steady states per phenotype can be found in Table 2. Interestingly, no control strategies consisting of only input variables lead to hybrid steady states.

Target control
The minimal control strategies up to size 3 are identified for each of the phenotypes, taking as target the subspace defined by the corresponding values of the phenotypic components in each case (see Table 2).The five phenotypic components are excluded from the candidate interventions, since they represent the readouts of the model that we want to control.Table 3 shows the number of control strategies identified per phenotype and size.Similarly to the case of attractor control, we observe that the phenotypes with higher number of control strategies are the mesenchymal phenotypes (over a hundred), while the epithelial and the hybrid phenotypes have fewer or no control strategies up to size 3.This is consistent with the bias of the model towards the mesenchymal phenotypes in terms of attractors.2.

Avoidance of hybrid phenotypes
According to [17], hybrid phenotypes may provide advantageous abilities to cancer cells such as drug resistance or tumor-initiating potential.Therefore, interventions avoiding these phenotypes might be good candidates for drug targets in therapeutic treatment against cancer cells presenting these traits.
The authors of [17] define the hybrid phenotype as the one containing steady states with both components AJ and FA activated, that is, AJ ≥ 1 and FA ≥ 1.Therefore, the subset defining the avoidance of the hybrid phenotype is As in the previous case, the five phenotypic components are excluded from the candidate interventions.Setting the upper bound on the size of the control strategies to 1, 12 control strategies are obtained: The first seven control strategies can be identified using direct percolation to the maximal subspaces of the target subset.The last five are not identified by using only percolation but are captured by the model checking approach.Two of the obtained interventions correspond to input variables (ROS and TGFB) while the other ten correspond to internal components.Looking at the regulatory functions, we observe that TGFBR is uniquely regulated by TGFB, so setting TGFB to 1 implies that TGFBR is also set to 1 and, therefore, these interventions are equivalent in terms of their effect on phenotypic components.Moreover, since there are no control strategies of size 1 for individual phenotypes, we deduce that these interventions lead to systems where multiple phenotypes coexist, none of them being hybrid.
The components involved in the control strategies identified include the epithelial markers (ECad and miR200) and the mesenchymal ones (BCat, SNAIL, SLUG, TCF-LEF and ZEB) as described in [17].In addition, the authors of [17] performed a systematic analysis of the effect of single mutants on the attractor landscape, excluding the input variables.All the single mutants corresponding to the non-input interventions found by our approach were identified as having only attractors in non-hybrid phenotypes.Moreover, there was no other single mutation that produced this result.In other words, the results obtained by our approach are in complete correspondence to the ones presented in [17].

Comparison with other methods
In this section, we compare the model checking approach to other control methods currently available.We show that our method is able to capture all the minimal control strategies identified by other methods, uncovering in some cases minimal control strategies that might be missed by them.
In order to be able to compare different approaches, certain common features need to be chosen.Here, we consider control for any possible initial state.We separately compare to methods tackling attractor control and target control.Although approaches for target control can also be used for attractor control when the target attractor is a steady state or minimal trap space, they are usually aimed at targeting larger subspaces, determined for example by a phenotype, which often include several attractors.For this reason, we consider two different scenarios: one for attractor control and one for target control.The case of an arbitrary subset as target could not be considered for comparison, since no other method, to our knowledge, allows this possibility.
The comparison presented here encompasses each of the main approaches for control strategy identification discussed in previous sections (an overview of the main features of the control methods is shown in Table 4): • For attractor control: -Stable-motifs approach (SM), attractor control method based on the identification of stable motifs as described in [21].
-Basins approach (BA), attractor control method that uses the basin of attraction of the target attractor to identify control strategies as implemented in [18].
-Model checking approach (MC), as presented in Section 3.3.
• For target control: -Percolation-only approach (PO), target control method based on percolation into the target subspace as implemented in [19].
-Trap-spaces approach (TS), target control method based on percolation into selected trap spaces introduced in [7].
-Model checking approach (MC), presented in Section 3.3.Since the methods for attractor control considered here (BA and SM) only work for asynchronous update, the comparison is only made for this dynamics.In the case of target control, control strategies are identified for both synchronous and asynchronous dynamics.
In view of the different nature of each method, we do not compare their running times.Some approaches require the computation of the system attractors to identify the control strategies (BA, SM).Others do not allow to choose a certain attractor as target and identify control strategies for all the attractors simultaneously (SM).Therefore, a fair comparison with respect to the running times is hard to achieve.For this reason, we focus on the amount and size of the control strategies identified, provided that the program terminates within a few hours.
In order to capture different control scenarios, several biological networks of different sizes with different type and number of attractors are considered.A short description of each network is provided below.See Table 5 for an overview of the networks and their features.The Boolean rules for each biological network can be found in the PyBoolNet repository [13].
(a) T-LGL network, introduced by Zhang et al. (2008) [22] to model the T cell large granular lymphocyte (T-LGL) survival signaling network.In order for SM to terminate the processing of the network within a few hours, it has been adapted as in [21], removing the outgoing interactions of Apoptosis and setting Stimuli and IL15 to 1 and the remaining inputs to 0. The simplified network consists of 60 Boolean variables and its asynchronous dynamics has 3 cyclic attractors, versus the 156 (steady states and cyclic) of the original.For sake of simplicity, the same modified network is used for attractor control and target control.

Attractor control
We compare our model checking approach (MC) to two methods for attractor control: stable motifs (SM) [21] and basins of attraction (BA) [18].SM works for steady states and the complex attractors captured by the stable motifs (in some cases, complex attractors are not identified and the method cannot be applied).BA works for any kind of attractors and computes their basins of attraction to identify minimal control strategies.Table 7: Minimal control strategies up to size 4 identified by each method (SM, BA and MC) for the selected attractor of each biological network.and − denote whether the control strategy is obtained by the method or not, respectively.For simplicity, only the control strategies of size 2 are included for the Cell-Fate network.
The control problems selected for each network are described below.
(a) T-LGL network.The three attractors of this network can be classified in two types (Survival and Apoptosis) according to the values of the output components.For this comparison, the apoptotic attractor is chosen, that is, the one with Apoptosis = 1.Similar results would be obtained for another choice of attractor.
(b) MAPK network.The eighteen attractors of this network can also be classified in two types (Survival and Apoptosis) according to the values of the output components.In order to apply BA to this network, we set some input components to fixed values.We consider all the input combinations that allow the two phenotypes to coexist.Three input-value combinations satisfy this condition.For sake of space, we only show the results for one of the combinations, the one fixing EGFR-stimulus = 0 and FGFR3-stimulus = 0 and targeting an apoptotic attractor.Similar results for BA and MC would be obtained for the other input-value combinations and different choices of attractor.Results for SM might vary depending on the target attractor that is chosen.
(c) Cell-Fate network.The 28 attractors of this network can be classified in four different phenotypes (Apoptosis, Survival, Non-Apoptotic Cell Death and Naive) according to the values of the output components.The control strategies up to size 4 obtained by each method for each of the attractors of the network are the same, except for five attractors, where the SM approach missed some of the minimal control strategies obtained by BA and MC.In order to gain more insight, we also compare the results when fixing the input components.We analyse the five input-value combinations that allow the three relevant phenotypes (Apoptosis, Survival and NonACD) to coexist.For sake of space, we only show the results for one of the combinations, the one fixing FADD = 0, FASL = 1 and TNF = 1 and targeting the apoptotic attractor.Similar results would be obtained for the other input-value combinations and different choices of attractor.
Table 6 contains the size and number of the control strategies up to size 4 computed by each approach for each network.MC is able to identify all the minimal control strategies for every network.SM obtains some non-minimal control strategies of larger size, which are supersets of minimal ones.BA usually identifies all the minimal control strategies of minimum size, as done by MC, but it misses two minimal control strategies of size 2 for the Cell-Fate network.The minimal control strategies for each network and the methods that are able to identify them are shown in Table 7.While BA does not identify any control strategy larger than the minimum size, MC computes all the strategies minimal with respect to inclusion.

Target control
We compare our model checking approach (MC) in asynchronous and synchronous dynamics to several methods for target control: percolation to target (PO) [19], percolation via trap spaces (TS) [7] and the completeness approach (CN), developed in Section 3.2.An overview of the features of each control method is shown in Table 4.
The target subspaces chosen for each network are the ones corresponding to the apoptotic phenotype, a common target in drug identification studies for cancer therapeutic treatments [4].They are defined in terms of the output components of each network:  Table 8 contains the size and number of the control strategies computed by each approach for the asynchronous and synchronous dynamics.It is important to note that all the minimal control strategies obtained by PO are included in the ones identified by TS, since TS is built on top of PO.Moreover, direct percolation is a pre-check for CN and MC methods and, therefore, all minimal control strategies found by PO are obtained by CN and MS as well.PO is able to identify a high number of minimal control strategies, as is to be expected since regulatory functions modeling biological systems usually induce a lot of percolation.Nonetheless, in some networks this number is still far from the number of minimal control strategies identified by MC.
Control strategies are update-dependent by definition.However, all the control strategies identified by PO are valid in all the dynamics considered here.The number of these control strategies that are minimal might vary from one update to another (see results for T-LGL and MAPK networks).On the other hand, methods TS, CN and MC are sensitive to the update.In the case of TS, since none of the networks is complete in the synchronous dynamics, attractors cannot be approximated by minimal trap spaces and, therefore, no additional control strategies compared to PO are obtained.Interestingly, the methods CN and MC obtain the same number of control strategies for the asynchronous dynamics.This is not the case for the synchronous dynamics, where additional control strategies are obtained by MC for the T-LGL and MAPK networks.This is caused by the fact that the CN method cannot classify a subspace as a control strategy if the restricted network is not complete.The CN method is likely to obtain better results when the original network is complete, which is usually the case for the asynchronous dynamics of biological networks [12].These case studies illustrate how an exhaustive approach like the one provided by the model checking method introduced in this paper has the capacity to identify, in networks of practical relevance, simple sets of interventions that might otherwise not be identified, and that can furnish additional insights into the model, widening the possibility for potential applications.

Discussion
This work presents a novel method to identify all the minimal control strategies for an arbitrary target subset.It is able to deal not only with the problems of attractor control and target control, already tackled by existing methods, but also with subset control, providing maximal flexibility on the definition of the control goal and allowing for instance the possibility of dealing with attractor avoidance problems.
The comparison performed in Section 5.2 shows that our approach is able to identify all the minimal control strategies obtained by the methods analysed and, in many cases, to uncover new control strategies.It also provides flexibility to study different control problems, which can lead to additional insights into the network.For these reasons, the method presented here can be a good option when a deep analysis of the control strategies of the model is required.
Even though running times are not compared in this work, the model checking approach is likely to entail more computational time than other approaches due to its exhaustiveness, since in some cases the full exploration of the state space might be required.Although the use of symbolic states allows to deal with relatively large networks, the computational time required to explore their state spaces might still be too high.For this reason, several steps have been developed to reduce the candidate space and the dimensionality of the problem, improving the overall performance.The case studies of Section 5 show the applicability of the method to real models of interest.In cases where this reduction might still be insufficient, our approach could be used to complement faster methods for the particular scenarios in which a more exhaustive analysis is needed.This new approach is able to adapt to different types of targets and updates.This flexibility, provided by the versatility of model checking, results in the potential to be extended to other control problems.Possible future works could include its extension to tackle control from a set of initial states or the addition of other types of interventions.

Figure 2 :
Figure 2: (a) Asynchronous dynamics of the Boolean function f (x) = (x 2 ∨ x 1 x3 , x 1 x3 ∨ x 2 x 3 , x1 x2 ∨ x 2 x 3 ) with two steady states 110 and 011.(b) Asynchronous dynamics of the restriction of f toΩ = * * 0, f Ω (x) = (x 2 ∨ x 1 , x 1 , 0).Ω is a control strategy for P = 110 in AD(f ).Ω does not percolate to P nor to any non-trivial trap space.T = 110 ⊆ P is the only minimal trap space of f Ω and is complete in D(f Ω ).

Corollary 3 . 6 . 1 .
Let f be a Boolean function, S ⊆ B n a subspace.Let k ≥ 0 and S k = F (f S ) k (S).Then for every x ∈ S there exists a path in D(f S ) from x to some y ∈ S k .Lemma 3.7.Let f be a Boolean function and S ⊆ B n a subspace.Let k ≥ 0 and S k = F (f S ) k (S).Then A is an attractor of f S if and only if A ⊆ S k and A is an attractor of f S k .

Figure 3 :
Figure 3: The asynchronous dynamics of the Boolean function f and f Ω , with Ω = 0 * * * , are represented in (a) and (b) respectively.Ω is a control strategy for P = 00 * * in AD(f ).Ω is not a trap space nor percolates to any trap space.There is no non-trivial trap space in f Ω and therefore, Ω would not be identified as control strategy by the completeness approach.

Lemma 3 . 10 .
Let f be a Boolean function, Θ ⊆ B n a subspace and P ⊆ B n a subset.The following are equivalent:(i) Θ is a control strategy for P in D(f ).

Algorithm 1 1 : 5 : 6 :
Control strategies for a target subset P Input: f Boolean function, P target subset, D type of update, m size limit Output: control strategies for P function ControlStrategies(f , P , D, m) for i in {1, . . ., min(m, n)} do: n: number of variables of f S ← {S subspace : |fixed(S)| = i} 7:

Figure 4 :
Figure 4: EMT multivalued network.Boolean nodes are represented by ellipses and multivalued nodes by rectangles.Input nodes are colored in gray.Image obtained from[17].Further information about the model can be found in[17].

Figure 5 :
Figure 5: Number of control strategies identified for the steady states grouped by phenotype and size.The control strategies obtained by direct percolation are represented in red and the additional control strategies identified by model checking in green.The number of steady states per phenotype can be found in Table2.
(b) MAPK, introduced by Grieco et al. (2013) [10] to model the effect of the Mitogen-Activated Protein Kinase (MAPK) pathway on cell fate decisions taken in pathological cells.The network consists of 53 Boolean variables and it has 18 attractors in the asynchronous dynamics, 12 steady states and 8 cyclic attractors.(c) Cell-Fate network, introduced by Calzone et al. (2010) [3] to model the cell fate decision process.The network uses 28 Boolean variables and its asynchronous dynamics has 27 attractors, all of them steady states.These are classified in four different phenotypes (Apoptosis, Survival, Non-Apoptotic Cell Death and Naive) according to the values of the output components of the network.

Table 1 :
Satisfaction relation and semantics for CTL formulas, with a ∈ AP an atomic proposition, x ∈ S a state, π a path in the transition system, ϕ a path formula and φ, φ 1 and φ 2 state formulas.
and φ 2 are CTL state formulas and ϕ is a CTL path formula, which in our work will be of the form:

Table 3 :
Number of minimal control strategies identified per size for each phenotype.All the control strategies are obtained by direct percolation except three control strategies of size 3 for the phenotype M3 which are only identified by model checking.

Table 4 :
Overview of the versatility of the different control methods in terms of the types of targets and update schemes.

Table 5 :
Main features of the biological networks used in the comparison.Input components are fixed in the T-LGL network and free in the Cell-Fate and MAPK networks, unless specified otherwise.

Table 6 :
Number and size of the control strategies up to size 4 identified by each method (SM, BA and MC) for the corresponding attractor of each biological network, with the input components fixed as mentioned in the main text.When a method obtains non-minimal control strategies, the number of minimal control strategies identified is indicated in parenthesis.Note that BA does not look for larger control strategies once a minimal one (with respect to size) is obtained.

Table 8 :
Number and size of the control strategies identified by each method (PO, TS, CN and MC) for the corresponding attractor of each biological network in the asynchronous and synchronous dynamics.When a method obtains non-minimal control strategies, the number of minimal control strategies identified is indicated in parenthesis.