Exact Probability Landscapes of Stochastic Phenotype Switching in Feed-Forward Loops: Phase Diagrams of Multimodality

Feed-forward loops (FFLs) are among the most ubiquitously found motifs of reaction networks in nature. However, little is known about their stochastic behavior and the variety of network phenotypes they can exhibit. In this study, we provide full characterizations of the properties of stochastic multimodality of FFLs, and how switching between different network phenotypes are controlled. We have computed the exact steady-state probability landscapes of all eight types of coherent and incoherent FFLs using the finite-butter Accurate Chemical Master Equation (ACME) algorithm, and quantified the exact topological features of their high-dimensional probability landscapes using persistent homology. Through analysis of the degree of multimodality for each of a set of 10,812 probability landscapes, where each landscape resides over 105–106 microstates, we have constructed comprehensive phase diagrams of all relevant behavior of FFL multimodality over broad ranges of input and regulation intensities, as well as different regimes of promoter binding dynamics. In addition, we have quantified the topological sensitivity of the multimodality of the landscapes to regulation intensities. Our results show that with slow binding and unbinding dynamics of transcription factor to promoter, FFLs exhibit strong stochastic behavior that is very different from what would be inferred from deterministic models. In addition, input intensity play major roles in the phenotypes of FFLs: At weak input intensity, FFL exhibit monomodality, but strong input intensity may result in up to 6 stable phenotypes. Furthermore, we found that gene duplication can enlarge stable regions of specific multimodalities and enrich the phenotypic diversity of FFL networks, providing means for cells toward better adaptation to changing environment. Our results are directly applicable to analysis of behavior of FFLs in biological processes such as stem cell differentiation and for design of synthetic networks when certain phenotypic behavior is desired.

Feed-forward loops (FFLs) are among the most ubiquitously found motifs of reaction networks in nature. However, little is known about their stochastic behavior and the variety of network phenotypes they can exhibit. In this study, we provide full characterizations of the properties of stochastic multimodality of FFLs, and how switching between different network phenotypes are controlled. We have computed the exact steady-state probability landscapes of all eight types of coherent and incoherent FFLs using the finite-butter Accurate Chemical Master Equation (ACME) algorithm, and quantified the exact topological features of their high-dimensional probability landscapes using persistent homology. Through analysis of the degree of multimodality for each of a set of 10,812 probability landscapes, where each landscape resides over 10 5 -10 6 microstates, we have constructed comprehensive phase diagrams of all relevant behavior of FFL multimodality over broad ranges of input and regulation intensities, as well as different regimes of promoter binding dynamics. In addition, we have quantified the topological sensitivity of the multimodality of the landscapes to regulation intensities. Our results show that with slow binding and unbinding dynamics of transcription factor to promoter, FFLs exhibit strong stochastic behavior that is very different from what would be inferred from deterministic models. In addition, input intensity play major roles in the phenotypes of FFLs: At weak input intensity, FFL exhibit monomodality, but strong input intensity may result in up to 6 stable phenotypes. Furthermore, we found that gene duplication can enlarge stable regions of specific multimodalities and enrich the phenotypic diversity of FFL networks, providing means for cells toward better adaptation to changing environment. Our results are directly applicable to analysis of behavior of FFLs in biological processes such as stem cell differentiation and for design of synthetic networks when certain phenotypic behavior is desired.

INTRODUCTION
Cells with the same genetic make-ups can exhibit a variety of different behavior. They can also switch between these different phenotypes stochastically. This phenomenon has been observed in bacteria, yeast, and mammals such as neural cells (Acar et al., 2005;Choi et al., 2008;Guo and Li, 2009;Gupta et al., 2011). The ability to exhibit multiple phenotypes and switching between them is the foundation of cellular fate decision (Schultz et al., 2007;Cao et al., 2010;Ye et al., 2019), stem cell differentiation (Feng and Wang, 2012;Papatsenko et al., 2015;Zhang et al., 2019), and tumor formation (Huang et al., 2009;Shiraishi et al., 2010).
Cells exhibiting different phenotypes have different patterns of gene expression. Single-cell studies demonstrated that isogenic cells can exhibit different modes of gene expression (Shalek et al., 2013), indicating that distinct phenotypes are encoded in the wiring of the genetic regulatory networks (Liang and Qian, 2010). This phenomenon of epigenetic control of bimodality in gene expression by network architecture is well-known and has been extensively studied in earlier works of phage-lambda (Arkin et al., 1998;Ptashne, 2004;Zhu et al., 2004a,b;Cao et al., 2010).
Understanding multimodality in gene regulatory networks and its control mechanism can provide valuable insight into how different cellular phenotypes arises and how cellular programming and reprogramming proceed (Lu et al., 2007). Much of current knowledge of multimodality is derived from analysis of networks with feedback loops or cooperative interactions (Siegal-Gaskins et al., 2009). However, recent studies suggest that multimodality and phenotype switching can also arise from slow promoter binding, which may result in distinct protein expression levels of long durations (Feng and Wang, 2012;Thomas et al., 2014;Chen et al., 2015;Duncan et al., 2015;Terebus et al., 2019). Nevertheless, the nature and extent of this type of bimodality is not well-understood.
In this work, we study the network modules of feedforward loops (FFLs) and characterize the stochastic nature of their multimodalities. FFLs are one of the most prevalent three-node network motifs in nature (Alon, 2006) and play important regulatory roles (Lee et al., 2002;Shen-Orr et al., 2002;Boyer et al., 2005;Mangan et al., 2006;Tsang et al., 2007;Ma et al., 2009;Sorrells and Johnson, 2015). They appear in stem cell pluripotency networks (Boyer et al., 2005;Papatsenko et al., 2015;Sorrells and Johnson, 2015), microRNA regulation networks (Tsang et al., 2007;Re et al., 2009;Ivey and Srivastava, 2010), and cancer networks (Re et al., 2009). The behavior of FFLs has been studied extensively using deterministic ODE models. These studies revealed important functions of FFLs in signal processing, including sign-sensitive acceleration and delay pulse generation functions, and increased cooperativity Ma et al., 2009). FFLs are also found to be capable of maintaining robust adaptation (François and Siggia, 2008;Ma et al., 2009) and detecting "fold-changes" (Goentoro et al., 2009).
However, analysis based on ODEs is limited in its ability to characterize probabilistic events, as they do not capture bimodality in gene expression that arises from slow promoter binding (Vellela and Qian, 2009). The stochastic behavior of FFLs is not well-characterized: Basic properties such as the number of different phenotypes FFLs are capable of exhibiting, the conditions required for their emergency, their relative prominence, and the sensitivity of different phenotypes to perturbations are not known.
Our stochastic FFL models are based on processes of Stochastic Chemical Kinetics (SCK), which provides a general framework for understanding the stochastic behavior of reaction networks. Quantitative SCK modeling can uncover different network phenotypes, the conditions for their occurrence, and the nature of the prominence of the stability peaks. However, analysis of stochastic networks is challenging. First, models based on stochastic differential equations such as Fokker-Planck and Lagenvin models may be inadequate due to their Gaussian approximations. This is further compounded by the limited number of simulation trajectories that can be generated. These difficulties are reflected in the reported failure of a Fokker-Planck model in accounting for multimodality in the simple network model of single self-regulating gene at certain reaction rates (Duncan et al., 2015). Second, the widely used Stochastic Simulation Algorithm (Gillespie simulations) can generate SCK trajectories (Gillespie, 1977), but are challenged in capturing rare events and in computing efficiency. There are also difficulties in assessing convergency and in estimating computational errors (Cao and Liang, 2013). Third, even if the probabilistic landscape can be accurately reconstructed with acceptable accuracy, detecting topological features such as peaks in high-dimensional probability landscapes and assessing their objectively prominence at large-scale remains an unsolved problem.
To characterize the stochastic behavior of FFLs using models based on SCK processes, our approach is to solve the underlying discrete Chemical Master Equation (dCME) using the ACME (Accurate Chemical Master Equation) algorithm (Cao et al., 2016a,b), and to obtain the exact probability landscapes of all 8 varieties of FFLs.
Aided by the computational efficiency of ACME, we are able to explore the behavior FFLs under broad conditions of synthesis, degradation, binding, and unbinding rates of transcription factors genes binding. Furthermore, we analyze the topological features of the exactly constructed high-dimensional probability landscapes using persistent homology, so the number of probability peaks and the prominence measured by their persistence are quantified objectively. These techniques allow us to examine details of the number of possible phenotypic states at different conditions, as well as the ranges of conditions enabling phenotypic switching. With broad exploration of the model parameter space, we are able to construct detailed phase diagrams of multimodalities under different conditions.
Our results reveal how FFL network behaves differently under varying strengths of regulations intensities and the input. In addition, we characterize quantitatively the effects of duplication of genes in the FFL network modules. We show gene duplication can significantly affect the diversity of multimodality, and can enlarge monomodal regions so FFLs can have robust phenotypes. The results we obtained can be useful for analysis of phenotypic switching in biological networks containing the FFL modules. They can also be used for construction of synthetic networks with the goal of generating certain desired phenotypic behavior.

Overview
FFLs consists of three nodes representing three genes, each expresses a different protein product ( Figure 1A). An FFL regulates the network output from the left input node toward the right output node via two paths; the direct path from the left node to the right node, and the indirect path from the left to the right node via an intermediate buffer node. As each of the three regulations can be either up-or downregulation, there are altogether 2 3 = 8 types of FFL.

Network Architecture
Specifically, we denote the three genes of an FFL module as a, b, and c, which expresses protein products A, B, and C at constant synthesis rate of s A , s B , and s C , respectively ( Figure 1A). Proteins A, B, and C are degraded at rate d A , d B , and d C , respectively. Both proteins A and B function as transcription factors and can bind competitively to the promoter of gene c and regulates its expression. As the promoter of gene c can bind to either protein A or B, but not both, this type of regulation is known as the "OR" gate. In addition, protein A can bind to the promoter of gene b and regulate its expression. Specifically, protein A can bind to the promoter of gene c at rate r A c to form complex cA, which dissociates at rate f A c . cA expresses protein C at a rate k 3 -fold over the basal rate of s C . Similarly, protein B can bind to the promoter of gene c at rate r B c to form complex cB, which dissociates at rate f B c . cB expresses protein C at a rate k 2 -fold over the basal rate of s C . Furthermore, protein A binds to the promoter of gene b at rate r A b to form gene-protein complex bA, which dissociate at rate f A b .
Upon binding protein A, bA expresses protein B at a rate k 1 -fold over the basal rate of s B . The biochemical reactions of our FFL model are summarized below: and s A = s B = s C = 10 s −1 . All reaction rate constants are of the unit s −1 , while coefficients k 1 , k 2 , and k 3 are ratio of reaction rates and therefore unitless. The ratios k 1 , k 2 , and k 3 can take different values so the network represents different types of FFLs.

Types of FFL Modules
Depending on the nature of the regulations, namely, whether each of regulation intensities k 1 , k 2 , and k 3 is ≥ 1 (activating) or < 1 (inhibiting), there are 2 3 = 8 types of FFLs. These FFLs are classified into two classes, the coherent FFLs and the incoherent FFLs ( Figure 1B) (Alon, 2006). A FFL is termed coherent (C 1 , C 2 , C 3 , C 4 in Figure 1B), if the direct effect of protein A on the gene c has the same sign (positive or negative)  as its net indirect effect through protein B. Taking the FFL model C 1 (Figure 1B) as an example, protein A activates gene b, and protein B activates gene c, with an overall effect of "activation." At the same time, the direct effect of product of gene a protein A is also activation of gene c. Therefore, C 1 is a coherent FFL. When the sign of the indirect path of the regulation is opposite to that of the direct path, we have incoherent FFLs (I 1 , I 2 , I 3 , I 4 in Figure 1B). Taking the FFL model I 1 as an example, the effect of the direct path is positive, but the overall effect of the indirect path is negative. As can be seen from Figure 1B, all incoherent FFLs have an odd number of edges of inhibition.

Model Parameters
In order to explore broadly the behavior of all types of FFLs, we construct FFL models over the parameter space of a wide range of possible combinations of k 1 , k 2 , and k 3 , representing all 8 types of FFLs. The regulation intensity is set to values based on values reported in (Bu et al., 2016;Tej et al., 2019). We then altered the regulation intensities by about 10-fold to study the general behavior of different types of FFLs at the steady state. We take parameter values of k 1 ∈ {0.025, 0.1, 0.4, 0.8, 1.5, 2.1, 2.4, 3.0}, k 2 ∈ [0.025, 5.0] with step size of 0.25, k 3 ∈ [0.025, 5.0] with step size of 0.25. In addition, for the input intensity, the values are selected based on the analysis of abundance pattern reported in (Momin and Biswas, 2020). We take s A ∈ {3.0, 10.0}s −1 , r A c and r B c ∈ {0.5, 2, 8, 16}s −1 for one and two copies of genes b and c. Details of the relationship of FFL types with k 1 , k 2 , and k 3 are listed in Table 1. Over this parameter space, we study the behavior of all 8 types of FFLs. Overall, we constructed a total of 10,812 examples of FFLs and computed the steady-state probability landscape for each of them.

Computing Probability Landscape
Using ACME

Exact Computation of Probability Landscape of FFLs
Consider a well mixed system of reaction with constant volume and temperature. This system has n species X i , i = 1, 2, · · · , n, in which each particle can participate in m reactions R k , k = 1, 2, · · · , m. A microstate of the system at time t, x(t) is a column vector representing the copy number of species: x(t) = (x 1 (t), x 2 (t), · · · , x n (t)) T , where the values of copy numbers are non-negative integers. The state space of the system includes all the possible microstate of the system from t = 0 to infinity, = {x(t)|t ∈ [0, ∞)}. In this study, the size of the state space is | | = 657, 900 when genes b and c are single-copy, and | | = 686, 052 and 1, 289, 656 when there are two copies of gene b and c, respectively.
The reaction R k of the system takes the form of which brings the system from a microstate x to a new microstate x + s k , where s k is the stoichiometry vector and is defined as In a well mixed system, the propensity function of reaction k, A k (x) is given by the product of the intrinsic reaction rate constant r k and possible combinations of the relevant reactants in the current state x.
x l c l k With the above definitions, the dCME of a network model of the SCK processes consists of a set of linear ordinary differential equations defining the changes in the probability landscape over time at each microstate x. Denote the probability of the system at a specific microstate x at time t as p(x, t) ∈ R [0,1] , the probability landscape of the system over the whole state space as p(t) = {p(x(t))|x(t) ∈ }, the dCME of the system can be written as the general form of where x and x − s k ∈ . The steady-state probability landscapes is obtained by solving the dCME directly. The exact solution is made possible by using the the ACME algorithm (Cao et al., 2016a,b). The ACME algorithm eliminates potential problems due to inadequate sampling, where rare events of very low probability is difficult to estimate using techniques such as the stochastic simulation algorithm (SSA) (Gillespie, 1977;Kuwahara and Mura, 2008;Daigle et al., 2011;Cao and Liang, 2013).

Identification of Multimodality by Persistent Homology
Despite its simple architecture, FFLs have a 9-dimensional probability landscape: There are three genes (a, b, and c), three proteins (A, B, and C), and three bound genes bA, cA, and cB (i.e., gene b bound to protein A, gene c bound to either protein A or protein B). Because of the high dimensionality, it is challenging to characterize the topological structures of their probability landscapes; restricting networks to only "on" and "of " state separately makes it difficult to gain insight into the overall behavior of the network.
There have been studies that analyze d-dimensional probability landscape by examining its projection onto 1-d or 2-d subspaces (e.g., 2-d heatmaps or contour plots) (Bu et al., 2016;Dey and Barik, 2021). However, projected probability surface on lower dimensional space often no longer reflects the topology of the original space, with results and interpretations likely erroneous or misleading (Manuchehrfar et al., 2021). Finding peak states by examining distinct local maxima is equivalent to locating hypercubes that are critical points of Morse index of d in the d-dimension state space. While, local maxima may be identified by comparing its probability value with those of all of its neighbors, all peaks regardless their prominence will be identified. As numerical calculation may introduce small errors, peaks of tiny magnitude will be included. It is non-trivial to decide on a proper threshold to filter them out.
Persistent homology provides a powerful method that can characterize topological features of high-dimensional probability landscapes (Edelsbrunner et al., 2002;Carlsson, 2009). Here, we use newly developed cubic complex algorithm to compute homology groups 1 and quantitatively assess the exact topology of the 9-dimensional probability landscape.

Homology Groups
We use homology groups from algebraic topology to characterize the probability landscape. Homology group provides an unambiguous and quantitative description on how a space is connected. It returns a set of algebraic groups describing topological features of holes of various dimensions in the space. The rank of each i-th groups counts the number of linearly independent holes in the corresponding ith dimension. For example, Rank(H 0 ) counts the number of connected components (0th dimensional holes).

Persistent Homology
Persistent homology measures the importance of these topological features (Edelsbrunner et al., 2002), and has been applied in studies of chemical compounds and biomolecules Wei, 2014, 2015;. Here, we focus on the topological features of probability peaks, including their appearance and disappearance. They are measured by persistent homology of the 0-th homology group. Specifically, we take the probability p(x) as a height function, and construct a sequence of topological spaces using thresholds {r i } for p(x): The superlevel sets which corresponds to the threshold r i . The sequence {X i } gives a sequence of subspaces, which is called filtration: As the threshold changes, the peak of a probability landscape emerges from the sea-level at a specific threshold, which is the birth time of the corresponding 0-homology group in the filtration. It disappears as an independent component when merged with a prior peak at a particular threshold, which is called the death time. When the sea-level recedes to the ground level at p(x) = 0, only the first peak remains.

Persistent Diagram of Multimodality in Probability Landscape
We keep track of the probability peaks by recording the birth and death times of their corresponding 0-homology groups throughout the filtration. This relationship is depicted by the two-dimensional persistent diagram.
For the ith probability peak, when the threshold r reaches the value r b (i), the probability peak appears. We call this value the birth probability p b (i) = r b (i) of peak i. When the threshold r is lowered to a value r d (i), this peak is merged to an existing peak. We call this value the death probability p d (i) = r d (i) of peak i. The persistence of peak i is defined as: (3) The persistent diagram plots peak i using the birth probability p b (i) as the y-coordinate and the death probability p d (i) as the x-coordinate. The number of dots on the persistent diagram corresponds to the number of probability peaks. Those that are further off the diagonals are the more prominent probability peaks as their persistences are larger.

Multimodality and Persistent Homology of FFLs
For each FFL network, we first compute its probability landscapes , and x C are copy numbers of proteins A, B, and C, respectively; x a , x b , and x c are copy numbers of genes a, b, and c, respectively; x bA and x cA are copy numbers of genes b and c bound by protein A; x cB is the copy number of gene c bound by protein B.
Our results show that the 8 types of FFLs can exhibit up to six different phenotypes of mono-and multimodality at different conditions in the parameter spaces we investigated. An illustration of these six different types of multimodality is shown in Figure 2.
We further computed their 0-th homology groups at varying sea level of probability. The number of peaks, the birth, and death probability associated with each peak in Figure 2 are shown in the persistent diagrams of Figure 3.

Behavior of FFLs From Stochastic Models Differ From Deterministic ODE Models
The behavior of FFL network modules revealed from our stochastic models are fundamentally different from that of deterministic models of ordinary differential equations (ODEs). ODE models are based on kinetics of law of mass action and are used to calculate the mean concentrations of A, B, and C at equilibrium state. However, they do not provide accurate pictures on the degree of multimodality. For example, the steadystate ODE solutions with respect to different gene occupancy for mass action kinetics show that there are at most six phenotypic states (see Supplementary Material for more details). However, as there are no probabilistic considerations, conclusions drawn from ODE models can be problematic. The illustrative examples are as follows: 1 peak (red), coherent FFL of type C1 when k 1 = 1.2, k 2 = 1.2, and k 3 = 1.2; 2 peaks (yellow), either in protein B with coherent FFL of type C1, where k 1 = 3.0, k 2 = 1.2, and k 3 = 1.2, or in protein C with coherent FFL of type C1, where k 1 = 1.2, k 2 = 6.0, and k 3 = 6.0; 3 peaks (green), coherent FFL of type C1, where k 1 = 1.2, k 2 = 6.0, and k 3 = 3.6; 4 peaks (light-blue), coherent FFL of type C1 exhibits two peaks for protein B and two peaks for protein C, where k 1 = 3.0, k 2 = 6.0, and k 3 = 6.0; and 6 peaks (purple), coherent FFL of type C1 exhibit two peaks for B and three peaks for C, where k 1 = 3.0, k 2 = 6.0, and k 3 = 3.6. Figure 2 exhibiting different multimodalities. Red: The probability landscape with monomodality. Yellow: These two PDs depict the two steady-state landscapes exhibiting bimodality. Green, light blue, and purple: These three PDs depict the landscape exhibiting tri-modality, 4-modality, and 6-modality, respectively.

FIGURE 3 | Persistent diagrams (PDs) of feed-forward loop (FFL) network modules of
An example of the diverging results between ODE and stochastic models is shown in Figure 4A for an FFL of C1 type. The mean values of C obtained from the ODE model (vertical blue line) and the expectation computed from the probability landscape (vertical purple line) diverge from each other ( Figure 4A). There are three different phenotypic states by the ODE model (green lines, Figure 4A), which are different from the bimodal probability distribution obtained from the SCK model ( Figure 4A).
A further example is provided by the FFL of type I1. Here, the ODE model predicts the existence of three phenotypes at k 1 = 2.7, k 2 = 0.4, and k 3 = 1.8 (Figure 4B, green vertical lines). However, the stochastic model shows that there is only one stability peak. Although the mean value of C obtained from the ODE model and the expected C value computed from the probability landscape largely overlap, the ODE model provides no information on phenotypical variability. Overall, stochastic models provide FIGURE 4 | Comparing feed-forward loop (FFL) behavior by Accurate Chemical Master Equation (ACME) and by deterministic ordinary differential equation (ODE) models. (A) shows the results of FFL of C1 type for (k 1 , k 2 , k 3 ) = (2.4, 4.5, 1.8). The exact results obtained using ACME exhibit bimodality in protein C (red curve), while trimodality is predicted by the deterministic ODE model (green vertical lines). The mean copy number from ACME (purple vertical line) is also different from the that from ODE (blue vertical line). (B) shows the results of FFL of I1 type for (k 1 , k 2 , k 3 ) = (2.4, 0.4, 1.8). The exact results obtained using ACME exhibit monomodality in protein C (red curve), while deterministic ODE model predicts trimodality (green vertical lines), even though the mean copy number of protein C are the same between ACME and ODE models (purple and blue vertical lines, respectively).
accurate and rich information that are not possible with ODE models.

Behavior of FFLs From Exact Solution to dCME by ACME Can Be Differ From That by Stochastic Simulation Algorithm
Results from simulations using SSA may differ from the exact solution to dCME obtained using ACME. We illustrate this using two incoherent FFLs, one at (k 1 , k 2 , k 3 ) = (3.0, 0.5, 5.0) of I1-FFL (Figures 5A-C) and another at (k 1 , k 2 , k 3 ) = (0.1, 2.75, 5.0) (Figures 5D-F) of the I4-type FFL. The exact steady-state probability landscape of the I1-FFL network computed using ACME is multimodal, exhibiting two peaks in protein B and two peaks in protein C ( Figure 5A). However, these peaks are not definitive when 30,000 reaction trajectories up to 2,500 s are simulated using SSA (upper plots, Figures 5B,C). Bimodality in proteins B and C becomes only definitive when simulation time is extended to 5,000 s (lower plots, Figures 5B,C).
The exact steady-state probability landscape of the I4-FFL network computed using ACME exhibits tri-modality in protein C and bimodality in protein B (Figure 5D). However, trimodality is not clearly captured when the reaction trajectories are < 2, 500 s (upper plot, Figure 5E), and becomes definitive only after 5,000 s (lower plot, Figure 5E). In addition, bimodality in protein B is not captured, even when the reaction trajectories are at 5, 000 s (upper and lower plot, Figure 5F).

Phase Diagrams of Multimodality in FFLs
Current studies of stochastic networks are limited to their behavior under a few selected conditions. Here, we explore the multimodality of all eight types of FFLs under broad conditions of synthesis, degradation, binding, and unbinding as outlined in Table 1. This is made possible by the efficiency of the multi-finite buffer ACME algorithm. The analysis using persistent homology further allows us to quantitatively characterize the exact topology of the landscape. Together, we are able to obtain the full phase diagrams on the phenotype of multimodality of FFLs at different combinations of parameter values (Figure 6).
Altogether, we compute 10,812 probability landscapes of the 8-types of FFL modules. Depending on the values of k 1 , k 2 , and k 3 , each phase diagram shown depicts the behavior of four types of FFLs, one for each of the four quadrants formed by the two straight lines of k 2 = 1 and k 3 = 1 (Figure 6), with the type of FFL labeled accordingly. The specific types also depend on k 1 , which is listed at the top of each plot (Figure 6). As a result, we have gained comprehensive and accurate characterization of the multimodality phenotypes of this type of important network modules. Figure 2, the steady-state probability landscape of the FFL at k 1 = k 2 = k 3 = 1.2 exhibits one probability peak. At this condition, it is a coherent FFL of type C1. The projected distributions of B and C exhibit monomodality and has only one peak (Figure 2, red) when the values of intensities k 1 , k 2 , and k 3 are close to 1.0 (Figure 6). Overall, there is only one phenotypic state when the regulations intensities in FFL are weak.

Bimodality
The steady-state probability landscape of FFLs can exhibit two types of bimodality (colored yellow in Figure 2). The first type occurs when k 1 < 0.4 or k 1 2.4, with bimodality in protein B while monomodality in protein C. This is illustrated as green regions in Figure 6 shown at the two top-left and the two bottom right phase diagrams where k 1 ∈ {0.025, 0.1, 2.4, 3.0}. That is, if the regulation intensities of k 1 and k 2 are about two-fold different either way, bimodality in B arises. The reaction trajectories computed from SSA corresponding to condition in (A) for proteins C and B, respectively. The upper plots are for 2,500 s and lower plots are for 5,000 s. SSA does not capture the bimodality of proteins B and C until 2,500 s. (D) The probability surface projected onto (B−C) plain for FFL with (k 1 , k 2 , k 3 ) = (0.1, 2.75, 5.0). There is tri-modality in protein C and bimodality in protein B. (E,F) Corresponding reaction trajectories in proteins C and B, respectively. Upper plots are for the results for 2,500 s and lower plots are for 5,000 s. SSA does not capture tri-modality of protein C until 2,500 s. In addition, SSA fails to capture bimodality in protein B.
The second type of bimodality occurs when 0.4 ≤ k 1 < 2.4, where protein C exhibit bimodality while monomodality is maintained in B. This is illustrated as green regions in the remaining phase diagrams of Figure 6, where k 1 ∈ {0.4, 0.8, 1.5, 2.1}.

Tri-modality
The steady-state probabilistic landscape of FFL can exhibit trimodality (green, Figure 2). There are three possible phenotypes in protein C while monomodality in protein B is maintained. Trimodal regions are colored red in the phase diagrams of Figure 6. They arise when the difference in rates k 2 and k 3 is at least about two-fold and 0.4 ≤ k 1 ≤ 2.1.

Multimodality
The steady-state probability landscape of the FFL can exhibit 4 to 6 probability peaks (orange, purple, and green, respectively, in Figure 2). Landscapes with 4 modes have bimodality in both protein B and protein C. Those with 5 modes has bimodality in B and tri-modality in C. Landscapes with 6 modes exhibit bimodality in B and tri-modality in C. Inspection on the conditions indicates that when the regulations are strong; i.e., when k 1 , k 2 , and k 3 ≥ 2.1, FFLs exhibit very well-defined multimodality peaks. However, when the regulation intensity k 1 is weak, the steady-state probability landscape exhibits multimodality only when the other two regulation intensities, namely, k 2 and k 3 are strong. As shown in Figure 6, there are two groups of FFLs based on the characteristics of the multimodality they exhibit: One group consists of FFLs of C 2 , C 4 , I 1 , and I 3 types, where tri-modality of output protein C always exists, as long as k 2 and k 3 are at least about two-fold different. The other group consists of FFLs of C 1 , C 3 , I 2 , and I 4 types where the signs of the regulations that the output node C receives from B and A are the same (both activation or both inhibition). Tri-modality occurs when the regulations k 2 and k 3 have very distinct values.
Overall, protein B can exhibit either mono-or bimodality, and protein C can exhibit mono-, bi-, or tri-modality on the probability landscape.

Increasing Input Intensity Amplifies Multimodality in FFL
To understand how input intensity affect the response of FFL networks, we examine their behavior under different input conditions. Specifically, we examine how different synthesis rate s A of protein A affects the number of modes in proteins B and C. Monomodality occurs when 0.4 ≤ k 1 ≤ 2.1 and k 2 , k 3 intensities are moderate, i.e., 0.4 ≤ k 1 ≤ 3 (blue region when k 1 = 0.4, 0.8, 1.5, and 2.1). Bimodality may occur for different combinations of regulation intensities. When k 1 intensity is either very high (2.4 ≤ k 1 ) or very low (k 1 ≤ 0.1), bimodality occurs when k 2 , k 3 intensities are moderate, i.e., 0.4 ≤ k 1 ≤ 3. When k 1 intensity is moderate (0.4 ≤ k 1 ≤ 2.1), bimodality occurs when at least one of the other regulation intensities k 2 or k 3 is high. Tri-modality occurs when k 1 is moderate (0.4 ≤ k 1 ≤ 2.1) and either k 2 or k 3 is moderate. Multimodality occurs when k 1 is low or high (k 1 ≤ 0.4 or k 1 ≥ 2.1), and at least either k 2 or k 3 is high. Color scheme (vertical bar): Blue, green, red, orange, purple, and brow represent regions with one, two three, four, five, and six peaks, respectively.
We first carry out computations and broadly survey the behavior of FFLs at strong input intensity, where s A is set to 10.0. The values of k 2 and k 3 are sampled broadly, and k 1 is tested for three different values of k 1 = 0.8, 2.1, and 2.4. The results are summarized in Figure 7 (top row). We then similarly survey the behavior of FFLs at decreased synthesis intensity of protein A, with s A = 3.0 (Figure 7, bottom row).
There are clear changes in the mode of multimodality of FFLs. At k 1 = 0.8 and k 1 = 2.1 (Figure 7, left and center columns), when protein A synthesis rate s A is reduced from 10.0 (top) to 3.0 (bottom), regions with one (blue) and three (red) peaks are reduced. In addition, certain areas of the tri-stable (red) regions become bimodal (green).
At larger k 1 = 2.4 (Figure 7, right column), the FFLs exhibits dramatic changes in the modes of multimodality when synthesis rate s A of protein A is reduced from 10.0 (top) to 3.0 (bottom). In many regions, one or more stability peaks disappear. There are regions with two peaks at s A = 10.0 that become monomodal. There are also regions of six peaks that become those of four peaks. This is due to the loss of one stability peak from three in protein C. In addition, large regions with four peaks (orange) disappear and become either regions with two peaks (green) or with three peaks (red). Overall, we can conclude that high-input intensity represented by high s A rate for protein A induces changed phenotypes of multimodality in FFLs.

Binding and Unbinding Dynamics Are Critical for Multiple Phenotypic Behavior
Results obtained so far are based on the assumption of slow binding (r A b = r A c = r B c = 0.005) and unbinding (f A b = f A c = f B c = 0.1) reactions, which we call the generic case. When the FFL network slowly switches between phenotypic states, the process of synthesis degradation of protein C has sufficient time to converge to equilibrium at each phenotypic state of gene c. An important questions is how slow the promoter dynamics need to be for FFLs to exhibit multiple phenotypes, without feedback loops or cooperatively.
To answer this question, we explore the behavior of FFLs under different binding and unbinding dynamics of gene c for an FFL of type I1. In this case, protein A activates protein B and protein C, while protein B inhibits protein C (see Figure 1B). With slow binding kinetics as described above, the output C of this FFL exhibits three stability peaks. These are at the expression level of protein C of (1) C = 0, corresponding to the condition when gene c is inhibited by B, (2) C = 9, corresponding to the basal level of C expression, and (3) C = 49, when C expression is activated by A. We then fix the regulation intensities at k 1 = 3.0, k 2 = 0.025, and k 3 = 5.1, and examine how the number of phenotypic states is affected by gene c binding dynamics (Figure 8).
We first set the binding affinities between gene c and protein A and between gene c and protein B to the same FIGURE 7 | Effects of input intensity on multimodality of Feed-forward loops (FFLs). The phase diagrams of the number of stability peaks in the steady-state probability landscapes at strong input intensity s A = 10.0 (top row) and weak input intensity s A = 3.0 (bottom row) for different k 2 and k 3 at three different conditions of k 1 = 0.8, 2.1, and 2.4. Color scheme (vertical bar): Blue, green, red, orange, purple, and brown represent regions with one, two, three, four, five, and six peaks, respectively.
values, and change them together to n-fold of the generic case, where n ∈ {0.5, 2, 8, 16}. For slower binding and unbinding dynamics (yellow line for n = 0.5, Figure 8A), the modes of the distribution of the output of protein C are even better distinguished. However, when both binding and unbinding rates are increased to n = 8 fold (green line), the probability peak at C = 9, which corresponds to basal level of C expression, merges with the probability peak at C = 0. At n = 16, the distribution of C is bimodal.
We then keep the biding affinity between gene c and protein A unchanged and alter only the binding affinity between gene c and protein B by n-fold, where n ∈ {0.5, 2, 8, 16}. When the binding affinity increases (e.g., n = 8), the probability peak at C = 9 disappears, while the probability peak at high copy number of C = 49 robustly remains, although with less magnitude (Figure 8B).
When only the biding affinity between gene c and protein A is altered while that between gene c and protein B is held constant (Figure 8C), the probability peak at the basal level of C expression (C = 9) diminishes when the binding affinity increases (e.g., n = 8). However, the probability peak at C = 49 becomes more prominent. At n = 8, the distribution of C is tri-modal. At n = 16, it becomes bimodal. This indicates that multiple phenotypes arise in FFLs when the unbinding rate is about an order of magnitude smaller than the expression rate of the protein.

Gene Duplication Can Enrich Phenotypic Diversity and Enlarge Stable Regions of Specific Multimodality of FFLs
Gene duplication provides a basic route of evolution (Lynch and Conery, 2000) and is an important driver of phenotypical diversity in organisms (Conrad and Antonarakis, 2007). Here, we study how gene duplication affects the phenotypes of FFLs.
We examine how duplication of gene c and separately duplication of gene b affect the behavior of the FFL network modules. With two copies of gene c, there can be six possible states of gene c activation. Depending on whether the promoter sites of both copies of gene c are free or occupied by either protein A or protein B, we have for both c genes to have unoccupied, protein A bound, or protein B bound promoter site. This can be denoted as a triplet (c, cA, cB), which can take any of the possible values of (2, 0, 0), (0, 2, 0), (0, 0, 2), (1, 1, 0), (1, 0, 1), and (0, 1, 1). For the case when there are two copy number of gene B, there are three possible states of gene b activation, depending on whether the promoter site of both copies of gene b are free or occupied by protein A. This can be denoted as a duplicate (b, bA), which can take any of the possible values of (2, 0), (1, 1), or (0, 2).
The phase diagrams of the number of modes of stability peaks are shown in Figure 9, when there is only one copy of both gene b and gene c (first row), when there are two copies of gene c but one copy of gene b (second row), and when there are two copy number of gene b but one copy of gene c (third row). The  Figure 9 consists of 400 steady-state probability landscapes with a total 12 × 400 = 4, 800 landscapes. This broad range of parameters allow us to study all 8 different modules of FFL network and the effects of gene c and gene b duplications.
We examine the behavior of FFL in three different regimes of k 1 : (1) When k 1 ≪ 1.0 (Figure 9, first column), the bimodal regions (green) expands when there are two copies of gene c (second row), but there are no significant changes when there are two copies of gene b (third row). The overall size of multimodal regions increases in both cases. (2) When k 1 ≈ 1.0 (Figure 9, second and third columns), the duplication of gene c (second row) expands the regions with three stability peaks and reduces regions with two peaks. In contrast, the duplication of gene b (third row) has no significant effects on multimodality. (3) When k 1 = 2.4 (fourth column), duplication of gene c (second row) expands regions with two and six stability peaks. Duplication of gene b (third row) reduces the region with four peaks and expands the region with five peaks.
These results show that introducing additional copy of gene b or gene c not only can enrich different phenotypic behavior but can also increase the stability of specific phenotypic states, namely, enlarge regions of particular phenotypes by uniting previously different phenotypic regions together. Overall, gene duplication can increase phenotypic diversity, and enlarge stability regions of specific multimodal states.
Bacterial cells have fast binding and unbinding dynamics (Ali Al-Radhawi et al., 2019), and it is unlikely that the occurrence of multiple copies of the same gene in FFLs plays significant roles in stochastic multimodality. In contrast, mammalian cells have slower promoter dynamics (Forger and Peskin, 2003). Gene duplication in FFLs may provide a natural mechanism for enriched multimodality with enhanced stochastic phenotypic switching. This is reflected in reduced monomodal regions, and enlarged multimodal regions where there are 4 (orange), 5 (purple), and 6 (brown) phenotypic states of the output C (second and third row in Figure 9).
Assuming that initially both copies of the gene were functioning, but subsequently one gene copy lost its biochemical function due to mutations, we can expect two opposite types of scenarios to occur: If regulation intensities are strong (k 2 and k 3 are large), one of the phenotypic states becomes lost (e.g., green region becomes light blue, and orange region becomes red, Figure 9). If regulation intensities are weak, the duplication of gene c or gene b can lead to enlargement of the region of monomodality. It can also lead to the appearance of new regimes where there are a larger number of multimodality modes (orange, purple, and green regions in Figure 9). That is, gene duplication can create new stable states, leading to an enlarged number of high probability states. This, however, occurs only in FFL modules with strong regulations intensities. FFL modules with low regulation intensities instead lose phenotypical diversity and become more robust in monomodality with enlarged region in the parameter space.

DISCUSSION
Gene regulatory networks (GRNs) play critical roles in defining cellular phenotypes but it is challenging to characterize the behavior of GRNs. Although GRNs may consist of dozens or more of genes and proteins, their functions often can be defined by smaller sub-networks called network motifs. How small network motifs are responsible for complex properties such as the maintenance of multi-phenotypic behavior or modules is poorly understood. Current widely practiced approach is studying network motifs using deterministic models. However, this approach imposes restrictions on the types of network motifs FIGURE 9 | Phase diagram of the effects of gene duplication on multimodality of feed-forward loops (FFLs). First row: Phase diagrams of the modality of stability peaks when there are one copy of gene c and one copy of gene b. Second row: Phase diagrams when there are one copy of gene b and two copies of gene c. Third row: Phase diagram, when there are two copy of gene b and one copy of gene c. The first, second, and third columns are for k 1 = 0.025, 1.5, and 2.4, respectively. Color scheme (vertical bar): Blue, green, red, orange, purple, and brown represent regions with one, two, three, four, five, and six peaks, respectively. capable of exhibiting multimodal phenotype to mostly feedback networks.
In this study, we examined the FFL network motifs, one of the most ubiquitous three-node network motifs. Although their deterministic behavior is well-studied, with great understanding of their functions such as signal processing and adaptations gained, their stochastic behavior remains poorly characterized.
Here, we showed the direct regulation path from the input node to the output node and the indirect path through the intermediate buffer node provide the necessary architecture for distinct multiple modalities. Phase diagrams of FFL in Figure 6 show that FFLs of various types can exhibit different multimodality. At large copy numbers and large volume, our model of stochastic reaction kinetics are the same as those based on mass action kinetics (Kurtz, 1971(Kurtz, , 1972Vellela and Qian, 2007), where ordinary differential equation (ODE) models are appropriate. When ODE models are applied to enzyme-substrate reactions, they can be further approximated by Michaelis-Menten kinetics, with the additional assumption that the substrate is in instantaneous chemical equilibrium with the enzyme-substrate complex. When ODE models are applied to the reaction of one receptor and n identical simultaneously binding ligands, we arrive at the Hill equation, with the coefficient n phenomenologically characterizing cooperativity. These kinetic models based on ODE approximations, however, are not applicable to the current study, as we are examining strong stochasticity arising at low copy number of molecules, where ODE models are not valid.
FFLs play important roles in gene regulatory networks. For example, it is shown that several I1-FFL sub-networks control the process of Bacillus subtilis sporulation (Eichenberger et al., 2004;Mangan et al., 2006). In addition, C1-FFL network is found to be present in the L-arabinose (ara) utilization system of E. coli, where araBAD is the target (gene c) activated by the intermediate gene araC and the input gene CRP. Gene araC is also activated by CRP. Therefore, they form a 3-node C1 type FFL . Results in this work can help to gain understanding of the behavior of these different types of FFLs found in gene regulatory networks.
In addition, we have shown that input intensity affects the multimodal behavior of various types of FFLs. Examples shown in Figure 7 demonstrate that at high k 1 values, input intensity dramatically changes the multimodality as shown in the phase diagrams. Our results are consistent with previous findings that input intensity is an important factor in determining output intensity of FFLs Goentoro et al., 2009;Lin et al., 2018). Here, we further demonstrated that input intensity is also important in determining the modality of the steady-state behavior of FFLs.
In mammalian cells, slow dynamics of transcription factor binding to promoter is often observed (Dermitzakis and Clark, 2002;Hager et al., 2009;Lickwar et al., 2012;Tuǧrul et al., 2015;Hasegawa and Struhl, 2019). This is likely due to the complex process of chromatin regions opening up so they become accessible and the slow nature of events such as promoter, enhancer, and mediator binding. These physical processes result in highly stochastic behavior of networks. Stochastic models have demonstrated that complex multimodality phenotypes can naturally arise from stochastic fluctuations when genes have distinct expression levels, a phenomenon widely observed in mammalian cells (Cao et al., 2018). We showed that binding and unbinding dynamics are critical for multi-phenotypic behavior. For an I1-FFL with (k 1 , k 2 , k 3 ) = (3.0, 0.025, 5.1), Figure 8 highlighted that binding and unbinding rates affect multiple peaks in protein C.
Results of this study indeed showed that once stochastic fluctuations between distinct expression levels due to slow promoter dynamics are considered, FFLs can exhibit complex multimodal phenotypes. When the expression levels of the output gene (gene c) at the inhibited, basal, and activated states are well-separated, three distinct phenotypes arise. Combined with two additional possible phenotypes of different levels of gene b expression, we can have up to six modalities for FFLs. Furthermore, high intensity of input amplifies multimodality in FFLs, suggesting that the FFL architecture are favored for maintaining multiple phenotypic states. In addition, we find that regulation intensities are key determinants of specific stochastic behavior of FFLs, which could be tuned in order to obtain any desired phenotypic behavior between 1 and 6 stability modes.
Our study also revealed the roles of gene duplication. When there are two copies of gene c, while one in principle could expect 2 × 6 = 12 different phenotypes for the output protein C. This is, however, not observed, as the regulation intensities or reaction rates are not well-separated. In contrast, instead of further increase in multimodality beyond six, we observe the expansion of the area of monomodality, resulting from the connectedness of regions of expression with different rates that are merged together. Our results showed that duplication of gene b and gene c not only can enrich different phenotypic behavior but can also increase the stability of certain phenotypic states, while decreasing others (Figure 9). We showed that in general, gene duplication can enrich phenotypic diversity. The presence and functional roles of gene duplication are well-known (Hurles, 2004). For example, in human-induced pluripotent stem cells (HiPSCs), chromosome 12 duplication leads to significant enrichment of cell cycle related genes (Mayshar et al., 2010), in which FFL sub-networks play important roles. This abnormality results in increase in the tumorigenicity of HiPSCs. Our findings may also shed light on how gene duplication affects cellular adaptation to changing environment (Kondrashov, 2012): As the support regions of monomodality are enlarged, smaller fluctuations in regulation intensities will not switch cells with duplicated genes to a different phenotypic state. Thus, gene duplication may help to stabilize the behavior of the network, so cells are better adapted to a changing environment.
Analysis of stochastic behavior of FFLs reported here have implications in a variety of biological problems. For example, the stem cell regulation network consisting of pluripotency transcription factors Oct4 and Nanog maintain pluripotency against differentiation (Boyer et al., 2005;Chickarmane et al., 2006;Papatsenko et al., 2015;Lin et al., 2018). A component of this network can be abstracted as an FFL: Nanog participates as the intermediate node (gene b, which is activated by Oct4 (gene a), and both regulate the expression of genes associated with the onset of differentiation or pluripotency (gene cs). In addition, regulation networks in hematopoietic stem cells are formed by two FFL networks involving β globin, GATA-1, EKLF, and FOG-1. In each network, FOG-1 and EKLF function as the intermediate genes (gene b) and are activated by GATA-1 (gene a), while all of them activate β globin (gene c) (Swiers et al., 2006). Moreover, in other stem cell differentiation networks, there are several subnetworks that exhibit behaviors of different types of FFLs. For example, Klf4 (gene a) activates Pou5f1 (gene b) and inhibits Sox2 (gene c), while Pou5f1 activated Sox2 (Onichtchouk et al., 2010;Okawa et al., 2016), as in the C3-type FFL (Figure 1).
In summary, we have constructed and analyzed the exact highdimensional steady-state probability landscapes of FFLs under broad conditions and have constructed their phase diagrams in multimodality. These results are based on 10,812 exactly computed probability landscapes and their topological features as measured by persistent homology. With slow binding and unbinding dynamics of transcription factor binding to promoter, FFLs exhibit strong stochastic behavior that is very different from deterministic models, and can exhibit from 1 up to 6 stability peaks. In addition, input intensity play major roles in the phenotypes of FFLs: At weak input intensity, FFLs exhibit monomodality, but strong input intensity may result in up to 6 stable phenotypes. Furthermore, we found that gene duplication can enrich the diversity of FFL network phenotypes and enlarge stable regions of specific multimodalities.
Results reported here can be useful for constructing synthetic networks, and for selecting model parameters, so a particular desirable phenotypic behavior can materialize (Jones et al., 2020). Our results can also be used for analysis of behavior of FFLs in biological processes such as stem cell differentiation and for design of synthetic networks with desired phenotype behavior. We hope results reported here for different types of FFL can be tested experimentally.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
AT and JL conceived and designed the study. AT designed and carried out analysis of the ODE model, multimodality, phase diagrams, slow dynamics, and gene duplication. FM designed and carried out analysis of persistent homology, and assisted in multimodality and phase diagram computation. YC participates in design and data analysis. AT and JL wrote the manuscript with significant input from FM. All authors have read and approved the final manuscript.

FUNDING
This work was supported by NIH grant R35 GM127084.