Time Series Analysis of the Bacillus subtilis Sporulation Network Reveals Low Dimensional Chaotic Dynamics

Chaotic behavior refers to a behavior which, albeit irregular, is generated by an underlying deterministic process. Therefore, a chaotic behavior is potentially controllable. This possibility becomes practically amenable especially when chaos is shown to be low-dimensional, i.e., to be attributable to a small fraction of the total systems components. In this case, indeed, including the major drivers of chaos in a system into the modeling approach allows us to improve predictability of the systems dynamics. Here, we analyzed the numerical simulations of an accurate ordinary differential equation model of the gene network regulating sporulation initiation in Bacillus subtilis to explore whether the non-linearity underlying time series data is due to low-dimensional chaos. Low-dimensional chaos is expectedly common in systems with few degrees of freedom, but rare in systems with many degrees of freedom such as the B. subtilis sporulation network. The estimation of a number of indices, which reflect the chaotic nature of a system, indicates that the dynamics of this network is affected by deterministic chaos. The neat separation between the indices obtained from the time series simulated from the model and those obtained from time series generated by Gaussian white and colored noise confirmed that the B. subtilis sporulation network dynamics is affected by low dimensional chaos rather than by noise. Furthermore, our analysis identifies the principal driver of the networks chaotic dynamics to be sporulation initiation phosphotransferase B (Spo0B). We then analyzed the parameters and the phase space of the system to characterize the instability points of the network dynamics, and, in turn, to identify the ranges of values of Spo0B and of the other drivers of the chaotic dynamics, for which the whole system is highly sensitive to minimal perturbation. In summary, we described an unappreciated source of complexity in the B. subtilis sporulation network by gathering evidence for the chaotic behavior of the system, and by suggesting candidate molecules driving chaos in the system. The results of our chaos analysis can increase our understanding of the intricacies of the regulatory network under analysis, and suggest experimental work to refine our behavior of the mechanisms underlying B. subtilis sporulation initiation control.


ORDINARY DIFFERENTIAL EQUATIONS AND PARAMETERS OF THE NETWORK MODEL
We report here the ordinary differential equations (ODEs) and the parameters of the model of the B. subtilis sporulation initiation network published in Ihekwaba et al. (2014). All state variables are in normal text, all parameters in italic. Parameter names are formed by a sequence of tokens separated by underscore characters (" "). They begin with letter k for reaction rates, or Kk for Hill function parameters, or n for Hill function exponents, followed by a token that indicates the biological process, followed by a token that indicates the species the process is active upon, and possibly followed by a token that identifies a regulatory species (when it exists), or a partner species in the reaction (e.g., phosphotransfers). The only exception to this notation is for the mRNA and the protein degradation rates, which are denoted by degm and degp, respectively, and which do not change depending on the species. As the concentration of the IPTG and SS species does not change over time, no differential equations exist in the model for these two species. The Tables S1, S2, and S3 collect all the parameters, explain their their biological meaning and report their value. The ODEs are given in Table S4. The time is measured in seconds, and the abundance of the species in nM. The model has been simulated on a time interval from 0 to 15,000 seconds with a time resolution of 10 seconds. Numerical solutions are shown in Figure S1.
Figure S1. Numerical solutions of the ODEs system. Most molecular species show a steep monotonic behaviour, which reflects the stiffness of system's dynamics.

SUPPLEMENTARY ANALYSES
This section collects the results of (i) the sensitivity analysis performed on a smaller range of parameter variation ( Figure S2); (ii) the comparison between complexity indices estimated from the time series and for those estimated from coloured and power-spectrum noise ( Figures S4, S5, and S6), that confirm that the complex behavior is due to chaos rather than to noise, and (iii) the recurrence plots ( Figure S7). Table S5 reports the results of the the recurrence quantification analysis (RQA). RQA quantifies the number and duration of recurrences of the state space trajectories of a dynamical system . Figure S2. Heatmap summarizing the results of sensitivity analysis. The size of the interval of parameter variation is defined by q = 2 in Eq. 1. The most sensitive molecular species are aa, ga, spo0b, ab, laci d, gb, spo0a. They are sensitive to the 50% of the parameters.

Analysis of monotonicity
The (non-decreasing) monotonicity is the property for which ∂s i ∂t ≥ 0, ∀i|Θ ∈ R N P , where s i is the solution of the equation for the i-th molecular species, and Θ is the set of parameter values.
In an interval of parameter variation defined as in Eq. 13, we samples 10,000 parameter configurations, and for each solution s i we calculated the percentage of time points at which ∂s i ∂t ≥ 0. Formally, the Proportion of Positive Derivatives values (PPD) for the species i is.
where, N is the length of the time series (here it is equal for all the species), and The derivative has been calculated with the Stineman method (Johannesson and Bjornsson, 2012). The results are shown in Figure S3 A. For 20 species out of 25 the derivative is invariably positive. The negative time derivatives of the solutions are due to (i) small fluctuations/irregularities of the curves (for spo0b and spo0f ); (ii) a slow slight decreasing of the curve before reaching a plateau (for spo0a); (iii) a rapid increment/decrement on a short time interval and/or of small magnitude (for dimkinap, and spo0bp).
The plots in Figures S3 B, C, D, E, and F show that the standard deviation of the simulation curves is approximately null and that their overall trend is monotone.

Steep monotonicity and global sensitivity analysis
A steep monotonic behaviour reflects the stiffness of the system's equations, which is due to the presence of substantially different orders of magnitude of the parameters (see Table S3). The stiffness of the system warns about the accuracy of the estimates of Sobol indices (Sumner et al., 2012) that quantify the global parameter sensitivity of the model. It is known that numerical methods for integrating stiff equations on large integration domain tend to be inaccurate. Since Sobol indices computation requires to integrate the solutions of the equations in the parameters space (Sumner et al., 2012), the occurrence of stiff equations would imply a severe loss of accuracy of the Sobol indices for our systems. The integrals in the definition of Sobol indices are usually evaluated using Monte Carlo integration methods that use random or quasirandom sampling of the model parameters. To obtain reliable accurate estimates of the Sobol indices, especially for stiff systems, the sample size should be taken large (i.e. the step size of the integration should be taken extremely small). However, the complexity of the method scales as N S (N P + 2), where N S is the sample size and N P is the number of parameters. As reported in (Morio, 2011) ten thousands samples are often necessary for the estimation of one Sobol index with a relative error of 10%. In order to have smaller relative errors, we should increase N S . However, if the system is highly stiff in the parameters space, as it is in our case, N S should be extremely high, so that the use of Sobol method becomes impractical (Sumner et al., 2012). Therefore, we set out to preferentially avoid global sensitivity analysis in our study. Figure S3. Analysis of monotonicity. A. Proportion of time points at which the time derivative of the numerical solution of the dynamical of a species is positive or null. B -F Simulation curves for the species whose time derivative assumes also negative values. All the parameters have been changed over the range defined in Eq. (13), and the effect on these variables has been assessed. The values of the parameters have been sampled from a uniform distribution positively defined on this range. The model has been ran 5,000 times. An upper (max) and a lower (min) bound simulation curve, along with the mean simulation curve and its standard deviation have been calculated (with the literature algorithms presented in (Soetaert et al., 2016a(Soetaert et al., ,b, 2010). The plots show that these curves overlap, as the standard deviation is very close to zero. Therefore, these simulation prove that the model is monotonic and that it does not respond to a global sensitivity analysis approach to parameter perturbation. Finally, note that although the behavior of spo0bp shows vlear deviations from monotonicity, the magnitude of the spo0bp variation amounts to only few nMs. Indeed, spo0b rapidly increases from 0 to only 4 nM and then it decays to 1 nM from which it starts to monotonically increase, whereas the range on y-axis covered by the other species in the same simulation time is tens/thousand times larger.

Recurrence plots (RPs)
A d-dimensional phase space trajectory (d > 2) can be visualized through a two-dimensional representation of its recurrences.
The recurrence plot (RP), proposed by Eckerman et al. (Eckmann et al., 1987), employs a two-dimensional squared matrix of zero values, where both axes are time axes. The recurrence of a state, observed at time i, at a time j different from i is marked with one (black dot in the plot) in the matrix. In formulas, an RP can be expressed as follows: where N is the length of the time series x, ν i is a threshold distance, || · || is a norm, and Θ(·) is the Heaviside step function The RPs structures are indicative of the time evolution of phase space trajectories. A comprehensive introduction to RPs and the interpretation of their structure is given in Abraham et al., 1989). Here, we summarize the main definitions and concepts as reported in (Marwan et al., , 2002. The RPs structure is characterized by large scale (typologies) and small scale patterns (textures).
The typologies in a RP can be (i) a homogeneous, (ii) a periodic, and (iii) a disrupted distribution of recurrent points. (Eckmann et al., 1987). Homogeneous RPs are common to chaotic and to stochasitc dynamics. Diagonally oriented, periodic recurrent structures (diagonal lines, checkerboard structures), represent oscillating systems. Fading to the upper left and lower right corners indicate non-stationarity (i.e. the presence of a drift or trend). Finally, white areas or bands in RP are typical in presence of abrupt changes in the dynamics as well as of rare events.
The textures can be single dots, diagonal lines as well as vertical and horizontal lines.
• Single, isolated recurrent points can occur if states are rare, if they do not persist for any time or if they fluctuate heavily.
• A diagonal line R i+k,j+k = 1 (for k = 1, . . . , l, where l is the length of the diagonal line) occurs when a segment of the trajectory runs parallel to another segment, i.e. the trajectory visits the same region of the phase space at different times. The length of this diagonal line is determined by the duration of such similar local evolution of the trajectory segments. In presence of diagonal lines, the process coudl be deterministic with no chaos; if these diagonal lines occur beside single isolated points, the process could be affected by deterministic chaos (if these diagonal lines are periodic, unstable periodic orbits can be retrieved).
where v is the length of the vertical line) marks a time length in which a state does not change or changes very slowly. It seems, that the state is trapped for some time. This is a typical behaviour of laminar states (intermittency).
The textures of a RP are the base of the definition of measure for a quantitative analysis of the RPs, called Recurrence Quantification Analysis (RQA) (Webber et al., 2016;Marwan et al., 2002;Trulla et al., 1996).
The list of the RQA measures, a short explanation of their meaning, and their values we found for the variables of the B. subtilis sporulation initiation network, is given in Table S5. DET measures the proportion of recurrent points forming diagonal line structures parallel to the main diagonal. Lmax, i.e. the length of the longest diagonal line, inversely scales with the maximal Lyapunov exponent (Eckmann et al., 1987;Trulla et al., 1996). Positive Lyapunov exponents gauge the rate at which trajectories diverge, and are the hallmark for dynamic chaos. Thus, the shorter the Lmax, the more chaotic (less stable) the signal. ENTR is the Shannon entropy of the distribution of the length of line segments parallel to the main diagonal. ENTR is a measure of signal complexity and is calibrated in units of bits/bin to quantify how much information one needs in order to recover the system. The entropy is small when the length of the longest segment parallel to the diagonal is short and does not vary much. This has to be associated with information on determinism. A high entropy is typical of periodic behavior while low entropy indicates chaotic behavior (Fabretti and Ausloos, 2004;Blackledge et al., 2002). Table S5 shows that DET is equal to 100% for all the molecular species, that means that for all the molecular species all the recurrent points lie on diagonal segments parallel to the main diagonal. This is a typical characteristic of deterministic systems (with and without chaos). However, we also found a significant variability in the mean length (Lmean) of these segments, and more than 50% of the molecular species with a Lmean smaller than the means value. These molecular species are marked in bold in Table  S6. The same set of species exhibits values of Vmean and ENTR below the mean. Of the species indicated in bold in the table, spo0b, spo0bp, spo0a, spo0ap, aa, and ab, also report a value of Lmax below that the mean. There results confirm again spo0b as a species affected with chaotic dynamics. Table S5. Summary of the recurrence quantification analysis (RQA). RQA quantifies the number and duration of recurrences of a dynamical system presented by its state space trajectory (Zbilut and Webber, 2006 Figure S7. Global aspect of the RPs for the variables of the B. subtilis sporulation initiation network.

Kaplan-Yorke ratio and Kolmogorov-Sinai entropy
By virtue of their definition including the Lyapunov exponents, Kaplan-Yorke ratio (i.e. the second term of the Kaplan-Yorke dimension (Nichols et al., 2004;Hilborn, 2000)) and Kolmogorov-Sinai entropy (Hilborn, 2000), also known as metric entropy, are the indices expressing topological complexity as function of sensitivity of the systems to the initial conditions. Kaplan-Yorke ratio is defined as: where λ i is the i-th Lyapunov exponent, and, ordering in decreasing order the Lyapunov exponents λ 1 > λ 2 > · · · > λ i > · · · > λ d , j is the largest integer such that j i=1 λ i > 0. Kolmogorov-Sinai entropy is defined as We calculated the values of KY R and KS E corresponding to the values of perturbations ∆ = {10 −5 , 10 −4 , 10 −3 , 10 −2 , 10 −1 , 1, 10} applied to one variable at a time. Figure S8 shows the boxplots of the distributions of the values KY R and KS E . The name of perturbed variable is indicated on the horizontal axis. The heatmaps in Figures S9 (S10) report respectively, the value of the difference between the mean values of KY R (KS E ) distributions and the p-values of the multiple t-test performed to asses the statistical significance of this difference.