Bayesian inference for stochastic individual-based models of ecological systems: a pest control simulation study

Parise, Francesca; Lygeros, John; Ruess, Jakob

doi:10.3389/fenvs.2015.00042

METHODS article

Front. Environ. Sci., 10 June 2015

Sec. Environmental Informatics and Remote Sensing

Volume 3 - 2015 | https://doi.org/10.3389/fenvs.2015.00042

This article is part of the Research TopicHybrid Solutions for the Modelling of Complex Environmental SystemsView all 14 articles

Bayesian inference for stochastic individual-based models of ecological systems: a pest control simulation study

Francesca Parise^1†

John Lygeros¹

Jakob Ruess²^*^†

¹Automatic Control Laboratory, Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
²Institute of Science and Technology Austria, Klosterneuburg, Austria

Mathematical models are of fundamental importance in the understanding of complex population dynamics. For instance, they can be used to predict the population evolution starting from different initial conditions or to test how a system responds to external perturbations. For this analysis to be meaningful in real applications, however, it is of paramount importance to choose an appropriate model structure and to infer the model parameters from measured data. While many parameter inference methods are available for models based on deterministic ordinary differential equations, the same does not hold for more detailed individual-based models. Here we consider, in particular, stochastic models in which the time evolution of the species abundances is described by a continuous-time Markov chain. These models are governed by a master equation that is typically difficult to solve. Consequently, traditional inference methods that rely on iterative evaluation of parameter likelihoods are computationally intractable. The aim of this paper is to present recent advances in parameter inference for continuous-time Markov chain models, based on a moment closure approximation of the parameter likelihood, and to investigate how these results can help in understanding, and ultimately controlling, complex systems in ecology. Specifically, we illustrate through an agricultural pest case study how parameters of a stochastic individual-based model can be identified from measured data and how the resulting model can be used to solve an optimal control problem in a stochastic setting. In particular, we show how the matter of determining the optimal combination of two different pest control methods can be formulated as a chance constrained optimization problem where the control action is modeled as a state reset, leading to a hybrid system formulation.

1. Introduction

The use of mathematical models in population ecology and epidemiology has a long history (Murray, 2002). Among the wide range of available models, two major categories can be distinguished: population-level and individual-based models (Black and McKane, 2012). Population-level models (PLMs) implicitly assume an infinite population size and provide a phenomenological description of the overall population behavior. The main advantage of PLMs is that they involve ordinary differential equations that can be analyzed using dynamical systems theory; their main drawback is that they neglect any effect that may be caused by finite population sizes or by the inherently random nature of interactions between individuals. Individual-based models (IBMs), on the other hand, are discrete models that represent an ecological system as a collection of a finite number of individuals that are modeled explicitly (DeAngelis and Mooij, 2005; Railsback and Grimm, 2012). Depending on how many details are included for each individual, IBMs can be further distinguished into agent-based models, which provide great detail but are limited to algorithmic and numerical analysis, and stochastic process models that typically distinguish a limited number of different types of individuals but are more amenable to analytical investigation (Black and McKane, 2012). Among IBMs we restrict our attention to stochastic process models where the possible interactions of species occur with a probability that is proportional to the number of individuals present in the system. Mathematically, these models can be represented by continuous-time discrete-state Markov chains (CTMCs).

Discrete stochastic models naturally arise in many applications both in ecology (Marion et al., 1998; Ovaskainen and Cornell, 2006; Ovaskainen and Meerson, 2010) and in epidemiology (Isham, 1991; Nåsell, 2002). While stochastic IBMs and CTMCs in particular, provide an intuitive description of these systems, their analysis tends to be difficult. Consequently, the use of these models has for a long time remained limited to systems containing at most a handful of different interacting species. Recent years have seen, on the one hand, an immense increase in computational resources, and on the other hand, a surge of studies in which IBMs have been used to study biological systems at the cellular level. Specifically, in these applications interacting molecules play the role of interacting individuals (Balazsi et al., 2011; Goutsias and Jenkinson, 2013; Neuert et al., 2013). These developments have stimulated research on new methods for analyzing IBMs (see, e.g., Munsky and Khammash, 2006; Wolf et al., 2010; Ruess et al., 2011, 2013), which opened the path for using larger and more complicated models that are more suitable to represent the complex systems encountered in applications. As a consequence of these new modeling capabilities, problems that were for a long time solvable only using PLMs can now be analyzed (and hence receive renewed interest) in the context of IBMs. In the case study of this paper, for example, we consider the use of IBMs to address optimal control problems for pest management. Many studies on optimal control problems in ecology were performed using PLMs during the 1970's (see Wickwire, 1977 and references therein). In these pioneering works, the control actions were usually included by changing the model parameters (e.g., birth or death rates) according to the control action or by developing a new model for the controlled system, in which the control is included as a continuous input. In the case study presented here, we adopt a different modeling approach, similar to the idea presented in Jaquette (1970) for a simple birth-death process. Specifically, we assume that the control operator can reset the state of the system (for example reduce the number of pest individuals by applying pesticide) at certain given intervention times, while the dynamics of the system between the reset times follow the uncontrolled IBM. The resulting model is thus a stochastic hybrid system with controlled state reset (Branicky et al., 1998; Bensoussan and Menaldi, 2000).

In the 1970's, the main difficulty in translating the control theoretical studies into real world applications was the lack of efficient and accurate procedures to estimate the required model parameters (Wickwire, 1977). While for PLMs these difficulties have been to a large extent overcome, for the more detailed IBMs the problem of reverse engineering parameter values from measured data is still a major challenge (Poovathingal and Gunawan, 2010; Stumpf, 2014). Most of the available approaches, in fact, require iterative evaluation of parameter likelihoods, which are usually not available analytically and computationally very expensive. To circumvent this problem, likelihood-free approaches for parameter inference, such as approximate Bayesian computation, have found applications in ecology (McKinley et al., 2009; Lagarrigues et al., 2014) and other fields (Toni et al., 2009).

In this paper, we suggest a different approach for parameter inference based on an expression that allows for fast (but approximate) evaluation of the likelihood. To obtain this expression, we move from a full description of the stochastic model to ordinary differential equations that describe the time evolution of only means and (co)variances (and possibly higher order moments) of the interacting species. We then use moment closure techniques to approximate the solution of these equations and show that the results can be used to approximate the likelihood. Moment closure methods have a long history in population biology (Whittle, 1957; Nåsell, 2003; Krishnarajah et al., 2005; Singh and Hespanha, 2006; Hespanha, 2008), but only recently first attempts have been made to use these methods for parameter inference (Kügler, 2012; Zechner et al., 2012; Milner et al., 2013). The approach we take here is motivated by recent developments in the modeling of biochemical reaction networks (Zechner et al., 2012) and differs from typical methods in ecology (Ross et al., 2009; Gillespie and Golightly, 2010) in that it is intended to be used with data that are obtained from many independent observations of a system, instead of being tailored to a single observation of an ecological system over time. Accordingly, it is most suitable for applications in which many replicates of the same experiment can be performed, as typically done in microcosm and mesocosm experiments (Srivastava et al., 2004; Hekstra and Leibler, 2012; Altermatt et al., 2014), but only one measurement per replica is feasible. This might be the case when measurements are costly or time-consuming, resulting in a trade-off between the number of measurements and the number of replicates (e.g., Carrara et al., 2014), or if they perturb the system so that further measurements of the same replicate are not possible or meaningful. This is the typical scenario for destructive measurements as, for instance, in the study of bacterial colonization dynamics where measurements may require sequencing the bacterial genomes of the whole community (Cordero et al., 2012). At the subcellular scale, this type of data is naturally obtained when the experimental replicates are individual cells and the intracellular dynamics of chemical species are measured (e.g., Zechner et al., 2012).

2. Materials and Methods

2.1. Individual-Based Modeling

Consider an ecological system comprising m species, X₁, …, X_m that can interact, in a given habitat, according to K different types of interactions

\begin{matrix} ν_{1 k}^{'} X_{1} + \dots + ν_{m k}^{'} X_{m} \overset{θ_{k}}{\to} ν_{1 k}^{″} X_{1} + \dots + ν_{m k}^{″} X_{m}, k = 1, \dots, K . & (1) \end{matrix}

The expression on the left hand side of the arrow denotes the amount ν′_ik of individuals for each species X_i needed for interaction k to happen. The expression on the right hand side describes the result of the interaction. In other words, the effect of interaction k is to update the number of individuals of each species X_i by the net amount ν_ik: = ν″_ik − ν′_ik. Let $X (t) = {[X_{1} (t) \dots X_{m} (t)]}^{⊤}$ denote the amount of individuals of each species present in the system at time t. In the following, we assume that each interaction is a stochastic event whose probability to occur depends on the probability that the required amounts of individuals meet in some location of the habitat and on a parameter θ_k that determines the probability that the individuals successfully interact when they meet. Since the interactions are stochastic events, X(t) is a stochastic process that takes values x = [x₁ … x_m]^⊤ ∈ ℕ^m₀. Under the assumption of random movement of the individuals, the probability that a given interaction takes place in the infinitesimal time interval [t, t + dt], given the current population state X(t) = x, can be determined by the law of mass action as

\begin{array}{l} a_{k} (x, θ) d t : = θ_{k} h_{k} (x) d t, where h_{k} (x) = \prod_{i = 1}^{m} (\begin{matrix} x_{i} \\ ν_{1 k}^{'} \end{matrix}), \\ k = 1, \dots, K and θ = {θ_{1}, \dots, θ_{K}} . \end{array}

Note that we assumed that the habitat is homogeneous, that is, the probabilities a_k(x, θ)dt do not depend on the spatial location, but only on the total amount x of individuals present in the system.

2.2. Data Description

Since the model introduced in the previous section is stochastic, for a given initial population X(0) = x₀, many different evolutions of the system, corresponding to different realizations of the stochastic process X(t), are possible. In the following, we assume that we can monitor many of these different replicates of the system, but only one measurement per replica can be taken (possibly at different times). Therefore, the collected data consists of several measurements of the number of individuals of one species¹ X_j at different time points, each coming from a different replicate. This means that we assume that the collected data contains information about the dynamics but not about the correlation of the species abundance between different time points. Let t₁, …, t_S denote the measurement times and suppose that for each measurement time we have measured n different replicates². The data set is then of the form $D = {X_{j}^{1} (t_{s}), \dots, X_{j}^{n} (t_{s})}_{s = 1}^{S}$ , where all the measurements Xⁱ_j(t_s), i = 1, …, n, s = 1, …, S are statistically independent.

A feature of such data, which is at the same time a strength and a serious complication, is that the measurements described above correspond to observations from n · S (supposedly) identical replicates of the system. In a realistic situation, these replicates might be performed in different days or come from slightly different ambient conditions. Not taking into account this source of variability can have deleterious effects on model predictions and accordingly also on any strategy for optimal interventions and population control. Consequently, the variability observed in different replicates is an asset that should be used in order to identify a model that is not tailored to one particular experimental condition but can describe all of them. To allow this type of flexibility, we assume that in different repetitions of an experiment some of the parameters θ_k, k = 1, …, K may be slightly different. Without loss of generality, let us denote by k = 1, …, r the interactions for which the rate θ_k varies between different repetitions of the experiment, and by k = r + 1, …, K the remaining ones that are the same for all repetitions. We describe the experimental variability by assuming that the success of the first r interactions is given by θ_k · Z_k, k = 1, …, r, where Z = [Z₁ … Z_r]^⊤ is a random vector with unknown distribution P_Z. Moreover, we assume that the marginal means of P_Z are all equal to one, so that θ_k, k = 1, …, r are the average interaction success rates over different replicates of the experiment. This gives rise to a model akin to mixed-effects models in the statistics literature (Lavielle, 2014).

2.3. The Conditional Master Equation and Moment Dynamics

Under the assumption of a homogeneous environment, for fixed success parameters θ_k, the time evolution of the number of individuals X(t), as described in Section 2.1, follows a continuous-time Markov chain. Consequently, the time evolution of its probability distribution p(x, t): = P[X(t) = x], can be described by a master equation (Black and McKane, 2012). The same theory holds in the case of random success rates θ_k · Z_k, k = 1 …, r, if we fix a specific realization z of the random vector Z. Mathematically, the conditional process X(t) | Z = z is Markovian and can be described by the master equation

\begin{matrix} \begin{array}{l} \dot{p} (x, t | z) = - p (x, t | z) \sum_{k = 1}^{K} a_{k} (x, θ, z) \\ + \sum_{k = 1}^{K} p (x - ν_{k}, t | z) a_{k} (x - ν_{k}, θ, z), \end{array} & (2) \end{matrix}

where³ p(x, t|z) : = P[X(t) = x|Z = z]. Typically, Equation (2) cannot be solved explicitly and for complex systems also numerical approaches might fail. However, if one is not interested in the whole probability distribution, Equation (2) can be used to derive evolution equations for some of the moments of the distribution, as mean and variance. Specifically, denoting by $\tilde{ψ}$ the l-dimensional vector of the (uncentered) moments up to some order L of the joint process $\tilde{X}$ (t) = [X(t)^⊤ Z^⊤]^⊤ and by $\bar{\tilde{ψ}}$ the vector containing the higher order moments of $\tilde{X}$ (t), one obtains

\begin{matrix} \frac{d}{d t} \tilde{ψ} (t) = \tilde{A} (θ) \tilde{ψ} (t) + \tilde{B} (θ) \bar{\tilde{ψ}} (t), & (3) \end{matrix}

where $\tilde{A}$ (θ) ∈ ℝ^{l × l} and $\tilde{B}$ (θ) ∈ ℝ^{l × ∞} are matrices defined by the reaction network and by the parameters θ [a detailed derivation is given in Zechner et al. (2012)]. For mass action kinetics with at most pairwise interactions of the species, $\bar{\tilde{ψ}}$ contains moments of order at most L + 2, so that $\tilde{B}$ (θ) is a finite dimensional matrix (Ruess and Lygeros, 2015). Nonetheless, the system in Equation (3) is not solvable because it depends on the unknown quantities $\bar{\tilde{ψ}}$ (t) (which act as an external input, that is, they are not part of the state vector). To overcome this issue, one can use moment closure techniques to approximate the unknown higher order moments $\bar{\tilde{ψ}}$ (t) by non-linear functions of the lower order moments, that is, $\bar{\tilde{ψ}} (t) ≅ \tilde{f} (\tilde{ψ} (t))$ . As a consequence, the right hand side of Equation (3) can be approximated with an expression that depends only on the state variables leading to the solvable closed system

\begin{matrix} \frac{d}{d t} \tilde{ν} (t) = \tilde{A} (θ) \tilde{ν} (t) + \tilde{B} (θ) \tilde{f} (\tilde{ν} (t)) . & (4) \end{matrix}

Note that in Equation (4) we used a different symbol for the state vector to stress the fact that $\tilde{ν}$ (t) are approximations of the true moments $\tilde{ψ}$ (t), since they are obtained as solution of the approximated dynamics.

Given that the marginal moments corresponding to Z of the joint process $\tilde{X}$ (t) are constant, we can write Equation (4) as

\begin{matrix} \frac{d}{d t} ν (t) = A (θ) ν (t) + C (θ) μ^{Z} + B (θ) f (ν (t), μ^{Z}), & (5) \end{matrix}

where ν(t) are the moments of $\tilde{X}$ (t) excluding the marginal moments of Z, μ^Z are the moments of Z up to order L and the matrices A(θ), B(θ), C(θ) are sub-matrices of $\tilde{A}$ (θ), $\tilde{B}$ (θ). This form of the equations is convenient because we can now regard the moments μ^Z as additional parameters and the system of moment equations for ν(t) as being parameterized by $γ = {θ, μ^{Z}}$ . Consequently, for a specified parameter vector γ, the system of Equation (5) can be solved numerically allowing one to compute any desired moment of X(t) up to order L.

2.4. Bayesian Inference with Population Data

In real applications, usually neither the rates θ nor the variability μ^Z between repetitions of the experiments are known. Hence, to obtain a model that is useful in practice, we need to estimate the parameters γ from measured data. This task can be posed as a Bayesian parameter inference problem, where any available knowledge about γ can be specified as an a priori parameter distribution p(γ). The result of the Bayesian inference procedure is a parameter posterior distribution p(γ | d) that reflects the updated belief about γ, given the observed realization d of the data D. According to Bayes' rule, this posterior distribution can be obtained as

p (γ | d) = \frac{p (d | γ) \cdot p (γ)}{p (d)},

where p(d | γ) is the likelihood of γ for the observed realization d, p(γ) is the prior distribution, and p(d) = ∫ p(d|γ) p(γ) dγ is the marginal likelihood of the data. Since computing the posterior distribution analytically is usually impossible, Monte Carlo schemes are typically used to draw samples γ₁, …, γ_M, from p(γ | d), allowing one to construct an empirical estimate of the posterior distribution. The iterative evaluation of the likelihood p(d | γ), needed in these schemes, can however be computationally very expensive or even impossible for complex high dimensional systems. For the data considered in this paper, for example, evaluating the parameter likelihood requires computing the distribution of the measured species at all the measurement time points. Specifically, the likelihood is given by

p (d | γ) = \prod_{s = 1}^{S} \prod_{i = 1}^{n} P [X_{j}^{i} (t_{s}) = x_{j}^{i} (t_{s}) | γ] = \prod_{s = 1}^{S} \prod_{i = 1}^{n} p_{j} (x_{j}^{i} (t_{s}), t_{s} | γ),

where p_j(·, t|γ) is the distribution of species X_j at time t given that γ are the model parameters, and xⁱ_j(t_s), i = 1, …, n, s = 1, …, S are the measured abundances of species X_j. The factorization of the joint distribution over time points and samples stems from the assumption that all the measurements are statistically independent. It is evident that evaluating this likelihood requires computing p_j(·, t|γ). This cannot be done (except in some special cases) without first computing the entire joint distribution of $\tilde{X}$ (t) at all the measurement time points, hence solving Equation (2). For these reasons, exact Bayesian inference is very difficult, if not impossible.

A naive idea to overcome these issues would be to approximate p_j(·, t|γ) by using the moments computed according to Equation (5) together with the assumption that p_j(·, t|γ) belongs to a certain family (e.g., that it is a Gaussian distribution). Such assumptions are, however, in general not satisfied and, as detailed in the following, they are not really necessary. By using as data the first L moments of the measured samples only, it is in fact possible to derive a different likelihood function that is correct for any distribution p_j(·, t|γ), in the limit of n → ∞. Specifically, set L = 2 and let $\hat{μ}$ ₁(t_s) and $\hat{μ}$ ₂(t_s) be sample mean and variance of the random samples $D (t_{s}) = {X_{j}^{1} (t_{s}), \dots, X_{j}^{n} (t_{s})}$ , which represent the measured species abundances at time t_s, s = 1, …, S. Furthermore, denote by μ₁(t) the mean and by μ_i(t), i = 2, … 4, the centered moments up to order four of p_j(·, t|γ). By the central limit theorem, the probability density function p_$\hat{μ}$(· | γ) of $\hat{μ} : = {[\hat{μ} {(t_{1})}^{⊤} \dots \hat{μ} {(t_{S})}^{⊤}]}^{⊤}$ , where $\hat{μ} (t_{s}) : = {[{\hat{μ}}_{1} (t_{s}) {\hat{μ}}_{2} (t_{s})]}^{⊤}$ is the vector of sample moments up to order 2 at time t_s, is for n large enough given by

Equation (6) allows one to evaluate the likelihood for the collection of sample moments $\hat{μ}$ up to order two, given the first four moments of p_j(·, t|γ). These moments can be obtained efficiently by numerically solving Equation (5). Consequently, using this approach it becomes feasible to draw samples from p(γ | $\hat{μ}$ ) using a Markov chain Monte Carlo scheme. The downside is that, by transitioning from the full data to the sample moments up to order L = 2, all the information about the parameters γ that might have been provided by higher order statistics of the data is discarded (Ruess and Lygeros, 2013). However, formulas for the likelihood of sample moments up to any desired order L can be obtained in exactly the same way, and thus, this approach is not limited to sample means and variances only. Evaluating the likelihood for sample moments up to order L requires computing moments of p_j(·, t|γ) up to order 2L, which becomes computationally expensive for large L. The choice of how many sample moments to include in the parameter inference is therefore a trade-off between computational cost and neglected information.

3. Case Study: Optimal Pest Control

3.1. The Model

As case study we consider the problem of modeling, and eventually controlling, the evolution of an agricultural pest. To this end, we consider an extension of the model of cotton aphids proposed in Matis et al. (2007) and Gillespie and Golightly (2010). Specifically, we introduce an additional immigration term to the original model and we include a recovery process of the habitat. In more detail, our model consists of a discrete state stochastic process N(t) that describes the size of the current pest population and a discrete state stochastic process C(t) that is used as an indicator of how much the environment has been deteriorated, up to time t, by the infestation. In the following, we assume that these two processes are updated according to the occurrence of the following stochastic events:

\begin{matrix} \begin{matrix} \emptyset & \overset{α}{\to} & N + C \\ N & \overset{λ Z}{\to} & 2 N + C \\ N + C & \overset{η}{\to} & C \\ C & \overset{r}{\to} & \emptyset . \end{matrix} & (7) \end{matrix}

Specifically, we suppose that new pest individuals arise in the system due to immigration, with rate α, or birth events, with a rate that is proportional to the current population size. To capture variability between different replicates, stemming for instance from different ambient conditions, we assume that the birth rate is given by λN(t)Z, where Z is a one-dimensional random variable distributed according to a log-normal distribution P_Z with mean one and unknown variance. The death rate of the pest is given by ηN(t)C(t), i.e., it depends on the current population size, but also on the damage to the environment.

Furthermore, we assume that the process describing the state of the environment, C(t), is increased by one unit whenever a new pest individual is added in the system (either via immigration or due to a birth event). Since pest individuals deteriorate the environment for a time period that may exceed their own life span, we assume that the death of pest individuals leaves C(t) unchanged. However, we model the fact that the environment may eventually recover by assuming that C(t) decreases with rate rC(t).

This model induces a conditional master equation, see Equation (2), with state x = [n c]^⊤ and parameters θ = {α, λ, η, r}, in which a_k(x, θ, z) and ν_k, k = 1, …, 4 are given by

\begin{array}{l} a_{1} (x, θ, z) = α, ν_{1} = [\begin{matrix} 1 \\ 1 \end{matrix}], \\ a_{2} (x, θ, z) = λ \cdot n \cdot z, ν_{2} = [\begin{matrix} 1 \\ 1 \end{matrix}], \\ a_{3} (x, θ, z) = η \cdot n \cdot c, ν_{3} = [\begin{matrix} - 1 \\ 0 \end{matrix}], \\ a_{4} (x, θ, z) = r \cdot c, ν_{4} = [\begin{matrix} 0 \\ - 1 \end{matrix}] . \end{array}

From this master equation we can derive moment equations and use moment closure to obtain a closed system in the form of Equation (5). In the following, we use equations for the moments up to order four and a fifth-order derivative matching closure, as described in Singh and Hespanha (2006). Solving the resulting approximate systems, for given parameter values θ and given first two moments of Z, enables us to approximately compute the moments of X(t) = [N(t)C(t)]^⊤ up to order four.

3.2. Inference Results

Since we assumed that Z has a log-normal distribution with mean one, it is sufficient to include only the variance of Z as an unknown parameter, so that γ = {θ, Var[Z]}. For the in silico case study, we assume that N(0) = C(0) = 0, that is initially no pests are present in the system, and that the true values of the parameters are given by

\begin{matrix} \begin{array}{l} α = 0.03, λ = 0.012, η = 0.25 \cdot 10^{- 4}, \\ r = 0.003 and Var [Z] = 0.05, \end{array} & (8) \end{matrix}

where we used hours as time units. These parameters produce pest outbreaks that are on the timescale of realistic profiles of aphids pest infestations. As hypothetical data set $D = {N^{1} (t_{s}), \dots, N^{n} (t_{s})}_{s = 1}^{S}$ , we consider a case study in which n = 100 different replicates of the system are measured once a week (t_s = 7 · 24 · s hours) for a total of S = 5 weeks. We note that the chosen measurements times are consistent with the dynamics generated by the parameters given in Equation (8). If, as in real scenarios, the parameters and hence the timescale of the systems dynamics are not known one may use a sequential experiment design approach to produce informative datasets. To generate the dataset considered in this in silico case study we used the stochastic simulation algorithm (SSA), described in Gillespie (1976), with randomly drawn values z from P_Z. The sample means and variances of the data are denoted by $\hat{μ} = {[\hat{μ} {(t_{1})}^{⊤} \dots \hat{μ} {(t_{5})}^{⊤}]}^{⊤}$ where $\hat{μ} (t_{s}) : = {[{\hat{μ}}_{1} (t_{s}) {\hat{μ}}_{2} (t_{s})]}^{⊤},$ . In principle, this gives us all the ingredients to evaluate the likelihood using Equation (6). However, the fact that we approximate the moments μ_i, i = 1, …, 4 using moment closure means that we only have an approximation of the true covariance matrices Σ(t_s), s = 1, …, 5. Since these approximations are not guaranteed to be positive semi-definite, a further step may be required in which the approximated symmetric matrices are projected onto the cone of positive semi-definite matrices. Another possibility, which we follow here, is to construct empirical estimates $\hat{Σ}$ (t_s) of Σ(t_s), s = 1, …, 5, from the measured data and use these in Equation (6). This procedure is reasonable whenever sufficient data is available to estimate moments up to fourth order to acceptable precision.

We assume that no prior information about γ is available and accordingly choose flat prior distributions for all the parameters. To draw samples from the posterior distribution p(γ | $\hat{μ}$ ) we used a Metropolis-Hastings Markov chain Monte Carlo algorithm with randomly chosen initial parameter guesses, log-normal proposal distributions and a chain length of 10000 (for more information on this algorithm see, for instance, Zechner et al., 2012). The first 3000 iterations of the chain were discarded as a burn-in period and an empirical estimate of the posterior distribution was obtained from the remaining 7000 iterations of the chain. The results are shown in Figure 1. It can be seen that the posterior distributions are relatively tight with mode close to the true parameters. The small deviations between posterior mode and true parameter value visible in some of the panels stem from a combination of approximation error due to moment closure and errors coming from the fact that the moments used as inference data were estimated from a finite (n = 100) number of replicates and are hence affected by the noise described in Equation (6). The obtained maximum a posteriori estimates $\hat{γ}$ _MAP are

\begin{matrix} \begin{array}{l} \hat{α} = 0.0307, \hat{λ} = 0.0115, \hat{η} = 0.247 \cdot 10^{- 4}, \\ \hat{r} = 0.0026 and \hat{V} ar [Z] = 0.0501. \end{array} & (9) \end{matrix}

FIGURE 1

Figure 1. Parameter posterior distribution. The panels show different marginals of the posterior distribution computed using the Bayesian inference MCMC approach described in Section 3.2. For each panel, the x-axis has been rescaled to show ratios of inferred to true parameter value. The red line highlights the 1/1 ratio, which corresponds to perfectly inferred parameters. The maximum a posteriori estimates $\hat{γ}$ _MAP are those maximizing the posterior distribution. The parameters are λ = birth, η = death, α = immigration, r = recovery rate.

In Figure 2, the mean and variance, computed from the model using Equation (5) and γ = $\hat{γ}$ _MAP, are compared to the sample means and variances $\hat{μ}$ (t_s) of the considered data set.

FIGURE 2

Figure 2. Inference data. Comparison of the sample means (A) and variances (B) of the pest population N(t) used to infer the parameters (dots) and the predictions given by the moment Equation (5) using the maximum a posteriori estimates $\hat{γ}$ _MAP given in Equation (9).

To assess the advantages of the proposed stochastic approach with respect to more standard methods based on measurements of the average species density only, we repeated the previously described inference process using only the means { $\hat{μ}$ ₁(t_s)}^S_{s = 1} as data. By comparing the parameter posterior distribution obtained in this case (Figure 3) with the one obtained using as inference data also the variance (Figure 1) one can immediately see that higher order statistics may contain valuable information regarding the parameters. As a consequence, the proposed approach could help solving identifiability problems of standard inference approaches based on deterministic models.

FIGURE 3

Figure 3. Parameter posterior distribution. The panels show different marginals of the posterior distribution computed using the Bayesian inference MCMC approach if only the means of the dataset D are used. For each panel, the x-axis has been rescaled to show ratios of inferred to true parameter value. The red line highlights the 1/1 ratio, which corresponds to perfectly inferred parameters. The parameters are λ = birth, η = death, α = immigration, r = recovery rate.

The previous results were obtained assuming a dataset that contains n = 100 replicates of the system for each measurement time. As rigorously encoded in the mathematical description of the noise given in Equation (6), the variance of the estimates is inversely proportional to the number of samples n. Consequently, if the number of replicates is very low, the variance of the data used in the inference may be large resulting in inconclusive (i.e., very spread) parameter posterior distributions. This is due to the fact that many different parameters give rise to model predictions that are consistent with the high level of noise given by Equation (6). To test the performance of the proposed approach for different levels of noise, we performed a case study in which we estimated the real parameter values given in Equation (8) using mean and variances estimated from different numbers of replicates. In particular, we considered four different scenarios with n = 10, n = 25, n = 50 and n = 100 replicates and simulated 10 different datasets for each scenario to reduce the influence of the particular realization of the data in the results. We then performed the parameter inference for all scenarios and all datasets (i.e., 40 times) and computed the relative error (e.g., 100 · $\frac{| \hat{λ} - λ |}{λ}$ ) of the MAP estimates with respect to the real ones. The results are reported in Table 1. It can be seen that the precision of the estimates becomes larger as the number of replicates increases. In particular, n = 10 replicates lead to very imprecise results, whereas the estimates obtained from both n = 50 and n = 100 replicates attain reasonable precision. The scenario with n = 25 replicates provides an intermediate case and may or may not be sufficiently accurate depending on what errors are tolerable for the application.

TABLE 1

Table 1. Average error of the MAP estimates.

3.3. Optimal Control

The identified model can be used to derive optimal control strategies for pest control. Specifically, we suppose in the following that we can influence the system in Equation (7) by means of two different control strategies: pesticides and release of sterile pest individuals. While pesticides are probably the most used strategy for pest control, they present some disadvantages, as for example progressive reduction of efficiency, negative impact on beneficial insect populations (as pest natural enemies) or chemical residues in crops and in the ecosystem (Rafikov and Balthazar, 2005). For these reasons, the use of complementary or alternative biological control approaches has been suggested (Bhattacharyya and Bhattacharya, 2006; Greenman and Norman, 2007; Vreysen et al., 2007). The release of sterile insects, in particular, is a biological control method that aims at reducing the pest population size by introducing in the ecosystem sterile insects, usually male, that compete with the wild type for reproduction (Dyck et al., 2005). The desired effect is thus achieved as a result of the fact that females mating with sterile males will have no offspring. In other words sterile releases reduce the number of pest individuals available for reproduction and hence the birth rate. This approach has been successfully employed, for example, to eradicate screwworm flies, melon flies, the codling moth and pink bollworm among others, see Barclay and Li (1991) and references therein. Pesticide and sterile release are very different strategies also from an economical perspective: while the cost of pesticide is proportional to the area that has to be treated, the cost of steriles depends on the amount released. We notice that, in order to prevent a given fraction of the population from reproduction, the amount of released steriles should be roughly proportional to the number of pest individuals present in the system. Finally, the two approaches differ in the effect that they have on the ecosystem. In the following, we model the effect of pesticide as an instantaneous state reset of the pest population to N(t⁺) = (1 − u_p(t))N(t⁻), where u_p(t) ∞ [0, 1] is the percentage of the field treated with pesticide at time t. Note that since X(t) is a stochastic process, this reset influences all the moments and cross-moments of X(t) involving the random process N(t). For example, for the ith moment of N(t), that is μ^N_i(t): = 𝔼 [Nⁱ(t)], we get μ^N_i(t⁺) = (1 − u_p(t))ⁱ μ^N_i(t⁻) in Equation (5). To model the effect of the release of sterile individuals we need to include them as a new species S(t). In particular, we model the interaction between healthy and sterile individuals by

\begin{matrix} \begin{matrix} N + S & \overset{κ_{1}}{\to} & B \\ B & \overset{κ_{2}}{\to} & N + S \\ B + C & \overset{η}{\to} & C \\ S + C & \overset{η}{\to} & C . \end{matrix} & (10) \end{matrix}

Note that we assume that the steriles have the same death rate as the healthy individuals, but they cannot reproduce. The interaction between the two species is modeled by assuming that each sterile individual can prevent one healthy individual from reproducing for a random time period during which both the individuals can die. Accordingly, the introduction of steriles effectively reduces the birth rate of the population. This is captured by B(t), which quantifies how many of the healthy individuals cannot reproduce at a certain time. If we denote by u_s(t) ∞ [0, 1] the percentage of steriles introduced at time t, with respect to an assumed maximal number of steriles S that can be introduced, we can again model this control action as a state reset of the extended model. Specifically, let Q(t): = u_s(t)S be the deterministic amount of steriles added at time t, then S(t⁺) = S(t⁻) + Q(t) and C(t⁺) = C(t⁻) + Q(t). Note that C(t) is updated as well since we assume that sterile individuals are also damaging the field. Again, the state reset action leads to an update of the moment equations: for example, for the moments involving only S(t), we get $μ_{i}^{S} (t^{+}) = E [S {(t^{+})}^{i}] = E [{(S (t^{-}) + Q)}^{i}] = \sum_{h = 0}^{i} (\begin{matrix} i \\ h \end{matrix}) \cdot μ_{h}^{S} (t^{-}) \cdot Q^{i - h}$ . Overall, the effect of the two control actions is to reset the state of the extended stochastic system, leading to a hybrid model.

One of the main problems in pest management is to determine what combinations of the available treatments are most effective for maintaining the infesting population below a given economic threshold, as described in Barclay and Li (1991), (or eventually eradicating the invasion) while minimizing the economic cost. Specifically, consider a given time horizon yes = [0, T] and suppose that control actions can be taken at discrete time intervals t_h = hΔ_{T_ac} where h = 1, …, H with H: = ⎿T/Δ_{T_ac}⏌. The optimal pest management problem can be stated as the following optimization problem

\begin{matrix} \begin{array}{l} \min_{\begin{matrix} u_{p} (t_{h}), u_{s} (t_{h}) \\ h = 1, \dots, H \end{matrix}} \sum_{h = 1}^{H} [ρ^{P} u_{p} (t_{h}) A + ρ^{S} u_{s} (t_{h}) \bar{S}] + ρ^{C} μ_{1}^{C} (T) \\ s .t . P [N (t) > ξ] \leq δ, \forall t \in [0, T], \end{array} & (11) \end{matrix}

where ρ^P > 0 models the cost of pesticide per area (possibly including a disincentive to penalize the use of pesticide with respect to biological control), A is the total area of the habitat, ρ^S > 0 is the cost per sterile and ρ^C > 0 is a factor that translates the expected value of the process C at the end of the period into an economic cost due to damage to the field and hence decrement in productivity. The parameter ξ represents the economic threshold below which the pest should be contained. Note that since the considered model is stochastic it is not possible to guarantee that, for a given control strategy, all the realizations of the process will be below the economic threshold. We can however impose that the constraint should be satisfied with a given probability 1 − δ, that is, the constraint should be satisfied in 100(1 − δ)% of the realizations. In the following we assume that the average population starts below the economic threshold ξ. If this was not the case, for example due to on-going infestations, one could substitute the constraint in Equation (11) with a time-varying decreasing threshold ξ(t), which is higher at the beginning (to guarantee the feasibility of the optimization problem) and eventually reaches the desired threshold ξ.

The problem in Equation (11) is a chance constrained optimal control problem and is in general very difficult to solve (Prékopa, 1995). Some possible approaches are based on sampling techniques (Vapnik and Chervonenkis, 1971; Tempo et al., 2012; Grammatico et al., in press) or convex relaxations (see Nemirovski and Shapiro, 2006 and references therein). Here, we decided to solve a simplified version of the problem in Equation (11) by assuming that the distribution of the pest N(t) is approximately Gaussian. If this is the case, we can rewrite the constraint $P [N (t) > ξ] \leq δ$ in terms of mean and variance of the stochastic process N(t) as follows

\begin{matrix} \begin{array}{l} \min_{\begin{matrix} u_{p} (t_{h}), u_{s} (t_{h}) \\ h = 1, \dots, H \end{matrix}} \sum_{h = 1}^{H} [ρ^{P} u_{p} (t_{h}) A + ρ^{S} u_{s} (t_{h}) \bar{S}] + ρ^{C} μ_{1}^{C} (T) \\ s .t . μ_{1}^{N} (t) + Φ^{- 1} (1 - δ) \cdot σ^{N} (t) \leq ξ, \forall t \in [0, T], \end{array} & (12) \end{matrix}

where Φ(·) is the cumulative distribution function of a normalized Gaussian random variable⁴ and $σ^{N} (t) : = \sqrt{μ_{2}^{N} (t) - μ_{1}^{N} {(t)}^{2}}$ is the standard deviation of the process N(t) (Boyd and Vandenberghe, 2004, p. 157; Nemirovski and Shapiro, 2006). Note that the problem in Equation (12) depends on the moments of the stochastic processes N(t) and C(t) only. Hence it can be solved using the approximated moment equations derived in Equation (5). Given the fact that we modeled the control actions as state resets, the controlled system in Equation (5) can be thought of as a deterministic continuous-time system with discrete-time controlled jumps, leading to a hybrid optimal control problem (Branicky et al., 1998; Bensoussan and Menaldi, 2000; Shaikh and Caines, 2007).

We assume in the following that ρ^C = ρ^P = 1, ρ^S = 4, S = 200, and A = 100. Figure 4 reports the result of the optimization problem, solved using the function fmincon of Matlab, for δ = 0.1, a horizon of 6 weeks, Δ_{T_ac} = 1 week and deterministic initial state N(0) = C(0) = 10. The lowest possible economic threshold that guarantees feasibility of the problem in Equation (12), given the chosen initial condition and the fact that the first intervention is after 1 week, is 130. For the results of Figure 4, we fixed ξ = 150. To find the global minimum of the non-convex problem in Equation (12) we restarted the optimization from 10 different random initial control vectors and then selected the strategy with minimum cost. For solving the moment equations in Equation (12), we used the identified parameters $\hat{γ}$ _MAP given in Equation (9) and we set the unknown parameters κ₁ = κ₂ = 0.01. The resulting optimal control laws, u^⋆_p, u^⋆_s, consist of applying pesticide to almost all the crop field at the first possible control time t₁ = 168 hrs and subsequently controlling the population with consecutive sterile releases. This result is consistent with the analysis reported in Barclay and Li (1991), where it was shown that releasing steriles is economically preferable when the infesting population is low, while if the population is high, pesticide has to be preferred.

FIGURE 4

Figure 4. Optimal control strategy. The two panels at the top visualize the optimal control strategy u^⋆_p, u^⋆_s, solution of Equation (12). For all the other panels, the solid blue line represents the time evolution of the mean of the corresponding process, [respectively, N(t), S(t), B(t), and C(t)], according to the optimal control laws. To obtain these results we solved the approximated moment Equations (5) using the identified parameters $\hat{γ}$ _MAP, given in Equation (9), together with κ₁ = κ₂ = 0.01. In the plot of the healthy population N(t), the dashed line represents the quantity μ^N₁(t) + Φ⁻¹(0.9) · σ^N(t). The fact that this line is below the economic threshold ξ = 150 (red line) guarantees that the constraint of the problem in Equation (12) is satisfied. Consequently the stochastic realizations of the pest population are contained below the economic threshold, that is N(t) < ξ, with probability 100(1-δ) = 90%.

In order to test the performance of the derived control law, we simulated the behavior of the model in Equations (7) and (10), according to the real parameters given in Equation (8), using the stochastic simulation algorithm. Specifically, we performed 500 simulations and reported in Figure 5 the median (blue line) and the probability distribution of N(t) and C(t). Figure 5A illustrates the behavior of the system if no control action is taken. We see that in this case the pest population exceeds the economical threshold ξ = 150. In Figure 5B, on the other hand, we see that the computed optimal control laws u^⋆_p, u^⋆_s successfully regulate all the possible realizations, maintaining the population well-under the given threshold. This result is obtained by applying pesticide at time t₁ = 168 h and then using only sterile release. To show that a single application of pesticide at the beginning of the horizon would not suffice to control the population, we show in Figure 5C the behavior of the system if only the optimal control law u^⋆_p is applied. From the result it appears that the use of steriles is fundamental to complement the effect obtained by applying u^⋆_p.

FIGURE 5

Figure 5. Control performances. For each column, the performance of the control strategy shown at the top is illustrated. Specifically, the middle and bottom panels visualize the probability distribution of N(t) and C(t), obtained from 500 stochastic simulations, using the real parameters given in Equation (8) and κ₁ = κ₂ = 0.01. The blue line denotes the median of the distribution (i.e., 50% of the realizations are below this line), the nested colored regions represent the cumulative distribution of N(t) and C(t), with steps of 10% (i.e., 10% of the realizations are inside the dark green region and 90% are inside the yellow one). The red line represent the economical threshold ξ = 150. (A) refers to no control action, (B) to the optimal strategy u^⋆_p, u^⋆_s, and (C) to the use of u^⋆_p, only.

4. Discussion

One of the major future challenges for ecologists is to find strategies for organizing human interactions with the environment in a long-term sustainable way. We believe that dynamical models, inferred from measured data, together with optimal control theory have the potential to be of substantial help in achieving this task. While it is usually straightforward to incorporate the effect of human actions in a model and to formulate related optimal control problems, solving these problems may be a challenge. This has however not prevented the advancement of control theory across all engineering disciplines. So why is it that optimal control has not found more applications in ecology? A reason for this is the intrinsic complexity of ecological systems. Contrary to engineering disciplines, where the systems usually have known structure and parameters, ecological systems have been shaped by evolution in ways that we do not yet fully understand. Therefore, before we can attempt to control an ecological system with the help of a mathematical model, we first need to identify an appropriate model and its parameters from measured field data. This task is complicated by the fact that ecological systems are inherently driven by random interactions between individuals that take place in spatially structured habitats and may be influenced by different environmental conditions. To be applicable in practice, a model should take into account all these factors.

In this paper, we took a first step in this direction by proposing an approach for dealing with stochasticity and varying environmental conditions, neglecting the spatial aspect of the problem. Our approach requires data from many different replicates of the system. In the current paper we used a simulated dataset; as future work it is important to test this method on real data. These may not be straightforward to obtain in some applications, as for the pest control application considered here. However, one could envision grouping a habitat into many small and clearly separated patches. Each patch could then provide a replicate of the system. Another possibility would be to “zoom out” and regard the model not as a model for one specific habitat (i.e., one specific crop), but as a model for all habitats of this kind (i.e., all cotton crops), for instance in an entire country. Based on the identified model, we formulated an optimal control problem in which two different control strategies can be used to reduce the pest population, both resulting in a state reset. This lead to a hybrid system in which the control operator can reset the state at certain given intervention times, while the dynamics of the system between the reset times are given by a continuous-time stochastic process.

All of the results of this paper rely on the possibility to obtain good approximations of the moments of the species abundances, that is, on the existence of an adequate moment closure method for the studied system. In some applications, it can happen that all the available methods do not perform adequately. To address this issue, further work is required on the development of new moment closure methods. From a hybrid systems perspective, an appealing approach is to group the interacting species into highly and lowly abundant species and to model them using continuous deterministic and discrete stochastic dynamics, respectively. Methods to analyze such hybrid models have been developed recently (Jahnke, 2011), but their use for parameter inference or optimal control has so far not been documented.

Author Contributions

FP and JR designed and performed the research. All authors wrote the paper.

Funding

The research leading to these results has received funding from the People Programme (Marie Curie Actions) of the European Union's Seventh Framework Programme (FP7/2007-2013) under REA grant agreement No. [291734] and from SystemsX under the project SignalX.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

The authors would like to acknowledge contributions from Baptiste Mottet who performed preliminary analysis regarding parameter inference for the considered case study in a student project (Mottet, 2014/2015).

Footnotes

1. ^Extension to multiple measured species is straightforward.

2. ^The assumption that n is the same for each time point is only for notational convenience and is by no means necessary.

3. ^For ease of notation we omit the dependence of this probability on the initial condition.

4. ^If no information is known about the distribution, Chebyshev's inequality can be used to enforce the constraint P[ N(t) > ξ] ≤ δ, leading to the more restrictive bound $μ_{1}^{N} (t) + \sqrt{\frac{1}{δ}} \cdot σ^{N} (t) \leq ξ$ .

References

Altermatt, F., Fronhofer, E. A., Garnier, A., Giometto, A., Hammes, F., Klecka, J., et al. (2014). Big answers from small worlds: a user's guide for protist microcosms as a model system in ecology and evolution. Methods Ecol. Evol. 6, 218–231. doi: 10.1111/2041-210X.12312