Probabilistic enhancement of the Failure Forecast Method using a stochastic differential equation and application to volcanic eruption forecasts

We introduce a doubly stochastic method for performing material failure theory based forecasts of volcanic eruptions. The method enhances the well known Failure Forecast Method equation, introducing a new formulation similar to the Hull-White model in financial mathematics. In particular, we incorporate a stochastic noise term in the original equation, and systematically characterize the uncertainty. The model is a stochastic differential equation with mean reverting paths, where the traditional ordinary differential equation defines the mean solution. Our implementation allows the model to make excursions from the classical solutions, by including uncertainty in the estimation. The doubly stochastic formulation is particularly powerful, in that it provides a complete posterior probability distribution, allowing users to determine a worst case scenario with a specified level of confidence. We apply the new method on historical datasets of precursory signals, across a wide range of possible values of convexity in the solutions and amounts of scattering in the observations. The results show the increased forecasting skill of the doubly stochastic formulation of the equations if compared to statistical regression.


Introduction
The Failure Forecast Method (FFM) for volcanic eruptions is a classical tool in the interpretation of monitoring data as potential precursors, providing quantitative predictions of the eruption onset. The basis of FFM is a fundamental law for failing materials: where X is the rate of the precursor signal, and α, A are model parameters. The solution X is a power law of exponent 1/(1 − α) diverging at time t f , called failure time. The model represents the potential cascading of precursory signals leading to the final rupture of materials, with t f a good approximation to the eruption onset time te.
Seismic data are the type of signals most extensively studied with the FFM method in volcanology. Volcanic tremor has been related to the multi-scale rock cracking (Kilburn and Voight, 1998;Ortiz et al., 2003;Kilburn, 2003;Smith et al., 2009) and volcano-tectonic earthquakes can be forecasted applying the FFM on its characteristics (Tárraga et al., 2006). Rheological experiments on lava domes revealed that also the magma seismicity is consistent with the FFM theory (Lavallée et al., 2008). In general, retrospective analysis of pre-eruptive seismic data produced good results in several case studies (e.g. Smith and Kilburn (2010); Budi-Santoso et al. (2013); Chardot et al. (2013)). Finally, the FFM has been successfully tested on Synthetic Aperture Radar acquisitions, opening the path to new forecasting applications based on satellite data (Moretto et al., 2016).
The reliability of FFM forecasts is known to be affected by several factors. When applied to seismic data, the performance of the method is usually higher on eruptions preceded by a single phase of seismic acceleration (Boué et al., 2015). The preliminary separation of signals originating from different sources can improve the results (Salvage and Neuberg, 2016;Salvage et al., 2017). Technically, nonlinear (power law) regression or non-Gaussian maximum likelihood methods can also enhance the accuracy of the forecasts, compared to linear models (Bell et al., 2011(Bell et al., , 2013. In general, the forecasting accuracy of FFM has been related to the heterogeneity in the breaking material (Vasseur et al., 2015). However, sometimes the method fails to predict the time of material failure, and some volcanic systems are often characterized by prolonged unrest and ambiguous signals (Chiodini et al., 2016). A full probability assessment, including uncertainty quantification (UQ), is required, and it is the purpose of our investigations.
In this study, we generalize the classical FFM approach by incorporating a stochastic noise in the original ordinary differential equation (ODE), converting it into a stochastic differential equation (SDE), and systematically characterizing the uncertainty. Embedding noise in the model can enable the FFM equation to have greater forecasting skill by focusing on averages and moments. Sudden changes in the power law properties are made possible. In our model, the prediction is thus perturbed inside a range that can be tuned, producing probabilistic forecasts.
In more detail, in the original equation the change of variables η = X 1−α implies: i.e. a straight line which hits zero at t f . The most efficient graphical and computational methods indeed rely on the regression analysis of inverse rate plots. We re-define η with: also called Hull-White model in financial mathematics (Hull and White, 1990). The parameter σ defines the strength of the noise, and γ the rapidity of the mean-reversion property. We validate the new method on historical datasets of precursory signals already studied with the classical FFM, including line-length and fault movement at Mt. St. Helens, 1981-82, seismic signals registered from Bezymyanny, 1960, and surface movement of Mt. Toc, 1963(Voight, 1988a. A fundamental aspect of our formulation is the possibility of a doubly stochastic UQ. Doubly stochastic models describe the effect of uncertainty in the formulation of aleatory processes, and have been successfully applied in volcanology (Sparks and Aspinall, 2004;Marzocchi and Bebbington, 2012;Bevilacqua, 2016). Thus, doubly stochastic probability density functions (pdf) and estimates are themselves affected by uncertainty. This approach has been applied in spatial problems concerning eruptive vent/fissure mapping (Selva et al., 2012;Bevilacqua et al., 2015;Tadini et al., 2017b,a;Bevilacqua et al., 2017a), long-term temporal problems based on past eruption record (Bebbington, 2013;Bevilacqua et al., 2016;Richardson et al., 2017;Bevilacqua et al., 2018), and hazard assessments Bevilacqua et al., 2017b). In this study, we use a doubly stochastic model to develop a short-term eruption forecasting method based on precursory signals.
The first part of this article defines the mathematical model adopted. In section 2 we present the equations in FFM method, in section 3 we define their enhancement with a mean-reverting SDE, and section 4 details the properties of the mean reversion. The second part of the article tests the model on historical datasets. In section 5 we define the fitting algorithm and compare retrospective analysis based on three different formulations of FFM. Section 6 tests the model on forecasting problems, and section 7 discusses the performance of the methods, showing the increased forecasting skill of the doubly stochastic formulation.
If α > 1, we see: dX dt and the FFM equation becomes: Simplifying again the notation, we can call η = X 1−α , and the FFM reads: We can solve this equation by immediate integration, and equivalently: The original method required fitting the two parameters α and A on the monitoring data, and then to estimate the time of failure t f , such that X(t f ) = +∞, or equivalently η(t f ) = 0. It follows: and so: We note that an estimate of η(t) is thus necessary to make forecasts, a non-trivial process if noise is assumed to be present. The effect of varying parameters α and A in the equation 3 is displayed in Figure  1a,b. Our purpose is to forecast the failure time t f , and hence it is more practical to examine the plot of X −1 = η 1 α−1 , shown in Fig.1b. The parameter α defines the convexity of that function -for α ≤ 2 it is convex, for α ≥ 2 it is concave. The value α = 2 produces a straight line. We call α the convexity parameter. In equation 2 the parameter A changes the time scale of the equation, and so defines the constant slope of η, that is −A. Hence we call A the slope parameter.

The Failure Forecast Method SDE
We assume that the equation is not exactly satisfied, but there is a transient difference, which however decreases exponentially through time. The equation becomes: where β is the value at t = 0 and γ is the rate of decay of this error term. This allows a reformulation of the equation. Given that: We can take the derivative, and obtain: and so In addition, we want to allow for an additive noise affecting the new equation, and the final formulation is: or equivalently: for each t < t f . This is also called a Hull-White model in financial mathematics (Hull and White, 1990).
The effect of varying parameters σ and γ on the SDE solution X is displayed in Figure 1c-f. In equation 5, σ defines the time scale of the additive noise, and so we call σ the noise parameter. We remark that X is nonlinearly affected by this random noise in equation 6. The SDE defining η is elevated to the exponent 1 1−α , and even a relatively small noise can significantly change the failure time (see Fig.1c,e). Parameter γ defines the time scale of the exponential decay of perturbations with respect to the mean solution. It controls the equation, reverting the paths of the solutions towards the mean curve (see Fig.1d,f). We call γ the mean-reversion parameter.
The new formulation allows the SDE solution to make random excursions from the classical ODE solution. Figure 2 displays three different solutions of X −1 , assuming convexity parameter α = 1.7, 2, or 2.3. The slope parameter is fixed A = 0.1. Plots 2a,c,e show an example of solutions assuming mean-reversion parameter γ = 0, or γ = 0.25. The noise is additive in 2a, and weakly nonlinear in 2c,e. We note that σ and γ define the noise affecting η, the same (α, γ) can produce significantly different noise effects on X −1 depending on the exponent 1 1−α . A very important consequence of our stochastic formulation is that the time of failure becomes a random variable: for almost every ω ∈ Ω, where (Ft)t>t 0 is the filtration generated by the noise, and P is a probability measure over it (Karatzas and Shreve, 1991). Plots 2b,d,f display pdf of t f calculated by Monte Carlo simulation (2,000 samples). The pdf becomes more peaked and symmetric when γ > 0.

The mean-reversion properties
Letη be the ODE solution with data η(t0) at time t0. If σ > 0 and γ = 0, the law of Brownian Motion and the linearity of the ODE imply that: If γ > 0 then |η(t) −η(t)| is reduced to zero exponentially. If σ = 0 and the equation starts with δ(t0) := |η(t0) −η(t0)| > 0 we have:  shows this example, and 3γ −1 provides the time interval required to have δ(t) δ(t0)/20. If both σ > 0 and γ > 0, the combined effect of the noise and the mean-reversion defines the Ornstein-Uhlenbeck process (from equation 5, with A = 0 and ηt 0 = 0): whose solution is: when γ|t f − t0| 1. The constant K := σ 2 2γ uniquely defines the probability distribution of the solution of this SDE. Different realizations of this process are displayed in in Figure 3b. If σ 2 increases and γ decreases, then the perturbations are more frequent, but reverted faster. This may have some effect on the estimate of t f , but discrete data cannot provide any information on perturbations occurring at frequency higher than the measurements. In most of our examples we define γ −1 = 15 days. That is, any perturbation decays by 63% within 15 days, and by 95% within 45 days, which is close to the total length of the time interval considered. Sensitivity analysis on this parameter is performed in Appendix A.

Parameter fitting and UQ
The application of our method requires the estimation of five parameters: • curvature parameter α, • slope parameter A, • noise parameter σ, • mean-reversion parameter γ, • an unperturbed initial valueη(t0).
We assume all these parameters to be positive, and α > 1. In particular, the case α = 1 is trivial, and the cases α < 1 or A ≤ 0 imply t f = +∞. We note thatη(t0) cannot be defined equal to the first observation, because of the perturbations. Several methods have been adopted in the determination of the parameters in the ODE problem (Voight, 1988a;Cornelius and Voight, 1995). The Log-rate versus Log-acceleration Technique (LLT), and the Hindsight Technique (HT) can both provide estimates of α. They are described in Appendix B. The first technique is generally less accurate because it needs an estimate of the time derivative of the observations. The second technique requires that we know te and hence can only be used in retrospective analysis. We take advantage of these classical methods to estimate α in our examples, approximating the curvature of the mean solution from the available observations. We remark that the time derivatives are always based on the discrete data in Voight (1988a), and not affected by the roughness of the paths of the new SDE formulation.
If α is given, then a linearized least square method can be used to fit parameter A andη(t0) on the inverse plot 1/X. This is the main method classically adopted as a forecasting technique in the ODE problem. In particular, we apply a linear regressive model to eq. 2: producing estimates of (1 − α)A andη(t0).
Finally, we fit the noise parameter σ on the residuals of this linearized problem, by imposing the constant K = σ 2 2γ to be equal to their variance and assuming γ −1 = 15 days, as explained in section 4. In summary, we plug-in α from classical LLT or HT, then we obtain (A,η(t0), K), and thus σ once γ is given. The numerical solution of the SDE is performed by the Euler-Maruyama method, which is equivalent to the Milstein method in our case.
In the following we apply three different forecast methods on the datasets in Voight (1988a), and we test t f as an estimator of te. In all our methods t f is assumed as a random variable, and its pdf is estimated following a classical Gaussian kde method. Parameter fitting is based on Monte Carlo simulations of different numbers of samples depending on the method. This number has been tuned to obtain a robust estimate of gt f , that is not sensitive to including additional samples.
• Method 1 is based on the classical ODE, and the corresponding forecasts are displayed in Figure 4. This is consistent with the original formulation in Voight (1988a). gt f depends on the uncertainty affecting the pair (A,η(t0)) in the regression method. We implement this model uncertainty as a bivariate Gaussian in a Monte Carlo simulation of 5,000 samples.
Methods 2 and 3 are both based on the new SDE.
• In Method 2, the least-square curve is assumed to be the mean solution, and gt f is defined by the noise. The forecasts are displayed in Figure 5. We implement this aleatory uncertainty in a Monte Carlo simulation of 5,000 sample paths of the stochastic noise.
• Method 3 is doubly stochastic (e.g. Bevilacqua (2016)). The mean solution is affected first by the uncertainty in the regression method, and then perturbed by the stochastic noise defined above.
The values of gt f are thus reported as 5 th percentile, mean, and 95 th percentile curves. We remark that the two uncertainties are not independent, because the properties of the noise are related to the residuals in the linearized problem. The forecasts are displayed in Figure 6. In this case, the mean pdf is based on a Monte Carlo simulation of 10,000 samples. However, the percentile values are based on a hierarchical Monte Carlo simulation of 60,000 samples, that is the product of 200 parameter samples and 300 paths of the SDE solution. In general, the mean path is consistent in the three methods, but UQ is significantly different, as well as the values of gt f . In particular: (a) Mt. St. Helens, 1982 -line length data values are initially scattered, until t = te −20, and then become more aligned. α ≈ 2, and E[t f ] overestimates te of 1-3 days in all the methods. Uncertainty range is two-times larger in Method 2 and 3 compared to Method 1. Figure 4: Estimators of t f based on Method 1. Blue lines assume α as from LLT, red as from HT. The bold line is g t f . Thin dashed lines bound the 90% confidence interval of the ODE paths, and a thin continuous line is the mean path. A dashed black line marks t e . Method 1 generally provides a good estimator of t e , but often only the HT method allows these robust estimates.  Method 2 reduces the overestimation issues of Method 1, but model uncertainty is neglected.
In summary, when α ≈ 2 Method 1 generally provides a good estimator of te, as well as Methods 2 and 3. They generally have larger uncertainty ranges. Sometimes, when α ≤ 1.6, Method 1 tends to overestimate te. Method 2 reduces this issue, but model uncertainty is neglected and the estimate still misses te. Method 3 enhances Method 2, and its doubly stochastic nature allows the production of either mean probability values or more conservative 95 th percentile values, with significantly high probability of eruption at time te, even when the mean estimate fails the forecast.

Examples of probability forecasts
The estimators defined in the previous section are informed by the entire sequence of data, up to the eruption onset or landslide initiation te. This provides useful insight on the validity of the model, but it is not a forecast (Boué et al., 2015). Indeed in any forecasting problem the sequence of data is available up to a time t1 < te, that represents the current time of the forecast. All the data collected after time t1 cannot be considered. In the following figures we display forecasts of te based on the ODE method, and obtained from the data collected in a limited time window T = [t2, t1]. We focus on the two examples of Mt. St. Helens, 1982 -line length (α ≈ 1.98), and Bezymyanny, 1960 -seismic strain (α ≈ 1.65). Figure 7 adopts Method 1, Figure 8 Method 2 and Figure 9 Method 3. Method 1 and mean pdf in Method 3 both implement a Monte Carlo simulation of 20,000 samples, Method 2 a Monte Carlo simulation of 5,000 samples. The percentile values in Method 3 are based on a hierarchical Monte Carlo simulation of 150,000 samples, that is the product of 300 parameter samples and 500 paths of the SDE solution.
If compared with the estimators in section 5, forecasting results can be significantly uncertain, because they are inherently extrapolations based on fewer data. In Methods 1 and 3, sometimes P {t f = ∞} > 0 and there is a non-negligible chance that the solution path never hits the real axis. In contrast, if P {t f < t1} > 0 there is a chance thatη(t1) < 0 and the equation is not well defined. The probability of both these events is quantified.
In our examples we consider three time windows progressively moving towards te, including new observations and neglecting the old data. In particular, (a,d) rely on the initial scattered data, (b,e) include both scattered and aligned observations, (c,f) generally forget about scattered data. We remark that α remains fixed for simplicity.
In general, uncertainty is always reduced while getting closer to te. In particular: In summary, forecasting results of Method 1 and Method 3 are similar, but the more complex UQ related to Method 3 improves its performance. In particular, when the forecast is not well constrained, Method 3 generally reduces the uncertainty range of the estimates if compared to Method 1. Method 2 tends to give a correct forecast only when the eruption is close. The doubly stochastic formulation of Method 3 appears to have an impact, and the 95 th percentile of the eruption probability is significant at time te.

Discussion
We described three different methods for estimating t f , the ODE-based Method 1, the new SDE-based Method 2, and its doubly stochastic formulation Method 3. We tested the methods in four case studies, and in two of them we also performed forecasts on moving time windows. Figure 10 summarizes the likelihood gt f (te), reported as a probability percentage. Plot (a) compares Method 1 (black bars) and Method 2 (colored bars). Method 1 always outperforms Method 2 when α is based on the more accurate HT (red bars), and provides likelihoods above 15%. In contrast, when α is based on LLT (blue bars) the two methods provide lower likelihoods, below 1% in some case. Plot (b) displays the likelihood provided by the doubly stochastic Method 3. Full colored bars report the mean likelihood, shaded bars the 95 th percentiles of the likelihood. Mean likelihoods are very similar or above those provided by Method 2. The 95 th percentile values are significantly higher. In particular, when α is based on LLT (blue bars), Method 3 percentiles are all higher than in Method 1.  Figure 11 summarizes the corresponding gt f (te). Plot (a) compares Method 1 (black bars) and Method 2 (colored bars). In Mt. St. Helens, 1982 -line length (blue), Method 2 outperforms Method 1 only in the third time window, with the only likelihood above 10%. In Bezymyanny, 1960 -seismic strain (red), Method 2 outperforms Method 1 in the first two time windows, with likelihoods above 5%. Plot (b) concerns the doubly stochastic Method 3. Full colored bars report the mean likelihood, shaded bars its 95 th percentile values. In this case, mean likelihoods are very similar to those provided by Method 1. The 95 th percentile values are again significantly higher, from 5% to 10% in the first and second time windows, and above 15% in the third.

Conclusions
In this study, we have introduced a new doubly stochastic method for performing material failure forecasts. The method enhances the well known FFM equation, introducing a new formulation similar to the Hull-White model. The model is a mean-reverting SDE, which assumes the traditional ODE as the mean solution. New parameters include the noise standard deviation σ and the mean-reversion rapidity γ. They are estimated based on the properties of the residuals in the original linearized problem. The implementation allows the model to make excursions from the classical solutions, including the possibility of some degree of aleatory uncertainty in the estimation. This provides probability forecasts instead that deterministic predictions.
We compared the new method and the forecasting method based on the classical formulation. We also compared an Hull-White model without considering the model uncertainty, and its doubly stochastic formulation. A comparison is performed on four historical datasets of precursory signals already studied with the classical FFM, including line-length and fault movement at Mt. St. Helens, 1981-82, seismic signals registered from Bezymyanny, 1960, and surface movement of Mt. Toc, 1963. We also considered forecasting problems over moving time windows, based on data in the case studies of Mt. St. Helens, 1982 andBezymyanny, 1960. The data shows the performance of the methods across a range of possible values of convexity α and amounts of scattering in the observations, and the increased forecasting skill of the doubly stochastic formulation in Method 3. The doubly stochastic formulation is particularly impacting in the forecasts because it enables the calculation of the 95 th percentiles of the probability of failure. This values are generally higher than the mean estimates, and represent the worst case scenario with a probability of occurrence above 5%. This was not possible in the classical formulation. This approach is the subject of ongoing and future work, with the purpose to further enhance the short-term eruption forecasting robustness.

A Sensitivity analysis on the noise properties
Discrete observations provide us information on K = σ 2 γ , which is the variance of the solution of the Ornstein-Uhlenbeck process associated to our SDE. However, solutions with the same K can look significantly different, as shown in Figure 3b. Figure 12: Forecasts of t f based on Method 2. The solutions assume equal K = σ 2 γ , but different (σ, γ). In plots (a,c) γ −1 = 7.5, and in plots (b,d) γ −1 = 30. The bold line is the pdf of t f . Thin dashed lines bound the 90% confidence interval of the SDE paths, and a thin continuous line is the mean path. The thin dotted lines are random paths. A thin dashed line marks 1/x 0 , and a dashed black line marks t e .
The estimators in all our case studies assume γ = 1/15. This is a choice based on the empirical observation that the total length of temporal sequence is at the scale of 45 days, and the duration of well-aligned observations is at the scale of 15 days. In Figure 12 we show examples of solutions with doubled or halved γ. There is an apparent effect on the 90% confidence interval of the SDE paths, which is enlarged increasing γ, and terminally bent down towards the real axis. This is increased in (c,d), where α = 1.65. However, even in that case the effect of gt f is minor, and increasing γ of four times reduces E[t f ] of about 5 days.

B Classical statistical analysis of FFM
In our study we apply a linearized least-squared approach, based on a preliminary estimate of α. Nonlinear regression methods have also been applied to the ODE problem, but in this study we relied on the linearized method for simplicity (Bell et al., 2011). Linear regressive models based on different formulations of the differential equation can provide estimates of α. Even if these formulations are algebraically equivalent, the result of the regression can change significantly. The two different methods LLT and HT are reported in Voight (1988a) and then further detailed in Cornelius and Voight (1995). can produce estimates of α and log(A). It requires an approximation to the rate change, which typically suffers of data scattering. Then, A is not robustly constrained by its logarithm. Moreover, the equation may be not well-posed in case of negative rates, requiring to neglect some values, or to apply the equation to X + c > 0.
In the Hindsight technique (HT), a LRM is applied to the equation (from eq. 3, with t0 = t f ): log(X(t)) = 1 1 − α log(t f − t) + log[A(α − 1)] 1 − α , producing estimates of 1 1−α and log[A(α−1)] 1−α . It does not rely on the rate change, but requires to know the failure time t f in advance. This is the reason of its name. Thus, it is not a method producing forecasts, but can be solely used in retrospective analysis. Moreover, while the value of α is well constrained, the value of A is not. The uncertainty range affecting A is increased by the uncertainty affecting α, and the estimate is done in logarithmic scale. Figure 13 shows the results of the LLT and HT applied to the Mt. St. Helens (a,b), and to the Bezyamyanny & Mt. Toc datasets (c,d). We note that the accuracy of HT is generally higher. In our examples we implemented seven datasets already processed in Voight (1988a), discarding four of them. These would require a more detailed UQ of the unprocessed data source. In detail, the Mt. St. Helens tilt dataset shows significantly discordant results between LLT and HT, and both the datasets are excluded. The uncertainty affecting α in the Bezymyanny dataset according to LLT is very large and is partially supported below 1. The LLT results of the Mt. Toc, 1960 dataset are characterized by α ≈ 1 and a very low scattering, insufficient to define a significant noise.