^{1}

^{*}

^{1}

^{2}

^{2}

^{1}

^{1}

^{2}

Edited by: Jiancang Zhuang, Institute of Statistical Mathematics, Japan

Reviewed by: Andrea Llenos, United States Geological Survey (USGS), United States; Robert Shcherbakov, Western University, Canada

This article was submitted to Statistical and Computational Physics, a section of the journal Frontiers in Applied Mathematics and Statistics

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

The epidemic type aftershock sequence (ETAS) model is widely used to model seismic sequences and underpins operational earthquake forecasting (OEF). However, it remains challenging to assess the reliability of inverted ETAS parameters for numerous reasons. For example, the most common algorithms just return point estimates with little quantification of uncertainty. At the same time, Bayesian Markov chain Monte Carlo implementations remain slow to run and do not scale well, and few have been extended to include spatial structure. This makes it difficult to explore the effects of stochastic uncertainty. Here, we present a new approach to ETAS modeling using an alternative Bayesian method, the integrated nested Laplace approximation (INLA). We have implemented this model in a new R-Package called

The epidemic type aftershock sequence (ETAS) model [

There are many different implementations of the ETAS model. The most common approach for determining ETAS parameters is the maximum-likelihood method which returns a point estimate on the ETAS parameters using an optimization algorithm (e.g., [

The integrated nested Laplace approximation (INLA [

To date, the main limitation of the application of

Here, we present a broad analysis of how the

In this section, we introduce our

The temporal ETAS model is a marked Hawkes process model, where the marking variable is the magnitude of the events. The ETAS model is composed of three parts: a background rate term, a triggered events rate representing the rate of events induced by past seismicity, and a magnitude distribution independent from space and time. Given the independence between the magnitude distribution and the time distribution of the events, the ETAS conditional intensity is usually the product between a Hawkes process model describing only the location and a magnitude distribution π(

More formally, the ETAS conditional intensity function evaluated at a generic time point _{1}, _{2}), _{1}, _{2} ≥ 0, _{1} < _{2} having observed the events _{0} is the minimum magnitude in the catalog which needs to be completely sampled, and _{Hawkes} is the conditional intensity of a temporal Hawkes process describing the occurrence times only. In our ETAS implementation, this is given by the following:

In seismology, the magnitude distribution, π(_{Hawkes} as simply λ.

The Hawkes process is implemented in

Having observed a catalog of events _{i}, and
_{0}(_{1}, _{2}) and the remaining summation, which is referred to as the sum of the number of triggered events by each event _{i}, namely Λ_{i}(_{1}, _{2}). We approximate the integral by linearizing the functions Λ_{0}(_{1}, _{2}) and Λ_{i}(_{1}, _{2}). This means that the resulting approximate integral is the sum of

However, we concluded that this approximation alone is not sufficiently accurate. To increase the accuracy of the approximation, for each integral Λ_{i}(_{1}, _{2}), we further consider a partition of the integration interval [max(_{1}, _{i}), _{2}] in _{i} bins,

Substituting Equations (5) into (3), the Hawkes process log-likelihood can be written as follows:
^{*}. Other choices led to a non-convergent model [^{m}, the linearized version with respect to a point ^{*} is given by a truncated Taylor expansion:

For each event, time binning is used in Equation (5) to improve the accuracy of the integration of the term describing the sum of the number of triggered events. The binning strategy is fundamental because the number of bins determines, up to a certain limit, the accuracy of this component of the approximation. Considering more bins enhances the accuracy of the approximation but increases the computational time because it increases the number of quantities to be approximated. Also, we cannot reduce the approximation error to zero, and the numerical value of the integral in each bin goes to zero increasing the number of bins, which can be problematic in terms of numerical stability. We found that the ETAS model considered here, having approximately 10 bins for each observed point is usually enough, and that is best considering higher resolution bins close to the triggering event. In fact, the function
_{i} and becomes almost constant moving away from _{i}. This means that we need shorter bins close to _{i}, to capture the variation, and wider bins far from _{i} where the rate changes more slowly.

We choose a binning strategy defined by three parameters Δ, δ > 0, and _{i} are given by the following:
_{i} ≤ _{max} is the maximum _{max} regulates the maximum number of bins.

This strategy presents two advantages. The first is that we have shorter bins close to the point _{i} and wider bins as we move away from that point. The second is that the first (or second, or third, or any) bin has the same length for all points. This is useful because the integral in a bin is a function of the bin length and not of the absolute position of the bin. This means that we need to calculate the value of the integral in the first (second, third, or any) bin once time and reuse the same result for all events. This significantly reduces the computational burden.

This section and the next one illustrate what we need to provide to

To build an ETAS model, we need to provide functions for each of the components of the likelihood function (Equation 6). The linearization and the finding of the mode ^{*} are managed automatically by the

and

For full details, refer to Serafini et al. [

This subsection is helpful for those wanting to understand the source code; it describes a computational trick that might appear unusual at first. It can be skipped without a loss of understanding of mathematics.

Our implementation in

More formally, INLA has the special feature of allowing the user to work with Poisson counts models with exposures equal to zero (which should be improper). A generic Poisson model for counts _{i}, _{i}, _{1}, …, _{n} with log-intensity log λ_{P}(_{h} number of bins in the approximation of the expected number of induced events by observation

Hawkes process log-likelihood components approximation.

_{P} |
|||||
---|---|---|---|---|---|

Part I | 1 | _{i} = 0, _{i} = 1 |
|||

Part II | log Λ_{h}(_{i, h}) |
_{i} = 0, _{i} = 1 |
|||

Part III | log λ( |
_{i} = 1, _{i} = 0 |

Furthermore, _{i}, exposures _{i}, and the information on the events _{i} representing the different log-likelihood components; and, to provide the functions

More detail on how to build the functions in the

We have to set the priors for the parameters. The INLA method is designed for latent Gaussian models, which means that all the unobservable parameters have to be Gaussian. This seems in contrast with the positivity constraint of the ETAS parameters μ,

Our idea is to use an internal scale where the parameters have a Gaussian distribution and to transform them before using them in log-likelihood components calculations. We refer to the internal scale as INLA scale and to the parameters in the INLA scale as _{Y}(·).

The

The package

The prior for μ is the one that will most commonly need to be modified as it changes with the size of the domain being modeled. We choose Gamma(shape = _{μ}, rate = _{μ}) prior for μ. The mean of the distribution is given by _{μ}/_{μ} = 1 event/day; the variance is _{μ} and _{μ}. There is then some trade-off in the variance and skewness parameters.

Samples drawn from the priors used in this article are shown in

The ETAS parameters themselves are not easy to interpret given that it is their combination in the Omori decay and triggering functions that we are most interested in. We draw 1000 samples from the prior to generate samples of the Omori decay, the triggering function for an M4 event and the triggering function of an M6.7 event (

The function

Description of the model definition contained in list input.

Catalog of event times and magnitudes | ||

catalog [ |
The input catalog as it is provided with at least a set of times and magnitudes | |

catalogue.bru list([ts, magnitude, idx.p]) | The input catalog in the format needed for inlabru. For each event we have a [time, magnitude, id] | |

Time domain varied in Sections 3.3, 3.4 | ||

time.int | The provided start and end date in string format | |

T12 double [T1, T2] | The start and end date as number of days from the provided starting date | |

lat.int double | [–90,90] | Min and max latitude bounds for filtering the catalog |

lon.int double | [–180,180] | Min and max longitude bounds for filtering the catalog |

M0 double | 2.5 | Minimum magnitude for the model domain |

Varied in Section 3.1 | ||

mu.init double | 0.3 | Initial guess for the background rate, μ |

K.init double | 0.1 | Initial guess for the, |

alpha.init double | 1 | Initial guess for, α |

c.init double | 0.2 | Initial guess for, |

p.init double | 1.1 | Initial guess for, |

A list of functions used to transform the parameters from the internal scale to the ETAS scale | ||

See Section 2.6 for definition | ||

a_mu double | 0.5 | Gamma distribution shape parameter |

b_mu double | 0.5 | Gamma distribution rate parameter |

a_K double | -1 | log-Normal distribution mean |

b_K double | 0.5 | log-Normal distribution standard deviation |

a_alpha double | 0 | Min of a uniform distribution |

b_alpha double | 10 | Max of a uniform distribution |

a_c double | 0 | Min of a uniform distribution |

b_c double | 1 | Max of a uniform distribution |

a_p double | 1 | Min of a uniform distribution |

b_p double | 2 | Max of a uniform distribution |

See Section 2.3 | ||

Nmax int | 8 | Value of the parameter _{max} in Equation (10) |

coef.t double | 1.0 | Value of the parameter δ in Equation (10) |

delta.t double | 0.1 | Value of the parameter Δ in Equation (10) |

See bru documentation | ||

bru.verbose int | 3 | Type of visual output from |

bru_max_iter int | 100 | Maximum number of |

num.threads int | 5 | Number of cores used in each |

inla.mode string | “experimental” | Type of approximation used by INLA |

bru.inital: th.mu, th.K, space th.alpha, th.c, th.p | list[double[5]] | Initial trial parameters on the internal scale. These are calulated using the inverse of the copula transformation functions in |

max_iter int | 100 | Maximum number of iterations for the inlabru algorithm. The number of iterations will be less than this number if the algorithm have converged |

max_step | NULL | This parameters refers to how far the parameter value can jump from one iteration to another. The greater the value the greater the potential jump. Setting a value different from NULL prevents the |

This information will be passed to

In the Section 3, we vary the catalog, start times, and the initial set of trial parameters. To achieve this, we create a default

Once we call

The iterative process is illustrated in

Schematic diagram showing the

We start with a set of trial ETAS parameters _{0} = (μ_{0}, _{0}, α_{0}, _{0}, _{0}) which will be used as the linearization point for the linear approximation. These initial values should lie within their respective priors. They could be sampled from the priors, but it is possible that a very unrealistic parameter combination might be chosen. These parameters will be updated in each loop of the ^{−5} or > 20) should be avoided. Usually, setting all the parameters to 1 (expect

From the R-INLA output, we extract the modes of the approximated posteriors _{0} used as starting point. The approximate posterior mode tends to true one as the iterations run.

At this point, we have the initial set of trial ETAS parameters _{0} that were used as the linearization point, and the posterior modes derived from R-INLA ^{1}

Convergence is evaluated by comparing ^{*} and _{0}. By default, convergence is established when there is a difference between each parameter pair is less than a 1% of the parameter standard deviation. The value 1% can be modified by the user. If convergence has not been achieved and the maximum number of iterations has not occurred, we set

The final component of this article is the production of synthetic catalogs to be analzsed. The synthetics are constructed leveraging the branching structure of the ETAS model. Specifically, for temporal models, background events are selected randomly in the provided time window with a rate equal to μ. Then, the offsprings of each background event are sampled from an inhomogeneous Poisson process with intensity given by the triggering function. This process is repeated until no offsprings are generated in the time frame chosen for the simulation.

Using _{0} = 2.5, which is motivated by catalogs such as those in the Apennines of Italy or for Ridgecrest in California.

We also use two different scenarios, a seeded version of these catalogs where we impose an extra M6.7 event on day 500, and an unseeded catalog where the events are purely generated by the ETAS model. This leads to catalogs which are relatively active in the former case and relatively quiet in the latter case. Using these scenarios, we can generate different stochastic samples of events to produce a range of catalogs consistent with these parameterizations.

The associated R Markdown notebook in the GitHub repository^{2}

We present the performance of the

All of the analyses start with one or more catalogs of 1,000 days in length, generated using a constant background rate of μ = 0.1 events/day above a constant magnitude threshold of _{0} = 2.5, and true ETAS parameters are listed on the top row of

The true ETAS parameters and the four sets of different initial conditions used in analyzing the catalogs in

True parameters | 0.1 | 0.089 | 2.29 | 0.11 | 1.08 |

Trial parameter set 1 | 0.05 | 0.01 | 1.0 | 0.05 | 1.01 |

Trial parameter set 2 | 5.0 | 1.0 | 5.0 | 0.3 | 1.5 |

Trial parameter set 3 | 0.1 | 0.089 | 2.29 | 0.11 | 1.08 |

Trial parameter set 4 | 0.3 | 0.1 | 1.0 | 0.2 | 1.01 |

A consequence of choosing the 1,000-day window is that there will not be a fixed number of events when comparing different samples as some samples contain large events while others are relatively quiet. We make this choice because we believe that it represents the closest analogy to the data challenge faced by practitioners. However, we will be explicit in exploring the implications of this choice.

Within the sequences, there are three different timescales or frequencies that inter-relate. The duration of the synthetic catalogs, the background rate and the rate at which aftershocks decay (e.g., Touati et al. [

Where algorithms require an initial set of trial starting parameters, it is important to test whether the results are robust irrespective of the choice of starting conditions. We explore the influence of the initial conditions by generating two catalog (refer to

Two catalogs we use when varying the starting point for the ETAS parameters.

The first catalog is relatively quiet and has only 217 events (

Posteriors for the inversion of the catalogs in

In the second catalog, we have seeded an M6.7 event on day 500 (

All of these models find similar posteriors irrespective of the initial trial ETAS parameters set. It is important that the priors are set broad enough to allow the potential for the posteriors to resolve the true value. This is particularly evident for the quieter model where the posteriors on the triggering parameter's rely on more information from the priors.

In real catalogs, the prior the background rate needs to be set with care because, when considering a purely temporal model, it will vary depending upon the spatial extent being considered; i.e., the background rate of a temporal model when considering a global dataset would be significantly higher than just California. Furthermore, changing the lower magnitude limit significantly changes the total number of background events.

It is difficult to interpret the ETAS posteriors directly, thus, we explore the triggering function by sampling the parameter posteriors 100 times, calculating the triggering functions for these posterior samples, and plotting the ensemble of triggering functions (

Propagation of ETAS parameter uncertainty on the triggering functions. We take 100 samples of the ETAS posteriors for the 1,000-day quiescent baseline

When incorporating the magnitude dependence, any bias or uncertainty in α becomes important. The figures show the triggering functions for magnitude 4.0 and 6.7 events over 24 hours. While the triggering functions for the M4 events are nested similarly to the Omori sequences; the triggering functions for the M6.7 event are systematically different. The posteriors from the training catalog were seeded with an M6.7 event result an initial event rate 50% higher, and the two distributions barely overlap.

We conclude that the choice of training data could have a significant effect on the forecasts of seismicity rate after large events. In the next section, we explore the robustness of these results to stochastic uncertainty in the training catalogs.

We extend the analysis of the previous section to explore the impact of stochastic variability. We produce 10 synthetic catalogs for both the unseeded and M6.7 seeded catalogs and compare the parameters posterior distribution.

In the family of catalogs where we did not seed large event (

A total of 10 synthetic catalogs based on the baseline model of 1,000-days with background events but no large seeded event. All parameters are the same between the runs, and these just capture the stochastic uncertainty. These are all inverted using

Considering the 10 seeded catalogs (

A total of 10 synthetic catalogs based on the baseline model of 1,000-days with background events and an M6.7 event on day 500. All parameters are the same between the runs, and these just capture the stochastic uncertainty. These are all inverted using

Posteriors that explore the impact of stochastic variability on the inverted ETAS parameters using

There is always trade-off between α and

Studies which are seeking to assign a physical cause to spatial and/or temporal variability of the background and triggering parameters should ensure that the variability cannot be explained by the stochastic nature of finite earthquake catalogs. The methods presented here provide one possible tool for doing this.

The runtime for a model with 2,000 events is 6 min on a laptop. We find that not only is our

The inversions of synthetic data presented here show that the stochastic variability in the training catalogs produces understandable variability in the posteriors. More data and sequences containing both triggered sequences and background allow us to resolve all parameters well. Better resolution of α and

The motivation for applying the ETAS model is sometimes the presence of an “interesting” feature, such as an evolving or complex aftershock sequence following a notable event. In this section, we explore whether it is important to have both quiet periods, as well as the aftershock sequence itself for accurately recovering the true parameterization. This motivates defining what a representative sample looks like; evaluating this in practice is non-trivial, but we can outline what is insufficient.

We start with the 1,000-day catalog including an M6.7 event seeded on day 500. We then generate catalog subsets by eliminating the first 250, 400, 500, and 501 days of the catalog (start dates of subcatalogs shown as vertical-dashed lines in

First, we consider what happens to the posterior of the background rate, μ, as the length of the subcatalogs is shortened. With 500 days of background before the seeded event, we resolve μ accurately. As the quiet background is progressively removed, the model estimate of μ systematically rises. When there is between 250 and 100 days of background data, the mode overestimates the data-generating parameter by approximately 30%, but it still lies within the posterior distributions (turquoise and brown curves for μ in

All of the models, apart from the one starting on day 501, contain the M6.7 event and 500 days of its aftershocks (

The results already presented in

The model, where the mainshock was not part of the sub-catalog, was particularly biased (pink curve in

In the previous section, we explicitly cropped out subcatalogs and ran the analysis on that subset of the data, effectively throwing the rest away. We demonstrated the consequences removing the M6.7 mainshock from the sub-catalog being analyzed; the posteriors on the triggering parameters and background became significantly biased (

In

In practice, this complicates the implementation of the time binning because events occurring prior to the start of the model domain only need to be evaluated from “T1” onward. The breaking of similarity of the time bins has a penalty in the speed of the implementation.

The results of conditioning the inversion using the historic events can be seen in

In the analysis of real catalogs, this effect will be particularly relevant when there have been very large past earthquakes, which are still influencing today's rates.

Finally, we explore the effect of short-term incompleteness after large mainshocks on the inverted parameters. This rate-dependent incompleteness occurs because is hard to resolve the waveforms of small earthquakes when overprinted by many larger events, yet the effect is short-lived.

We take a 1,000-day catalog with an M6.7 event seeded on day 500 and then introduce a temporary increase in the completeness threshold after the M6.7 event using the functional form suggested by Helmstetter et al. [_{i} and _{i} are the magnitude and occurrence time of the event, we are modeling the incompleteness for,

We perform the inversion on the original catalog and the one where short-term incompleteness has removed a number of events (

Plots of the complete baseline catalog and the catalog with incompleteness artificially introduced using the functional form suggested by Helmstetter et al. [

Plot comparing the posteriors of the complete and incomplete catalogs presented in

The complete catalog contained 1832 events, and the incomplete catalog contains 1469 events (

All of the parameter estimates in the incomplete catalog are now notably more biased, and their standard deviation has not increased to compensate for this, so the true values lie significantly outwith the posteriors (

Propagation of ETAS parameter uncertainty on the triggering functions. We take 100 samples of the ETAS posteriors for the complete 1,000-day catalog with an M6.7 on day 500

The bias in the predicted triggering functions arising from short-term incompleteness is significant and cannot be ignored within an OEF context. Solving this problem within

Having analyzed a range of synthetic datasets, we now emphasize the lessons learned we should carry forward for the analysis of real sequences.

Reliable inversions can only result from data that are representative of the processes the model is trying to capture. This means that datasets need to contain both productive sequences and periods that resolve the background without being overprinted by triggered events. The main difference between α and

Interpreting the results of an ETAS inversion is non-trivial. We advocate the routine analysis of synthetic to understand what it is possible to resolve in principle.

Preconditioning models using large historic events can be significant. By considering samples of the triggering functions once can pre-determine the magnitude of events that need to be included as a function of time.

The use of synthetic modeling should be, particularly, important if time-varying background rates are being inferred from the inversion of catalog subsets in moving windows using the ETAS model.

The impact of short-term incompleteness following large events is very significant and needs to be addressed either by raising the magnitude of completeness or formal modeling of the incompleteness. Resolving this for

These are some of the considerations we explored here, but different use cases will present other modeling challenges that can be effectively explored through similar suits of synthetic modeling.

The exploratory approach illustrated here can be used to identify and understand sources of uncertainty and bias in the ETAS parameter posteriors that are derived from the structural and stochastic nature of the training data. We identify the need for a representative sample to contain periods of relative quiescence, as well as sequences with clear triggering behavior, if all parameters are to be well resolved.

Where studies focus solely on active sequences, the background rate can be erroneously estimated to be several times larger than the real background rate, and the triggering parameters erroneously imply more rapidly decaying sequences than the true underlying parameterization would. This implies that caution is needed in studies that allow the parameters to vary in time using windowing methods. While one cannot conclusively rule out that background rates and triggering behaviors may vary, we advocate that a stationary model with constant parameters should be adopted unless there is compelling evidence independent of the ETAS inversion, Colfiorito being a case in point (e.g., [

Rate-dependent incompleteness severely degrades the accuracy of the ETAS inversion and needs to be addressed directly.

The use of synthetic modeling, as presented here, provides a basis for discriminating when variations in the posteriors of ETAS parameters can be explained by deficiencies in the training data, and when there is likely a robust and potentially useful signal. Such exploration requires a fast method for performing the inversion. The interpretation is easier when full posteriors can be compared, as opposed to just having point estimates.

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below:

The writing of this article and the analyzes presented in this article were led by MN. The ETAS model was implemented within

This study was funded by the Real-Time Earthquake Risk Reduction for a Resilient Europe RISE project, which has received funding from the European Union's Horizon 2020 Research and Innovation Program under grant Agreement 821115. MN and IM were additionally funded by the NSFGEO-NERC grant NE/R000794/1.

All the code to produce the present results is written in the R programming language. For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) license to any author accepted manuscript version arising from this submission.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

The Supplementary Material for this article can be found online at:

^{1}

^{2}