How many strong earthquakes will there be tomorrow?

In this note, we study the distribution of earthquake numbers in both worldwide and regional catalogs: in the Global Centroid Moment Tensor catalog, from 1980 to 2019 for magnitudes Mw 5. 5+ and 6.5+ in the first case, and in the Italian instrumental catalog from 1960 to 2021 for magnitudes Mw 4.0+ and 5.5+ in the second case. A subset of the global catalog is also used to study the Japanese region. We will focus our attention on short-term time windows of 1, 7, and 30 days, which have been poorly explored in previous studies. We model the earthquake numbers using two discrete probability distributions, i.e., Poisson and Negative Binomial. Using the classical chi-squared statistical test, we found that the Poisson distribution, widely used in seismological studies, is always rejected when tested against observations, while the Negative Binomial distribution cannot be disproved for magnitudes Mw 6.5+ in all time windows of the global catalog. However, if we consider the Japanese or the Italian regions, it cannot be proven that the Negative Binomial distribution performs better than the Poisson distribution using the chi-squared test. When instead we compared the performances of the two distributions using the Akaike Information Criterion, we found that the Negative Binomial distribution always performs better than the Poisson one. The results of this study suggest that the Negative Binomial distribution, largely ignored in seismological studies, should replace the Poisson distribution in modeling the number of earthquakes.


Introduction
One of the goals of statistical seismology is to forecast the number of events in a future space-time window. To properly determine the probability of the number of events in the selected space-time window, seismologists use discrete probability distributions, such as the Poisson distribution and (seldom) the Negative Binomial (NB) distribution [1]. In particular, besides long-term forecasting applications (usually known as probabilistic seismic hazard analyses (e.g., Meletti et al. [2], Danciu et al. [3]), the Poisson distribution is also widely used in short-term forecasting (time windows from 1 day to 1 month), both for making earthquake forecasts and for testing models based on independent observations [4][5][6]. In these cases, the time variations of the seismic rates are described by the Omori-Utsu law [7,8], but the number of events in a selected future time window is modeled by a Poisson distribution [4,5]. Another approach involves the epidemic space-time models (e.g., the ETAS model, Ogata [9]); in this case, the number of events in a selected future time window is modeled by computing a large number of simulations [10].
The use of Poisson distribution in short-term forecasting is criticized, since the clustering of seismicity leads to an overdispersed distribution of the number of events [11]. This conclusion is also consistent with recent studies showing that the temporal distribution of seismicity is governed by long-term correlation [12][13][14], a property of earthquake clustering.
The key to the success of the Poisson model for earthquake occurrence mostly relies on its simplicity. However, on the one hand, a Poisson distribution with only one parameter is too simple to capture the variability of the number of seismic events. On the other hand, a distribution based on seismic sequences modeling (e.g., using the ETAS model, [9]), albeit more suitable, is difficult to apply in real-time due to the large number of simulations needed [15]. The NB distribution can be a good compromise: it is more flexible than the Poisson distribution, since it uses two parameters instead of one, and is less complicated than an ETASbased simulation approach.
In this paper, we aim to test whether the NB distribution, which has already been successfully used for modelingthe number of earthquakes on a global scale (worldwide catalog with a time window of 1 year, [16]), can be used for shorter time windows, and whether this distribution has better characteristics compared to the Poisson distribution.

Data
In this study, we use both global and regional catalogs. The first catalog we consider is the Global Centroid Moment Tensor (GCMT) [17,18], from 1980 to 2019 ( Figure 1A), which has already been used in some important studies on earthquake statistics (e.g., 19,20). We selected events with a maximum depth of 50 km and a moment magnitude (Mw) above 5.5 [19]. Other studies suggest that Mw 5.5 may be too optimistic in the first years of the catalog [20] or just after the strongest events [21]. Thus, we also assume Mw 6.5 as a second lower threshold to be used: from this value on, the catalog can be considered complete. These two thresholds are also associated with earthquake and tsunami risk, since events with Mw 5.5+ can cause significant damage to buildings [22], and events with Mw 6.5+ can generate a tsunami wave that cannot be neglected [23]. We also consider a subset of the GCMT catalog, focused on the Japan region (see Figure 1A), to better investigate the characteristics of the Poisson and NB distributions in this very active seismic region. The second catalog we consider is the Italian instrumental catalog with homogenized magnitudes (HORUS catalog, [24]), from 1960 to 2021. Here we have removed the offshore events to achieve reliable completeness ( Figure 1B). We started our tests from a magnitude Mw 4.0, as suggested by the authors of the catalog [24], but also used Mw 5.5 as the second threshold because these two values are used in the Italian operational earthquake forecasting model [5].

Methods
As suggested in the Introduction, to model the number of earthquakes we consider here two discrete probability distributions: Poisson and Negative Binomial.
According to the Poisson distribution, events occur independently of each other with a known constant rate λ. The corresponding probability density function (PDF) is given by: This distribution is widely used in probabilistic seismic hazard analysis [25], but it does not take into account the well-.
/fams. .  known property of earthquakes to cluster in space and time. Indeed, the Poisson variance is too small to reflect the actual distribution of the number of earthquakes [11]. An overdispersed discrete distribution that better reflects the branching nature of the seismic events is the Negative Binomial distribution. The branching nature of earthquakes reflects their triggering property: when an earthquake occurs, the probability of a repeated earthquake in a close space-time window increases [11]. Therefore, the branching nature of earthquakes leads to the observed spatiotemporal clustering of seismicity. Kagan [11] provides some theoretical arguments relating seismicity clustering to NB distribution. Due to its simple formulation, NB is also the preferred choice for practical applications over other overdispersed distributions, such as the generalized Poisson distribution [26,27]. The PDF of the NB distribution is given by: Unlike the Poisson distribution, which is just based on the rate parameter (λ), the NB distribution depends on two parameters (r, p), the second of which can be used to characterize overdispersion in the seismic process (i.e., earthquake clustering). In this paper, we estimate the value of the parameters of the two distributions under study using the classical Maximum-Likelihood Estimation (MLE) technique: where

Goodness-of-fit test and Akaike Information Criterion
To test the theoretical distributions against observations (from the seismic catalog), we use the well-known chi-squared goodness of fit test [28,29]. This test is suitable for comparing observations with discrete probability distributions (as in our case for the number of seismic events in a selected time window), whose parameters are estimated from the same dataset of observations [30]. A classic p-value threshold used to reject a hypothesis is 0.05; since here we performed) 6 tests for each model and for each of the three datasets (world, Japan, Italy, we must use a correction to avoid the "multiple testing problem" [31]. Therefore, using the Bonferroni correction we consider p-value 0.05/6 = 0.0083 as the lower limit of the threshold 0.05/6 = 0.0083 [32]. The Akaike Information Criterion (AIC, [33]) is a classical method used to compare the performances of two or more models   with different number of parameters: where k is the number of model parameters, andL is the maximum value of the likelihood. The smaller the AIC, the better the model. Here we use AIC to compare the performances of the Poisson and the Negative Binomial distributions.

Result and discussion
Our results are presented in Tables 1-3. Table 1 shows that, in the short-term, the Negative Binomial distribution fits well for events with Mw 6.5+ in all considered time windows of 1, 7, and 30 days in the case of the GCMT catalog. In fact, the pvalues of the chi-squared test (Table 1) here increasingly exceed 0.05/6 = 0.0083 (two out of three are very large, i.e., >0.3), which demonstrates the best performance of the NB distribution for Mw 6.5+ events. This good fit can also be seen in the AIC values, which are always lower for the NB distribution than for the Poisson distribution. The results just discussed are valid for Mw 5.5+ events and the time window of 1 month in the global catalog: the Poisson distribution should be rejected, while the NB distribution is not. As for the chi-squared test for the 1 and 7 days cases and Mw 5.5+ GCMT events, the p-values associated with the NB distribution are significantly greater than or the Poisson distribution (several orders of magnitudes higher); however, both distributions fail the goodness-of-fit test (small absolute p-values). However, the results of the AIC speak in favor of the NB distribution for all the considered time windows (1, 7, and 30 days). Now let's look at the results for the Italian and Japanese catalogs. Due to the low degrees of freedom, the chi-squared test applied to these spatial shorter-scale catalogs provides no explanation for almost any of the considered time intervals. We get multiple Not Available p-values, as shown in Tables 2, 3. However, the cases in which the p-value is calculated are in complete agreement with the results referring to the global catalog. In particular, we observe that for Japanese events Mw 5.5+ within 1 month, the results of the chisquared test favor the NB distribution. Also, similarly to the global catalog and despite the fruitless chi-squared test, the AIC values for the Poisson distribution are always higher than for NB. This allows us to vote again in favor of the NB distribution.
In general, we can conclude that, in the short term, the NB distribution should be preferred over the Poisson distribution, especially when considering significant seismicity on a global scale. This is consistent with Kagan [11] study, which states that the NB distribution "is clearly a better approximation" to the distribution of the number of annual earthquakes. In Figures 2-4 we show the empirical distribution of the observations (i.e., histograms) along with the estimated Poisson and Negative Binomial distributions for all the time windows. Looking at these figures, we can appreciate the differences between the two distributions (e.g., Figure 2 Mw 5.5+ 30 days, or Figure 4 Mw 4.0+ 30 days); in some cases, the distributions seem very similar because in the linear scale of these figures is not possible to appreciate the differences in bins with a low number of events (e.g., Figure 2 Mw 5.5+ 7 days, for more than 15 events).

Conclusion
In this study, we tested two different probability distributions for the number of earthquakes in short-time windows (1 day, 7 days, 30 days) using worldwide seismic data (Mw 5.5+ and 6.5+ from 1980 to 2019), data from the same catalog but with a focus on Japan, and regional seismic data from Italy (Mw 4.0+ and 5.5+ from 1960 to 2021). As already shown in previous studies, we found that the Poisson distribution cannot properly describe the number of events. Conversely, the Negative Binomial distribution performed better, especially for large magnitude events (Mw 6.5+) of the global catalog, for all considered time windows. However, in the Japanese and Italian regional catalogs, the Negative Binomial distribution fails to describe the number of events, especially in Italy for magnitudes Mw 4.0+ and 5.5+, although with better performances compared to the Poisson distribution. These new results demonstrate the power of the Negative Binomial distribution in forecasting the number of earthquakes in short-time windows, in particular if we compare its performance with that of the Poisson distribution. Our findings for time windows of 1 and 7 days could be especially useful for short-term earthquake forecasting. Both models based on the Omori-Utsu law and epidemic models (like ETAS) are widely used for short-term earthquake forecasting, along with the Poisson distribution to model the number of events in a future time window; our results suggest also considering the Negative Binomial distribution to model the number of events.

Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author. The data and the code used in this paper are available in the repository https://github.com/ MatteoTaroniINGV/NegativeBinomialDistribution (Last access: June 2023).