Statistical evaluation of earthquake forecast efficiency using earthquake-catalog and fault slip rate in the Sichuan-Yunnan region, China

Epicenter locations are generally adjacent to active faults and past seismicity regions. Past earthquake catalogs and the geometry of the active faults can provide key prior knowledge concerning earthquake forecasts. In this study, we first introduce two straightforward dedicated models, the proximity-to-past-earthquakes (PPE) and proximity-to-mapped-faults (PMF) models, to fit the seismicity in the Sichuan-Yunnan region, China. The hybrid proximity-to-known-sources (PKS) model with the optimized model parameters is then used to estimate the probability of earthquake occurrence. Second, to compare the PKS forecast efficiency to those of different models, retrospective tests are applied to a dataset located in the Sichuan-Yunnan region. The results show that the probability maps derived from PPE, PMF, and PKS have non-uniform Poisson distribution features and that there is forecasting significance for past cases of moderate earthquakes in the test region. Finally, using Molchan error diagram tests, we find that the hybrid PKS model performs better than the other models in the testing region. The unsatisfactory performance of the PMF model for earthquake forecasting may lie both in the incompleteness of the fault database and the lack of consideration of co-seismic ruptures. Therefore, one of the three models can be used as a base model for comparing and evaluating earthquake forecast strategies.


Introduction
More than one-third of the global population experiences earthquakes, with frequent slight or greater damage to lives and property from such events (Marti et al., 2019). However, at present, earthquake forecasting remains a major unsolved scientific problem and is not sufficiently reliable to rapidly predict earthquakes. Seismologists can also contribute to society by performing long-term probabilistic seismic hazard assessments. As a basic input for building codes, such assessments are crucial to ensure good building practices to save lives (Hiemer et al., 2013).
Mapped faults and past earthquakes have historically been the two main types of data used by seismic hazard models to estimate the likelihood of earthquake occurrence . Some short-term models have been built using only cataloged earthquakes, including the short-term earthquake probability (STEP) (Gerstenberger et al., 2005) and epidemic-type aftershock sequence (ETAS) (Ogata, 1988;Ogata, 1998;Gerstenberger and Rhoades, 2010) models, which are based on the decay of aftershock rates and the Omori-Utsu law (Utsu, 1961). Other useful medium-or long-term models, such as the proximity-to-pastearthquakes (PPE) (Kagan and Jackson, 1994;Jackson and Kagan, 1999) and the every-earthquake-a-precursor-according-to-scale (EEPAS) (Rhoades and Evison, 2004;Rhoades and Evison, 2005;Rhoades, 2007) models, rely only on past earthquakes. Meanwhile, long-term models such as the proximity-to-mapped-faults (PMF) model (Rhoades and Stirling, 2012) often use mapped faults with their slip rates. In addition, the increasing availability of crustal movement observational data, such as Global Positioning System (GPS) network data and derived models of crustal strain rate (Bird et al., 2015;Rhoades et al., 2017) are important to estimate the earthquake occurrence rate (Rhoades et al., 2015).
A parallel development is that of hybrid methods, which combine two or more different models to gain predictive skills from a single idea or data source. The results of retrospective and pseudo-prospective analyses have shown that some multiplicative ensembles provide statistically better forecasts than their constituent forecasting models (Rhoades and Stirling, 2012;Rhoades 2013;Gerstenberger et al., 2014;Rhoades et al., 2015;Rhoades et al., 2016). Therefore, the international Collaboratory for the Study of Earthquake Predictability (CSEP) supports methods to evaluate combinations of two or more individual models or to assimilate new gridded covariates into existing models (Schorlemmer et al., 2018).
The robustness of various earthquake forecast models cannot be fully described without first being assessed in diverse regions with distinct seismic features. The Sichuan-Yunnan region provides a natural earthquake forecast experiment because, following the M S 8.0 Wenchuan earthquake, it became more seismically active. For example, two earthquakes of magnitude ≥7.0, the M S 7.0 Lushan and M S 7.0 Jiuzhaigou earthquakes, have occurred in this region in the last 10 years since the M S 8.0 Wenchuan earthquake; however, no earthquakes had occurred over the same time period before the M S 8.0 Wenchuan earthquake. Moreover, substantial infrastructure, such as reservoirs, railway networks, and electricity grids, is spread throughout the Sichuan-Yunnan region. While earthquakes of great intensity occurring due to natural processes cannot be avoided, forewarning can help minimize their often catastrophic and damaging impacts (Herrera et al., 2022). Much work is required regarding probabilistic seismic hazard assessments in this area. Therefore, we introduce individual probabilistic models based on past earthquakes or mapped faults and a hybrid model based on both to estimate the probability of earthquake occurrence; we then take a statistical approach to evaluate the efficiency of earthquake forecasting in the Sichuan-Yunnan region.
Section 2 includes a full description of the mapped faults and cataloged earthquakes that were used in this study. The two individual models and the hybrid model are introduced in Section 3. Section 4 and Section 5 present, successively, the results of the estimated seismicity rate for each model, along with an evaluation of their predictive effectiveness.

Data
We chose the Sichuan-Yunnan region for testing because of the good monitoring capability in this region with respect to seismicity and fault movement. In the test area of the Sichuan-Yunnan region, which is marked in Figure 1, 1,234 earthquakes with magnitudes ≥4.0 and depth ≤50 km were recorded in the China Earthquake Networks Center (CENC) catalog from January 1, 1970, to September 10, 2022, including the M S 8.0 Wenchuan earthquake, the largest earthquake in the test region. Of these, 14 target earthquakes in the test region with M ≥ 6.0 and depth ≤ 50 km that occurred during the three periods of May 20, 2008-May 20, 2011January 1, 2013-January 1, 2016andSeptember 10, 2019-September 10, 2022, listed in Table 1, were used to evaluate the forecast efficiency of the three different models. Figure 1 shows the epicenters of the earthquakes that occurred in the test region.
The Chinese Earthquake Administration (CEA) and other groups launched the Project Crustal Movement Observation Network of China (CMONC) to monitor the crust dynamics using various techniques including GPS. Based on the GPS measurements and the fault locations in the test region, the Second Monitoring and Application Center of CEA calculated the fault rates in the Sichuan-Yunnan region and provided the fault-model datasets for this study ( Figure 2). Figure 2 shows the higher slip rates in the Ganzi-Yushu fault (F5), the fault intersection segment of the Xianshuihe fault (F13), the Longmenshan fault (F10), the Jinpingshan fault (F15), the Anninghe fault (F14), and the Daliangshan fault (F12). The Qinghai-Tibet Plateau Block moves eastward continuously, blocked by the relatively stable Sichuan Basin, and the South China Block in the Sichuan-Yunnan region moves from east to south along these faults at a relatively high slip rate. In Figure 2, four aftershocks of the M S 8.0 Wenchuan earthquake with magnitudes of M S 6+ occurred near the northern segment (latitude >32°N) of the Longmenshan fault (F14), however, the segment of the fault does not have the expected relatively high slip rate (Shen et al., 2009;Xiong et al., 2021).

Proximity-to-past-earthquakes model
According to the assumption that future earthquakes are more likely to occur near past earthquakes, Jackson and Kagan (1999) proposed the PPE forecasting model, in which the contribution from every earthquake is inversely proportional to the epicentral distance and is directly dependent on the earthquake magnitude.
Under the PPE model, the earthquake occurrence-rate density λ at time t and location (x, y) can be calculated from the earthquakes (t i , m i , x i , y i ), where i = 1, . . ., n eq , in the catalog such that where a is a normalization parameter, d is the smoothing distance, and s is a small spatially uniform background rate of earthquake occurrence per day per kilometer squared to allow for earthquakes far from past earthquakes. The seismicity rate λ(t, x, y) is higher near past earthquakes of greater magnitude. By maximizing the likelihood function, the three parameters (a e , d e , s e ) can be computed by fitting the model to the past earthquake catalog. We used earthquakes with magnitudes of M S > 4.0 to compute the conditional density, using January 1, 1970, to January 1, 1990, as the learning period. The likelihood was only calculated for earthquakes with magnitudes of M S > 6.0 over the learning period. The three parameters were optimized for target earthquakes with M S > 6.0 over the forecast period from January 1, 1990, to May 20, 2008. Using the same parameters, the seismicity rates for January 1, 2013, and September 10, 2019, can also be estimated.

Proximity-to-mapped-faults model
The PMF model assumes that the mapped fault sources are frequently affected by earthquakes having the same characteristic magnitudes (Rhoades and Stirling, 2012). Under the PMF model, the mapped fault sources are composed of planar segments and each source has its own slip rate. Some long faults are divided into multiple segments, which are assigned their own slip rates. Because of earthquake depth estimation issues or because they are assigned a fixed depth, every fault is treated as if its dip angle were 90°; therefore, the distance between the faults and earthquakes has no relevance to the earthquake depths reported in the earthquake reports. The separated segments of the mapped faults and the individual associated slip rates  Frontiers in Earth Science frontiersin.org are used to fit the PMF model. The seismicity rate λ PMF at time t and location (x, y) is of the form where x i and y i are the locations of the individual point fault sources, n f is the total number of points with associated slip rates r i , and i = 1, ..., n f . d f is the space smoothing distance, which is not sensitive to the actual spacing between the individual points if it is much greater than d f . s f is a small spatially uniform background rate of earthquake occurrence per day per kilometer squared, which allows for earthquakes far from the mapped faults. Therefore, the seismicity rate λ PMF (t, x, y) is greater near mapped faults and lower far from mapped faults. In addition, the seismicity rate increases with the fault slip rate of the surrounding identified faults. The test area of the Sichuan-Yunnan region has 25 mapped faults with known slip rates, as shown in Figure 2. Each mapped fault is separated into multiple point sources closely spaced at 0.01°intervals along the fault. To compare the earthquake forecast efficiency of the PMF model to that of the PPE model, we used the same learning and forecast periods as used in the PPE model. The three parameters were optimized for target earthquakes with magnitudes of M S > 6.0 over the three periods of January 1, 1990-May 20, 2008; January 1, 1990-January 1, 2013; and January 1, 1990-September 10, 2019.

Proximity-to-known-sources model
Mapped faults and cataloged earthquakes can sometimes each contribute differently to the creation of conventional probabilistic seismic-hazard models. For short periods, the fault data may not play a worthwhile role in estimating the earthquake likelihood. Accordingly, Rhoades and Stirling (2012) proposed an optimal model for earthquake occurrence, called the proximity-to-known-sources (PKS) model, that combines the fault location and estimated slip rates with the cataloged earthquake locations and magnitudes. This model consists of a convex linear combination of the PPE and PMF models. The seismicity rate λ PKS at time t and location (x, y) is of the form where μ is an additional parameter to be estimated in the range of 0 ≤ μ ≤ 1, such that seven parameters are needed for the model. Here, we provide three methods to combine the two models: 1) Based on the optimized parameters (a e , d e , s e ) of the PPE model and (a f , d f , s f ) of the PMF model, we calculated the seismicity rate λ PKS on May 20, 2008; January 1, 2013; and September 10, 2019, using an empirically specified μ; i.e., μ = 0.1 and 0.9, and 2) μ is a liner dependent on magnitude, as proposed by Hiemer et al. (2013) and called PKS weighted model (PKSW) in this study. The form is where m c is the upper corner magnitude of the Gutenberg-Richter relationship (Gutenberg and Richter, 1944). All m c values over the three time intervals are M S 7.0. 3) All seven parameters of the hybrid model were optimized by maximizing the likelihood function over the same three time intervals. Finally, we compared the earthquake forecasting performances of the various parameter combinations. Figure 3 shows the results of the fitted seismicity rates for the three different times: May 20, 2008; January 1, 2013; and September 10, 2019. The black stars represent earthquakes of M S 6+ that occurred over the 3 years following the above three dates. The spatial inconsistency of the PPE model is due to its strong event location density concentration.

Results of the PPE model
Some obvious differences along the Longmenshan fault between the exceptions of the earthquake occurrence are shown in Figures 3A,  B. The M S 8.0 Wenchuan earthquake and its aftershocks may be the dominant earthquake source locations contributing to the spatial variation differences in the rate density. The seismicity rate of the Lushan region in Figure 3C is higher than that in Figure 3B because of the M S 7.0 Lushan earthquake. The PPE model gets its strength from its ability to use data from these huge earthquakes.
As shown in Figure 3, the M S 6.1 Panzhihua, M S 6.0 Maerkang, and M S 6.8 Luding earthquakes are not located at locations of relatively higher seismicity rate, which indicates that no earthquake clustering occurred during the last several decades. This adds to the difficulty of forecasting earthquakes using the PPE model.  Figure 4, the location probability decreases with distance from the mapped faults. The spatial variations do not differ obviously in Figures 4A-C because the map of the seismicity rate is related to the locations of the mapped faults and their slip rates, which are static on decadal timescales.

Result of the PMF model
As shown in Figure 4B, the seismicity rate at the location where the M S 6.6 Jinggu earthquake occurred is low because the epicenter of the earthquake was far from the mapped faults. However, three of the M S 8.0 Wenchuan aftershocks and the M S 6.1 Panzhihua earthquake ( Figure 4A) are not located in relatively high seismicity rate areas, even though they are near mapped fault sources. One possible explanation may be that the fault slip rate at these locations is small and that the seismicity rate did not benefit sufficiently from the fault slip rates, as seen in Figure 2.

Results of the PKS model
Based on the various μ values, the varying seismicity rates at the three dates, May 20, 2008; January 1, 2013; and September 10, 2019, are illustrated in Figure 5. All the graphs indicate that the rate density is highest near the greatest concentration of large earthquakes during the entire history of the catalog, as well as the mapped fault sources to different extents. The value of μ is related to the proportion of the contribution of the PPE model used to construct the PKS model.

Frontiers in Earth Science
frontiersin.org Figure 5 illustrates that the smaller the μ value, the more "hot spots" with high seismicity rates will appear. The estimated seismicity rate of the PKS model looks like that for the μ value calculated using Eq. 4 at the three dates.

Discussion
A hybrid model may underperform in a separate testing period if the seismicity rates that contribute to it are not well correlated with the

Frontiers in Earth Science
frontiersin.org earthquake locations or if the model is overfitted to a training period in which the seismicity rates are more closely correlated with the earthquake locations than those in the testing period (Rastin et al., 2022). Unaffected by any model fitting, an error diagram is a valuable tool to illustrate the relationship between the seismicity rate and the earthquake occurrence during various time periods and is also Frontiers in Earth Science frontiersin.org 06 commonly used to evaluate earthquake forecast strategies (Molchan, 1991;Zechar and Jordan, 2008).
In this study, we defined the following earthquake forecast strategy: there will likely be earthquakes of M S ≥ 6.0 occurring in the subsequent 3-year period at locations with relatively high seismicity rates based on the PPE, PMF, and PKS models.
The testing region was divided into multiple cells 0.5°× 0.5°in size. The seismicity rate of each cell was assigned the average of all the seismicity rates in the cells. A threshold was then specified and an alarm was generated if the seismicity rate of the cells was higher than the threshold. The fraction of the space-time occupied by the alarm (τ) of the Poisson is defined such that The fraction of earthquakes missed (]) is defined such that where S(i) is the seismicity rate of the ith cell, slevel is the threshold value of the seismicity rate, A(i) is the area of the ith cell, and E(i) is the number of the targeted earthquakes at the ith alarm. We used the area skill score (ASS) and the probability gain to summarize the potential performance of the earthquake forecast strategy. The region above the error diagram inside the unit square, called ASS, serves as a summary of the prospective performance of the seismicity rate in alarm-based earthquake occurrence forecasting (Zechar and Jordan, 2008;Zechar and Jordan, 2010). The diagonal line indicates ASS = 0.5. A positive connection between the seismicity rate and the frequency of earthquakes is denoted by ASS >0.5, while ASS <0.5 denotes a negative correlation . The probability gain (Gain) is defined such that The sample value of Gain corresponds to the slope of the line connecting (0,1) to (τ, ν), as shown in Figure 6. The diagonal line corresponds to a completely random guess, and Gain = 1. A higher Gain value indicates that fewer cells are needed for the same "hits," which means better performance of the earthquake forecast strategy.
In Section 4, the spatial variation of the estimated seismicity rate under the PPE, PMF, and PKS models was calculated at three dates, May 20, 2008; January 1, 2013; and September 10, 2019. By adjusting the threshold of the seismicity rate from the highest to the lowest, the fraction of the space-time occupied by alarm and alarmed targeted earthquakes was obtained and their correlations are illustrated in a Molchan error diagram, as seen in Figure 6. All 14 targeted earthquakes during the three periods of May 20, 2008-May 20, 2011, January 1, 2013-January 1, 2016, and September 10, 2019-September 10, 2022, are included.
With respect to the individual models, the PPE model slightly outperforms the PMF model in the efficiency of earthquake forecast because the ASS value of the PPE model is higher than that of the PMF Molchan error diagram for the PPE (red curves), proximity-to-mapped-faults (PMF) (blue curves), and proximity-to-known-sources (PKS) models with different weighting schemes (green curves: μ value optimized by maximizing the likelihood function, pink curves: μ value calculated using Eq. 4; short purple and orange dashed curves: μ values manually specified at 0.1 and 0.9) at three different times: May 20, 2008; January 1, 2013; and September 10, 2019. The area skill score (ASS); that is, the area above the error diagram inside the unit square, is used to evaluate the efficiency of the earthquake forecast.
Frontiers in Earth Science frontiersin.org model. However, the PMF model has a more informative probability gain than the PPE model because when τ < 0.3 and the number of "alarmed" earthquakes is < 6, which indicates that less than six targeted earthquakes occur near the mapped faults having a relatively high slip rate, as seen in Figure 4. Conversely, when τ ≥ 0.3, the efficiency of PPE is better than that of PMF. The target catalogs include surprises; i.e., the M S 8.0 Wenchuan aftershocks ( Figure 4A) and the M S 6.1 Panzhihua ( Figure 4A) and M S 6.6 Jinggu ( Figure 4B) earthquakes, which did not correspond to previously mapped faults. As shown in Figure 6, ASS PMF < ASS PPE ; therefore, the PPE model is more informative than the PMF model. Meanwhile, ASS PMF < ASS PPE ≈ ASS PKS ≈ ASS PKSW ≈ ASS PKS(μ=0.1) , which implies that, when we specify a low value of μ close to zero, the hybrid PKS model outperforms the PMF model in any combination but does not obtain a better forecast efficiency than the PPE model.
In addition, we obtained the performance for the two methods of combining the PMF model with the PPE model. A comparison of the results reveals that, according to the ASS value, there is no obvious difference between the two ensemble methods. However, the combination method in which μ is manually specified has a more informative gain when τ is in an appropriate range, as shown in Figure 6.

Conclusion
Predicting the time, location, and magnitude of future seismic events is possible using information from earthquakes in historical and instrumental catalogs and geologically mapped active faults, as proven in multiple studies (Rhoades and Stirling, 2012;Rhoades et al., 2015). In the present study, we applied the PPE and PMF models to fit the seismicity in the Sichuan-Yunnan region in China. Using the optimized parameters, we estimated the seismicity rates of the PPE and PMF models at three different times: May 20, 2008; January 1, 2013; and September 10, 2019. Retrospective synthetic testing has consistently shown that both independent and hybrid models can provide informative forecasts (Rhoades and Gerstenberger, 2009;Bayona et al., 2022). Next, we introduced the PKS model proposed by Rhoades and Stirling (2012) to fit the seismicity and to calculate the seismicity rate for the same dates using the PPE and PMF models. Finally, we collected the earthquakes with M S 6.0+ in the test region during three time periods, May 20, 2008-May 20, 2011, January 1, 2013-January 1, 2016, and September 10, 2019-September 10, 2022, as target events for prospective testing. The Molchan error diagram method was used to evaluate the model performances with respect to earthquake forecasts. Based on our results, our main conclusions are as follows.
(1) The PPE, PMF, and PKS models show better forecast efficiency with respect to moderate earthquakes compared to a homogeneous Poisson distribution in the Sichuan-Yunnan test region. While the PPE model is straightforward, its earthquake forecast ability is significant. In some cases, the PMF model has a more informative probability gain than the PPE model when τ is below 0.3, which indicates that, in the Sichuan-Yunnan region, the earthquake forecast is more effective when assigning a relatively higher threshold. We obtained a poor performance with the PMF model when τ was high.
(2) The forecast efficiencies differed due to discrepancies in the prior information or precursors related to the earthquakes and active tectonic setting. The PMF model, which is derived from the timeindependent slip rate of the active faults, represents features that are expected to affect the earthquake occurrence over a long-time frame. The results of the PMF model are also affected by the intrinsic incompleteness of the fault database (e.g., Basili et al., 2013), especially any unknown faults in the test region. However, it is impossible to obtain all of the existing faults. The missed earthquakes; e.g., the M S 6.6 Jinggu earthquake, may have occurred in the vicinity of an unknown fault. Moreover, some large earthquakes can produce higher scaling co-seismic ruptures equal to thousands of times the slip displacement in a single year. The highest slip displacement, that of the M S 8.0 Wenchuan earthquake, was 12-15 m (Li et al., 2009), located at Beichuan, where three aftershocks occurred that were missed in the forecast by the PMF model. The unsatisfactory performance of the PMF model may lie both in the incompleteness of the fault database and in the lack of consideration of co-seismic ruptures.
(3) The hybrid PKS model can incorporate information from the PPE and PKS models and consequently maintained the best performance over the three testing epochs. This is consistent with previous studies of hybrid short-, medium-, and long-term components (Rhoades et al., 2016;Christophersen et al., 2017;Rhoades et al., 2017). The key point may be that the hybrid model obtains a variety of strengths from the component in ways that other components do not. However, the PKS model, given any method of combination, did not outperform the PPE model, possibly due to the poor efficiency of the PMF model. (4) Comparisons of the methods of hyper-parameter optimization in which the seven parameters are fit by maximizing the likelihood function, calculating the value of μ based on the linear function of magnitude, or by specifying the value of μ showed an efficiency difference in the PKS model performance. In particular, the optimization method with a manually specified μ potentially obtained a good informative gain for some particular seismic events, which implies that the earthquake forecast strategy could be adjusted using an appropriate specified threshold in the test region.
The retrospective tests in our study also had limitations; for instance, the earthquake forecast strategy was originally designed for a specified 3-year interval. The performance may change somewhat when applied to other periods because of the degree of time dependence in the data used to generate the models. In addition, we did not account for location uncertainties when estimating the seismicity rates of the three models. Because we calculated their seismicity rates on a rather coarse grid of 0.1°× 0.1°c ells, location errors can naturally be accommodated without significantly affecting the correlation with the earthquake locations (Rastin et al., 2022).
The Sichuan-Yunnan region is one of the most active seismic regions on the Chinese mainland, and various spatial parameters carrying different types of information have been used to analyze the seismic activity. The PPE model, which is straightforward, has good forecast efficiency and can be used as a base model to evaluate the effectiveness of various earthquake forecast methods. Moreover, the results of our study indicate that an approach to earthquake forecasting that is model-driven and hyper-parameter controlled could be a promising means to implement operational earthquake forecasting.
Frontiers in Earth Science frontiersin.org