Original Research ARTICLE
A Source Clustering Approach for Efficient Inundation Modeling and Regional Scale Probabilistic Tsunami Hazard Assessment
- 1Department of Earth Sciences, University of Oregon, Eugene, OR, United States
- 2Courant Institute, New York University, New York, NY, United States
- 3Department of Applied Mathematics, University of Washington, Seattle, WA, United States
- 4Department of Earth and Space Sciences, University of Washington, Seattle, WA, United States
For coastal regions on the margin of a subduction zone, near-field megathrust earthquakes are the source of the most extreme tsunami hazards, and are important to handle properly as one aspect of any Probabilistic Tsunami Hazard Assessment. Typically, great variability in inundation depth at any point is possible due to the extreme variation in extent and pattern of slip over the fault surface. In this context, we present an approach to estimating inundation depth probabilities (in the form of hazard curves at a set of coastal locations) that consists of two components. The first component uses a Karhunen-Loève expansion to express the probability density function (PDF) for all possible events, with PDF parameters that are geophysically reasonable for the Cascadia Subduction Zone. It is then easy and computationally cheap to generate a large N number of samples from this PDF; doing so and performing a full tsunami inundation simulation for each provides a brute force approach to estimating probabilities of inundation. However, to obtain reasonable results, particularly for extreme flooding due to rare events, N would have to be so large as to make the tsunami simulations prohibitively expensive. The second component tackles this difficulty by using importance sampling techniques to adequately sample the tails of the distribution and properly re-weight the probability assigned to the resulting realizations, and by grouping the realizations into a small number of clusters that we believe will give similar inundation patterns in the region of interest. In this approach, only one fine-grid tsunami simulation need be computed from a representative member of each cluster. We discuss clustering based on proxy quantities that are cheap to compute over a large number of realizations, but that can identify a smaller number of clusters of realizations that will have similar inundation depths. The fine-grid simulations for each cluster representative can also be used to develop an improved strategy, in which these are combined with cheaper coarse-grid simulations of other members of the cluster. We illustrate the methodology by considering two coastal locations: Crescent City, CA and Westport, WA.
The primary goal of this work is to present a general methodology for developing the hazard curve for a quantity of interest (e.g., maximum water depth) at a coastal location that may be inundated by tsunamis. An inundation hazard curve shows the annual probability that the flooding depth will exceed each value in a range of specified exceedance values. The same techniques could be applied to other quantities of interest (e.g., maximum flow speed or momentum flux) but here we concentrate on water depth h for illustration, and use
A full probabilistic tsunami hazard assessment (PTHA) would have to include all potential sources of tsunamis, far-field as well as near-field, and also possibly tsunamis induced by landslides or other processes; a complete review of this can be found in Grezio and Babeyko (2017). Here we concentrate on one aspect of PTHA, assessing the probabilities in a coastal region due to a megathrust event on a nearby subduction zone. This is a difficult aspect of PTHA because variations in the spatial distribution of the slip can have a significant effect on the resulting tsunami (Goda et al., 2016; Melgar et al., 2019). In addition, some events may cause substantial subsidence or uplift of the coast around the location of interest, which can also greatly effect the inundation extent and depth of the resulting tsunami.
To perform PTHA it is necessary to first have some model for the probability density function (PDF) of all possible events. It is impossible to know the correct distribution due to the high degree of epistemic uncertainty in subduction zones with infrequent past megathrust events. However, recent studies have suggested ways to generate a geophysically reasonable distribution that can be easily sampled to generate large numbers of hypothetical events (LeVeque et al., 2016); accordingly, the approach we use is based on a Karhunen-Loève (K-L) expansion to generate slip patterns with correlation lengths that are thought to be “reasonable” from studies of past events, e.g., (Mai and Beroza, 2002; Goda et al., 2016; Melgar and Hayes, 2019; Crempien et al., 2020), as discussed further in Section 2. We stress, however, that we do not claim we have the “correct” distribution, or even the best possible based on available science, and so our focus is on a methodology that could also easily be applied to other choices of the PDF. We also suggest that any PTHA study intended as guidance for decision makers should include a sensitivity study that considers how robust the results are to changes in the assumed earthquake distribution, along with other approaches for assessing the effects of the epistemic uncertainty inherent in this problem, see, for example, Davies and Griffin (2020). In addition there is a need for further testing of random tsunami models comparing their statistical properties with historical tsunamis as is done in Davies (2019) for 18 recent events in the Pacific and Indian Oceans. Of course too few data points are available from the Cascadia Subduction Zone (CSZ) itself to allow a local validation of any PDF.
In this paper we focus on how best to handle the aleatoric uncertainty, i.e., assuming that we have a probability distribution to use for the PTHA, how do we efficiently create hazard curves based on this distribution? The brute force approach would be to choose a very large number N of samples from the distribution, perform a numerical tsunami simulation with each, and then (for each location of interest and each exceedance value) determine the number
The primary difficulty we address is that N may need to be very large in order to get meaningful statistics, particular for the relatively unlikely but most dangerous higher values of
A fundamental problem already arises when we ask for a reasonable value of
We can address this by a simple application of importance sampling. We first split the space of all possible events of interest into a small number of classes. For illustration in this paper we use four classes based on magnitude: Mw 7.5, 8.0, 8.5, and 9.0, but this could easily be expanded. We then assign an annual probability to each class, call this
TABLE 1. Table of the four magnitude classes used in this work, with the annual probability
Next we tackle the problem that even 2,000 tsunami simulations may be excessively demanding, particularly when we expect that many of these events will give very similar inundation patterns and depths as other events, and so in principle it should be possible to estimate the hazard curve with fewer simulations of judiciously chosen representative tsunamis. We develop an approach for clustering that can be applied to the 2,000 events before doing any fine-scale tsunami modeling, in order to identify clusters of events that we expect to give very similar tsunami impact in the location of interest. We then do a fine-scale tsunami model of only one realization from each cluster (which we call the “cluster representative”) and assign it a weight that is based on the collection of events in that cluster. Based on this we can estimate the contribution that this cluster should make to each hazard curve. This clustering is explained in much more detail in Section 5. Other studies have used clustering to achieve scenario reduction. For example, see Lorito et al. (2015) for hazard assessment, Gusman et al. (2014) for early warning and Volpe et al. (2019) for a study more closely related to this paper.
The clustering approach we illustrate in this paper is based on doing a coarse-grid tsunami simulation for each of the 2,000 realizations, with a grid resolution that allows much faster simulation, but is too coarse to properly represent the tsunami inundation over the communities of interest. However, we show that these coarse grid simulations give information in the form of proxy variables that can be used to very effectively cluster the events.
Moreover, the coarse-grid simulations can be greatly enhanced to provide “pseudo-fine” results that are at the resolution of the desired fine grid and that agree very well with the actual fine-grid simulations of the same realization, but are much cheaper to compute. This enhancement is performed in part using information about the difference between the coarse and fine grid simulations performed for the few realizations where both are available (the cluster representatives). This procedure is described in more detail in Section 4.
For illustration we consider two sample communities: Crescent City, CA, which is near the southern extent of the CSZ and Westport, WA, roughly 570 km north of Crescent City. These communities are both at high risk to CSZ tsunamis and have been the subject of past studies. They also have quite different topographic features as discussed further in Section 3. The same set of 2,000 CSZ realizations was used for each site, although the clustering algorithm is applied separately to each, since the set of realizations that give similar inundation patterns at one site may not form a suitable cluster at the second site. For illustration we show that selecting only 18 clusters (and hence performing only 18 fine-grid simulations for each site) gives hazard curves and maps that compare very well with those obtained if all 2,000 realizations are simulated on the fine grid, particularly after adding in additional information obtained from coarse-grid simulations of each realization. These results are presented in Section 7.
Some of the techniques presented in this paper were first developed as part of a project funded by Federal Emergency Management Agency Region IX, and presented in the project Final Report by Adams et al. (2017). Subsequently we have improved some of these techniques. We are also now using a probability distribution that is potentially more realistic than the original choice, and we consider two different target communities with quite different topography in order to better test the general applicability of these ideas. The original report and associated webpages (Adams et al., 2017) contain more discussion of some of these ideas, along with illustrations of some related approaches that are not reported in this paper. Research on PTHA using stochastic collections of sources goes back many years, see, for example, the early review Geist and Parsons (2006) and the more recent ones of Geist and Lynett (2014) and Grezio and Babeyko (2017) for many more references.
Recently, several researchers have adopted the use of a K-L expansion to generate large suites of realizations for PTHA studies of particular regions and/or to study sensitivities and uncertainty. For example, Gonzalez et al. (2020) generated 400 realizations for a hybrid deterministic/PTHA to Iquique, Chile, and Crempien et al. (2020) generated 10,000 realizations on a idealized fault and performed GeoClaw tsunami simulations of each on idealized topography to study the effect of spatial slip correlation on tsunami intensity. The techniques developed in this paper could help to accelerate such studies.
Research on reducing the work required to handle large sets of realizations has also been done by others. For example, Sepúlveda et al. (2017) used the K-L expansion together with a stochastic reduced order model to obtain better results than with a brute force Monte Carlo simulation, and Sepúlveda et al. (2019) used these techniques to do a PTHA analysis including a sensitivity study for Hong Kong and Taiwan locations due to earthquakes on the Manila Subduction Zone. These techniques reduced the number of simulations needed from 10,000 to 200 for each of 11 sets of earthquakes, followed by fine-grid simulations of the resulting 2,200 realizations. The reduction was based on seafloor deformation statistics at the earthquake sources. That paper also has a strong emphasis on using sensitivity analysis to quantify errors in PTHA.
Even closer to the methodology presented in this paper is the source filtering approach developed recently by Volpe et al. (2019). They started with a suite of more than 1.7 million scenarios that affect their study region, and used a clustering algorithm based on cheaply obtained proxies to reduce this to a smaller set of 1,154. They then performed fine-grid simulations for one representative from each cluster. One significant difference in our approach is that we use coarse-grid simulations (using the full nonlinear tsunami model, including onshore inundation over the study region) and the associated pseudo-fine results, which allowed us to obtain PTHA results with fewer fine-grid simulations. On the other hand, we performed 2,000 coarse-grid simulations to obtain these, whereas Volpe et al. (2019) performed the clustering based on proxy data that was more cheaply obtained. A hybrid approach might be to apply our methodology to the 1,154 cluster representatives identified in Volpe et al. (2019), performing only coarse-grid inundation simulations of these, and then further clustering into a much smaller set for the fine-grid simulations.
The approach we use to create pseudo-fine results is also similar to the idea of multilevel or multifidelity Monte Carlo methods (Giles, 2015; Peherstorfer et al., 2018), in which results from two or more different resolution simulations are combined to reduce the computational load. This is often done in the context of creating a surrogate model or emulator that can be very cheaply evaluated for new parameter choices in order to do a more extensive Monte Carlo simulation. This approach has been used in connection with tsunami modeling by de Baar and Roberts (2017), and by Salmanidou et al. (2017) for underwater landslide and tsunami modeling. For a review of these types of statistical approaches, see Viana et al. (2017). Our approach is somewhat different in the way we use cluster representatives and the differences in the local topography at different resolutions in defining the corrections.
Earthquake Probability Density and Realizations
Probability distributions proposed for CSZ earthquake magnitudes have included both characteristic and Gutenberg–Richter (G-R) types. More generally, Parsons et al. (2012) noted that this is a long-standing controversy for many other fault zones. Consequently, they developed both types of distribution models for the Nankai Trough, based on data from many past events. The characteristic earthquake model was based on fixed rupture geometries and historical/palaeoseismic recurrence times, and the G-R model was based on fault-slip rates and an estimated distribution slope (b-value). They found that the G-R distribution, constrained with a spatially variable long-term slip rate, replicated much of the spatial distribution of observed segmented rupture rates along the Nankai, Tonankai, and Tokai subduction zones, although with some rate differences between the two methods in the Tokai zone. Thus, where supporting information exists (e.g., palaeoseismic and historical recurrence data), and fault segmentation observations are absent, they suggested that very simple earthquake rupture simulations based on empirical data and fundamental earthquake laws could be useful forecast tools in settings with sparse data from past events. Models using a G-R distribution but without the explicit guidance of a varying long-term slip rate have also been employed, both globally and specifically along the Cascadia margin (Rong et al., 2014). We thus view a G-R distribution of magnitudes as adequate for this study.
We generated 2,000 slip realizations over four magnitude classes: Mw 7.5, 8.0, 8.5, and 9.0 (with 500 of each). To determine the annual probabilities of earthquakes in each of the magnitude classes, we follow a G-R law using a b-value of 1, indicating “normal” seismic behavior. We also assume a yearly rate of occurrence of a Mw 9.0 along the CSZ as once every 526 years based on paleotsunami records from Goldfinger et al. (2012). This implies an a-value of 6.279 in the G-R relation and gives us annual probabilities
We limit our earthquake realizations to imitate a series of thrust events located on the megathrust interface along the CSZ. To introduce variability to each realization, we allow for geophysically reasonable variations in slip distribution, location, and rupture dimension. An example of a rupture from each magnitude class is shown in Figure 1. We employ a regional fault geometry that approximates the CSZ from McCrory et al. (2012). This is then discretized into triangular subfaults using the three-dimensional finite element mesh generator GMSH (Geuzaine and Remacle, 2009). A triangular mesh allows a variable strike and dip that can better approximate the McCrory et al. (2012) geometry than a rectangular discretization. Our area of interest extends along the entire CSZ margin and down to a depth of 30 km beyond which slip is not expected to continue (Frankel et al., 2015).
FIGURE 1. (A–D) Example earthquake realizations for each magnitude class. (E) Variability in correlation length and width per each magnitude. (F) Variability in total fault length and width per magnitude class. (G) Variability in mean and maximum slip, in meters, for realizations in each magnitude class.
In order to introduce variability in ruptures of the same magnitude, as is observed from past earthquakes, the length and width of each realization is obtained from a probabilistic source dimension scaling law (Blaser et al., 2010). For each individual rupture we sample from a lognormal PDF such that
Once the bounds of the rupture area are established, we generate a stochastic slip distribution using an application of the Karhunen-Loève (K-L) expansion following LeVeque et al. (2016) and Melgar et al. (2016). Here, we assign slip over participating subfaults using a von Karman correlation function,
H is the Hurst exponent (set in this study as 0.75),
The correlation length and width for each realization governs the size of asperities and uses a magnitude dependent scaling law from Melgar and Hayes (2019) where
We cap the upper level of slip possible for any realization in this study to 60 m, as was recommended in Melgar et al. (2016) and based on plate convergence rates from McCaffrey et al. (2007). We achieve this by rejecting and re-running any realization where any subfault in our mesh has an assigned slip that is greater than our maximum slip threshold. This cap is in place in order to limit the possibility of unrealistically large amounts of slip in any earthquake realization. It should be noted that this cap creates an upper limit in tsunami intensity that may be reflected in any final PTHA analysis. We do not enforce the target magnitudes in a strict sense. The resulting magnitude after the stochastic process can be slightly higher or lower than the requested values. We do not re-scale the slip in anyway to force the rupture to have the target magnitude exactly. This, and the maximum slip requirement, can introduce departures from the desired PDF (Sepúlveda et al., 2017) but this is generally an effect that is much smaller than the epistemic uncertainty.
We calculate the total seafloor deformation of each earthquake realization using angular dislocations for triangular subfaults in an elastic half space (Comninou and Dundurs, 1975). This method can be seen as a variant of the Okada equations, which focus on rectangular subfaults (Okada, 1985). We obtain the deformation over the entire CSZ study region with at a 30” spatial resolution. This is a fine enough spacing to ensure that we recover slip features that may be present at our smallest magnitude class, including rupturing asperities at the smallest reported slip correlation length (see examples in Figure 2).
FIGURE 2. Slip on our fault geometry and associated seafloor deformation for two sample realizations, numbered 1,665 and 1,999, shown in Figures 5, 8. WP marks the location of Westport, WA. CC marks the location of Crescent City, CA.
Seafloor deformation is directly translated to a disturbance at sea level by assuming an incompressible water column. While some large magnitude earthquakes can have rupture durations extending multiple minutes, this kinematic effect on the tsunami in the near-field is minimal (Satake, 1987; Williamson et al., 2019). Here, we simplify the rupture process by treating all seafloor deformation as instantaneous and occurring at the initial time step of our tsunami model. It is this initial disturbance that initializes the tsunami model, as discussed further in Section 3.
Figure 2 shows both slip on the fault and the resulting seafloor deformation for two of the magnitude 9.0 realizations. We use
Communities of Interest and Tsunami Modeling
We consider two sample communities as shown in Figure 3. Crescent City, CA was used in previous work on this topic (Adams et al., 2017), and was the subject of a previous PTHA analysis by González et al. (2014) and Adams et al. (2015). Tsunamis tend to focus in Crescent City due to the offshore bathymetry and harbor (Horrillo et al., 2008), and the central business district is bounded by the harbor, the low-lying Elk River valley to the east, and higher hills to the north.
FIGURE 3. Communities of interest in this study. (A) Regional and inset view of Westport, WA and two representative cross sections shown in Section 7. (B) Regional and inset view of Crescent City, CA with two representative cross sections. Both regional views have a bathymetric (dashed) and topographic (solid) contour interval of 10 m. Inset figures use a contour interval of 5 m. The coastline is differentiated with a bolded brown line.
Westport, WA lies on a peninsula at the entrance to Grays Harbor. The topography is below roughly 10 m everywhere on the peninsula, and a number of north-south running ridges protect some areas from the direct waves arriving from the west that may still be flooded from the east after the tsunami enters Grays Harbor. Westport is the site of the Ocosta Elementary School, recently rebuilt to include the first tsunami vertical evacuation structure constructed in the United States, for which tsunami modeling was presented by González et al. (2013). We selected these two communities to showcase the versatility clustering PTHA methodology. However, this methodology can be extended to any coastal community.
Tsunami simulations are performed using GeoClaw Version 5.7.0, distributed as part of the open source Clawpack software (Clawpack Development Team, 2020). This solves the two-dimensional depth-averaged non-linear shallow water equations using adaptive mesh refinement on rectangular grid patches (in longitude-latitude coordinates). GeoClaw allows each cell to be wet or dry and to change dynamically, so that the wet/dry boundary of the coastline evolves as the tsunami inundates the coastal site of interest.
For this study we simulated the tsunami from each of the 2,000 realizations in two separate simulations. The first set were the “fine-grid runs” where refinement down to 1 arcsecond (roughly 30 m in latitude, less in longitude) was enforced over both study sites. This provided the “ground truth model” hazard curves and maps to use for comparison purposes, i.e., we assume that our goal is to produce good approximations to these curves and maps with much less work than was required to run 2,000 fine-grid simulations. The second set of simulations were the “coarse-grid runs” in which the refinement only went down to 9,” and hence a factor of 81 fewer grid points on the finest level than in the fine-grid runs. Moreover on these coarser grids it is also possible to take larger time steps [while still respecting the Courant-Friedrichs-Lewy condition required by the explicit finite volume method used in GeoClaw], potentially giving another factor of 9. However, since some of the computation takes place on coarser grids over the entire computational domain, the coarse grid simulations are on average 5 times faster than the fine grid simulations; see below. We also note that for a real PTHA we might want to use even finer grids, e.g., 1/3” is often used now used for hazard studies, and 1/9” topography is becoming available in many locations. In this case the relative speedup for coarse-grid simulations could be much more dramatic.
We use adaptive mesh refinement to optimize the computational cost of each tsunami simulation. All simulations used three levels of refinement in the open-ocean, with grid resolution 1″, 6″, and 3″, and with regridding every few time steps to follow the propagating waves (based on a tolerance on the sea surface elevation). On the continental shelf, refinement is allowed to the next Adaptive Mesh Refinement level at 90”. An additional refinement level of 9” is enforced around both study sites. For the coarse-grid simulations only these five levels of Adaptive Mesh Refinement are used. For the fine-grid runs, two additional Adaptive Mesh Refinement levels are introduced at 3” and 1” resolutions, and the study areas are forced to be resolved at the finest 1” level. The ETOPO1 topography Digital Elevation Model at 1 arcminute resolution (Amante and Eakins, 2009) was used over the full computational domain. A subset of the Astoria, Oregon 1/3” Digital Elevation Model (NOAA NCEI, 2017) was used around Westport, and around Crescent City a version of the Crescent City, California 1/3” Digital Elevation Model (NOAA NCEI, 2012) was used that was modified to remove the pier in the harbor, since water flows under the pier, for an earlier PTHA study of this region by González et al. (2014).
In each simulation we monitor the maximum water depth h over a grid of points covering the study area (at the finest resolution of the simulation) over the duration of the simulation. For this study we ran each tsunami simulation to 4 h of simulated time after the instantaneous seafloor displacement. Examining the results we found that in a few cases there were still significant edge waves trapped along the coast that could have lead to slightly larger values of the maximum at some points, so a realistic PTHA should run some realizations out to later times. For the purposes of this study our reference solution uses the maximum h over the same time period as our approximations and so comparisons are still valid.
At each point where h is monitored, these maximum values (denoted by
The reference hazard curve is affected by the spatial distribution and properties of the rupture realizations which act as our ground truth. Therefore, it is important to have enough realizations at each magnitude class to capture all tsunamigenic behavior that is possible given the seismic constraints we presented in Section 2. The total number of realizations per magnitude bin is based on the variability in the likelihood of exceeding a set of tsunami amplitudes in the harbors of both Crescent City, and Westport, as illustrated in Figure 4. Here, we calculate the likelihood that h exceeds a set of tsunami thresholds, ranging from 1 cm to 10 m at one particular point in each study region. As more realizations in a magnitude class are added, the variability in the probability of exceeding each tsunami threshold reduces. We can estimate that we have enough realizations to act as our ground truth when each probability curve has flattened out. Here, this occurs at about 400 realizations per magnitude.
FIGURE 4. The probability of exceeding a specified tsunami threshold as a function of the number of cases included for each magnitude class at particular points in (A) Crescent City and (B) Westport. Thresholds tested range from 1 cm to 10 m, with a separate curve for each, color-coded as indicated by the color bar.
The tsunami simulations were performed using the OpenMP feature of GeoClaw using 30 threads on a Linux server. The total Central Processing Unit time varied for each realization, depending on whether the initial deformation came from a small localized slip patch (requiring small regions of refinement in the ocean and possibly resulting in a negligible tsunami) or a larger rupture. Total Central Processing Unit time (summed over all threads and over all 2,000 simulations) was 49.2 h for the coarse-grid runs and 255.6 h for the fine-grid runs.
Before proceeding, we first show
FIGURE 5. Sample results for realization 1,665 at Westport, where the tsunami was small. (A, D, G)
FIGURE 6. Sample results for realization 1,665 at Crescent City, where the tsunami was large. Panels as described in Figure 5. Purple is above 8 m. and Green is land not inundated.
FIGURE 7. Sample results for realization 1,999 at Westport, where the tsunami was large. Panels as described in Figure 5. Purple is above 8 m and Green is land not inundated.
FIGURE 8. Sample results for realization 1,999 at Crescent City, where the tsunami was small. Panels as described in Figure 5. Purple is above 8 m and Green is land not inundated.
The remaining panels of each figure show the result of enhancing the coarse-grid results using techniques developed in the next section, where these will be discussed in more detail.
Coarse-Mod and Pseudo-Fine Enhancements
Our PTHA approach starts by sampling N realizations, which we denote by
One approach to approximating the hazard curves for N realizations using K clusters is to perform only one fine-grid simulation for each cluster (for a “cluster representative” realization selected from the cluster), and assign a weight to each that is the sum of the weights of all realizations in the cluster. We show results of this approach in Section 7. However, using only K events will give a hazard curve with only K jump discontinuities and cannot well approximate the true hazard curve if K is much smaller than N, particularly at the lower probabilities. In our example application,
Much better results are obtained if we also make use of the remaining
Modified Coarse Grid Corrections
Each of the
The coarse-mod corrections are based on the observation that the maximum water surface
In other words, we simply adjust
However, there are a few special cases where we can not use (7). Clearly, if
The last three lines in Eq. 9 refer to the special case when water does not reach the coarse bathymetry level
Based on the above discussion the modified coarse grid value is given by:
The second row of Figures 5-8 shows examples of the effect of this. Panel D is again the fine grid
An earlier version of this coarse-mod strategy was used in (Adams et al., 2017), and more examples are given in the Appendix of that report showing how well the modified coarse data can compare to the original coarse data and the fine data. We have since improved this strategy by looking only locally for the threshold value
Pseudo Fine Grid Corrections
We now present an approach to further improve each of the coarse-mod results defined by (9) by also using the fine-grid runs performed for each cluster representative. We begin by clustering the N modified coarse grid runs (or the original N coarse grid runs) into a small number of non-overlapping clusters. For this paper, the clustering was done using the original
After the clustering has been done, more information is available that can be used to further improve the coarse-mod approximations. Since we assume that the cluster representative
Note in particular that the pseduo-fine result for the cluster representative itself (i.e, for
We also believe the modified-coarse results provide an excellent start in building the pseudo-fine results because the coarse grid GeoClaw simulations already contain information about the nonlinear flow dynamics on land. This is in contrast to cheaper coarse results that could have been obtained in deep water using the linearlized shallow water equations, such as the thousands used by Li et al. (2016), but this is an approximation we wished to avoid since our focus is the inundation on land.
The third row of Figures 5-8 show examples of the effect of this. Panel G is again the fine grid
In this section we discuss how one can subdivide the individual events into a small number of clusters that are likely to have similar inundation patterns. The clustering will be based on proxy quantities for each event that can be computed solely from the coarse-grid (low-resolution) simulations, whose runtime is orders of magnitude smaller than the fine-grid (high-resolution) tsunami inundation simulations.
In our previous work (Adams et al., 2017), proxy variables based only on the seafloor deformation of each realization were also considered. These are much cheaper to compute than the coarse-grid simulation proxies, but did not do as good a job of clustering, even though in that work we only considered ruptures on the southern margin of CSZ and only one near-field study region, Crescent City. Given the wider range of events now being considered, where many events are localized far away from a study region, we believe that it may be harder to develop robust proxies based only on the seafloor deformations. However, due to the greater efficiency of that approach, this could be a fruitful area for future research. We also note that Volpe et al. (2019) used seafloor deformation near the study region, both to classify realizations into near-field and far-field, and also in clustering the near-field realizations.
In this work we use three proxy variables computed from the coarse-grid simulations for each realization. Each realization thus corresponds to a point in a three-dimensional space and various clustering methods can then be used to identify clusters. We consider variables that attempt to capture aspects of the spatial variation of the inundation patterns. For each realization, and each coastal location, we compute the value
As the proxy variables, we will use the following statistics of
The first variable
Utilizing these quantities, the coarse-grid inundation from each individual realization can be mapped to a point in the three-dimensional space of proxy variables, which we will call “proxy-space,” as shown in Figure 9. To perform clustering it is also necessary to define a metric that measures the distance between two points in this space, and here we simply use the Euclidean distance (the square root of the sum of squares of differences in each of the three proxy variables). We then use K-means clustering (Lloyd, 1982), as implemented in scipy. sklearn by Pedregosa et al. (2011) to cluster the 2,000 points in proxy-space into K clusters, with the property that each point belongs to the cluster with the closest centroid (as measured in the specified metric). Note that because
FIGURE 9. Clustering results for the two study regions. In each case all 2,000 realizations are represented as a point in the three-dimensional proxy-space, and colored by cluster after with 18 clusters in each case. The crosses indicate the cluster representative, i.e., the realization closest to the centroid of each cluster.
This clustering is done independently for the two study regions, and the results are shown in the scatter plots in Figure 9. The plot also highlights which realization is closest to the centroid of each cluster, which we refer to as the cluster representative
From Figure 9 one observes that there is a general monotonic behavior with respect to all three variables. Higher magnitude events tend to have higher values for all three variables. But also note that the scatter plot for the two coastal communities show qualitatively different patterns. While the points in proxy space for Westport show a more predictable behavior with higher magnitude events more concentrated along a smooth, monotone increasing curve, there is much more variation in the proxy variable
Hazard Curves and Maps
Finally we combine the techniques developed in the previous sections to produce approximate hazard curves with much less work than would be required to perform all N fine-grid simulations. We first summarize our notation and the definition of the hazard curves and these approximations. For other discussions of hazard curves, see, for example, (González et al., 2014; Adams et al., 2015), and also the review paper (Grezio and Babeyko, 2017). and associated Jupyter notebooks that illustrate these concepts interactively.
We assume that we have split all possible events into J classes indexed by
As long as the probabilities are small, the final line of (13) is a good approximation to the true value. For the probabilities listed in Table 1,
We next chose
Now consider a fixed location in the study region where we have computed
FIGURE 10. Sample hazard curves at two locations in Westport (top) and two locations in Crescent City (bottom), in each case indicated by the dots in the inset maps. The curves show the reference all-fine result
Note that summing the weights
In Section 5 we discussed an approach to clustering the
One simple strategy for approximating the hazard curve is then to assign a weight
and then approximate
In other words, we assume that if the tsunami from the cluster representative
Another approximation, which requires no clustering, would be to approximate
Much better results can be obtained by using all of the coarse-grid results after enhancing them using the techniques presented in Section 4. Using only the coarse-mod corrections to each coarse-grid result would lead to the approximation
while also using the clustering to produce the pseudo-fine corrections gives
In general using Eq. 19 as an approximation to the all-fine grid (ground truth model) hazard curve defined by
Probabilistic Tsunami Hazard Assessment Results
We now explore the results of performing PTHA using the clustering strategies developed in Section 5, either alone or in conjunction with additional coarse-mod or pseudo-fine results as developed in Section 4.
Recall that we have sampled
Recall from Section 6 that a hazard curve is defined at each point in the study region where
Note that the all-fine, coarse-mod, and pseudo-fine hazard curves obtained using 2,000 realizations typically have 2,000 jump discontinuities, one at the location of
The hazard curve
Using the coarse-mod enhancement of each coarse-grid result gives the hazard curve
In evaluating the results shown in Figure 10, it is important to remember that we cannot expect very good agreement at the smallest annual probabilities, where the results depend entirely on the most extreme tsunamis out of the 2,000 selected. Of most interest in this study is the portion of the hazard curve above say
Hazard Maps and Transects
The hazard curves
FIGURE 11. Sample hazard maps and transects for select locations in Westport. The top panels show plan view plots for return times
FIGURE 12. Sample hazard maps and transects for Crescent City. As described in the caption to Figure 11.
Note that in these maps we show offshore points as well as onshore points, since
It is hard to quantitatively compare these hazard maps, and impossible to present results for more than one probability
Finally, in Table 2, we give the maximum and mean differences between
TABLE 2. The magnitude of
We have presented a general approach to performing PTHA when given a) a set of classes of possible events with an annual probability or return time for each class, and b) a probability density within each class that can be sampled to obtain a sufficiently large number of sample realizations that hazard curves can be accurately approximated by performing fine grid tsunami simulations for each realization. The problem we considered is that the number of realizations needed may be too large to perform fine grid simulations of each, particularly if many coastal locations are of interest, and so the goal is to obtain good approximations to the hazard curves that would be generated by all the fine-grid simulations with much less work, employing only coarse-grid simulations, clustering, and correction procedures.
We considered a model problem where 2,000 fine-grid inundation simulations were performed in order to obtain a reference hazard curve to test our methodology, which we again summarize. We first performed coarse-grid inundation simulations for each realization, using a set of four magnitude classes and a K-L expansion to define the probability density within each class for illustrative purposes. We then clustered them into only 18 clusters and performed one fine-grid simulation for a single representative from each cluster. We also used these 18 simulations, with very little additional work, to enhance the remaining coarse-grid results. The resulting 2,000 pseudo-fine results were then used to produce approximate hazard curves that are much more accurate than those obtained using the cluster representatives alone.
Although we chose geophysically reasonable parameters, we do not claim that the results of our fine-grid hazard curves are correct, only that they are reasonable reference solutions against which to compare cheaper strategies. We have also ignored other magnitude events on the CSZ, other fault mechanisms such as splay faults, along with distant earthquakes and other tsunami sources, so the results in this paper should not be interpreted as providing realistic estimates of hazards in either Crescent City or Westport. Certainly any realistic PTHA meant to inform decision-making should also include sensitivity studies, particularly in light of the large epistemic uncertainty in the parameters that go into the probability densities (whether generated by K-L expansion or by any other technique). The techniques of this paper can be useful for such sensitivity studies since they can accelerate the PTHA for any set of realizations and thus allow testing more sets of realizations in order to investigate the resulting variation in hazard curves. Sets of realizations might be generated with different density parameters, or be of different sizes. Different random sets of realizations of the same size can also help to better explore the aleatoric uncertainty.
More work is needed to better understand and optimize the clustering method used in this paper. In particular there is a need to better quantify the number of clusters that should be used, and the best set of proxy variables to use in performing the clustering.
However, we note that we were generally able to achieve nearly the same hazard curves with our pseudo-fine results as when using all the fine-grid results, down to an annual probability of
Data Availability Statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: http://depts.washington.edu/ptha/frontiers2020a/.
All authors contributed to the initial development and project goals of this study. AW and DM developed the earthquake realizations and performed the tsunami simulations. FG, RL, and DR developed the clustering methodology, and LA developed the coarse-mod and pseudo-fine techniques. All authors contributed to writing the text and producing the figures presented.
This research was supported in part by FEMA Region IX, Grant EMW-2014-CA-00218, and by NASA DISASTERS grant 80NSSC19K1104, AFOSR CoE FA9550-17-1–0195, AFOSR MURI FA9550-15-1-0038, NSF Hazard SEES grant EAR-1331412, and NSF CoPe‐EAGER ICER-1940024.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Adams, L. M., LeVeque, R. J., and González, F. I. (2015). The Pattern Method for incorporating tidal uncertainty into probabilistic tsunami hazard assessment (PTHA). Nat. Hazards. 76, 19–39. doi:10.1007/s11069-014-1482-z
Adams, L. M., LeVeque, R. J., Rim, D., and Gonzalez, F. I. (2017). Probabilistic source selection for the Cascadia Subduction Zone. Results from a study supported by FEMA region IX. Available at: http://depts.washington.edu/ptha/FEMA (Accessed March, 2017).
Amante, C., and Eakins, B. W. (2009). “ETOPO1 1 Arc-minute global relief model: procedures, data sources and analysis,” in NOAA Technical Memorandum NESDIS NGDC-24. Boulder, CO. March 2009. Available at: http://www.ngdc.noaa.gov/mgg/global/global.html (Accessed March, 2009).
Blaser, L., Krüger, F., Ohrnberger, M., and Scherbaum, F. (2010). Scaling relations of earthquake source parameter estimates with special focus on subduction environment. Bull. Seismol. Soc. Am. 100, 2914–2926. doi:10.1785/0120100111
Crempien, J. G., Urrutia, A., Benavente, R., and Cienfuegos, R. (2020). Effects of earthquake spatial slip correlation on variability of tsunami potential energy and intensities. Sci. Rep. 10, 1–10. doi:10.1038/s41598-020-65412-3
Davies, G., and Griffin, J. (2020). Sensitivity of probabilistic tsunami hazard assessment to far-field earthquake slip complexity and rigidity depth-dependence: case study of Australia. Pure Appl. Geophys. 177, 1521–1548. doi:10.1007/s00024-019-02299-w
Frankel, A., Chen, R., Petersen, M., Moschetti, M., and Sherrod, B. (2015). 2014 update of the Pacific Northwest portion of the U.S. National seismic hazard maps. Earthq. Spectra. 31, S131–S148. doi:10.1193/111314eqs193m
Goda, K., Yasuda, T., Mori, N., and Maruyama, T. (2016). New scaling relationships of earthquake source parameters for stochastic tsunami simulation. Coast Eng. J. 58, 1650010–1650011. doi:10.1142/s0578563416500108
Goldberg, D. E., and Melgar, D. (2020). Generation and validation of broadband synthetic P waves in semistochastic models of large earthquakes. Bull. Seismol. Soc. Am. 110, 1982–1995. doi:10.1785/0120200049
Goldfinger, C., Nelson, C. H., Morey, A. E., Johnson, J. E., Patton, J. R., Karabanov, E. B., et al. (2012). Turbidite event history—methods and implications for Holocene paleoseismicity of the Cascadia subduction zone. Tech. rep., US Geological Survey, 170 pp. doi:10.3133/pp1661f
González, F., LeVeque, R., Adams, L., Goldfinger, C., Priest, G., and Wang, K. (2014). Probabilistic tsunami hazard assessment (PTHA) for Crescent City, CA final report September 11, 2014. Seattle, WA: University of Washington ResearchWorks Archive. Available at: http://hdl.handle.net/1773/25916 (Accessed September 11, 2014).
González, F. I., LeVeque, R. J., and Adams, L. M. (2013). Tsunami hazard assessment of the Ocosta School site in Westport, WA. Seattle, WA: University of Washington ResearchWorks Archive. Available at: http://hdl.handle.net/1773/24054 (Accessed September 11, 2013).
González, J., González, G., Aránguiz, R., Melgar, D., Zamora, N., Shrivastava, M. N., et al. (2020). A hybrid deterministic and stochastic approach for tsunami hazard assessment in Iquique, Chile. Nat. Hazards. 100, 231–254. doi:10.1007/s11069-019-03809-8
Grezio, A., Babeyko, A., Baptista, M. A., Behrens, J., Costa, A., Davies, G., et al. (2017). Probabilistic tsunami hazard analysis: multiple sources and global applications. Rev. Geophys. 55, 1158–1198. doi:10.1002/2017rg000579
Gusman, A. R., Tanioka, Y., Macinnes, B. T., and Tsushima, H. (2014). A methodology for near-field tsunami inundation forecasting: application to the 2011 Tohoku tsunami. J. Geophys. Res. Solid Earth. 119, 8186–8206. doi:10.1002/2014JB010958
Horrillo, J., Knight, W., and Kowalik, Z. (2008). Kuril islands tsunami of November 2006: 2. Impact at Crescent City by local enhancement. J. Geophys. Res. Oceans. 113, C01021. doi:10.1029/2007jc004404
LeVeque, R. J., Waagan, K., Gonzalez, F. I., Rim, D., and Lin, G. (2016). Generating random earthquake events for probabilistic tsunami hazard assessment. Pure Appl. Geophys. 173, 3671–3692. doi:10.1007/s00024-016-1357-1
Li, L., Switzer, A. D., Chan, C.-H., Wang, Y., Weiss, R., and Qiu, Q. (2016). How heterogeneous coseismic slip affects regional probabilistic tsunami hazard assessment: a case study in the south China Sea. J. Geophys. Res. Solid Earth. 121, 6250–6272. doi:10.1002/2016jb013111
Lorito, S., Selva, J., Basili, R., Romano, F., Tiberti, M. M., and Piatanesi, A. (2015). Probabilistic hazard for seismically induced tsunamis: accuracy and feasibility of inundation maps. Geophys. J. Int. 200, 574–588. doi:10.1093/gji/ggu408
McCaffrey, R., Qamar, A. I., King, R. W., Wells, R., Khazaradze, G., Williams, C. A., et al. (2007). Fault locking, block rotation and crustal deformation in the Pacific Northwest. Geophys. J. Int. 169, 1315–1340. doi:10.1111/j.1365-246x.2007.03371.x
McCrory, P. A., Blair, J. L., Waldhauser, F., and Oppenheimer, D. H. (2012). Juan de Fuca slab geometry and its relation to Wadati-Benioff zone seismicity. J. Geophys. Res. 117, B09306. doi:10.1029/2012jb009407
Melgar, D., LeVeque, R. J., Dreger, D. S., and Allen, R. M. (2016). Kinematic rupture scenarios and synthetic displacement data: an example application to the Cascadia Subduction Zone. J. Geophys. Res. Solid Earth. 121, 6658–6674. doi:10.1002/2016jb013314
Melgar, D., Williamson, A. L., and Salazar-Monroy, E. F. (2019). Differences between heterogenous and homogenous slip in regional tsunami hazards modelling. Geophys. J. Int. 219, 553–562. doi:10.1093/gji/ggz299
NOAA NCEI (2012). Crescent city, California 1/3 Arc-second MHW coastal digital elevation model. Available at: https://www.ngdc.noaa.gov/metaview/page?xml=NOAA/NESDIS/NGDC/MGG/DEM/iso/xml/693.xml&view=getDataView&header=none (Accessed July 14, 2010).
NOAA NCEI (2017). Astoria, Oregon 1/3 Arc-second MHW Coastal Digital Elevation Model. Available at: https://www.ngdc.noaa.gov/metaview/page?xml=NOAA/NESDIS/NGDC/MGG/DEM/iso/xml/5490.xml&view=getDataView&header=none (Accessed June 18, 2016).
Parsons, T., Console, R., Falcone, G., Murru, M., and Yamashina, K. I. (2012). Comparison of characteristic and Gutenberg-Richter models for time-dependent M ≥ 7.9 earthquake probability in the Nankai-Tokai subduction zone, Japan. Geophys. J. Int. 190, 1673–1688. doi:10.1111/j.1365-246x.2012.05595.x
Salmanidou, D. M., Guillas, S., Georgiopoulou, A., and Dias, F. (2017). Statistical emulation of landslide-induced tsunamis at the rockall bank, NE Atlantic. Proc. R. Soc. A. 473, 20170026. doi:10.1098/rspa.2017.0026
Sepúlveda, I., Liu, P. L. F., and Grigoriu, M. (2019). Probabilistic tsunami hazard assessment in South China Sea with consideration of uncertain earthquake characteristics. J. Geophys. Res. Solid Earth. 124, 658–688. doi:10.1029/2018jb016620
Sepúlveda, I., Liu, P. L.-F., Grigoriu, M., and Pritchard, M. (2017). Tsunami hazard assessments with consideration of uncertain earthquake slip distribution and location. J. Geophys. Res. Solid Earth. 122, 7252–7271. doi:10.1002/2017jb014430
Volpe, M., Lorito, S., Selva, J., Tonini, R., Romano, F., and Brizuela, B. (2019). From regional to local SPTHA: efficient computation of probabilistic tsunami inundation maps addressing near-field sources. Nat. Hazards Earth Syst. Sci. 19, 455–469. doi:10.5194/nhess-19-455-2019
Williamson, A. L., Rim, D., Adams, L. M., LeVeque, R. J., Melgar, D., and Gonzaléz, F. I. (2020). A Source Clustering Approach for Efficient Inundation Modeling and Regional Scale PTHA. Frontiers in Earth Science, 2020 [Preprint]. Available at: http://depts.washington.edu/ptha/frontiers2020a/ (Accessed 2020).
Keywords: probabilistic tsunami hazard assessment, clustering, stochastic earthquakes, Karhunen-Loève expansion, GeoClaw
Citation: Williamson AL, Rim D, Adams LM, LeVeque RJ, Melgar D and González FI (2020) A Source Clustering Approach for Efficient Inundation Modeling and Regional Scale Probabilistic Tsunami Hazard Assessment. Front. Earth Sci. 8:591663. doi: 10.3389/feart.2020.591663
Received: 05 August 2020; Accepted: 11 September 2020;
Published: 21 October 2020.
Edited by:Finn Løvholt, Norwegian Geotechnical Institute, Norway
Reviewed by:Gareth Davies, Geoscience Australia, Australia
Ignacio Sepulveda, University of California, San Diego, United States
Copyright © 2020 Williamson, Rim, Adams, LeVeque, Melgar and Gonzalez. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Amy L. Williamson, firstname.lastname@example.org