Conceptual Design of Extreme Sea-Level Early Warning Systems Based on Uncertainty Quantification and Engineering Optimization Methods

Coastal hazards linked to extreme sea-level events are projected to have a direct impact (by flooding) on 630 million of people by year 2100. Numerous operational forecasts already provide coastal hazard assessments around the world. However, they are largely based on either deterministic tools (e.g., numerical ocean and atmospheric models) or ensemble approaches which are both highly demanding in terms of high-performance computing (HPC) resources. Through a robust learning process, we propose conceptual design of an innovative architecture for extreme sea-level early warning systems based on uncertainty quantification/reduction and optimization methods. This approach might be cost-effective in terms of real-time computational needs while maintaining reliability and trustworthiness of the hazard assessments. The proposed architecture relies on three main tools aligning numerical forecasts with observations: (1) surrogate models of extreme sea-levels using polynomial chaos expansion, Gaussian processes or machine learning, (2) fast data assimilation via Bayesian inference, and (3) optimal experimental design of the observational network. A surrogate model developed for meteotsunami events – i.e., atmospherically induced long ocean waves in a tsunami frequency band – has already been proven to greatly improve the reliability of extreme sea-level hazard assessments. Such an approach might be promising for several coastal hazards known to destructively impact the world coasts, like hurricanes or typhoons and seismic tsunamis.


INTRODUCTION
The size and number of global coastal communities have increased dramatically in the past century. Today more than 40% of the worldwide population is residing within 100 km of the coast, and 10% in nearshore areas less than 10 m above the sea-level (Nicholls and Cazenave, 2010;Neumann et al., 2015). Coupled with more frequent and more energetic weather phenomena due to global climate changes (e.g., Emanuel, 2017;Romera et al., 2017), these coastal communities are under high risks and extremely vulnerable to catastrophic events such as hurricanes, tropical storms, tsunamis, as well as flooding associated with smaller, sometimes less intensive events (e.g., meteotsunamis, wave storms, and medicanes). With the ultimate aim to ensure public safety and to manage resources along the coastal zones, early warning systems for extreme sea-level events, based on both computer models and monitoring networks, have been developed worldwide.
Early warning systems have been broadly defined by The United Nation International Strategy for Disaster Reduction (UNISDR, 2009) as "the provision of timely and effective information, through identified institutions, that allows individuals exposed to a hazard to take action to avoid or reduce their risk and prepare for effective response." In many countries and regions of the world, these systems have been implemented for extremely destructive sea-level hazards, such as tsunamis and hurricanes or typhoons (Franklin et al., 2003;Basher, 2006;Chatfield et al., 2013;Hettiarachchi, 2018). However, other localized and less known sea-level coastal hazards can also produce major structural damages and losses of life. For these specific types of events, very few early warning systems have been developed worldwide, of which many were implemented with very limited human resources and funding (i.e., most often within academic research projects). This is the case, for example, of the prototypes for meteorological tsunamislong ocean waves in a tsunami frequency band generated by atmospheric gravity waves, pressure jumps, frontal passages, and squalls, etc (e.g., Pattiaratchi and Wijeratne, 2015;Rabinovich, 2020) -in the Adriatic and Balearic Islands (Renault et al., 2011;Vilibić et al., 2016).
Although constantly developed to reliably and timely provide forecast of extreme events (Swail et al., 2019), extreme sealevel early warning systems may still fail to produce accurate predictions, in particular concerning the intensity of the hazard. For hurricanes, no substantial improvement in intensity forecasting has been achieved since the 1990s (DeMaria et al., 2014;Emanuel, 2017), largely due to limitations in development (of both physics and resolution) of ocean and atmospheric models (Rotunno et al., 2009;Andreas et al., 2015). For tsunamis, a failure in coastal hazard forecast may be triggered by improper parameterization of the source (initial conditions for tsunami models) which may lead to substantial overestimation or underestimation of the hazard (Titov et al., 2016). For meteotsunamis, improper reproduction of the atmospheric forcing and poorly represented coastal bathymetry often result in the underestimation of the coastal hazard (Vilibić et al., 2016).
Additionally, as lead-time and robustness/stability are the main controlling factors of real-time sea-level forecasting, fidelity of scale, resolution, and geographic domain are often sacrificed so that extreme events can be simulated at higher speeds (i.e., fast enough to implement a response) and higher frequency (e.g., recalculated every time new information becomes available). This is well illustrated for atmospherically driven extreme sealevels, for which three main types of storm surge predictions can be implemented. The first type is a deterministic forecast based on single simulation such as in Suh et al. (2015), which forces a relatively lightweight ocean mesh covering the North Western Pacific Ocean with real-time meteorological forecast advisories. The main advantage of such an approach is that, as the storm surge forecast at a given time is based on a unique simulation, high-resolution domains which accurately describe the geomorphology of the coastal areas and fully coupled wave-current models can be used, even though they require longer simulation times. However, the major drawback of such a deterministic surge prediction is that it does not offer any quantified uncertainty or confidence in its computations, and thus limited in use for risk assessment. The prediction validity then heavily relies on the quality and availability of the forecasted meteorological input, which can carry extremely high uncertainties in terms of track, intensity, speed, etc. Generally, the deterministic approach is thus used in research to analyze past storms (i.e., with known atmospheric forcing derived from reanalysis) and not in real-time evacuation decisions. The second and third types are the statistical forecast and ensemble/composite approach, which are most often implemented within early warning systems since they can propagate the atmospheric forcing uncertainties to the surge results. In the former, a statistical error derived from past forecasts is applied to the atmospheric forecast during the extreme event in order to create probable storms forcing for the ocean model (e.g., P-surge model from National Oceanic and Atmospheric Administration, NOAA). In the latter, the ocean model is run multiple times forced by hypothetical storm conditions in order to determine the storm surge vulnerability for a given area. For example, at the NOAA National Hurricane Centre 1 this approach forms the basis for developing evacuation zones in the United States (Taylor and Glahn, 2008). However, running statistical and/or ensemble/composite approaches can be prohibitive as they require extensive numerical resources and long simulation times (i.e., hundreds or thousands of simulations per forecast or assessment). Therefore, the ocean model resolutions and domain sizes, and thus the simulation accuracy, are generally greatly sacrificed to keep these costs low. Additionally, the wave-current dynamics, which can drive up to 20% of the extreme sea-levels for certain events (e.g., Murty et al., 2020), is also often ignored. As suggested in Veeramony et al. (2012), a good balance between ocean model fidelity (via high resolution meshes and detailed physics) and atmospheric uncertainty (via probabilistic approaches) has yet to be properly achieved in real-time forecasts.
Therefore, other avenues should be explored to improve coastal sea-level hazard forecast within early warning systems. One recently demonstrated successful approach relies on a surrogate stochastic model created within the early warning system prototype for meteotsunamis in the Adriatic Sea (Denamiel et al., 2019a(Denamiel et al., , 2020. Meteotsunamis are an interesting example as specialized atmospheric forecasts (i.e., AdriSC deterministic model, Denamiel et al., 2019b) and observational networks (i.e., barograph measurements) are required to capture the highly variable (in time and space) air-pressure disturbances driving the tsunami-like waves Vilibić and Šepić (2009). Within this application, the surrogate model, covering all potential meteotsunamis in the middle Adriatic Sea, demonstrated that it achieves execution speeds 1000 times faster than the deterministic ocean model in operational use, whilst largely increasing the accuracy of the overall extreme sea-level hazard assessments (i.e., by thoroughly quantifying the uncertainty in the atmospheric forcing and propagating it to the ocean forecast).
The objective of this perspective paper is thus to provide a new concept on a generalized framework for designing reliable early warning systems for extreme sea-levels. This new framework aims to provide computationally inexpensive forecasts compared with present systems, while keeping reliability and trustworthiness of the hazard assessments, thus seeking to break the barrier caused by computational resource limitations to provide near real-time forecast.

CONCEPTUAL DESIGN AND MAJOR POSTULATES
We postulate that fast and reliable stochastic extreme sealevel hazard assessments, using uncertainty quantification (UQ, Najm, 2009;Yildirim and Karniadakis, 2015) and optimization engineering methods (Marler and Arora, 2004), can be implemented within real-time early warning systems in place of expensive state-of-the-art physical ocean model forecasts. This postulate relies on the innovative concept illustrated in Figure 1. The framework integrates advanced stochastic methods such as surrogate models based on forward UQ, Bayesian inference and optimal experimental design within an efficient operational extreme sea-level forecast system built around four main hypotheses presented below.

Hypothesis 1: Uncertainty of Extreme Sea-Level Forecasts Can Be Captured With Stochastic Forcing
Theory The core component of the postulated early warning system is the ocean model, which requires an atmospheric or other source input derived either directly from measurements or indirectly from observation-driven models. The uncertainty associated with this input propagates to the extreme sea-level forecasts, and therefore needs to be properly captured in order to produce reliable hazard assessments. In the extreme sealevel community, idealized stochastic forcing are already used to provide hazard assessments with, for example, Monte Carlo (MC) sampling (ensemble/composite approach, Gallagher et al., 2009;Stanford et al., 2011) or perturbation methods (statistical forecast, Cubasch et al., 1994), but not yet fully implemented in near real-time and forecast modes. We hypothesize that extreme sea-level forecasts can be produced with UQ to reflect the uncertainty of the atmospheric or other forcing. We thus propose to produce a large ensemble of ocean simulations forced by idealized stochastic forcing (hereafter referred as synthetic forcing) which are a simplified representation of the atmospheric or other source input depending on uncertain parameters (i.e., stochastic variables) with prescribed prior distributions (e.g., wind speed and air pressure for hurricanes, pre-failure slope angle for seismic tsunamis, etc., Figure 2 steps 1 & 2). The prescribed prior distributions (e.g., Uniform, Gaussian, Gamma) should cover all potential realizations of the studied hazard in a specific geographical location based on the previously acquired knowledge from historical events.
Practical Implementation and Cost At NOAA, this approach is considered the best to assess the vulnerability of the coastline to storm surges during hurricanes (i.e., to capture the worst-case high-water value at a particular location for hurricane evacuation planning), as it accounts for uncertainties linked to forward speed, storm trajectory, landfall location, and maximum wind speed, etc. In terms of cost, to predict the worst-case surges along the Gulf of Mexico and the US East Coasts, several thousand of simulations were run with hypothetical hurricanes under different storm conditions 2 .
Hypothesis 2: Surrogate Models Can Shift Computational Needs From "Online" to "Offline" and Achieve Fast and Accurate Predictions Theory High-fidelity physics-based models are very computationally demanding, with each simulation taking minutes to hours to run on supercomputers depending on the geographic domain size, grid resolution (which define the level of accuracy of the coastal geomorphology representation) and the model physics (e.g., if including wave-current dynamics). Direct attempts to accelerate these computations generally involve coarsening the grid, simplifying the physics, and/or reducing the number of models if in an ensemble setting -none of which is desirable for coastal early warning systems where accuracy and reliability are critical. We hypothesize that real-time ensemble ocean forecasts can be replaced by surrogate models (e.g., Polynomial Chaos Expansion, PCE, Le Maître and Knio, 2010;Gaussian Process, GP, Rasmussen and Williams, 2006;Deep Neural Network, DNN, Goodfellow et al., 2006) and can achieve fast and accurate UQ analysis, thus providing rigorous accounting of the forcing uncertainty to the extreme sea-levels (Figure 2, steps 7 & 8). Surrogate models can be developed and updated offline ahead of any emergency situation and run online when the severe events are (about to) taking place at very high speeds and with nearly no computational cost.
Practical Implementation and Cost Within the coastal hazard community, PCEs have already been used for the propagation of uncertainty in an earthquake ocean floor displacement model to tsunami wave parameters (Giraldi et al., 2017) and the reproduction of Hurricane Gustav, (2008) by probing its track and intensity (Sochala et al., 2020). In terms of cost, to implement the meteotsunami surrogate model in the Adriatic Sea -using 6 uniformly distributed stochastic variables to describe the synthetic atmospheric forcing -4161 ocean simulations derived with a mesh of 513 340 triangular elements were needed to reach the 5 th order of the polynomial chaos FIGURE 1 | Extreme sea-level hazard assessments based on uncertainty quantification and optimization engineering methods: (1) uncertain input parameters with prior distribution are used to create stochastic ocean model forcing which are both (2) and (3) used to optimized the observational network with optimal experimental design strategies and (4) modified with the assimilation of observational data via Bayesian inference in order to (5) create the posterior distributions of the input parameters. Finally, (6) new stochastic ocean forcing based on these parameters are used to force (7) the surrogate models and produce (8) extreme sea-level hazard assessments. Drawing of the flooded city adapted from Frits Ahlefeldt: https://fritsahlefeldt.com/2019/01/24/not-ready-city-facing-flooding.
method (Denamiel et al., 2020). Till now, the meteotsunami surrogate model has been proven to be extremely reliable in terms of providing extreme sea-level assessments during real events (Denamiel et al., 2019b;Tojčić et al., 2021) at nearly zero additional computational cost (i.e., less than 5 min to produce maximum sea-level distributions at all sensitive locations from 20 000 samples).

Theory
The uncertainty of extreme event forecasts in the postulated early warning system can be reduced through the assimilation of observational data and/or operational atmospheric forecasts into the model. However, classical filteringbased data assimilation methods (Eversen, 1994) (e.g., Kalman filters, ensemble Kalman filters, and particle filters) cannot be used for our problem for two main reasons. First, whereas classical data assimilation typically targets the direct forecast of state variables (e.g., atmospheric pressure for meteotsunami events) over time, we are interested in reducing uncertainty of indirectly observed "hidden" model parameters (i.e., stochastic variables of the synthetic forcing). Second, our approach forgoes the physics-based dynamical systems in favor of surrogate models, which offers advantages in mapping directly to the prediction quantities of interest. In order to reduce the uncertainty associated with the forcing, we propose to use Bayesian inference (e.g., Berger, 1985;Sivia and Skilling, 2006;Von Toussaint, 2011) which involves updating a prior uncertainty of model parameters (e.g., wind speed, air pressure or pre-failure slope angle) estimated before receiving the data, to FIGURE 2 | Proposed conceptual design for early warning systems illustrated for hurricanes in the Gulf of Mexico and the US East Coast. The three blue wheels represent the uncertainty quantification and optimization tools including the surrogate models, the fast data assimilation and the optimal experimental design. The green ellipses show the input needed to build (bottom right, the synthetic forcing and the ocean model) and run (top, real-time observations and operational atmospheric forecast) the early warning system. the posterior uncertainty after assimilating the new observations and/or operational atmospheric forecasts (Figure 2, steps 4 & 5).
Practical Implementation and Cost Bayesian inference provides a probabilistic solution to solving an inverse problem, which generally requires repeated forward model simulations under different parameter settings. The primary class of Bayesian inference algorithms is the Markov chain Monte Carlo (MCMC; Andrieu et al., 2003;Robert and Casella, 2004;Brooks et al., 2011). In practice, MCMC often requires thousands or more model evaluations to obtain good chain mixture and convergence to the posterior distribution. Coupling a physics-based model with MCMC is often prohibitive, and a fast surrogate model (Hypothesis 2) would be needed to render the computation tractable. Once a surrogate is available, the forward model expense is greatly mitigated, and the computational cost then primarily resides with the performance of MCMC. Advanced MCMC variants are widely available in implemented packages, such as for the delayed rejection adaptive Metropolis (Haario et al., 2006) and Hamiltonian Monte Carlo (Duane et al., 1987;Neal, 2011).

Hypothesis 4: Optimal Experimental Design (OED) Can Find the Most Informative Data in Observational Networks That Best Reduce the Forecast Uncertainty
Theory While Hypothesis 3 describes the use of observational data to reduce uncertainty which in turn improves the forecast reliability of early warning systems, the cost of acquiring such data is extremely high, often requiring the installation or deployment of aerial, nautical, and land-based sensors and probes. Moreover, not all data are equally useful. Therefore, a careful design of these valuable data-acquisition opportunities can provide substantial savings of operational costs. We propose to develop and utilize model-based statistical OED methods (Chaloner and Verdinelli, 1995;Müller et al., 2007;Huan and Marzouk, 2013) that can also leverage our knowledge of the physical system and model predictive capabilities, in order to answer questions such as: where should the sensors be placed? how frequently should the observations be taken? what quantities should be measured? We further propose to conduct sensitivity studies with synthetic forcing on both (i) present observational networks and (ii) optimized observational networks (varying observational networks in the space, changing temporal resolution, choosing parameters to be measured), to also understand the robustness of these networks that are aimed to minimize uncertainty as described in Hypothesis 3.
Practical Implementation and Cost Conceptually, OED involves simulating different possible experimental outcomes and their corresponding Bayesian inference results at a design, and then optimizing for the best design that maximized uncertainty reduction (information gain). As a result, the cost of OED is equivalent to many repeated Bayesian inference solutions. For example, an approach described in Huan and Marzouk (2013) involves a double-nested Monte Carlo estimator of the expected utility (objective function) wrapped within an iterative optimization routine. Thus, the total number of forward model evaluations in OED can easily reach millions, and cannot be achieved without a surrogate model except for very simple (e.g., algebraic or analytical) physical models. The computational cost is further compounded if we are interested in assessing the uncertainty reduction to specific quantities of interest (e.g., the maximum wave height at a harbor) induced by the parameter posterior, requiring additional model evaluations. Further algorithmic advances in addition to surrogate modeling are thus required to quell the intensive computational needs for OED.

PRACTICAL FEASIBILITY FOR A HURRICANE EARLY WARNING SYSTEM
The practical feasibility of the postulated conceptual design is illustrated Figure 2 for a hurricane early warning system in the Gulf of Mexico and the US East Coast.
In this application, the first task is to develop the extreme sea-level surrogate models for hurricanes (as described in Hypothesis 2) with idealized stochastic forcing (as described in Hypothesis 1). The main advantage of this application is that both analytical synthetic forcing (e.g., Holland, 1980;DeMaria and Kaplan, 1994;Knaff et al., 2007;Wood et al., 2013) and high-resolution wave-current ocean model (i.e., ADCIRC + SWAN; Dietrich et al., 2011) were already developed and used to study hurricanes in the Gulf of Mexico and the US East coast and are publicly available (bottom green ellipse, Figure 2). This means that the principal work prior to building the surrogate model is reduced to define the most accurate possible distributions of the stochastic input parameters used, for example, in the Holland synthetic forcing (i.e., track, central and environmental surface pressures, maximum winds, radius of maximum winds, etc.). In this case, again, our work is simplified thanks to the publicly available revised Atlantic hurricane database (HURDAT2 3 ) compiled by the NHC and containing the best track information of all historical tropical and sub-tropical cyclones of the Atlantic basin. Additionally, the future of tropical cyclones in a warmer environment can also be derived from Emanuel (2006Emanuel ( , 2013. The construction of hurricane surrogate models along the coast of the Gulf of Mexico and the US East Coast is thus feasible and benefits from the past experience of many researchers and institutions. Practically, if the Holland (1980) model is used with the hurricane aspect ratio defined by Levinson et al. (2010), the number of stochastic variables is reduced to 7: landfall location, track direction, translational speed, central pressure, radius of maximum winds, maximum wind speed and density of the air. The Florida Commission on Hurricane Loss Projection Methodology 4 published the distributions derived from HURDAT2 for these parameters and found that translational speed, central pressure and radius of maximum wind can be described with well-known Lognormal distributions, while landfall locations, track direction, maximum wind speed and density of the air are best described with a maximum likelihood estimation kernel smoothing. As some of the stochastic parameters do not follow a well-known distribution that can be easily described with polynomials (e.g., Legendre, Hermite, Jacobi, Laguerre and other), the most efficient design for the surrogate model would probably require to use Deep Neural Network (DNN). However, gPCE could also be implemented with, for example, Uniform distributions for the landfall location, the track direction, the density of the air and the maximum wind speed while Lognormal distributions would be kept for the other parameters. Comparison of the efficiency of the two surrogate models could be done following Laloy and Jacques (2019). Finally, in terms of cost, identically to the assessment of the coastline vulnerability to storm surges during hurricanes done by the NOAA, it is expected that the implementation of both surrogate models would require a unique set of a few thousand ocean simulations. However, once the surrogate models built, the extreme sea-level hazard assessments would only take a few minutes to be derived (similarly to the meteotsunami example).
The second task of the implementation of the hurricane early warning system, based on the proposed conceptual design, is to optimize the observational network using experimental design (as described in Hypothesis 4). Both historical and synthetic forcing can be used to design and test different theoretical observational networks in order to minimize the uncertainty of the extreme sea-level forecast. Based on the outcome, the winning network(s) may be proposed to environmental agencies running the real networks (NOAA in this case) for implementation within the monitoring system.
Finally, available real-time observations chosen from the optimal observational network designed in the previous task and the available operational atmospheric forecasts would be collected in forecast and/or near real-time modes (top green ellipses, Figure 2). Identically to the first task, this work is greatly simplified by the existence of the NOAA National Hurricane Centre which already produces both atmospheric analyses and forecast of tropical cyclones based on a dense observational network (e.g., automated surface observing systems, radiosondes, airport weather observing systems, satellites, meteorological buoys, marine observations, and radars, etc.). Then, the best available forecast and real-time data would be used to run the fast data assimilation via Bayesian inference (as described in Hypothesis 3). To test the methodology, the surrogate models and Bayesian inference will first be used for historical storms in order to derive the capacity of the newly developed early warning system to produce accurate extreme sea-level hazard assessments based on saved historical atmospheric forecasts as well as real observational networks and their data. If the new system provides satisfactory results, it then could be used in parallel and compared to the more traditional approaches in forecast and near real-time modes.

DISCUSSION AND PERSPECTIVES
While surrogate models, Bayesian inference and optimal experimental design are mathematical tools widely used in statistics and computational engineering, they remain mostly unknown and marginally explored within the extreme sea-level and geosciences communities. In this article we describe how they can be applied to early warning systems for extreme sealevels driven by hurricanes, tsunamis, meteotsunamis, and other. These hazards are known to have substantial impacts on coastal regions around the world, and thus any improvement of early warning system reliability and performance might be of a great societal benefit. However, many of these systems are constrained by the development of numerical tools providing accurate and timely forecast and consequently are largely restricted by the available computational resources. For that reason, we postulated a new and innovative conceptual design that might be a leapfrog in development of more accurate and more efficient extreme sea-level hazard assessments in early warning systems. The potential impact of the proposed conceptual design for extreme sea-level early warning systems is far-reaching: (1) it provides a low-cost approach to early warning systems that would be highly valuable for local communities that may have inadequate computational resources for running high-fidelity forecasts in real-time, (2) in hindcast (research) mode, it might improve the coastal hazard estimates, thus providing a valuable input for improving coastal planning and mitigation plans, and (3) it can improve the reliability of warning systems, thus having capacity to adequately quantify the incoming coastal disaster and to make appropriate decisions. The real advantages as well as the unforeseen shortcoming of the presented conceptual design can only be achieved via the development of the surrogate model, Bayesian inference and optimal experiment design methods within different early warning systems around the world. To encourage potential system developers, we have discussed in details the practical feasibility of this approach within the Gulf of Mexico and the US East Coast hurricane warning system. More broadly, the postulated conceptual design, if proven effective, can also be adapted for destructive geohazards (e.g., Navarro et al., 2018;Chandra et al., 2020) other than extreme sea-levels.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
CD and IV wrote the initial version of the manuscript. CD and XH developed the conceptual design for extreme sea-level hazard assessments within early warning systems. CD prepared the figures. All authors revised the manuscript and developed the idea and concept of the manuscript.

FUNDING
The presented research was funded through Croatian Science Foundation (project ADIOS, Grant IP-2016-06-1955 and European Centre for Middle-range Weather Forecast (ECMWF) special project "Using stochastic surrogate methods for advancing toward reliable meteotsunami early warning systems."