Over the next century, coastal regions are under threat from projected rising sea levels and the potential emergence of groundwater at the land surface (groundwater inundation). The potential economic and social damages of this largely unseen, and often poorly characterised natural hazard are substantial. To support risk-based decision making in response to this emerging hazard, we present a Bayesian modelling framework (or workflow), which maps the spatial distribution of groundwater level uncertainty and inundation under Intergovernmental Panel on Climate Change (IPCC) projections of Sea Level Rise (SLR). Such probabilistic mapping assessments, which explicitly acknowledge the spatial uncertainty of groundwater flow model predictions, and the deep uncertainty of the IPCC-SLR projections themselves, remains challenging for coastal groundwater systems. Our study, therefore, presents a generalisable workflow to support decision makers, that we demonstrate for a case study of a low-lying coastal region in Aotearoa New Zealand. Our results provide posterior predictive distributions of groundwater levels to map susceptibility to the groundwater inundation hazard, according to exceedance of specified model top elevations. We also explore the value of history matching (model calibration) in the context of reducing predictive uncertainty, and the benefits of predicting changes (rather than absolute values) in relation to a decision threshold. The latter may have profound implications for the many at-risk coastal communities and ecosystems, which are typically data poor. We conclude that history matching can indeed increase the spatial confidence of posterior groundwater inundation predictions for the 2030-2050 timeframe.
Watershed models such as the Soil and Water Assessment Tool (SWAT) consist of high-dimensional physical and empirical parameters. These parameters often need to be estimated/calibrated through inverse modeling to produce reliable predictions on hydrological fluxes and states. Existing parameter estimation methods can be time consuming, inefficient, and computationally expensive for high-dimensional problems. In this paper, we present an accurate and robust method to calibrate the SWAT model (i.e., 20 parameters) using scalable deep learning (DL). We developed inverse models based on convolutional neural networks (CNN) to assimilate observed streamflow data and estimate the SWAT model parameters. Scalable hyperparameter tuning is performed using high-performance computing resources to identify the top 50 optimal neural network architectures. We used ensemble SWAT simulations to train, validate, and test the CNN models. We estimated the parameters of the SWAT model using observed streamflow data and assessed the impact of measurement errors on SWAT model calibration. We tested and validated the proposed scalable DL methodology on the American River Watershed, located in the Pacific Northwest-based Yakima River basin. Our results show that the CNN-based calibration is better than two popular parameter estimation methods (i.e., the generalized likelihood uncertainty estimation [GLUE] and the dynamically dimensioned search [DDS], which is a global optimization algorithm). For the set of parameters that are sensitive to the observations, our proposed method yields narrower ranges than the GLUE method but broader ranges than values produced using the DDS method within the sampling range even under high relative observational errors. The SWAT model calibration performance using the CNNs, GLUE, and DDS methods are compared using R2 and a set of efficiency metrics, including Nash-Sutcliffe, logarithmic Nash-Sutcliffe, Kling-Gupta, modified Kling-Gupta, and non-parametric Kling-Gupta scores, computed on the observed and simulated watershed responses. The best CNN-based calibrated set has scores of 0.71, 0.75, 0.85, 0.85, 0.86, and 0.91. The best DDS-based calibrated set has scores of 0.62, 0.69, 0.8, 0.77, 0.79, and 0.82. The best GLUE-based calibrated set has scores of 0.56, 0.58, 0.71, 0.7, 0.71, and 0.8. The scores above show that the CNN-based calibration leads to more accurate low and high streamflow predictions than the GLUE and DDS sets. Our research demonstrates that the proposed method has high potential to improve our current practice in calibrating large-scale integrated hydrologic models.
In an age of both big data and increasing strain on water resources, sound management decisions often rely on numerical models. Numerical models provide a physics-based framework for assimilating and making sense of information that by itself only provides a limited description of the hydrologic system. Often, numerical models are the best option for quantifying even intuitively obvious connections between human activities and water resource impacts. However, despite many recent advances in model data assimilation and uncertainty quantification, the process of constructing numerical models remains laborious, expensive, and opaque, often precluding their use in decision making. Modflow-setup aims to provide rapid and consistent construction of MODFLOW groundwater models through robust and repeatable automation. Common model construction tasks are distilled in an open-source, online code base that is tested and extensible through collaborative version control. Input to Modflow-setup consists of a single configuration file that summarizes the workflow for building a model, including source data, construction options, and output packages. Source data providing model structure and parameter information including shapefiles, rasters, NetCDF files, tables, and other (geolocated) sources to MODFLOW models are read in and mapped to the model discretization, using Flopy and other general open-source scientific Python libraries. In a few minutes, an external array-based MODFLOW model amenable to parameter estimation and uncertainty quantification is produced. This paper describes the core functionality of Modflow-setup, including a worked example of a MODFLOW 6 model for evaluating pumping impacts to a lake in central Wisconsin, United States.
Evaluating whether hydrological models are right for the right reasons demands reproducible model benchmarking and diagnostics that evaluate not just statistical predictive model performance but also internal processes. Such model benchmarking and diagnostic efforts will benefit from standardized methods and ready-to-use toolkits. Using the Jupyter platform, this work presents HydroBench, a model-agnostic benchmarking tool consisting of three sets of metrics: 1) common statistical predictive measures, 2) hydrological signature-based process metrics, including a new time-linked flow duration curve and 3) information-theoretic diagnostics that measure the flow of information among model variables. As a test case, HydroBench was applied to compare two model products (calibrated and uncalibrated) of the National Hydrologic Model - Precipitation Runoff Modeling System (NHM-PRMS) at the Cedar River watershed, WA, United States. Although the uncalibrated model has the highest predictive performance, particularly for high flows, the signature-based diagnostics showed that the model overestimates low flows and poorly represents the recession processes. Elucidating why low flows may have been overestimated, the information-theoretic diagnostics indicated a higher flow of information from precipitation to snowmelt to streamflow in the uncalibrated model compared to the calibrated model, where information flowed more directly from precipitation to streamflow. This test case demonstrated the capability of HydroBench in process diagnostics and model predictive and functional performance evaluations, along with their tradeoffs. Having such a model benchmarking tool not only provides modelers with a comprehensive model evaluation system but also provides an open-source tool that can further be developed by the hydrological community.
Following the advancement of high-performance computing and sensor technology and the increased availability of larger climate and land-use data sets, hydrologic models have become more sophisticated. Instead of simple boundary conditions, these data sets are incorporated with the aim of providing more accurate insights into hydrologic processes. Integrated surface-water and groundwater models are developed to represent the most important processes that affect the distribution of water in hydrologic systems. GSFLOW is an integrated hydrologic modeling software that couples surface-water processes from PRMS and groundwater processes from MODFLOW and simulates feedbacks between both components of the hydrologic system. Development of GSFLOW models has previously required multiple tools to separately create surface-water and groundwater input files. The use of these multiple tools, custom workflows, and manual processing complicates reproducibility and confidence in model results. Based on a need for rapid, reproduceable, and robust methods, we present two example problems that showcase the latest updates to pyGSFLOW. The software package, pyGSFLOW, is an end-to-end data processing tool made from open-source Python libraries that enables the user to edit, write input files, run models, and postprocess model output. The first example showcases pyGSFLOW’s capabilities by developing a streamflow network in the Russian River watershed with an area of 3,850 km2 located on the coast of northern California. A second example examines the effects of model discretization on hydrologic prediction for the Sagehen Creek watershed with an area of 28 km2, near Lake Tahoe, California, in the northern Sierra Nevada.
In 2018–2020, meteorological droughts over Northwestern Europe caused severe declines in groundwater heads with significant damage to groundwater-dependent ecosystems and agriculture. The response of the groundwater system to different hydrological stresses is valuable information for decision-makers. In this paper, a reproducible, data-driven approach using open-source software is proposed to quantify the effects of different hydrological stresses on heads. A scripted workflow was developed using the open-source Pastas software for time series modeling of heads. For each head time series, the best model structure and relevant hydrological stresses (rainfall, evaporation, river stages, and pumping at one or more well fields) were selected iteratively. A new method was applied to model multiple well fields with a single response function, where the response was scaled by the distances between the pumping and observation wells. Selection of the best model structure was performed through reliability checking based on four criteria. The time series model of each observation well represents an independent estimate of the contribution of different hydrological stresses to the head and is based exclusively on observed data. The approach was applied to estimate the drawdown caused by nearby well fields to 250 observed head time series measured at 122 locations in the eastern part of the Netherlands, a country where summer droughts can cause problems, even though the country is better known for problems with too much water. Reliable models were obtained for 126 head time series of which 78 contain one or more well fields as a contributing stress. The spatial variation of the modeled responses to pumping at the well fields show the expected decline with distance from the well field, even though all responses were modeled independently. An example application at one well field showed how the head response to pumping varies per aquifer. Time series analysis was used to determine the feasibility of reducing pumping rates to mitigate large drawdowns during droughts, which depends on the magnitude and response time of the groundwater system to changes in pumping. This is salient information for decision-makers. This article is part of the special issue “Rapid, Reproducible, and Robust Environmental Modeling for Decision Support: Worked Examples and Open-Source Software Tools”.
When modeling groundwater systems in Quaternary formations, one of the first steps is to construct a geological and petrophysical model. This is often cumbersome because it requires multiple manual steps which include geophysical interpretation, construction of a structural model, and identification of geostatistical model parameters, facies, and property simulations. Those steps are often carried out using different software, which makes the automation intractable or very difficult. A non-automated approach is time-consuming and makes the model updating difficult when new data are available or when some geological interpretations are modified. Furthermore, conducting a cross-validation procedure to assess the overall quality of the models and quantifying the joint structural and parametric uncertainty are tedious. To address these issues, we propose a new approach and a Python module, ArchPy, to automatically generate realistic geological and parameter models. One of its main features is that the modeling operates in a hierarchical manner. The input data consist of a set of borehole data and a stratigraphic pile. The stratigraphic pile describes how the model should be constructed formally and in a compact manner. It contains the list of the different stratigraphic units and their order in the pile, their conformability (eroded or onlap), the surface interpolation method (e.g., kriging, sequential Gaussian simulation (SGS), and multiple-point statistics (MPS)), the filling method for the lithologies (e.g., MPS and sequential indicator simulation (SIS)), and the petrophysical properties (e.g., MPS and SGS). Then, the procedure is automatic. In a first step, the stratigraphic unit boundaries are simulated. Second, they are filled with lithologies, and finally, the petrophysical properties are simulated inside the lithologies. All these steps are straightforward and automated once the stratigraphic pile and its related parameters have been defined. Hence, this approach is extremely flexible. The automation provides a framework to generate end-to-end stochastic models and then the proposed method allows for uncertainty quantification at any level and may be used for full inversion. In this work, ArchPy is illustrated using data from an alpine Quaternary aquifer in the upper Aare plain (southeast of Bern, Switzerland).
The primary tasks of decision-support modelling are to quantify and reduce the uncertainties of decision-critical model predictions. Reduction of predictive uncertainty requires assimilation of information. Generally, this information resides in two places: 1) expert knowledge emerging from site characterization and 2) field measurements of present and historical system behavior. The former is uncertain and should therefore be expressed stochastically in a model. The range of parameter and predictive possibilities can then be constrained through history-matching. Implementation of these Bayesian principles places conflicting demands on the level of model structural complexity. A high level of structural complexity can facilitate expression of expert knowledge by establishing model details that are recognizable by site experts, and through supporting model parameters that bear a close relationship to real-world hydraulic properties. However, such models often run slowly and are numerically delicate; history-matching therefore becomes difficult or impossible. In contrast, if endowed with enough parameters, structurally simple models facilitate the achievement of a good fit between model outputs and field measurements. However, the values with which parameters are endowed may bear a looser relationship with real-world properties and are therefore less receptive to information born of expert knowledge. The model design process is therefore one of compromise. In this paper we describe a methodology that reduces the cost of compromise by allowing expert knowledge of system properties to inform the parameters of a structurally simple model. The methodology requires the use of a complementary model of strategic, but not excessive, structural complexity that is stochastic, fast-running and requires no history-matching. We demonstrate the approach using a real-world case in which modelling is used to support management of a stressed coastal aquifer. We empirically validate the approach using a synthetic model.