# PARAMETER ESTIMATION AND UNCERTAINTY QUANTIFICATION IN WATER RESOURCES MODELING

EDITED BY : Philippe Renard, Frederick Delay, Daniel M. Tartakovsky and Velimir V. Vesselinov PUBLISHED IN : Frontiers in Environmental Science and Frontiers in Earth Science

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88963-674-7 DOI 10.3389/978-2-88963-674-7

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# PARAMETER ESTIMATION AND UNCERTAINTY QUANTIFICATION IN WATER RESOURCES MODELING

Topic Editors:

Philippe Renard, Université de Neuchâtel, Switzerland Frederick Delay, Université de Strasbourg, France Daniel M. Tartakovsky, Stanford University, United States Velimir V. Vesselinov, Los Alamos National Laboratory (DOE), United States

Numerical models of flow and transport processes are heavily employed in the fields of surface, soil, and groundwater hydrology. They are used to interpret field observations, analyze complex and coupled processes, or to support decision making related to large societal issues such as the water-energy nexus or sustainable water management and food production. Parameter estimation and uncertainty quantification are two key features of modern science-based predictions. When applied to water resources, these tasks must cope with many degrees of freedom and large datasets. Both are challenging and require novel theoretical and computational approaches to handle complex models with large number of unknown parameters.

Citation: Renard, P., Delay, F., Tartakovsky, D. M., Vesselinov, V. V., eds. (2020). Parameter Estimation and Uncertainty Quantification in Water Resources Modeling. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88963-674-7

# Table of Contents

*04 What We Talk About When We Talk About Uncertainty. Toward a Unified, Data-Driven Framework for Uncertainty Characterization in Hydrogeology*

Falk Heße, Alessandro Comunian and Sabine Attinger


# What We Talk About When We Talk About Uncertainty. Toward a Unified, Data-Driven Framework for Uncertainty Characterization in Hydrogeology

#### Falk Heße1,2 \*, Alessandro Comunian<sup>3</sup> and Sabine Attinger 1,2

1 Institut für Erd- und Umweltwissenschaften, Universität Potsdam, Potsdam, Germany, <sup>2</sup> Department of Computational Hydrosystems, UFZ–Helmholtz Centre for Environmental Research, Leipzig, Germany, <sup>3</sup> Dipartimento di Scienze della Terra "A.Desio", Università degli Studi di Milano, Milan, Italy

Keywords: Bayesianism, uncertainty analysis, hydrogeology, data science, opinion, prior derivation

A habit of basing convictions upon evidence and giving them only that degree of certainty which the evidence warrants, would, if it became general, cure most of the ills from which the world is suffering. -Bertrand Russel

#### Edited by:

Frederick Delay, Université de Strasbourg, France

#### Reviewed by:

Abderrahim Jardani, Université de Rouen, France Bart Rogiers, Belgian Nuclear Research Centre, Belgium

> \*Correspondence: Falk Heße falk.hesse@ufz.de

#### Specialty section:

This article was submitted to Hydrosphere, a section of the journal Frontiers in Earth Science

Received: 30 November 2018 Accepted: 06 May 2019 Published: 18 June 2019

#### Citation:

Heße F, Comunian A and Attinger S (2019) What We Talk About When We Talk About Uncertainty. Toward a Unified, Data-Driven Framework for Uncertainty Characterization in Hydrogeology. Front. Earth Sci. 7:118. doi: 10.3389/feart.2019.00118

# 1. INTRODUCTION

Uncertainty is a central and unavoidable feature of decision problems that people face both in everyday life, as well as in virtually every field of science. Hydrogeology is no exception to that. In fact, the relevance of uncertainty to hydrogeology is particularly high, due to the high variability of many subsurface properties combined with a general scarcity of data. These factors have led to the development of stochastic hydrogeology where the subsurface properties are modeled as random variables (Gelhar, 1993; Rubin, 2003). Despite the wealth of research on stochastic modeling, a systematic investigation into the quantification of uncertainty and its impact on decision problems has remained limited. For example, Dagan (2002) noted that uncertainty "is a topic that has received little attention" and more recently, Kitanidis (2015) again pointed out that "it is somewhat surprising that this topic has not received more attention". Instead, most discussions evolve around the specific topic of stochastic concepts, which is a closely related but ultimately an independent topic. For example, in a discussion forum, Zhang and Zhang (2004) solicited contributions discuss the perceived lack of applications that stochastic concepts have seen in hydrogeology. Of these contributions, only Ginn (2004) and Rubin (2004) discussed uncertainty briefly in the context of probability assessments. This lack may be explainable by the limited scope of that discussion forum, where only a fixed number of questions were to be answered by the contributors. However, more recently, Sanchez-Vila and Fernandez-Garcia (2016) organized a Special Issue that covered that same question but allowed for more space and personal involvement from the participants. While the topic of uncertainty was touched upon in all solicited contributions, none of them discussed its nature or aims. This neglect contrasts the closely related fields of hydrology (Mantovan and Todini, 2006; Vrugt et al., 2009; Clark et al., 2011) as well as geology (Wellmann and Regenauer-Lieb, 2012; Bond, 2015; de la Varga and Wellmann, 2016) which have and continue to have a broad and deep discussion about the nature, sources, aims and direction of uncertainty analysis. One of the main impediments for further progress in the field of uncertainty characterization, that has been identified, is the lack of a coherent terminology and framework (Montanari, 2007; Montanari et al., 2009).

A significant challenge for the establishment of a consistent framework of uncertainty analysis in hydrogeology is caused by the most prevalent workflow of inverse theory, or, inversion for short (Neuman, 2004; Tarantola, 2005; Carrera et al., 2005; Franssen Hendricks et al., 2009; Biegler et al., 2010; Menke, 2012; Linde et al., 2015; Zhdanov, 2015). This workflow, known as point estimation in inference, is based on a (regularized) data fitting or optimization approach which tries to identify a unique solution to the inverse problem. Typical examples in hydrogeology are goodness-of-fit criteria, i.e., the task of finding a single set of parameters such that the distance between predictions and observations is minimized (Indelman et al., 1996; Sánchez-Vila et al., 1999; Firmani et al., 2006; Schneider and Attinger, 2008; Riva et al., 2009; Copty et al., 2011; Pechstein et al., 2016; Zech et al., 2016). Even when pointestimators are not applied during calibration and a probabilistic sampling of the parameter distribution is advised instead, often the analysis is performed again, by deriving single solutions through, e.g., averaging (Neuman, 2014). There seems to be a deep distrust of the generally non-unique nature of scientific inference. Strangely, no rationale for this distrust is ever asked for nor provided.

Since this workflow of inversion was not designed to account for uncertainties in the inference, it consequently exhibits a number of problems from that perspective. First, aiming for a single best estimate neglects all other parameter sets that are possible but may be less plausible. Neglecting possible states can be problematic for inference alone (Good, 2009a). For uncertainty analysis, point estimation is even more problematic since the use of a single best estimate implies absolute certainty. Second, these studies usually use the observed data only, without any reference to available background data and are therefore liable to the base rate fallacy (Barhillel, 1980; Kahneman et al., 1982).

Responses to this have been mixed and often inconsistent (Nearing et al., 2016). Arguably, the most obvious concern is centered around the so-called problem of equifinality (Beven, 2006), i.e., the observation that many, often diverging, parameter sets may provide nearly identical goodness-of-fit values. One response to this insight has been the development and application of the generalized likelihood uncertainty estimation (GLUE) framework (Beven and Binley, 1992). Although often described as an (informal) Bayesian method, it has been strongly criticized in the literature (Mantovan and Todini, 2006; Stedinger et al., 2008). Another seminal response has been the development of the differential evolution adaptive metropolis framework (DREAM, Vrugt et al., 2009; Laloy and Vrugt, 2012; Laloy et al., 2013). Unlike GLUE, DREAM is a fully Bayesian inference method and uncertainty estimator and has seen many applications in hydrology since its publication. Although DREAM has seen applications in hydrogeology too (Mariethoz et al., 2010; Hansen et al., 2012; Shi et al., 2014; Xu et al., 2017; Laloy et al., 2016; Hayek et al., 2018), no universally accepted Bayesian framework for uncertainty analysis currently exists. A rising number of Bayesian inversion methods for inference have been published over the years (see e.g., Cardiff and Kitanidis, 2009; Rubin et al., 2010; Shi et al., 2014; Elsheikh et al., 2014; Saley et al., 2016), yet, the overall adoption of such methods has remained limited. This lackluster adoption rate can, e.g., be seen by the almost total absence of Bayesian methods in the aforementioned special issue on stochastic hydrogeology. Only Cirpka and Valocchi (2016) even care to mention this topic and only in the context of Bayesian model selection.

One problem that all the above-mentioned uncertainty frameworks share, is the lack of formalized prior derivation. Hydrogeology itself has only produced a small number of studies on this topic (Kitanidis, 2012; Li et al., 2017; Cucchi et al., 2019), which is sadly in line with the situation in many other fields. Since specifying the prior is the first step in any Bayesian analysis, high emphasis should be put on this step. However, coming up with universal and objective guidelines for every conceivable situation has proven to be elusive, so far (Earman, 1992; Scales and Tenorio, 2001). In fact, detractors of Bayesian inference are often criticizing the need to define a prior, claiming this step to be necessarily subjective and arbitrary (Kass and Wasserman, 1996; Ulrych et al., 2001; Kass, 2011). On the other hand, proponents of Bayesian inference cite the ability to include background knowledge in the form of prior probabilities as one of its major strengths (Jaynes, 1968).

To this date, the single most used inference framework in hydrogeology is arguably the Model-Independent Parameter Estimation and Uncertainty Analysis framework (PEST, Doherty, 2004). PEST itself constitutes a diverse set of optimization and estimation tools to calibrate a wide range of environmental models. The uncertainty framework allows users to estimate the predictive uncertainty of the model output using a somewhat inconsistent mixture of Bayesian and calibration techniques (Doherty, 2010). Such an eclectic approach to inference and uncertainty characterization may look intriguing. However, as Kitanidis (2015) pointed out, combining elements from two internally consistent systems is very risky.

In this manuscript, we will instead make the case for a single coherent data-driven framework for uncertainty characterization in hydrogeology. As we will argue below, this framework ought to be the Bayesian interpretation of probability, i.e., the system of probabilistic reasoning as developed by Ramsey (1931), Finetti (1975), Savage (1954), and others. When making this case, we will follow the reasoning of Pearl (1988b), who, while talking about Bayesian probability in the context of artificial intelligence, said the following:

Obviously, there are applications where strict adherence to the dictates of probability theory would be computationally infeasible, and their compromises will have to be made. Still, we find it more comfortable to compromise an ideal theory that is well-understood than to search for a surrogate theory, with only gut feeling for guidance. The merits of a theory-based approach are threefold:


3. Compromised theories facilitate scientific communication; one need specifies only the compromise made, treating the rest of the theory as common knowledge.

To follow the advice of Pearl (1988b), we will start by describing such an ideal theory, outline the present challenges for its application and explain how to address them. In section 2, we will make the case for Bayesian probability by contrasting it to potential competitors. Next, in section 3, we will conceptualize the different forms of uncertainty using the Bayesian framework and identify the most relevant forms for hydrogeology. Finally in section 4, we will make a number of practical propositions by outlining what is currently missing, what compromises need to be made to make the Bayesian paradigm viable and what the relevant steps are that may help to reduce or even eliminate some of these compromises.

# 2. REASONING WITH UNCERTAINTY

A discussion about uncertainty should begin with a clear understanding of what is meant by this term. In a slightly ironic twist, the term uncertainty is far from being welldefined, both in everyday use as well as in the sciences. In fact, when scanning the hydrogeological literature, a wide range of, often conflicting, definitions, and conceptualizations are used (Hoffman and Hammonds, 1994; Hofer, 1996; Walker et al., 2003; Brown, 2004; Carrera et al., 2005; Refsgaard et al., 2007, 2012; Tartakovsky, 2013; Bond, 2015; Enemark et al., 2019). We will therefore start by providing an overview of different models that have been developed to conceptualize this term and facilitate both qualitative and quantitative reasoning. To avoid the often ad-hoc or gut driven nature of uncertainty analysis that is found in the literature, we will start by presenting frameworks that were developed in the field of epistemology, i.e., the field of philosophy concerned with the character of knowledge.

In general, uncertainty should be understood as a measure that describes the distance or gap between a current state of an agent and the one representing absolute certainty. The latter is formalized by the True and False statements found in classic logic, whereas the former extends this concept. Consequently, we will begin by describing models for such partial certainty as found in modern epistemic logic. As argued above, we will make the case for Bayesianism, i.e., the idea that certainty equals probability and vice versa. Only in the next step we will discuss a concept of uncertainty and make the case for the Kullback-Leibler divergence as the distance measure between current and absolute certainty.

# 2.1. Models for Reasoning With Certainty and the Case for Probabilities

In modern epistemology, certainty is defined over possible states of reality, collectively known as the set of possible worlds (Halpern, 2003; Fagin et al., 2004). In Bayesianism, the equivalent term would be possibility space (Kruschke, 2010), whereas the equally labeled sample space from probability theory may or may not have the same meaning depending on its interpretation (see below). This possibility space is now meant to contain, next to the true state of affairs, all other possibilities that are compatible with a given set of constraints and data available to an epistemic agent. Such an epistemic agent may be a human, but with the advent of artificial intelligence, computational agents have increasingly become the focus of modern research (Russell and Norvig, 2009). The epistemic state of such an agent is then defined by a function that assigns weights to each possible world ω. This function is known as a certainty or credence distribution.

Despite its relatively short history, the field of modern epistemology has already developed a number of different measures to describe that certainty distribution. These measures differ both mathematically as well as conceptually. The latter difference can best be understood by viewing these different measures as extensions of classical logic. In logic, a simple True or False relationship exists between a statement and the reality that this statement is trying to describe. In these extensions, discussed below, this simple relationship becomes more flexible and allows for different kinds of degrees of certainty (Darwiche and Pearl, 1997).

The most common approach is to conceptualize such a gradual degree of certainty as stemming from uncertain or incomplete knowledge. This means that, similar to classical logic, a given statement is objectively either true or false in reality. However, due to the agent's limited knowledge, she cannot fully determine its veracity and has to assign a limited certainty to it. This is known as a degree of belief and it is described by a number between 0 and 1, corresponding to False and True in classic logic. It can be shown that such degrees of belief follow the rules of probability theory, i.e., the certainty of an agent is to be described by a probability measure (Savage, 1954; Lindley, 1987). To illustrate this model; consider a statement about the mean conductivity of a given sample being above a given threshold, say, 10−3m/s. The certainty of this statement could then be ascertained if we would have access to a well-calibrated histogram of conductivity values of samples from the aquifer the sample was taken from.

An extension of this model is derived by the additional inclusion of uncertainty stemming from ignorance. Using the now (in)famous epistemology of the former US Secretary of Defense Donald Rumsfeld (2002), probability describes the known unknowns whereas ignorance is about the unknown unknowns. To illustrate this concept, let us consider a revised version of the above problem: Suppose that our knowledge base is now less certain such that the sample is only with 90% probability from the aforementioned aquifer, but with 10% probability of some unknown provenience. This leaves a 10% certainty gap in our reasoning system. The seminal work of Dempster (1968) demonstrated a way of handling this gap, whereas the later extensions of Shafer (1976) made this calculus into a full reasoning and inference system. The resulting reasoning framework is the Dempster-Shafer theory, evidence theory or theory of belief functions. Like in the example above, Dempster– Shafer theory is often praised for being able to combine evidence from different sources with different kinds of uncertainty attached to it. Its applicability has, however, been limited due to a number of criticisms (Pearl, 1988b,a, 1990). Note that other fields use similar concepts, with sometimes very different notations.

In economy, for instance, uncertainties from lack of knowledge are called risk, whereas uncertainties from ignorance are called Knightian uncertainties or simply uncertainties (Knight, 1921). On the other hand, political science and decision theory often describe uncertainties as a lack of knowledge and ignorance, shallow, and deep uncertainty, respectively (Walker et al., 2013).

An alternative approach is to model uncertainty stemming from uncertain truth. This means that the veracity of a given statement may never be fully determined, even in cases of complete knowledge. Using the above example, this could be the case if the statement is altered such that conductivity of the sample is said to be large. Even if all relevant data on this sample are gathered, e.g., some laboratory testing may determine the conductivity being 10−3m/s, no definitive certainty can be determined. A statement that the conductivity of the sample is large given that the conductivity is 10−3m/s may be considered as sort-of-true. The point is that our limited degree of certainty is not caused by limited knowledge but by some vagueness or fuzziness in the statement itself. Measures that are able to describe such situations are confusingly called possibility measures, although no particular connection to above possibility space exists, and the related mathematical framework is, more appropriately, called fuzzy logic (Zadeh, 1978; Dubois and Prade, 1988). Reasoning systems that employ fuzzy logic have seen wide-ranging applications in engineering, modern logic, artificial intelligence systems etc., which testifies to their versatility and usefulness.

Despite the existence of these comparably sophisticated systems for reasoning under uncertainty, the simplest approach of probabilistic reasoning has seen a strong resurgence in the last decades with the introduction of Bayesian networks (Pearl, 1988b; Neapolitan, 1990). These networks are probabilistic graphical models that represent the, typically causal, relationship between different physical processes.

Using the above statements, we would assert that probabilistic reasoning is the most-suited framework for reasoning under uncertainty in hydrogeology. Probability represents a simple, yet very flexible tool that is able to capture most of the problems encountered in this field. Fuzzy logic, although often employed in engineering, does not offer much additional benefit since most evidence being used is numerical in nature and therefore has virtually no fuzziness associated with it. Contrary to that, Dempster-Shafer theory does offer relevant benefits as an uncertainty framework, which do however, need to be considered in context. The first benefit is simply due to the fact that many situations of subsurface analysis do include an element of deep uncertainty, in particular the topic of structural uncertainty. To address that, we will deal with this problem exhaustively below and describe a way how to turn this deep into shallow uncertainty. In this way we are making the topic of structural uncertainty fully amendable to probabilistic analysis. Second, as pointed out by Rubin et al. (2018), the Dempster-Shafer theory can be helpful in accounting for unknown unknowns that result from the interaction of hydrogeological problems with societal developments in general (Walker et al., 2013; Maier et al., 2016). This second benefit is very important but does not necessarily conflict with our notion of probability as a default system for uncertain reasoning. Since Dempster-Shafer theory is a full generalization of probability theory, it is easy to embed a fully probabilistic analysis within a larger framework. In addition to these benefits, the Dempster-Shafer theory has a number of drawbacks that make reasoning with it counter intuitive and hamper its applicability for real-world problems. In particular, it is not possible to employ it on top of existing techniques compared to probability theory or even fuzzy logic and its applicability to decision theory remains controversial. Although the number of applications in earth sciences is rising, it is still a niche theory with only few practical applications (Malpica et al., 2007). In summary, we propose to use probabilistic reasoning as the main tool of uncertainty analysis, while being aware of its limitations, and being prepared to account for other types of uncertainty by embedding probability measures within a larger analysis possibly using the Dempster-Shafer theory.

## 2.2. On the Interpretation of Probability

Owing to the seminal work of Kolmogorov (1933), modern probability theory is fully grounded in set theory and as such, it is as well founded and defined as any other field of mathematics (Kallenberg, 2002). Yet, unlike many other mathematical disciplines, there is no clear consensus about where to locate probability in real-world situations.

Roughly speaking, two different interpretations of probability can be distinguished; physical as well as epistemic probability (**Figure 1**). The first interpretation regards probability as an actual property of physical systems comparable to, e.g., mass, energy and momentum (**Figure 1**, upper right corner). This is best captured in its most widely applied form; frequentism, where the probability of an event is equated with the relative frequency of this event in an often-repeated random experiment (Neyman and Pearson, 1928, 1933). This definition has garnered wide support in the sciences, due to its clear and lucid formulation (von Mises, 1982). On the other side, the epistemic interpretation regards probability as an intrinsic property of epistemic agents (**Figure 1**, upper left corner). This means that, unlike mass, energy or momentum; probability is not a property of a physical system but of the epistemic state of an agent that is trying to reason about said system (Finetti, 1975; Savage, 1954; Jeffrey, 1992).

#### 2.2.1. The Case for Bayesianism

The discussion about the best or most appropriate interpretation for any given situation is still ongoing and we do not want to uncritically favor any side. For our topic, however, the epistemic interpretation, called Bayesian interpretation, of probability seems to be the only appropriate one. The main rationales for its use shall be discussed in the following.

First, the epistemic interpretation is simply the more comprehensive interpretation of the two. In fact, it is a full generalization of the physical interpretation, since it is able to cover all cases described by the latter and then some. This is not the case for the frequentist interpretation since its application is constrained to cases where physical frequencies are available. This inclusiveness of Bayesianism is often obfuscated by calling it the subjective interpretation of probability, as

opposed to the objective interpretation of frequentism. But such a characterization is misguided since epistemic probabilities can be subjective, objective and everything in between (Berger, 2006; Williamson, 2010).

Second, the concept of a long-running sequence of random experiments is ill defined in the context of hydrogeology. This is not to deny the relationship that relative frequencies of, say, conductivity values from other sites have on the characterization of a given site. As described above, most Bayesians would agree that such frequencies should always be used when available (Rubin, 1984). The difference to frequentism is not the importance of observed frequencies but the role they play in defining probability. Within the context of hydrogeological site characterization, we would argue that equating those frequencies with probability, and the use of frequentist methods therefore as well, is rather contrived (Renard, 2007).

Third, uncertainty is a property of knowledge and therefore fundamentally epistemic. This fact is often obfuscated by the practice of separating uncertainty into so called aleatoric and epistemic uncertainty, somewhat mimicking the above distinction between physical and epistemic probability (Hoffman and Hammonds, 1994; Helton and Burmaster, 1996; O'Hagan et al., 2006; Gong et al., 2013). Generally speaking, epistemic uncertainty is said to be reducible by collecting more data whereas aleatoric uncertainty is caused by intrinsic randomness which cannot be further reduced by data. Typically, the latter is illustrated by referring to activities like throwing a die or tossing a coin. The problem with such examples is that both these activities are demonstrably deterministic without any intrinsic randomness (Diaconis et al., 2007). It may be argued that the physical world itself exhibits pure randomness on the quantum level of reality. This notion, associated with the Copenhagen interpretation of quantum mechanics was dominant for the better part of the 20th century but has become marginalized more recently in favor of the Everett (Deutsch, 1999; Sebens and Carroll, 2016) and Bayesian (Schack et al., 2001; Fuchs and Schack, 2013) interpretations. Whatever the case may be, within the context of the macroscopic physical laws relevant to hydrogeophysics, these debates are completely immaterial. On the macroscopic level, the laws of classical physics exhibit no randomness, which can therefore not suddenly manifest in situations that are fully determined by these laws.

The last major reason, for why frequentism does not provide an adequate framework for uncertainty analysis concerns the nature of frequentist inference itself. Following Royall (1997), any statistical inference, as well as any other form of evidential assessment, can be thought of as addressing a series of three questions; (i) What does the evidence say?; (ii) What should I believe?; and (iii) What should I do? Uncertainty itself concerns the second question, i.e., the question of belief but frequentist inference actually does not address this question at all (see **Table 1**). To explain why, let us start at question (i); the question of evidence. Evidence is central to the field of inference but surprisingly difficult to pin down (Feldman and Conee, 1985; Achinstein, 2003; Dougherty, 2011). In general, the evidence of some observations are those aspects of it which justify or lend credence to a hypothesis under question. The most important theoretical advance for the quantification of this notion came in the form of the Likelihood Principle (LP, Barnard et al., 1962), stating that all the evidence of the data is contained in their likelihood. Although some criticism exists, the LP is broadly accepted in the field of epistemology, due to being derived from extremely simple axioms (Barnett, 1999; Good, 2009b; Bandyopadhyay and Forster, 2011; Grossman, 2011). This first step alone, therefore, puts some pressure on frequentist hypothesis testing since it does not meet this criterion. In contrast to that, frequentist estimation, like calibration, parameterization, regression etc.; does not necessarily conflict with this principle. Here, the problem comes in the second step, i.e., using the evidence to justify belief. Frequentism does not deal with belief but uses the evidence, or some proxy thereof, to jump directly to decisions. As outlined above, decisions are made by deriving point estimates, for instance by applying a significance criterion to a p-value or an optimality criterion to some estimation procedure (**Table 1**). Some optimality criteria can be derived on evidential grounds, like the Maximum Likelihood (ML) estimator. While ML estimators are widely used, many other estimators exist, which typically divert from ML by trying to reproduce only certain features of the data or contain some application-specific reasoning (Krause et al., 2005; Pushpalatha et al., 2012; Bennett et al., 2013; Moriasi et al., 2015). This work flow, which forms the blueprint of most inference techniques

TABLE 1 | Comparison of the different concepts used in frequentism and Bayesianism.


in hydrogeology, is in contrast to the principles of Bayesian inference. First, Bayesian inference meets the LP by using only the likelihood to assess the evidential support of the data. As mentioned above, adherence to the LP is not unique to Bayesianism, but a necessary prerequisite. The most important difference comes in the next step, when Bayesian inference uses Bayes' theorem to compute the belief that follows from the evidence. This step is simply a conclusion from the axioms of probability theory and a number of more recent studies have demonstrated how updating through Bayes' theorem does indeed maximize the epistemic accuracy of an agent (Greaves and Wallace, 2006; Leitgeb and Pettigrew, 2010; Easwaran, 2013). However, the application of Bayes' theorem requires the derivation of the prior probability, which is regularly criticized. We will deal with this question in more detail below and continue for now with the above schematic. Having determined the belief given the evidence concludes the inferential part of the statistical analysis. This, however, leaves open the last step of the analysis, which is to make an informed decision. In above terms, this can mean to decide which parameter θ to use or which hypothesis to accept. In Bayesianism, decision making is done by maximizing the expected utility. Like the two other steps, the decision making through maximizing the expected utility is derived from simple axioms making it the most well-subscribed paradigm in decision making. What is important for us is that this last step is independent of the other two and the specification of the utility function is therefore left to the decision maker. This clear separation of inference and decision makes Bayesianism so relevant to uncertainty analysis.

Combined, reasons like this have led to the strong rise Bayesian methods have seen in many fields like physics (von Toussaint, 2011), biology (Huelsenbeck et al., 2001), environmental science (Clark, 2005), clinical research (Berry, 2006), genetics (Beaumont and Rannala, 2004), psychology (Wagenmakers, 2007), cognitive science (Clark, 2015), and many more. In addition, these reasons have made Bayesianism the leading paradigm in the field of philosophy of science (Howson and Urbach, 2005; Bandyopadhyay and Forster, 2011; Easwaran, 2011a).

#### 2.3. Bayesian Uncertainty Analysis

Having established Bayesianism as the most appropriate framework for uncertainty analysis, we will quickly restate the basic properties of Bayesian inference and prediction. In addition to that, we will demonstrate how the Bayesian framework provides an axiomatically based definition of uncertainty and therefore allows a quantitative assessment of uncertainty reduction as provided by the inference.

#### 2.3.1. Inference

Inference is defined as the process of characterizing a probability function using data. In Bayesian inference, this probability is defined over the possibility space (Kruschke, 2010). As the name implies, is supposed to contain all states that are possible for a given situation, i.e., states whose probability cannot be set a priori to zero. To avoid the curse of dimensionality, this space is typically approximated by a parsimonious parametric model, which is fully determined by the specification of its parameters θ. The initial certainty for these parameters θ, is the aforementioned prior p(θ). The likelihood of each parameter set θ is determined by the data generating process, which in hydrogeology is usually defined through partial differential equations. Combining prior and likelihood, Bayes' theorem can now be used to determine the probability conditioned on the data z called the posterior

$$p(\theta|z) = \frac{p(z|\theta)}{p(z)} p(\theta). \tag{1}$$

The only missing element in Equation (1) is the probability of the data p(z) often called the marginal likelihood or the evidence. However, for most scenarios, this probability is only a normalization constant. It can therefore be omitted and Equation (1) can be computed by normalizing p(z|θ)p(θ). This latter form is often used in the literature, since modern sampling methods are versions of the Markov-Chain-Monte-Carlo method, which guarantees this normalization by design.

Looking at this workflow, it becomes clear how Bayesian inference is the general probabilistic framework for the inverse problem with the likelihood being the Bayesian representation of the relevant forward problem.

#### 2.3.2. Prediction

Using the conditioned, i.e., posterior, probability p(θ|z), the predictive probability for new unobserved data, i.e., predictions, z ∗ is given by

$$p(z^\*|z) = \int\_{\Theta} p(z^\*|\theta) p(\theta|z) d\theta. \tag{2}$$

Bayesian prediction of new data z ∗ given the old data z is therefore achieved by marginalizing the posterior probability p(θ|z) times the predictive probability p(z ∗ |θ) of the model as defined by θ.

This influence of the parameterized space of possible worlds 2 on both Bayesian inference and prediction also provides a formal description of two closely related epistemological problems in science; namely the Theory-ladenness of Science and the Duhem-Quine Hypothesis. The former roughly states that scientific inference is always affected by, usually implicitly held, beliefs of the investigator and is strongly associated with the works of Kuhn (1962) and Feyerabend (1975). The latter is more general and states that scientific inference is always underdetermined by our observations and additional assumptions are needed to make sensible conclusions. Next to Duhem (1906) and Quine (1951), important contributions to the development of this notion came, e.g., from Van Fraassen (1980), Laudan (1990), and Stanford (2001). Note that the Bayesian framework does not solve this problem but does make the impact of 2 on the findings transparent. This means that all theoretical presuppositions are contained in the definition of 2 and only influence inference and prediction by virtue of its choice and through the aforementioned equations.

#### 2.3.3. Uncertainty

In addition to properly describing the change of beliefs due to new evidence, Bayes' theorem provides a mathematically rigorous way to characterize the uncertainty represented in a probability distribution as well as the uncertainty reduction achieved during the inference. This is possible due to the intricate relationship between the concept of information in information theory and the way probabilities are updated in Bayesian inference (Ebrahimi et al., 2010). According to Shannon (1948), the information of a particular value of θ<sup>i</sup> is given by I(θi) = − log<sup>2</sup> (p(θi)), with log<sup>2</sup> being the logarithm of base 2. Due to this choice, information is usually measured in bits, with other bases simply leading to other units. To characterize the information content of a probability function of a discrete variable, Shannon (1948) introduced the expected value of information

$$H = \int p(\theta) I(\theta) \mathrm{d}\theta = -\int p(\theta) \log\_2(p(\theta)) \mathrm{d}\theta. \tag{3}$$

As recounted by Tribus and McIrvine (1971), Shannon initially called this quantity uncertainty but was unfortunately convinced by John von Neuman to use the term entropy instead. In addition, the Shannon entropy only describes the uncertainty with respect to a state of complete ignorance which is implicitly defined in his first and second axioms. Due to this limitation, the Shannon entropy is not suited for Bayesian inference, where updating from arbitrary priors to posteriors is possible. To that end, Equation (3) needs to be amended such that the uncertainty with respect to other degrees of certainty can be described. This extended concept is known as the Kullback-Leibler (KL) divergence (Kullback and Leibler, 1951) but should be more appropriately called relative entropy/uncertainty or information

$$D\_{\rm KL} = \int p(\theta) \log\_2 \left( \frac{p(\theta)}{q(\theta)} \right) d\theta. \tag{4}$$

Due to its positive sign, the KL divergence is the negative relative entropy of p with respect to q. It therefore belongs to the class of entropy measures meaning that uncertainty is the entropy of a probability function. Such measures have several advantages compared to, say, variance-based measures like Sobol indices that are often used in the literature (Ebrahimi et al., 2010). Compared to the Shannon entropy, this quantity is more fundamental in several ways. First, DKL can be easily extended to continuous as well as multidimensional variables. Second, DKL describes the uncertainty as expressed in one distribution with respect to another and therefore reduces to the Shannon entropy in the marginal case of a flat q. Finally, DKL connects the concept of information with the updating from the prior to the posterior in Bayesian inference. If p and q are identified with the posterior and prior distribution, respectively, then DKL is a measure for the information and therefore the uncertainty reduction achieved during the inference (Hou, 2005; Tang et al., 2016).

#### 2.4. Challenges of Bayesianism

Although Bayesianism has become such a popular position in the sciences, it has also received its fair share of criticism (Gelman, 2008; Easwaran, 2011b). These criticisms often include rather formal issues like the problem of logical omniscience and the problem of old evidence (Garber, 1983). Others, however, are more relevant to its applicability and should therefore be discussed in the following. Looking at Equation (1), we see that Bayesian inference consists of determining three expression only. Next to the likelihood, which is also often used outside of Bayesian inference, two expressions are peculiar to it and therefore need to be looked at in detail.

First, let us look at the marginal likelihood in Equation (1), which poses the biggest computational problem. Since a direct computation of this expression generally involves multi-dimensional integrals, Bayesian inference was, for a long time, confined to a comparably small number of simple cases. Nowadays, Markov-Chain-Monte-Carlo (MCMC) methods are used (Gelfand and Smith, 1990; Tierney, 1994; Chib and Greenberg, 1995), which circumvent the computation of these often-intractable integrals by sampling directly from the nonnormalized posterior. Implementations of MCMC samplers exist as either standalone versions (Lunn et al., 2009; Gelman et al., 2015; Depaoli et al., 2016) or they are implemented for popular languages like R (Martin et al., 2011; Lindgren and Rue, 2015; Denwood, 2016) and Python (Patil et al., 2010; Foreman-Mackey et al., 2013).

Second, let us turn to the prior in Equation (1), which is both a theoretical and computational problem. The theoretical problem, known as the problem of the priors (Osherson et al., 1993), is caused by the limited number of restrictions that Bayesianism puts on a reasoning system to be rational. Taken to the extreme, an agent would be free to believe anything as long as there is no contradiction to the axioms of probability (Romeijn, 2017). In reality, however, common sense dictates that a sound opinion is constrained by additional sources of evidence. Responses from Bayesians to this obvious clash have been mixed, with roughly two extreme camps existing; subjective Bayesianism and objective Bayesianism. Subjective Bayesians stick to the purely theoretical principles and try to mitigate the conflict through expert elicitation, i.e., the application of formal rules to elicit expert opinions on a given topic and turn them into prior distributions (O'Hagan et al., 2006; Albert et al., 2012). On the other end, we find objective Bayesians who claim that additional conditions on rational beliefs, in particular prior beliefs, are necessary. These additional conditions revolve around the principle of equivocation, which means that, if no evidence favors one possibility, a rational agent should initially equivocate between all possibilities (Williamson, 2010). A more sophisticated version of this idea is the Maximum Entropy (ME) method (Jaynes, 1957a,b), where non-flat priors are possible if some additional, often physically based, constraints or symmetry arguments make some possibilities less credible from the start. In practice, most statisticians, scientist and engineers strive for objectivity but use primarily tried-and-tested approaches, which strike a balance between objectivity and applicability. While the ME method has found some application in hydrogeology (Woodbury and Ulrych, 1993), most studies rely on using flat or extremely wide distributions over some parameter range derived from the literature (Woodbury and Ulrych, 1996, 2000; Marchant and Lark, 2007; Diggle and Ribeiro, 2007; Murakami et al., 2010; Laloy et al., 2013; Shi et al., 2014; Geiges et al., 2015; Mara et al., 2017; Hayek et al., 2018). Unfortunately, this procedure is not as harmless as often perceived, since neither are ranges particularly objective nor are flat priors necessarily uninformative (a property which is often used as a proxy for objectivity) (Gelman et al., 2017). In general, current use of objective and uninformative priors in hydrogeology is seriously lacking compared to the standards established in statistics. To give an example of the latter, let us consider so called reference priors. These priors are based on the idea to systematically minimize the impact on the inference. If properly done, the results would then be dominated by the data alone (Bernardo, 1997). Another example is the use of the data themselves to determine the best prior. This method is called empirical Bayes (Carlin and Louis, 2000; Malinverno and Briggs, 2004), since it uses only the empirically available data for the inference. These methods have been very successful in statistics due to their ease of application and apparent objectivity (Valakas and Modis, 2016) but have been heavily criticized on theoretical grounds. The main critique is that by using the data for both the likelihood and the prior, empirical priors–and to a lesser extend reference priors as well—use the data twice. This is a clear violation of the LP, which is generally seen as a necessary element of Bayesianism (Lindley, 1987; Good, 2009b).

A final point concerns Bayesian decision theory. Including this topic under challenges may seem strange given that the Bayesian framework enjoys the best integration with decision theory of any inferential framework. Beginning with the seminal works of von Neumann and Morgenstern (1944) and Savage (1954), Bayesianism has become the de-facto standard in modern decision theory (Berger, 1985; Jeffrey, 1992; Bernardo and Smith, 2000; Robert, 2001; Koehler and Harvey, 2004; Baron, 2004; Parmigiani and Inoue, 2009; Gilboa, 2009). It should consequently be one of its biggest assets. Alas, that is not what we see. Instead, very little effort has been devoted to connecting Bayesian inference with any of the established models from decision theory. To substantiate this pessimism, we simply refer here to the review of Tartakovsky (2013), who gives an excellent overview of this topic yet still fails to find more than a handful of studies, which apply Bayesian decision theory to hydrogeology. As explained above, this lack is of little consequence for the specific topic of uncertainty analysis. It is, however, clearly a challenge for Bayesian inference in general. Looking at **Table 1**, current practice means to only implement the first and second step out of all three. While not the focus of this manuscript, we opine that a full appreciation of the Bayesian framework will only become a reality once all its elements are common knowledge and regularly applied.

# 3. WHAT KIND OF UNCERTAINTIES ARE WE TALKING ABOUT

To organize the different forms and sources of uncertainties, a wide range of often conflicting notations is used in the literature (Hoffman and Hammonds, 1994; Hofer, 1996; Walker et al., 2003; Brown, 2004; Carrera et al., 2005; Refsgaard et al., 2007; Kwakkel et al., 2010; Biegler et al., 2010; Refsgaard et al., 2012; Guillaume et al., 2012; Tartakovsky, 2013; Caers et al., 2014; Bond, 2015; Enemark et al., 2019). In the following, we are going to describe and contextualize these notations using the formalism established above.

# 3.1. Understanding Uncertainties Using the Framework of Bayesianism

A common approach is to separate uncertainties into epistemic and aleatoric uncertainties (Kiureghian and Ditlevsen, 2009; Beven and Young, 2013; Bond, 2015). As already explained above, all uncertainties in Bayesianism are necessarily epistemic and aleatoric uncertainties simply do not exist in this framework. In our opinion, this distinction can make sense for practical applications because, due to often highly non-linear processes, reality always has a tipping point—sometimes sudden, sometimes more gradual—beyond which the additional collection of data becomes too costly to be reasonably entertained. Identifying such tipping points is important for any engineering task in order to estimate sensible directions for the additional data gathering. This notion is often implicitly confirmed by authors who otherwise seem to argue for the physical presence of aleatoric uncertainty within macroscopic phenomena. Fox and Ülkümen (2011) for example admit as much when saying "Aleatory uncertainty is attributed to outcomes that for practical purposes cannot be predicted and are therefore treated as stochastic". Another situation, where the use of aleatoric uncertainty seems justified, is in the presence of so-called statistical uncertainty (Beven and Young, 2013). This means that the statistical variation of a given population, say, the conductivity values of an aquifer, puts an "inherent" limit on how much the uncertainty can be reduced. From a Bayesian perspective, such reasoning is simply false. Kiureghian and Ditlevsen (2009), for instance, articulate this problem when stating that "The distinction between aleatory and epistemic uncertainties is determined by our modeling choices." To illustrate this point and get at the root of this prevailing misunderstanding, let us look at the already used example of conductivity values of an aquifer. Using, e.g., the whole aquifer as the population, there is indeed an intrinsic limit of how much the uncertainty can be reduced through sheer data collection. It would, therefore, seem that this uncertainty fits the above definition of being aleatoric. However, as Kiureghian and Ditlevsen (2009) pointed out, there is no metaphysical reason to model the whole aquifer as a single statistical population. If enough data are collected, the aquifer can easily be split up into, say, its hydrofacies, each of which would now have a muchreduced statistical uncertainty. This process of fine graining our statistical model, depending on the amount of data, can be repeated ad infinitum, which shows that no statistical variation is ever intrinsic to reality but only determined by our model. The last class of examples, which are often used to demonstrate aleatoric uncertainty are actually cases of deep or Knightian uncertainty (Fox and Ülkümen, 2011; Beven and Young, 2013). Deep uncertainty is certainly an important and even dominant form of uncertainty in everyday situations. However, as explained above, it should be modeled by the Dempster-Shafer framework and treating it within a probabilistic context is necessarily error prone. In summary, aleatoric uncertainty, as used in the literature, is an inconsistent mixture of several distinct concepts, with varying levels of usefulness.

The second differentiation is to separate uncertainties into inferential and predictive uncertainties. Within Bayesianism, these uncertainties are simply defined via Equations (1) and (2), respectively. This means that the inferential uncertainty is the DKL of the posterior vs. the prior distribution, whereas the predictive uncertainty is given as the DKL of the predictive distribution with the data vs. without them.

Another important differentiation is to separate uncertainties into input and parametric uncertainties (Refsgaard et al., 2012; Tartakovsky, 2013). Within the context of Bayesianism, input uncertainty is simply the uncertainty that is passed down from receiving nodes in a Bayesian network. This means, the input uncertainty of a given node is the combined uncertainty of its parent nodes. On the other hand, parametric uncertainty is uncertainty in the parameters θ of the used parametric model and is therefore identical to the inferential uncertainty given by Equation (1).

The last concept to be discussed is the topic of structural and conceptual uncertainty. Both these terms are used in sometimes overlapping and sometimes conflicting ways (Refsgaard et al., 2012; Tartakovsky, 2013; Enemark et al., 2019). In general, it is not even clear whether these two terms do differ in meaningful ways. From a Bayesian perspective, both refer to the necessary approximation of the possibility space by a lowerdimensional parametric subspace 2. Since describing , e.g., the conductivity field of a real-world aquifer, in full detail is both impossible, due to the scarcity of data, as well as numerically intractable, such approximations will always be necessary. From our perspective, it can be beneficial to use two different terms in order to distinguish between the uncertainty expressed between the different parametric models, called structural models in the following, and the uncertainty being expressed within any such given structural model. In the following, we will focus on the latter and use the term structural uncertainty to describe this category.

Having used the Bayesian framework to put the different forms of uncertainty into a proper context, we will finish this section by ranking these different forms according to their relevance for hydrogeological modeling. Since such a ranking is strongly dependent on the context, we will shortly discuss each form individually. First, parametric uncertainty is doubtless one of the dominant forms regardless of the situation. Bayesian inference is perfectly suited to handle it, provided that proper priors are provided. Structural uncertainty is also very important, due to the often-pronounced spatial pattern of many aquifers. Handling this form of uncertainty as well as structural modeling in general is comparably underdeveloped. Instead, structural uncertainty is often recast as a form of process uncertainty, in particular in the case of transport processes (Neuman and Tartakovsky, 2009). While such alternative process models may be important for pure modeling purposes, we are very skeptical about their use in uncertainty analysis. In addition to this, the parameters of such models are often pure convenience parameters and they are difficult or impossible to condition on point measurements. As mentioned above, conceptual uncertainty is not going to be the focus of this paper. Instead, we are going to describe in the following the range of structural models, which are used in hydrogeology and try to identify the most relevant paradigms. Finally, input uncertainty is technically not part of hydrogeology since the uncertainty is passed down from the parent nodes of the Bayesian network. This is quite apparent in case of groundwater recharge, where the input uncertainty is the accumulated uncertainty of the meteorological, land surface, and soil compartments of a complex hydrological model.

# 3.2. On the Role of the Structural Model

At this point, we have identified structural and parametric uncertainty as the most relevant categories of hydrogeological uncertainty. While in theory both are of similar importance, the way to handle them in practice is very different. Parametric uncertainty can be reduced by collecting more data and is primarily a question of data acquisition. On the other hand, structural uncertainty is connected to the model for the subsurface heterogeneity itself. It is therefore much harder to quantify, which has wide ranging ramifications for uncertainty analysis. To investigate this problem in more detail, we will first present and discuss the most common paradigms for generating subsurface structures.

The most famous of these paradigms is the Gaussian process (GP) model, also known as Multivariate Gaussian or a Gaussian random field (Rasmussen and Williams, 2006). In its basic form, a GP is a very parsimonious model and can therefore be applied in situations where only few data are available, as often the case in hydrogeology. In addition, a GP is hierarchical by nature and therefore scales well with the amount of data available, i.e., the dimensionality of 2 can be arbitrarily matched with the amount of data available for the inference (Gelfand and Schliep, 2016). However, employing a GP as the structural model for a conductivity field makes a number of strong assumptions about the properties and characteristics of the conductivity field, some of which have been strongly criticized (Gómez-Hernández and Wen, 1998; Zinn and Harvey, 2003; de Marsily et al., 2005; Linde et al., 2015). The most important of these criticisms concerns the inability of GPs to reproduce long-ranging highconductivity structures, which are reported to exist in many realworld aquifers (Abelin et al., 1991; Zheng and Gorelick, 2003;

Kerrou et al., 2008). As a result, using a GP as the structural model for an aquifer will lead to a failure to (i) detect the presence of such features as well as (ii) to predict certain behaviors of, say, break-through curves (Heße et al., 2015; Savoy et al., 2017).

To improve on some of the limitations of GPs, truncated pluri-Gaussian models have been developed (Le Loc'h and Galli, 1997; Galli et al., 1994; Emery, 2004; Armstrong et al., 2011). These methods can be seen as an extension of the Gaussian paradigm by embedding Gaussian SRFs within a larger hierarchical framework. Here, hierarchy means that the Gaussian SRFs are used to create a larger spatial structure. This larger structure is typically representing distinct hydrogeolocial units, like hydrofacies or lithofacies. Despite some success in recent years (Emery, 2007; Mariethoz et al., 2009; Serrano et al., 2014), their overall geological realism has remained limited, and alternative paradigms have continued to attract considerable attention.

Surface-based modeling is one of these other paradigms (Caumon et al., 2009), often implemented in terms of implicit surfaces (Calcagno et al., 2008; Chilés et al., 2004). Although not particularly suited for modeling intricate structures and low scale heterogeneity, this paradigm provides a realistic representation of the main geological structures and allows to integrate a good range of information, including geological data (contact points, surface orientations, faults) but also information coming from geophysical surveys.

Geological structures can also be represented as geometrical objects using object-based models (Koltermann and Gorelick, 1996). Object-based method are mainly based on geometrical considerations about the expected shapes.

Process-based methods on the other hand arguably provide the most realistic representation of real word structures (Koltermann and Gorelick, 1996). Software implementations of this paradigm are available both as commercial and academic releases, and for unconditional simulations the computational costs are acceptable. Nevertheless, as is the case for object-based methods, process-based methods have difficulty in honoring all the observed conditioning data, which puts some limit on their applicability.

The last, modeling paradigms discussed here are the multiple-point statistic (MPS) based techniques (Guardiano and Srivastava, 1993; Strebelle, 2002; Mariethoz and Caers, 2014). In contrast to geostatistical methods based on two-point statistics, these methods allow to reproduce more realistic structures, which better represent important features observed in real world aquifers like connectivity. Computational costs remain relatively high compared to other paradigms. Nevertheless, MPS simulation algorithms intrinsically honor all the observed conditioning data, and the flexibility of the technique allows for the incorporation of information coming from different sources, in a straightforward way.

Together, these paradigms form the basis of most of the models used for generating subsurface heterogeneity. In addition to the properties already mentioned, they also differ in how much they are amendable to a Bayesian framework (**Table 2**). The Gaussian paradigm is very well-suited, since most of its parameters are simple statistics of typical hydrogeological variables (e.g., conductivity, porosity, storativity etc.). Consequently, there exists a direct way to derive these parameters from real-world measurements. The truncated pluri-Gaussian paradigms scores lower in this regard since a crucial feature of this method is the derivation and application of the truncation rule. These rules usually ought to come from expert elicitation but little to no guidelines exist on how to formalize this process. The surface-based paradigm fares much better in this regard since the inference of subsurface structures is able to draw on observable features of the subsurface. In fact, the possibility to frame these paradigms within a Bayesian framework was already explored by Wellmann et al. (2018) and de la Varga et al. (2018). Next is the object-based paradigm, where quite often objects are drawn based on purely geometrical considerations. It is therefore not straightforward to use this paradigm in a Bayesian framework. Nevertheless, object-based methods can profit from the statistics about the morphological attributes of real-world geological objects (Gibling, 2006; Colombera et al., 2012). Process-based models, on the other hand, are built on a plausible physical model by mimicking the geological genesis of the subsurface. However, the parameter of these models are usually pure convenience parameters making the derivation of prior PDFs subjective. The last paradigm discussed above are MPS, which is not without problems from a Bayesian perspective. However, its overall aptitude is arguably higher than the two former paradigms. First of all, realizations of MPS models are easy to condition on point measurements, which is an important feature. Looking at the generating mechanism itself, we see that its parameters are simple convenient parameters that are not directly connected to physical principles. Despite this clear drawback, MPS realizations are based in training images, which means the method is based on observable features of the subsurface.

In conclusion, we would argue that two of the above presented paradigms stand out as the most viable candidates for Bayesian uncertainty analysis. First, the Gaussian paradigm, which scores high on almost every metric except geological realism (see **Table 2**). While this is only one point of many, it is arguably the most important one. At the same time, GPs score so high on the other metrics, in particular its wide use, that it should not be excluded. Furthermore, GPs are good candidates for subdomain models within a larger hierarchical modeling framework and can consequently form an important component of a larger and more realistic framework. The second relevant paradigm is MPS, which, in some sense, can be placed on the other end of the spectrum of the paradigms presented here. This is to say that models based on MPS have a high degree of geological realism but suffer from a lack of ready-made software tools, that they are comparably unknown to practitioners and that they are computationally demanding. However, both paradigms have the ability to incorporate a variety of data sources and the generation of heterogeneous structures is based on observable characteristics of the subsurface. In addition, if the MPS framework is used to generate only the categorical SRF of the different hydrogeological units, it can be seamlessly integrated with the Gaussian paradigm. With these two candidate paradigms in mind, we will continue our discussion on Bayesian uncertainty analysis.


TABLE 2 | Overview of the different paradigms for sub-surface structure generation applied in hydrogeology.

#### 4. TOWARD A DATA-DRIVEN UNCERTAINTY CHARACTERIZATION

As we have discussed earlier, Bayesianism is not free of practical and theoretical problems. Of these, the question of prior derivation was presented as the most pressing, with no universally accepted guideline for prior derivation existing in the literature. In this last part of the manuscript, we are going to lay out the current challenges and explain a possible solution by using data-driven priors.

## 4.1. Putting Prior Derivation on Solid Grounds

In this paper, we want to make the case for combining aspects of objective Bayesianism with frequentist reasoning (Rubin, 1984; Bayarri and Berger, 2004; Little, 2006). While we fully agree that objectivity is an important goal in any scientific or engineering enterprise, we do not think that this needs to be achieved by minimizing the impact of the prior as often argued. On the contrary! Priors can have both a strong impact on the analysis and being objective. This can be achieved if these priors are derived from well-defined empirical frequencies (see, e.g., Li et al., 2017 for a rare example for hydrogeology). Hydrogeological variables like conductivity and porosity are physical properties or can at least be soundly derived from them (Di Palma et al., 2017). As a result, their parametric and structural uncertainties can be calibrated against observed frequencies. This feature, that the parameters of stochastic hydrogeology relate to physical quantities, makes this mixture of Bayesian analysis and frequentist reasoning the natural choice for prior derivation. This notion has, e.g., been voiced by Gelman (2008) when saying that a hardcore Bayesian is someone "who would apply Bayesian methods to all problems" whereas a reasonable person "would apply Bayesian inference in situations where prior distributions have a physical basis or a plausible scientific model."

Such basis should be a formalized knowledge base, i.e., a database DB = (X<sup>i</sup> , Zi)i≤n, containing all investigated and cataloged cases. In the field of hydrogeology, these cases should be identified with investigated and cataloged sites. Furthermore, n is the number of these sites, Z<sup>i</sup> are the measurements of the target variable at each site i and X<sup>i</sup> = (X 1 i , . . . , X m i ) is a vector containing the m cataloged characteristics/features of each site. In hydrogeology, such characteristics may include latitude, longitude, climatic properties, rock type, environment type, physiographic properties etc.

#### 4.1.1. Prior Derivation Using Machine Learning

To derive prior distributions from an established database DB, a large variety of techniques can be used, including machine learning. Here, for brevity, we illustrate only a single example of these techniques, namely supervised feature learning (**Figure 2**). As described above, the used database DB contains, next to the measurements, the features associated with each site. Feature learning would allow to determine the most predictive features as well as the functional dependency for any give site. The result would be a function that maps the observable features of a site to the distribution of its conductivity values (or any other target variable). Common machine learning methods that can tackle such tasks include Random Forests (Breiman, 2001; Segal and Xiao, 2011), Gradient Boosting Trees (Elith et al., 2008), or Bayesian Additive Regression Trees (Chipman et al., 2010; Pratola et al., 2014; Kapelner and Bleich, 2016).

#### 4.1.2. Prior Derivation Using Similarity-Weighted Frequencies

Machine learning tools like neural networks are well-established in data science. However, they can be extremely computationally expensive, need large amounts of data and the resulting models are notoriously hard to interpret.

Recent developments in the field of epistemology have provided a sound mathematical procedure to formalize the intuitive notion about how different cases can be made amenable to frequentist reasoning and therefore provide the basis for combining aspects of frequentism and Bayesianism (Billot et al., 2005; Gilboa et al., 2006, 2010). The basic idea behind it can be easily illustrated (**Figure 3**). Given a similarity function s, more on this later, the probability of the target variable at a new site n + 1 can be expressed as

$$p(Z\_{n+1}^s) = \frac{\sum\_{i \le n} s(X\_i, X\_{n+1}) p(Z\_i)}{\sum\_{i \le n} s(X\_i, X\_{n+1})}.\tag{5}$$

Since Equation (5) strongly depends on the specification of the similarity function s(Z<sup>i</sup> , Zj), Gilboa et al. (2010) propose a simple best fit approach to find such a function, as the sopt(Z<sup>i</sup> , Zj) that best explains the given database. To handle the curse of dimensionality, a parametric model should be used. Following again Gilboa et al. (2010), we use an exponential model here for

demonstration

$$s(X\_i, Y\_j) = \exp\left(-\sqrt{\sum\_{j \le m} w\_j (\mathbf{x}^j - \mathbf{x}^j)^2}\right). \tag{6}$$

At this point, the task of finding the prior distribution for a target variable has first been transformed into finding s, and then by virtue of a parametric model—transformed into finding the appropriate weights **w** = (w1, . . . ,wm). Gilboa et al. (2010) propose a simple best-fit approach, i.e., finding **w**opt such that the sum of squared errors between all elements in the database is minimized. The principle behind this similarity function and the way it is used to estimate probability is identical to kernel density estimation but extended to the feature space of our database. The choice of the name similarity function, instead of kernel, was motivated by Gilboa et al. (2010) to emphasize the epistemic nature of the procedure.

#### 4.1.3. Prior Derivation Using Bayesian Hierarchical Modeling

Although the above procedure is axiomatically elegant, it is not without flaws. One problem is that the used database might be too small to contain a relevant number of observations. Basing the prior on such a database would lead to attribute a zero probability to structures that have not yet been cataloged. Another problem is the incorporation of data on different scales like summary statistics of certain sites.

Such challenges can be handled by Bayesian hierarchical modeling, which provides a natural way to partially pool data from different sites by incorporating the structural dependencies between the data points. In hydrogeology, it is not immediately clear how data from different sites can be combined and jointly used for the inference. Hierarchical models facilitate a partial pooling by first assimilating data from each site independently and, in a next step, modeling each site as elements of a population of sites (**Figure 4**). Bayesian statistics is easy to adapt to such schemes by modeling the parameters of each site as being conditional on the statistics of the upper levels.

One drawback of Bayesian hierarchical modeling, compared to a similarity weighted approach, is that the prior distribution of a new, as of yet, unexplored site, is simply a random sample from all possible sites and therefore has a comparably large uncertainty. Narrowing down the uncertainties could be done by narrowing down the number of used sited by restricting the analysis to similar sites, only. Yet, this would necessitate the existence of a huge database, which we currently do not have (Cucchi et al., 2019).

# 4.2. Hydrogeological Data Science in the Context of Bayesian Uncertainty Analysis

At this point, it should become clear that, irrespective of the details of the aforementioned approaches, the performance of data-driven methods is not only determined by the sophistication

of the used method but is equally dependent on the amount and kind of data being used (Halevy et al., 2009). This means that a simple algorithm, having a lot of data, can easily outperform a highly complex one, with only a modest amount of data. Increasing the amount of available data is consequently of similar importance compared to the development of ever better algorithms.

In general, data generation in hydrogeology is quite costly compared to, say, hydrology, meteorology or land-surface modeling. To counter such challenges, much has been invested in the development and deployment of cost-effective methods for subsurface characterization. As a result, the total amount of data being collected every year is quite substantial. However, collecting these data and making them available to practitioners remains difficult. To demonstrate why this a problem, we can use, e.g., the schematic proposed by Rogati (2017). Using the well-known hierarchy of human motivation (Maslow, 1943), Rogati (2017) promotes a hierarchy of needs in data science. According to this schematic, hydrogeology has focused most of its effort only on the first level of this hierarchy, where she puts data collection, data storage and data transformation. However, already at the second level, where she puts routines, protocols and infrastructure for moving and storing the data, hydrogeology is comparably underdeveloped. Given that the hierarchy proposed by Rogati (2017) has 6 levels in total, we can confidently state that hydrogeology has a long way to go before having a viable ecosystem for modern data-driven analysis.

This is not to say that there have not been efforts to provide standardized procedures for sharing and storing data in hydrogeology (Boisvert and Brodaric, 2011; Brodaric et al., 2018; Wojda et al., 2010). However, these data collection initiatives focus almost exclusively on indirect measurements like hydraulic heads or are restricted to some specific measurement sites. In contrast, the efforts to collect direct hydrogeological measurements have been so far been rather modest. For example, the largest open-access databases on the topic of hydraulic conductivity is the World-Wide HYdrogeological Parameter DAtabase (WWHYPDA, Comunian and Renard, 2009), which contains a little bit over 20.000 measurements from approximately 50 different sites. This does only reflect a tiny fraction of the total amount of data that has been collected on this topic and is not even close to what anyone would label as big data. This situation means that the field of hydrogeology is currently seriously under equipped for the deployment of any data-driven method in general and the derivation of data-driven priors in particular.

Focusing on the topic of data-driven priors, we can state that only a minimum number of tools currently exists. The WWHYPDA provides a modicum of data on hydraulic conductivity, which can be used to determine parametric uncertainty for a given site (Cucchi et al., 2019). However, the amount of data for other variables (e.g., porosity) is much lower and currently not sufficient for use in a Bayesian context. Concerning structural uncertainty, the situation is even less promising. Since the measurements cataloged in the WWHYPDA do currently not contain spatial coordinates, it is not possible to determine simple two-point statistics like a variogram or covariance function. This means that even for a simple paradigm like Gaussian SRFs, there are no databases from which prior distributions for, say, the parameters of a variogram function can be derived. It should come as no surprise, that the situation for more complex structural models is no better. In this last portion of the paper, we would therefore like to present a list of challenges as well as possible solutions for the field of Bayesian uncertainty analysis.

#### 4.2.1. Parametric Uncertainty

Currently, the amount of data contained in WWHYPDA barely allows one to tackle the challenges of parametric uncertainty. The modest amount of data represented in WWHYPDA is probably caused by the lack of user contributions and/or policies that encourage the systematic and central publication of hydrogeological measurements. Probably, the best way to quickly populate databases like WWHYPDA with a sufficient amount of data would be to incorporate data spread in regional/national repositories, but also strongly interact with boards of the main scientific journals and set up a tighter interaction between authors and open data collection initiatives.

#### 4.2.2. Structural Uncertainty (Gaussian Process Paradigm)

A Gaussian process is the simplest paradigm that can be used to describe structural uncertainty, with spatial structures defined by two-point statistics through variograms or covariance functions.

To the best of the authors' knowledge, there is no widely available open-source database, which provides practitioners with a catalog of either estimated variogram functions or measurements that allows one to estimate them.

The abilities of the WWHYPDA could be expanded to that purpose with only a modest amount of effort. Compared to its current implementation, only the coordinates of the measurements need to be added as a feature.

#### 4.2.3. Structural Uncertainty (Multiple-Point Statistics Paradigm)

In the text above, multiple-point statistics was identified as the overall most promising paradigm to realistically represent structural uncertainty. Some efforts have been made to share repositories of training images [see for example (2014), companion site of the book Mariethoz and Caers, 2014], in some cases focusing on some specific environments (Pushpalatha et al., 2008). In addition, other efforts were made to create database of analogs, with initiatives mainly sponsored by oil companies like the SafariDB (2019a), the Sedimentary Analogs Database and Research Consortium (2019b), the Fluvial Architecture Knowledge Transfer System (Colombera et al., 2012), CarbDB (Jung and Aigner, 2012), and WODAD (Kenter and Harris, 2006). In these cases, the access to the full functionalities of the database is very often restricted to the partner institution/companies. Therefore, when talking about open-access databases, the available resources are quite limited. In practice, a set of TIs that can even be remotely called representative of earths subsurface structures simply does not exists, and the efforts made by consortium sponsored or private companies are often governed by very restrictive access policies. As a result, the task of building up an open-access knowledge base for Bayesian structural uncertainty analysis, or any other data-driven modeling efforts, has to start from scratch. Within the scope of this manuscript, we can neither detail the specific architecture of such a knowledge base nor explain the necessary steps to create one. We will, however, try to formulate a set of desiderata that such a data base should meet.

First, the data base should contain a representative sample of the structures encountered in the subsurface. Such a desideratum may sound obvious w.r.t. any statistical analysis. It does, however, need special attention due to being somewhat vague and elusive. For example, particularly in the case of three-dimensional case studies, TIs could come from high-resolution reconstructions of aquifer analogs (Bayer et al., 2011; Comunian et al., 2011; Bayer et al., 2015), but also be the result of a more or less complex simulation with object-based or process-based methods. Therefore TIs, in particular within a Bayesian context, can represent very different entities depending on how they were created.

Second, to derive prior distributions, the most predictive features of the cataloged sites must be reported as well. This desideratum is again somewhat weak, since it is not a priori clear what features of a site are most predictive of its subsurface structure.

In case a database meeting these desiderata becomes successful, the algorithm for prior derivation has to be adapted to the specifics of the MPS paradigm. As already discussed above, recasting this paradigm in a Bayesian framework is not straightforward, since the parameters for generating random realizations are derived ad-hoc and cannot, in general, be exchanged between different workflows. So, instead of deriving prior distributions over some parameters, the prior distribution should be defined over the TIs themselves. In this framework, one could provide a given prior distribution to each TI, for example based on the ranking procedure proposed by Pérez et al. (2014) computed using a portion of the available data. Then, uncertainty could be assessed by distributing the number of realizations for each TI proportionally to the prior computed in the previous step.

Probably, embedding or strongly connecting a database of TIs within a structure like WWHYPDA, as argued by Comunian and Renard (2009), would improve its usefulness, because facies codes of categorical TIs could be directly linked to parameter distributions. Moreover, if included in the WWHYPDA structure, TIs could be also organized in a more efficient and flexible way within the provided catalog of hydrogeological environments.

# 5. CONCLUSIONS

In this manuscript, we made the case for a unified data-driven framework for hydrogeological uncertainty analysis. Following Pearl (1988b), we attempted this by first identifying the most suitable theory for such a framework, motivating its use, explaining its properties, identifying the current challenges, showing what kind of approximations needed to be made to make the framework viable, and finally detailing a road map to fill the gaps which currently exist.

As the ideal theory for such a unified framework, we have argued for Bayesianism. This was done by contrasting the Bayesian framework with its most relevant competitors. In the realm of uncertain reasoning, these competitors include fuzzy as well as Dempster-Shafer reasoning, whereas in the realm of probabilistic reasoning, the main competitor is the frequentist framework. While all of these frameworks have their merits, the Bayesian framework is the only one combining a sound epistemology with wide-spread use and application. We then explained the main features, as well as challenges, of this framework and how it relates to the specifics of the field of hydrogeology.

With this foundation, we proceeded to contextualize the different forms of uncertainty, used in the literature. Of these forms, we identified structural and parametric uncertainty as the two most relevant. Parametric uncertainty is mainly a question of data collection, data preparation, and data dissemination. In contrast, structural uncertainty is equally a conceptual challenge, i.e., the hydrogeological community is still lacking a paradigm for generating subsurface structures, which is widely accepted and used. To make this point, we reviewed the most common of these paradigms and discussed their most salient features. Special attention was put on their compatibility with a Bayesian framework. We tentatively identified two paradigms as the most relevant: Gaussian multivariate random fields as well as multiplepoint statistics. While the Gaussian paradigm is mathematically elegant, it suffers from a lack of geological realism. On the other hand, the multiple-point statistics paradigm is very promising due to its conceptual clarity. To provide the data for the needed prior distribution, training images are the natural choice.

Finally, we make the point that the field of hydrogeology needs to increase its efforts in order to provide a large openaccess databases with direct hydrogeological measurements. The examples given here are derived from specific applications, i.e., Bayesian uncertainty analysis, but the general point is independent of such concerns. As long as virtually all efforts are focused on the measurements themselves, we remain stuck on the first level of the hierarchy described by Rogati (2017). Without investing at least, a modicum of work into reaching the second level, all the advances of modern data science will remain restricted to extremely myopic and small data sets, at best,

#### REFERENCES


Barnett, V. (1999). Comparative Statistical Inference, 3rd Edn. Chichester: Wiley.

Baron, J. (2004). "Chapter: Normative Models of Judgment and Decision Making," in Blackwell Handbook of Judgment and Decision Making; Handbooks of Experimental Psychology, eds D. J. Koehler and N. Harvey (Malden, MA: Blackwell Publishing Ltd), 19–37.

and completely out of reach, at worst. From our perspective, the reasons for the inability of our community to "move up" remains somewhat of a mystery. After all, managing and maintaining a database is easy compared to the sophisticated network of measurement operations that are managed worldwide, and a minor redirection of resources from Level 1 to Level 2 of the schematic depicted by Rogati (2017) would make a big difference. In fact, we can hardly think of a more impactful way of how to spend one's money when trying to improve hydrogeological data science. We therefore close this paper with a call for such an initiative.

#### AUTHOR CONTRIBUTIONS

As the main author, FH has worked on every section of the article. AC's input was focused on the last portion of the manuscript dealing with the models for generating subsurface structures and the challenges of databases. SA provided overall guidance for the development of the main ideas presented here.

#### FUNDING

For the finalization of this manuscript. FH was financially supported by the Deutsche Forschungsgemeinschaft via grant HE 7028/2-1.


prior distributions in hydrogeology. Adv. Water Resour. 126, 65–78. doi: 10.1016/j.advwatres.2019.02.003


an information theoretic approach. Water Resour. Res. 49, 2253–2273. doi: 10.1002/wrcr.20161


Quine, W. V. O. (1951). Two dogmas of empiricism. Philos. Rev. 60, 20–43.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Heße, Comunian and Attinger. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Groundwater Contaminant Transport: Prediction Under Uncertainty, With Application to the MADE Transport Experiment

#### Aldo Fiori <sup>1</sup> \*, Antonio Zarlenga<sup>1</sup> , Alberto Bellin<sup>2</sup> , Vladimir Cvetkovic<sup>3</sup> and Gedeon Dagan<sup>4</sup>

*<sup>1</sup> Department of Engineering, Roma Tre University, Rome, Italy, <sup>2</sup> Department of Civil, Environmental and Mechanical Engineering, University of Trento, Trento, Italy, <sup>3</sup> Department of Water Resources Engineering, Royal Institute of Technology, Stockholm, Sweden, <sup>4</sup> School of Mechanical Engineering, Tel Aviv University, Ramat Aviv, Israel*

#### Edited by:

*Daniel M. Tartakovsky, Stanford University, United States*

#### Reviewed by:

*Gerardo Severino, University of Naples Federico II, Italy Felipe De Barros, University of Southern California, United States*

> \*Correspondence: *Aldo Fiori aldo@uniroma3.it*

#### Specialty section:

*This article was submitted to Freshwater Science, a section of the journal Frontiers in Environmental Science*

Received: *13 December 2018* Accepted: *21 May 2019* Published: *06 June 2019*

#### Citation:

*Fiori A, Zarlenga A, Bellin A, Cvetkovic V and Dagan G (2019) Groundwater Contaminant Transport: Prediction Under Uncertainty, With Application to the MADE Transport Experiment. Front. Environ. Sci. 7:79. doi: 10.3389/fenvs.2019.00079* Transport of solutes in porous media at the laboratory scale is governed by an Advection Dispersion Equation (ADE). The advection is by the fluid velocity *U* and dispersion by *DdL* = *U*α*dL*, where the longitudinal dispersivity α*dL* is of the order of the pore size. Numerous data revealed that the longitudinal spreading of plumes at field scale is characterized by macrodispersivity α*L*, larger than α*dL* by orders of magnitude. This effect is attributed to heterogeneity of aquifers manifesting in the spatial variability of the logconductivity *Y*. Modeling *Y* as a stationary random field and for mean uniform flow (natural gradient), α*<sup>L</sup>* could be determined in an analytical form by a first order approximation in σ 2 *Y* (variance of *Y*) of the flow and transport equations. Recently, models and numerical simulations for solving transport in highly heterogeneous aquifers (σ 2 *<sup>Y</sup>* > 1), primarily in terms of the mass arrival (the breakthrough curve BTC), were advanced. In all cases ergodicity, which allows to exchange the unknown BTC with the ensemble mean, was assumed to prevail for large plumes, compared to the logconductivity integral scale. Besides, the various statistical parameters characterizing the logconductivity structure as well as the mean flow were assumed to be known deterministically. The present paper investigates the uncertainty of the non-ergodic BTC due to the finiteness of the plume size as well as due to the uncertainty of the various parameters on which the BTC depends. By the use of a simplified transport model we developed in the past (which led to accurate results for ergodic plumes), we were able to get simple results for the variance of the BTC. It depends in an analytical manner on the flow parameters as well as on the dimension of the initial plume relative to the integral scale of logconductivity covariance. The results were applied to the analysis of the uncertainty of the plume spatial distribution of the MADE transport experiment. This was achieved by using the latest, recent, analysis of the MADE aquifer conductivity data.

Keywords: solute transport, heterogeneous porous formations, breakthrough curve (BTC), uncertainty, MADE experiment, stochastic subsurface hydrology

#### 1. INTRODUCTION

Aquifers pollution by various contaminants constitutes a major threat to fresh water resources all over the world. Unlike the accessible surface water bodies, groundwater pollution is detectable by wells which cover a limited zone and often respond after a large portion of the aquifer is already contaminated. Furthermore, the process is slow, occurring over periods of tens of years and cleaning by natural attenuation or remediation, whenever possible, is also slow. Under these circumstances, transport mathematical models, which may help analyzing field findings on one hand and predicting solute future spreading on the other, are of crucial importance.

There are various modes of quantification of transport. In this study we focus on characterization of solute plumes longitudinal spreading by the BTC (breakthrough curve) M(x, t) at vertical control planes located at longitudinal distance x = const, normal to the mean flow direction. An alternative and related measure is the longitudinal mass distribution m(x, t) as function of distance x, for a given time t. We limit the scope to inert solutes (tracers) and to constant mean head gradient J (natural gradient flow), a setup of interest for many applications and an essential first step toward analysis of more complex configurations. For the benefit of the reader not familiarized with the groundwater transport theory, we recapitulate in the following a few essential developments.

The traditional modeling of transport was based on column laboratory experiments (Bear, 1979) for which the macroscopic (at the pore, Darcy, scale) concentration C(x, t) satisfies the advection dispersion equation (ADE)

$$\frac{\partial \mathcal{C}}{\partial t} + U \frac{\partial \mathcal{C}}{\partial \mathbf{x}} = D\_{d\mathcal{L}} \frac{\partial^2 \mathcal{C}}{\partial \mathbf{x}^2} \tag{1}$$

where U = q/n is the macroscopic flow velocity (q-specific discharge, n-effective porosity) at the Darcy's scale and DdL is the longitudinal dispersion coefficient. For the large Peclet number Pe<sup>0</sup> = Ud/D<sup>0</sup> (d-pore scale, D0-molecular diffusion coefficient) encountered in applications it was found that DdL = αdLU, where αdLis the pore scale dispersivity, of order d (Bear, 1979).

Field findings (for a recent compendium see Zech et al., 2015) have revealed that the longitudinal α<sup>L</sup> (derived for instance with the aid of the spatial moments of aquifer plumes) is larger than αdL by orders of magnitude; α<sup>L</sup> was coined as longitudinal macrodispersivity in the literature. The contrast has been attributed to the impact of aquifers heterogeneity, manifesting primarily in the spatial variation of the hydraulic conductivity K(**x**). For illustration we present in **Figure 1** the spatial distribution of Y = ln K in a cross section of the Columbus Air Force Base aquifer, where the MAcro Disperion Experiment (MADE) took place (Boggs and Rehfeldt, 1990; Boggs et al., 1992). It was obtained by interpolating among the measured values provided by Bohling et al. (2012) at a relatively dense set of points. A few significant features of the aquifer in **Figure 1** are worth mentioning: (i) K varies by orders of magnitude, (ii) the spatial distribution of K is seemingly erratic and difficult to be captured by smooth interpolators, (iii) the zones of different K are elongated in the horizontal direction, the aquifer being coined as anisotropic at the field scale (notice that for clarity of representation the scale of reduction is smaller in the vertical direction with respect to the horizontal one, such that these zones are more elongated than how they appear in the figure).

The above features have a dramatic impact upon the spreading of plumes of solutes. For illustration we represent in **Figure 2** the concentration spatial distribution obtained by a numerical solution of the advective transport equation (in 2D). The velocity field **V**(**x**) was derived by a numerical solution of the flow equation for a K field statistically similar to that of **Figure 1**, with logconductivity variance σ 2 <sup>Y</sup> = 6, under conditions of constant mean head gradient J. The transport equation was solved by using a Smooth Particle Hydrodynamic (SPH) scheme, which is virtually free of numerical diffusion (Boso et al., 2013), for Pe = I/αdL = 1000 (I is the longitudinal integral scale of logconductivity).

A few qualitative features of distribution of C in **Figure 2** are: (i) for an initial rectangular pulse of constant C = C0, the plume becomes highly fragmented with time and of progressive spreading, (ii) the plume splits, with quicker advancing "fingers" in zones of high K , and practically stagnant solute in regions of

low K, (iii) like the K field (see **Figure 1**) the plume is seemingly erratic and defies a representation by smooth functions but, at the same time, it makes the identification of point concentration at a given location an elusive goal, (iv) in contrast, global measures like the mass arrival at vertical planes over the entire domain (the BTC M) smooth out the variations and the extent of spreading can be quantified for instance by αL, (v) it was found that the presence of the pore scale dispersion (primarily the vertical one) causes mixing which affects the local C, but has a minor effect on M (Fiori and Dagan, 2000) and (vi) space averages like M are the ones of interest in many applications, e.g., those in which the goal is to determine the mass of solute pumped by wells which intercept the plume (Fiori et al., 2016).

This state of affairs has motivated the emergence around 40 years ago (for a review see for instance the books by Dagan, 1989; Gelhar, 1993; Rubin, 2003) of a new discipline, namely stochastic subsurface hydrology. In its frame the hydraulic logconductivity field Y(**x**) is modeled as a stationary random space function whose univariate distribution and two point covariance are characterized by a few parameters: the geometric mean KG, the variance σ 2 Y and the horizontal I and the vertical I<sup>v</sup> < I integral scales, respectively. As a consequence, the steady Eulerian velocity **V**(**x**), is also a random space function, of constant mean **U**(U, 0, 0). The statistics of **V** are obtained by solving the flow equations for conditions of constant mean head gradient **J**(J, 0, 0) and random K(**x**). Similarly the concentration field C(**x**,t) is random and so are its global measures like the BTC M(x, t). The latter is obtained by solving the transport equation with advection by the random **V** and dispersion by the local pore scale dispersion tensor. Unfortunately, heterogeneity renders point concentration C(**x**, t) highly uncertain with a coefficient of variation that is controlled by pore-scale dispersion and reduces slowly with time (Fiori and Dagan, 2000). Uncertainty reduces considerably when global measures are used, such as M(x, t), or the sampling volume has dimensions comparable with the integral scales of Y, thereby making the ensemble mean hC(**x**, t)i a reliable and robust measure, similarly to M (Bellin et al., 1994; Tonina and Bellin, 2008). If the point concentration is of interest, such as in risk assessment for example, uncertainty can be reduced by focusing on the probability that a given concentration threshold is exceeded, irrespective of the position where this occurs, rather than focusing on a given fixed location where the ensemble mean concentration provided a unreliable estimate (Bellin and Tonina, 2007). Another option is by conditioning on available data (e.g., Y, head, concentration etc), for which, however, several conditioning points are needed in order to significantly reduce uncertainty. In any case, as previously stated our study focuses on the BTC M which is relevant to many applications and is quite robust and less prone to uncertainty than point concentration.

A common adopted assumption is that transport is ergodic in the sense that M (or similar global attributes) in a realization can be exchanged with the ensemble mean hMi. This is a basic tenet in many branches of physics and engineering, e.g., molecular diffusion driven by Brownian motion or effective properties of composite materials. It is justified by the large contrast between the microscopic length scales and the macroscopic ones of interest in applications. For groundwater transport ergodicity implies that the solute plume samples a sufficiently large aquifer volume compared to the integral scales so as to encounter zones of various K, representative of the entire population. Since typically I and I<sup>v</sup> are of the order of meters and solute plumes of tens of meters, the contrast is not so large and ergodicity may be not obeyed and prediction of M by models is subjected to uncertainty. This feature differentiates transport by groundwater from the traditional pursuit of the effective properties solely, prevalent in the literature on heterogeneous media.

Similarly, it is common to assume in applications that the various parameters and variables like J, U, KG, σ 2 Y , I are known. In reality, in the field they are only estimated and subjected to uncertainty as well, impinging on the uncertainty of M.

The aim of the present study is to provide a discussion of uncertainty in modeling transport in three-dimensional heterogeneous aquifers, with application to the MADE transport experiment (Zheng et al., 2011) as a platform for discussion; we summarize what we have learned in the last two decades or so, in view of applications, with a particular focus on uncertainty due to plume sampling (i.e., non-ergodicity) and incomplete knowledge of parameters that are both important for MADE. A novel analytical formulation is also proposed for assessing uncertainty due to lack of ergodicity by the plume. It is emphasized that this is not a review paper and we build primarily on our developments in modeling flow and transport in three-dimensional heterogeneous formations.

The plan of the paper is as follows: section 2 provides an overview of concepts development and paper aims, recapitulating some of our recent developments in transport of ergodic plumes; section 3 addresses the modeling of uncertainty in the prediction of the BTC, the main topic of the paper; section 4 presents the application of the uncertainty analysis to the MADE-1 experiment, relying on the latest published data; finally, section 5 summarizes and concludes the study.

# 2. BACKGROUND AND MATHEMATICAL PRELIMINARIES

## 2.1. The K Structure

As already mentioned above, we limit the study to stationary random Y(**x**). For sedimentary formations of concern here, the histogram of Y = ln K was found to fit a normal univariate distribution f(Y), of mean hYi = ln K<sup>G</sup> and variance σ 2 Y (see for instance the analysis of MADE data by Fiori et al., 2015; Bohling et al., 2016) which we adopt here. At the lowest order, the spatial structure is captured by the two point covariance CY(**x1**, **x2**) = σ 2 Y ρ(**r**) where ρ is the autocorrelation and **r** = **x<sup>1</sup>** − **x<sup>2</sup>** is the distance vector between the two points respect to which the covariance is computed. In turn, the assumed axisymmetric ρ is characterized by the finite horizontal I and vertical I<sup>v</sup> integral scales, respectively. The Y field is often assumed to be multivariate normal (multi-Gaussian), and then the structure is completely characterized by hYi and CY. This is a very convenient representation which is commonly adopted in numerical simulations or analytical approximations. For other types of structures higher multipoint correlations are required for a complete statistical characterization. Thus, in our past numerical simulations of 3D flow and transport (see for a summary our recent paper Jankovic et al., 2017) we have generated a few fields which share the same f(Y) and I, I<sup>v</sup> but differ at higher order as manifested for instance by the spatial connectivity of zones defined by classes of K. Thus, we considered besides the multivariate normal (multi-Gaussian) one, two Y fields devised by Zinn and Harvey (2003), obtained by transformations which led to more connected or disconnected zones of large conductivity. Besides, we investigated extensively flow and transport in a structure we coined as MIM (Multi-Indicator-Model): rectangular blocks which tessellate the space or spheroidal inclusions of dimensions 2I in the horizontal and 2I<sup>v</sup> in the vertical directions, respectively, and are of independent K = exp(Y). Unlike the three previous ones, the connectivity of different classes of K values is the same.

It is worthwhile to note that even for thoroughly monitored aquifers like that of MADE, field data do not generally allow for determining statistical parameters beyond f(Y) and ρ, i.e., KG, σ 2 Y , I, and I<sup>v</sup> and even those are estimated within ranges of values.

### 2.2. Flow

We consider here steady flow governed by

$$\begin{aligned} \mathbf{q} &= -K \,\nabla h \text{ (Darcy's law)},\\ \nabla \cdot \mathbf{q} &= 0 \text{ (continuity)} \rightarrow \nabla \cdot (K \nabla h) = 0 \end{aligned} \tag{2}$$

where h (**x**) is the head. With the assumed constant porosity n (which is much less variable than K), the velocity **V** also satisfies ∇ · **V** =0. The boundary condition of interest here is the one of constant head, such that the head gradient has components **J**(J, 0, 0). Consequently the mean velocity **U**(U, 0, 0) is also constant and its fluctuation **u**(**x**) is stationary. The relationship **U** = Kef**J**/n defines the effective conductivity and its dependence on the structural parameters is the subject of a vast literature (see e.g., Renard and De Marsily, 1997).

Analytical approximate solutions of the statistics of the h and **V** fields, as well as Kef , were obtained in the literature (e.g., Dagan, 1989; Rubin, 2003) by a first order approximation in σ 2 Y , presumably valid for weak heterogeneity. In contrast, based on numerical simulations, with values of the logconductivity variance up to σ 2 <sup>Y</sup> = 8, we have recently presented (Zarlenga et al., 2018) the dependence of the Kef components on σ 2 Y for different e = Iv/I and for the above different structures, as well as by the first order approximation. A striking result is that the first order approximation of the horizontal component Kefh is quite accurate for σ 2 <sup>Y</sup> < 2, which reinforces what already observed for the velocity covariance function and the spatial plume moments in an early works by Bellin et al. (1992) and Salandin and Fiorotto (1998).

#### 2.3. Transport (General)

We consider injection of a solute over an area A<sup>0</sup> in the plane at x = 0. A thin plume with total mass M<sup>0</sup> is modeled as a Dirac pulse of infinitesimal duration δ(t) at x = 0 (extension to arbitrary spatial or temporal initial distributed plumes is straightforward); following the definitions of Kreft and Zuber (1978) injection may be in resident or flux proportional mode. Thus, the initial mass density m<sup>0</sup> = dM0/d**b** is constant and equal to M0/A<sup>0</sup> for uniform initial resident concentration while it is given by the variable and random m0(**b**) = [V0(**b**)/V¯ <sup>0</sup>](M0/A0) for the flux proportional one. Here **b**(by, bz) is a coordinate in A0, V0(**b**) =Vx(0, by, bz) and V¯ <sup>0</sup> = (1/A0) R A0 V0(**b**)d**b** is the mean velocity over A0, which for sufficiently large A<sup>0</sup> is equal to the mean velocity U.

We adopt the boundary condition of flux proportional initial condition which applies to many cases e.g., injection by wells as it was the case for field experiments including MADE. It simply states that solutes initially occupy preferential zones of high conductivity. We also adopt the detection in the flux proportional mode (Kreft and Zuber, 1978), i.e., the BTC defined by M(x, t) = M<sup>0</sup> − R x 0 R R nC(x, y, z, t)dxdydz = R t 0 dt R R nVx(x, y, z)C(x, y, z, t)dydz, where integration in y, z is over the cross section of the domain. The issue related to injection and detection conditions and their impact on transport was extensively discussed in past work (see e.g., Kreft and Zuber, 1978; Dagan, 2017; Fiori et al., 2017). If it satisfies the ADE (1) with initial condition M(0, t) = M0H(t) (where H is the Heaviside step function), the solution for the semi-bounded domain is given by the CDF of the Inverse Gaussian (IG) distribution (Kreft and Zuber, 1978):

$$\frac{M(\mathbf{x},t)}{M\_0} = \frac{1}{2} \left\{ \text{erfc} \left[ \frac{\mathbf{x} - Ut}{2(D\_L t)^{1/2}} \right] + \exp\left(\frac{U\mathbf{x}}{D\_L}\right) \text{erfc} \left[ \frac{\mathbf{x} + Ut}{2(D\_L t)^{1/2}} \right] \right\} \tag{3}$$

with D<sup>L</sup> the longitudinal macrodispersion coefficient, whereas the relative mass flux is given by the Inverse Gaussian (hereinafter IG) distribution

$$\frac{\mu(\mathbf{x},t)}{M\_0} = \frac{1}{M\_0} \frac{\partial M}{\partial t} = \frac{\mathbf{x}}{2(\pi D\_L t^3)^{1/2}} \exp\left[\frac{(\mathbf{x} - \mathbf{U}t)^2}{4D\_L t}\right] \tag{4}$$

In the frame of random walk transport theory, the Inverse Gaussian distribution pertains to a first arrival process (see, e.g., Redner, 2001), i.e., the detection plane at x serves as an absorbing boundary. Note that IG is a special case of the more general Tempered One-Sided Stable distribution (TOSS) with exponent 1/2 (Cvetkovic, 2011).

We proceed now with reviewing the results we obtained recently for ergodic transport in heterogeneous aquifers.

#### 2.4. Transport (Summary of Results for Ergodic Plumes)

The starting point for our recent developments are the systematic accurate numerical simulations (see Jankovic et al., 2017, and reference therein) of flow and transport in 3D; they are recapitulated in the Appendix of Jankovic et al. (2017) and only briefly here. The K field was generated for a lognormal univariate distribution and two point covariances C<sup>Y</sup> of integral scales I and Iv, with different values of the anisotropy coefficient e = Iv/I. The complete characterization of the structures was achieved by a variety of different models: multi-Gaussian, the previously mentioned connected and disconnected fields, spheroidal inclusions and rectangular blocks tessellating the space (MIM). The BTC was finally calculated by large-scale numerical simulations, for a variety of parameters (logconductivity variance, Peclet number, control plane distances etc.).

A striking result (Jankovic et al., 2017, **Figure 3**) was that for the various structures the bulk of M (say M/M<sup>0</sup> < 0.95) did not differ significantly among structures, proving indeed that M is a very robust measure. Furthermore, the simple model

FIGURE 3 | Comparison of the proposed solution (11) with the results of Cvetkovic et al. (1992) (Figure 5C), for σ 2 *Y* = 0.5, *x*/*I* = 20 and a few values of the size of the initial plume (*Ly*/*I* = *Lz*/*I* = H). The figure displays the cumulative mass *M* and the bands *M* ± σ*M* predicted by the present approach (blue lines) over the original Figure 5C of Cvetkovic et al. (1992).

(3), with the macrodispersivity given by the well-known firstorder approximation D<sup>L</sup> = αLU, α<sup>L</sup> = σ 2 Y I, agreed well with the bulk of the BTC derived numerically (Fiori et al., 2017) while it underestimated the late arrival time of the tail of a few percents of M. However, the tail prediction is anyway quite imprecise. In particular, it was found that the IG model behaves similar to the MIMSCA (Multi Indicator Model Self Consistent Approximation) that we developed in the last 20 years (e.g., Dagan et al., 2003; Fiori et al., 2007), that is more accurate for the prediction of late mass arrival.

Thus, the IG model (3) is quite effective in capturing the behavior of the bulk of the BTC, for a wide range of flow and transport parameters; the model depends on a few parameters characterizing the permeability structure (σ 2 Y ,I) and the flow (J, Kef). It is emphasized that the first-order approximation was applied to deriving the longitudinal macrodispersivity α<sup>L</sup> while M(x, t) (3) itself depends non-linearly on σ 2 Y and is different from the Gaussian distribution; the two coalesce for small σ 2 Y or large tU/I.

Summarizing, the simple formula (3), with D<sup>L</sup> given by the first-order approximation, is a very robust model that can be safely used in applications, e.g., as a screening tool for a preliminary assessment of the BTC. We note that a similar, simple approach, although with parameters based on numerical simulations, was proposed by Hansen et al. (2018).

We move now to the central topic of the present study, the uncertainty of prediction of the BTC.

## 3. UNCERTAINTY OF BTC PREDICTIVE MODELING

#### 3.1. A Few Sources of Uncertainty

Transport predictions by the above modeling approach is prone to several sources of uncertainty, the major ones being:


Cvetkovic, 1998; Attinger et al., 1999 ´ ). As a consequence, the quantities of interest, including M, are random, and uncertain due to non-ergodic behavior emerges. As it was found in the past and confirmed by the developments of the following section, uncertainty for small plumes can be quite large. Instead, the uncertainty related to the size of the sampling volume (Fiori et al., 2002; Bellin and Tonina, 2007; Severino et al., 2010) is not relevant for the transport scenario investigated here in which solute is detected at a large control plane at distance x from the source.


According to the above discussion, the major sources of uncertainty for transport in mean uniform flow are likely (ii) and (iv), i.e., possible non-ergodicity of the plume and parametric uncertainty; this is particularly true for the MADE experiment, as shown in the sequel. Thus, in the following we shall focus on the uncertainty quantification originating from (ii) and (iv). Furthermore, we shall apply the uncertainty analysis to MADE, that is a well-known benchmark, very useful for a thorough discussion on uncertainty in applications.

#### 3.2. Quantifying Uncertainty Due to Non-ergodic Effect 3.2.1. General

We follow here the theoretical framework and the notations of Cvetkovic et al. (1992) and Dagan et al. (1992). Focusing on the BTC M we may write for flux proportional injection over an area A<sup>0</sup>

$$\frac{M(t; \mathbf{x})}{M\_0} = \frac{1}{A\_0} \int\_{A\_0} \frac{V\_0(\mathbf{b})}{\bar{V}\_0} H[t - \mathbf{r}(\mathbf{x}, \mathbf{b})] d\mathbf{b} \tag{5}$$

where we remind that V<sup>0</sup> is the local velocity at the location **b** within A<sup>0</sup> and V¯ <sup>0</sup> = (1/A0) R A0 V¯ <sup>0</sup>(**b**)d**b** ≈ U. It is also reminded that in (5) H is the Heaviside step function and τ (x, **b**) is the random travel time to the control plane at x of a fluid particle injected at x = 0 within A0. The travel time is related to the random velocity field by dτ/dx = <sup>V</sup>x[x, <sup>η</sup>(x), <sup>ζ</sup> (x)]−<sup>1</sup> where y = η(x, **b**), z = ζ (x, **b**) are the equations of a streamline originating at **b** within the plane A<sup>0</sup> at x = 0. In words, (V0(**b**)/V¯ <sup>0</sup>) H[t − τ (x, **b**)](d**b**/A0) in Equation (5) marks the contribution of the particle originating at **b** to the mass that crossed the control section at x until the time t = τ .

The expected mass hM(t; x)i is easily calculated from (5), yielding

$$\frac{\left}{M\_0} = \frac{1}{A\_0} \left< \int\_{A\_0} \frac{V\_0(\mathbf{b})}{\bar{V}\_0} H[t-\tau(\mathbf{x},\mathbf{b})] d\mathbf{b} \right> = G\_1\left(t;\,\boldsymbol{x}\right) \tag{6}$$

where G<sup>1</sup> = R t 0 f (τ ) dτ , is the cumulative travel time distribution, where τ is weighted by the injection velocity V<sup>0</sup> (Cvetkovic and Dagan, 1994). Here f (τ ) = R (V0)f(τ ,V0)d(V0) is the marginal pdf of τ , with f (τ ,V0) being the joint pdf of τ and V<sup>0</sup> .

Formula (6) is the well-known result (Shapiro and Cvetkovic, 1988) that the mean mass arrival is the CDF of travel time. Furthermore, in view of the findings of section 2.3, it is seen that G1(τ ) is the CDF of the Inverse Gaussian distribution Equation (3) and the pdf g<sup>1</sup> = dG1/dτ is the Inverse Gaussian (4).

After recapitulating these preparatory steps we move now, along Dagan et al. (1992), to the derivation of the variance of M

$$\frac{\sigma\_M^2(t;\,\boldsymbol{x})}{M\_0^2} = \left\langle \frac{M^2(t;\,\boldsymbol{x})}{M\_0^2} \right\rangle - \frac{\langle M(t;\,\boldsymbol{x}) \rangle^2}{M\_0^2} \tag{7}$$

Considering the expression (5) for the mass M we may write

$$\left\langle \frac{M^2(t; \mathbf{x})}{M\_0^2} \right\rangle = \frac{1}{A\_0^2} \left\langle \int\_{A\_0} \int\_{A\_0} \frac{V\_0(\mathbf{b}) V\_0(\mathbf{b}')}{\vec{V}\_0^2} \right.$$

$$H[t - \tau(\mathbf{x}, \mathbf{b})] H[t - \tau(\mathbf{x}, \mathbf{b}')] d\mathbf{b} \, d\mathbf{b}' \right\rangle \quad \text{(8)}$$

Thus, by taking advantage of the linearity of the ensemble mean operator and considering Equation (6) we arrive at

$$\frac{\sigma\_M^2(t; \mathbf{x})}{M\_0^2} = \frac{1}{A\_0^2} \int\_{A\_0} \int\_{A\_0} G\_2\left(t; x, \mathbf{b} - \mathbf{b}'\right) d\mathbf{b} \, d\mathbf{b}' - G\_1^2\left(t; x\right) \tag{9}$$

where G<sup>2</sup> is the bivariate travel time CDF, which is given by G<sup>2</sup> t; x, **b** − **b** ′ = R t 0 R t 0 g2 τ , τ ′ ; x, **b** − **b** ′ dτdτ ′ , with g2 τ , τ ′ ; x, **b** − **b** ′ being the marginal joint pdf of travel times τ , τ ′of two particles injected at **b** and **b** ′ , respectively.

The general result (9) by Dagan et al. (1992) has served Cvetkovic et al. (1992) to effectively compute σ 2 <sup>M</sup>(x, t) by adopting a few assumptions: the bivariate g<sup>2</sup> is lognormal, the travel time moments were derived from the velocity field by a first order approximation in σ 2 Y , A<sup>0</sup> is a square. Two quadratures, which were carried out numerically, were needed to complete the derivation.

The above approach can be generalized to compute the covariance of M (CM) at two different times t1, t2, leading to

$$\frac{\text{G}\_{\text{M}}\left(t\_{1}, t\_{2}; \mathbf{x}\right)}{\text{M}\_{0}^{2}} = \frac{1}{A\_{0}^{2}} \int\_{A\_{0}} \int\_{A\_{0}} \text{G}\_{2}\left(t\_{1}, t\_{2}; \mathbf{x}, \mathbf{b}' - \mathbf{b}''\right) d\mathbf{b}' d\mathbf{b}''} $$
 
$$ - \text{ \$G\_{1}(t\_{1}; \mathbf{x})\$G\_{1}(t\_{2}; \mathbf{x})\$} \tag{10} $$

with G<sup>2</sup> t1, t2; x, **b** − **b** ′ = R t1 0 R t2 0 g2 τ , τ ′ ; x, **b** − **b** ′ dτdτ ′ .

#### 3.2.2. Simplified Derivation of σ 2 M (x,t) and Comparison With Numerical Simulations

The derivation of the two particles covariance needed in (9) is complex and requires additional information, like e.g., the shape of the two particles covariance and its moments. Also, the calculation of σ 2 <sup>M</sup> along (9) requires a few numerical quadratures, as done by Cvetkovic et al. (1992). We simplify the calculations by using the basic properties of the MIMSCA model which, as mentioned above, led to very good agreement with the numerical solution of hMi.

Consider the covering of the input area A<sup>0</sup> by rectangles of sides 2I and 2I<sup>v</sup> in the y and z directions, respectively. Following the MIMSCA model, the travel time of particles originating within an areal element in A<sup>0</sup> is the same whereas they are statistically independent for particles originating from different elements. This is the only property we use in the present derivation.

With the above assumption, the calculation of (9) can be considerably simplified. The detailed derivations are given in **Appendix A**, leading to the final result (A.8), that is reproduced here

$$
\sigma\_M^2 = M\_0^2 \,\,\omega\left(\mathbf{L}\right) G\_1\left(t;\,\mathbf{x}\right) \left[1 - G\_1\left(t;\,\mathbf{x}\right)\right] \tag{11}
$$

where the weight function ω is given by

$$
\omega(\mathbf{L}) = \Omega \left( \frac{L\_{\mathcal{V}}}{I} \right) \Omega \left( \frac{L\_z}{I\_{\mathcal{V}}} \right) \tag{12}
$$

with Ly, L<sup>z</sup> the sides of the rectangular injection area, i.e., A<sup>0</sup> = LyL<sup>z</sup> and

$$\Omega\left(\ell\right) = \begin{cases} \begin{pmatrix} 1 - \frac{\ell}{6} \\ 1 - \frac{4}{3} \end{pmatrix} & \text{for } \ell < 2 \\\ \frac{1}{\ell} \left( 2 - \frac{4}{3\ell} \right) & \text{for } \ell > 2 \end{cases} \tag{13}$$

In particular ω ≃ 1 for A0/(IIv) ≪ 1 (maximal uncertainty for small source) and ω ≃ (IIv/A0) for A0/(IIv) ≫ 1 (ergodic, practically deterministic).

Thus, σ 2 <sup>M</sup> is given by an analytical expression supposed to apply to highly heterogeneous formations which separates the effect of spreading represented by the IG G<sup>1</sup> (3) with D<sup>L</sup> = σ 2 Y IU, one hand, and the weight function ω accounting for the size of the injection area on the other hand.

As a first test of (11) we compare it in **Figure 3** with the results of Cvetkovic et al. (1992) described above (σ 2 <sup>Y</sup> = 0.5,

FIGURE 4 | Comparison of solution (11) with the numerical simulations of Jankovic et al. (2017), which were carried out for single realizations and a large source H = *Ly*/*I* = *Lz*/*I* = 90. The comparison is achieved by dividing the initial plume in subdomains of H = 2, 5, 10, 15, 30 regarded as independent realizations, a proxy for Monte Carlo simulations. The BTC *M* and its standard deviation (SD) for the multivariate normal field at two Control Plane distances (6*I* and 18*I*) and two degrees of heterogeneity (σ 2 *Y* = 2 and σ 2 *Y* = 8) are represented. The SD predictions by the simplified model (11) are represented by the dashed lines.

x/I = 20, Ly/I = Lz/I = H) and it is seen that the agreement if very good in spite of the different methodologies. A more stringent test is carried out by comparison with the numerical simulations of Jankovic et al. (2017), which were carried out for single realizations for the large H = Ly/I = Lz/I = 90. This was achieved by dividing A<sup>0</sup> in subdomains of H ≃ 2, 5, 10, 15, 30 regarded as independent realizations, a proxy for Monte Carlo simulations. The results are displayed in **Figure 4**, where both the BTC and its standard deviation (SD) for the multivariate normal field at two Control Plane distances (6I and 18I) are represented. The SD predictions by the simplified model (11) are also displayed (dashed lines). It is seen that, in spite of the limited number of realizations for some cases (9 for L<sup>y</sup> = L<sup>z</sup> = 30I) the agreement is quite good, even for the largest σ 2 <sup>Y</sup> = 8. The behavior is very similar for other K structures examined (e.g., connected/disconnected and blocks, as described in section 2.4; not shown in the figure). Thus, the simple model proposed here can be an effective tool for the prediction of the BTC uncertainty due to the non-ergodic effect (i.e., finite size of the plume compared to the heterogeneity length scales).

It is worthwhile noting that for a small plume (ω close to unity) uncertainty affects its time of arrival at the control plane (de Barros et al., 2011; de Barros, 2018). In contrast, for a large plume the practically deterministic prediction reflects the spreading of the BTC. For intermediate cases the two effects are combined, and they are incorporated in the simple function ω (12).

# 3.3. Impact of Parametric Uncertainty

Even under the basic assumptions of stationarity, steadiness and mean flow uniformity the mean BTC (Equation 3 with D<sup>L</sup> = σ 2 Y IU) depends on a few parameters. Thus, U = JKef /n depends, besides the mean gradient J, on K<sup>G</sup> and σ 2 Y since Kef /K<sup>G</sup> depends on σ 2 Y as well as on the structure (Renard and De Marsily, 1997). Of course the knowledge of these parameters is not needed if Kef is determined for instance directly by pumping tests and/or U by flowmeters. In all cases these parameters are affected by uncertainty due to measurement errors, insufficient data etc. The same is true for the parameters influencing the mean BTC namely σ 2 Y and I besides U. Finally, the uncertainty quantified by σ 2 <sup>M</sup> (11) depends on the additional parameter A0, reflecting the initial size of the plume.

The uncertainty of these parameters impacts that of M in addition to the non-ergodic effect. However, the magnitude depends on the availability of data and their precision, which is aquifer specific. In the following section we shall examine the impact of parametric uncertainty for the Columbus Air Force Base aquifer where the MADE experiment took place and for which a relatively large amount of data is available.

Nevertheless, we may make a few statements on the relative impact of various parameters based on the numerical simulations and theoretical developments. Thus, as mentioned above, Fiori et al. (2017) and Jankovic et al. (2017) have already found that hMi is quite insensitive to the structure (as characterized by connectivity), whereas U, and to a lesser extent heterogeneity σ 2 Y , has a larger impact. Rather than a general discussion we defer the analysis to the MADE case in the following.

# 4. ANALYSIS OF THE MASS DISTRIBUTION AT THE MADE-1 EXPERIMENT

The first experiment conducted at the Columbus Air Force Base (MADE-1) represents the ideal platform for discussing the above issues regarding uncertainty of solute transport predictions in aquifers. In terms of the quantity and quality of investigations, regarding both aquifer characterization and plume monitoring, the MADE-1 experiment represents a benchmark for analyzing groundwater transport; it has motivated a large body of research work, from the testing of innovative measuring techniques to the development of novel theoretical frameworks. For such reasons, after more than 30 years, MADE is still providing insight and topics of discussion in the scientific community, as witnessed for instance by the recent 2015 AGU Chapman conference (Gómez-Hernández et al., 2016). Before discussing uncertainty, along the previous lines, we briefly recapitulate in the following the main features of the MADE-1 experiment, as well as the flow and transport parameters that shall be used in the present work, together with their uncertainty measures. More details can be found in the original papers (Boggs and Rehfeldt, 1990; Boggs et al., 1992) and the review (Zheng et al., 2011).

The experiment took place in a highly heterogeneous sedimentary aquifer at Columbus, Ohio (USA). A plume was injected in a relatively small area of the domain, and the plume movement, along the natural gradient, was



*Data are taken from Bohling et al. (2016) (\*) and Boggs et al. (1992) (\*\*); in the case of the data from Bohling et al. (2016), the SD was calculated from their 95 % confidence intervals after assuming a normal distribution for all parameters except KG, for which lognormality was assumed. The SD for the mean velocity was estimated from Darcy formula U* = *Kef J*/*n, assuming Kef* /*K<sup>G</sup> and J as deterministic.*

continuously monitored for about 2 years by a dense network of multilevel samplers. The relevant transport quantity that was analyzed at the MADE-1 experiment is the longitudinal mass distribution m (x;t), which was first analyzed and presented by Adams and Gelhar (1992). Six snapshots were analyzed, at t = 49, 126, 202, 279, 370, 503 days since injection. The mass distribution was derived by calculating the solute mass (after interpolation of concentration measurements) within a moving window of 10 m length, at spatial intervals of 5 m. The striking feature of m is its skewed shape, very much different from the presumed symmetrical Gaussian behavior, that was mainly caused by the highly heterogeneous velocity field induced by the complex aquifer system; such feature has motivated subsequently a flurry of theoretical developments to explain it. Despite this dense grid of samplers, mass recovery was incomplete, except for the snapshot at t = 126 d, and the mass recovery continuously decreased after it, down to 43% in the last snapshot at t = 503 d [the topic is discussed by Fiori (2014)].

In the following, the longitudinal mass distribution at MADE is modeled by the aid of the model (3). Following Adams and Gelhar (1992), the mass distribution is calculated within a moving window of 1 = 10m, i.e., m (x;t) = (M (x + 1/2;t) − M (x − 1/2;t)) /1, with space intervals of 5m (x = 0, 5, 10, 15, ...). The parameters to be used in the model were inferred from different studies and are presented in **Table 1**; the standard deviation (SD) is also reproduced, when available. Of particular relevance is the analysis of Bohling et al. (2016) of the K values based on DPIL measurements, that superseded a previous analysis by same authors (Bohling et al., 2012). This study analyzed the conductivity field at an unprecedented detail and resolution, and constitute the best available conductivity analysis of MADE data so far.

The mean velocity is calculated by U = KGJǫ/n, with KG, J, n, ǫ the geometric mean of K, the mean hydraulic gradient, the mean porosity and the effective conductivity ratio ǫ = Kef /KG, respectively. Unfortunately, ǫ cannot be measured and it is variable, as a function of the particular conductivity structure at hand (the matter is discussed in Zarlenga et al., 2018). An estimate of ǫ was provided by the formula (5) derived by Zarlenga et al. (2018) based on extensive 3D numerical simulations, obtaining ǫ = 3.93; this results in the estimated U = 0.026 m/d. The SD of U is calculated by a perturbation approach over the parameters KG, n, hence assuming J and ǫ as deterministic; as a consequence, the standard deviation of U appearing in **Table 1** is likely underestimated.

#### 4.1. Prediction of Mass Distribution and Uncertainty Due to Non-ergodic Effects

Before embarking on the analysis of the spatial mass distribution (the snapshots), it is worthwhile to estimate the non-ergodic effect on the uncertainty of the BTC M(x, t), although the latter was not determined experimentally. According to Equation (11) the maximal value of σM/M<sup>0</sup> is obtained by differentiating with respect to t and is reached for G<sup>1</sup> = 1/2. This leads to (σM/M0)max = ω 1/2 /2. We show in the sequel that for MADE plume initial size the estimate is ω = 0.148, i.e., (σM/M0)max = 0.19. Thus, the width of the band hMi/M<sup>0</sup> ± σM/M<sup>0</sup> reaches its maximum at the time for which hMi/M<sup>0</sup> = 1/2 and its size is ±0.19. diminishing to zero forhMi/M<sup>0</sup> → 0, 1. However, a direct comparison with the experimental results (the snapshots) needs reformulation in terms of spatial distribution.

Along the lines of section 4, the longitudinal mass distribution at MADE is given by

$$m\left(\mathbf{x};t\right) = \frac{M\left(\mathbf{x} + \Delta/2; t\right) - M\left(\mathbf{x} - \Delta/2; t\right)}{\Delta} \tag{14}$$

where m is the longitudinal mass distribution aggregated over the spatial interval 1 = 10 m.

The expected value and variance of m for non-ergodic plumes can be derived with the same procedure of section 3.2.2 , and detailed in **Appendix A**. The detailed derivations are given in **Appendix B**, and we reproduce here the final result

$$\frac{\langle m(\mathbf{x};t)\rangle}{M\_0} = \frac{\mathcal{G}\_1(\mathbf{x} + \Delta/2; t) - \mathcal{G}\_1(\mathbf{x} - \Delta/2; t)}{\Delta} \quad \text{(15)}$$

$$\sigma\_m^2(\mathbf{x};t) = \omega\left(\mathcal{L}\right) \left\langle m(\mathbf{x};t) \right\rangle \left(\frac{M\_0}{\Delta} - \left\langle m(\mathbf{x};t) \right\rangle \right)$$

In (15), G<sup>1</sup> is given by (3), as previously explained, although any alternative model can be used. As discussed in Fiori et al. (2017), the time of the snapshots at the MADE experiment was very small in dimensionless terms, namely tU/I = 0.15 − 1.0, posing doubts regarding the use of a constant D<sup>L</sup> in (3). Therefore, the pre-asymptotic DL, as predicted by the first order approximation, was employed here; the issue is discussed in Fiori et al. (2017) and it is further elaborated in **Appendix C**, leading to the revised formula (C.3), for the travel time CDF G<sup>1</sup> that shall be used in the present analysis.

The formula (15) for σ 2 <sup>m</sup> requires the vertical and transverse dimensions of the initial plume, L<sup>y</sup> and Lz. The distances between the injection wells and the width of the screens in the wells are not reliable estimates of L<sup>y</sup> and L<sup>z</sup> as the plume underwent a significant expansion in all directions soon after the injection, as visible in the early snapshots. Figure 6a of Adams and Gelhar (1992) shows that after 9 days the vertical size of the plume was around 8 m, much larger than the vertical size of the screen of the wells. Also, Figure 4a from the same paper suggests that, again after 9 days from the injection event, the size of the plume was already about 40 m wide. Thus, in the following we assume L<sup>y</sup> = 40 m and L<sup>z</sup> = 8 m as the initial sizes of the plume; such estimates are rather rough and uncertain, but there is no other way to accurately assess them. With those estimates of Ly, Lz, and those of I,I<sup>v</sup> of (1), the variance reduction factor due to the finite size of the plume appearing in (15) is ω (**L**) = 0.148 (Equations 12, 13).

**Figure 5** displays the experimental longitudinal mass distribution at the MADE-1 experiment for the six snapshots presented by Adams and Gelhar (1992); black lines); the blue solid line depicts the theoretical m (x;t) (Equation 15), while the dashed lines represent the bounds m + σ<sup>m</sup> (green) and m − σ<sup>m</sup> (orange). It is seen that the theoretical model captures quite well the experimental mass distribution at MADE; the result is not entirely new as a similar comparison was made in Fiori et al. (2017), although the updated estimates of Bohling et al. (2016) and the more accurate spatial aggregation over the 1 interval was made here for the first time. The direct comparison between experiment and theory is made difficult by the overestimation of mass in the first snapshot (around 200% of the injected mass was recovered) and the incomplete mass recovery for increasing time. Still, the model captures the peak and its timing quite accurately in most of the snapshots, all the approximations and uncertainties notwithstanding.

It is seen that the bands of uncertainty, described by the bounds m ± σ<sup>m</sup> (dashed lines), are rather wide for all snapshots (note that m is subject to the constraint R <sup>∞</sup> <sup>0</sup> m(t)dt = 1). The bounds tend to increase with time; as a matter of fact the behavior is quite expected in view of the nature of the analytical solution (15): the broader is the distribution, the larger is the uncertainty. The wide bounds of uncertainty pose doubts regarding the applicability of analytical solutions, based on stochastic approaches, that implicitly assume ergodicity; in such cases, it is advisable to present results together with the bands of uncertainty, as done here. Although the bands can be rather wide, like the present MADE case, the representation of **Figure 5** in terms of prediction and bands of uncertainty may be of definite help in applications, for instance the case of risk assessment and plume management.

Uncertainty can in principle be constrained by some conditioning of the solution, e.g., based on available K data. Still, the impact of conditioning permeability at a point is generally limited to a domain of the order of the integral scales of Y (see e.g., Dagan, 1985), which has a minor effect on M for a large plume unless the grid of measurements is dense and covers the advancing plume. For small plumes conditioning may be more effective in reducing uncertainty if the measurements grid covers the trajectory. In any case, conditioning requires a theoretical model more complex that (3), posing additional computational burden that may not be otherwise required for simple preliminary (screening) analysis.

The relative good agreement of m (solid blue line) with experiments is quite surprising in view of the large uncertainty, as represented by the upper and lower limits (the dashed lines) of

the figures; it may suggest that the initial size of the plume was indeed large enough to adequately sample the range of velocity variations in the aquifer, hence more in favor of ergodicity, and the estimates of L<sup>y</sup> and L<sup>z</sup> (that we recall were roughly estimated from the profiles of Adams and Gelhar, 1992) might perhaps be too conservative.

#### 4.2. Parameter Uncertainty

Parameter uncertainty impact is assessed here by a simple first order analysis, that provides the variance of mass distribution m due to parametric uncertainty

$$
\sigma\_m^2 = \sum\_{i=1}^{N\_P} \left( p\_i \frac{\partial m}{\partial p\_i} \right)^2 CV\_{p\_i}^2 = \sum\_{i=1}^{N\_P} s^2 \left( p\_i \right) CV\_{p\_i}^2 \tag{16}
$$

where the function s is the sensitivity, and it represents the relative variation of the solution to changes in the generic parameter p<sup>i</sup> i = 1, ..., N<sup>p</sup> . The procedure is justified by the relatively small coefficient of variation of the parameters, that is below 0.35 for all parameters (see **Table 1**).

It is instructive to analyze first the sensitivity function s pi , (Equation 16), as function of the generic parameter p<sup>i</sup> . **Figure 6** illustrates the sensitivity pertaining to the relevant parameters σ 2 Y , U,I,I<sup>v</sup> for the snapshot t = 202 d (the sensitivities for the other snapshots are similar). It is seen that the sensitivity displays an antisymmetric behavior, which is determined by the constraint that the area underneath the curve m is unitary. Hence, increasing a parameter has opposite effects in different segments of the mass distribution. The behavior is similar for all parameters except Iv, that contributes through the anisotropy ratio (see **Appendix C**) and hence has opposite effects with respect to I. The curves of **Figure 6** indicate that the most relevant parameter for uncertainty is the mean velocity U, followed by the logconductivity variance σ 2 Y and the horizontal integral scale I; the impact of the vertical scale I<sup>v</sup> is rather small. This finding already suggests what are the parameters requiring a more careful and precise estimate in order to reduce uncertainty, with the mean velocity playing an important role; the issue was also mentioned in section 3.1.

The bands of parametric uncertainty m±σm, along the model (16), are represented in **Figure 7** for the six snapshots of the MADE-1 experiment. Comparison with **Figure 5** indicates that the parametric uncertainty effect is smaller than the one due to non-ergodic behavior (section 3.2). We remind, however, that the bands may be wider as the variance of the mean velocity U is expected to be larger than the one estimated here, and reproduced in **Table 1**; this issue was discussed in section 4.1. The behavior of the uncertainty bands observed in **Figure 7**, with a central area where the bands shrink, is easily explained by the antisymmetric shape of the sensitivity, as previously discussed. The width of the uncertainty bands increases with time, just like the case illustrated in **Figure 5**.

The analysis of parametric uncertainty is indeed a first and relative easy (if data are available) estimate of possible prediction errors. However, as shown here, it may be not the main source of uncertainty. It is worth noting that Cvetkovic et al. (2015) discussed the global sensitivity including mass transfer reactions.

# 5. SUMMARY AND CONCLUSIONS

Spreading of solute plumes in aquifers, as quantified for instance by the longitudinal macrodispersivity αL, is much larger than the one observed in laboratory experiments (pore scale dispersion). This enhancement is caused by the spatial variability of the conductivity K, which in the context of stochastic subsurface hydrology, is modeled as a random space function. The paper considers flow which is uniform in the mean (natural gradient flow of velocity U) and inert solutes. Transport is quantified by the BTC M(t, x) at control planes at x as well as the associated spatial longitudinal mass distribution m(x, t). The logconductivity Y = ln K is modeled as stationary, of normal univariate pdf (parameters K<sup>G</sup> and σ 2 Y ) and axisymmetric covariance of horizontal I and vertical I<sup>v</sup> integral scales. The latter are much larger than the pore scale, which explains the above findings. Flow and transport variables, solutions of the flow and transport equations, are consequently random as well.

Most of the transport models developed in the past, aiming at prediction of M, were underlain by the ergodic hypothesis, valid for plumes of large extent at the I, I<sup>v</sup> scales. As a consequence, the one realization M is approximately equal to the ensemble mean hMi.

The paper investigates the uncertainty of M (or m) in three dimensional formations, as quantified by the variance σ 2 M. Among the various sources of uncertainty, we deal with two: primarily with the non-ergodic effect present for finite plumes, as encountered in many applications. Besides we consider the impact of uncertainty of parameters like U, KG, σ 2 Y , I. Indeed, the latter are affected by measurement errors even in extensively monitored aquifers.

The non-ergodic effect on transport was investigated in the past by adopting the first-order approximation in σ 2 Y in solving the flow and transport equations (weakly heterogeneous formations). One of our main aims here is to extend the analysis to highly heterogeneous aquifers for which σ 2 <sup>Y</sup> ≤ 8. The

investigation is based on our work in the last 15 years on ergodic transport, both by extensive and accurate numerical simulations not available in the past for three-dimensional configurations, as well as by simplified models. This was done for a few types of heterogeneous structures, differing in the connectivity of K classes. One of our main results was that hMi is a very robust predictor whose bulk can be modeled by the Inverse Gaussian CDF, with travel time variance given by the first order approximation.

The main novel theoretical contribution is the development of a simple analytical model to compute σ 2 <sup>M</sup>. It combines the above Inverse Gaussian distribution with an analytical function of the size of the injected plume relative to the integral scale, covering the spectrum from small plumes to ergodic ones. The result is illustrated by depicting the bands of uncertainty delimited by hMi ± σ<sup>M</sup> (**Figure 4**) as dependent on σ 2 Y . While the present contribution is limited to non-reactive transport, the methodology can be easily extended to reactive solutes, along the lines of Cvetkovic and Dagan (1994) and Fiori et al. (2002).

A major part of the paper is devoted to application of the concepts to the MADE aquifer (σ 2 <sup>Y</sup> ≃ 6) transport experiment, which has become a platform for groundwater contaminant transport modeling in the last 30 years. We present the observed snapshots of m(x, t), functions of x for a few values of t, as well as the bands of uncertainty related both to non-ergodic effects and uncertainty of parameters. The analysis relies on published recent analysis of field data, based on renewed characterization campaigns, and it represents a major overhaul of our previous analyses of the MADE-1 experiment. The results indicate that the most relevant parameter for uncertainty is the mean velocity U, followed by the logconductivity variance σ 2 Y and the horizontal integral scale I. This finding suggests what are the parameters requiring a more careful and precise estimate in order to reduce uncertainty.

The main conclusion of the study is that, even for thoroughly characterized aquifers (like MADE) prediction of transport is affected by uncertainty; in particular, the major source of uncertainty for the MADE-1 experiment seems to be the nonergodic behavior, i.e., the finite size of the plume with respect to the directional correlation scales of hydraulic conductivity. Uncertainty is prone to be even greater for the common, less detailed, sites data available in practice.

The above finding, and the general argumentation brought by the present work, enforces the conclusions from past work that estimating uncertainty of prediction should become an integral part of solving aquifer contamination problems, toward risk analysis. To that aim, characterization efforts should be directed toward reducing uncertainty of most influential parameters like the mean velocity U. Due to the prevailing scarcity of data in practice, it is advisable to use simple models, at least for screening scenarios.

The envisaged main future developments which may contribute to uncertainty and risk reduction are 2-fold. On one hand improvement of characterization technology may provide a detailed and large volume of data which may need analysis relying on Big Data treatment approach. On the other hand, numerical models of flow and transport in which the detailed aquifer architecture is based on conditioning on the large number of data should also be devised. At present, simple models like the ones presented here may serve for preliminary and screening analysis.

# AUTHOR CONTRIBUTIONS

AF: derivation of the new analytical expression of variance reduction due to non-ergodic effects; application to the MADE transport experiment using updated field data; exploration of the impact of parameters uncertainty on that of the MADE experiment plume snapshots. AZ: calculation of the variance reduction and related figures; computation of the comparisons of the BTC variance; detailed computation of the updated solution of the mean mass spatial distribution at MADE Site and comparison with the experimental snapshots. AB: computation of the logconductivity profiles at the MADE (**Figure 1**); numerical simulation of a 2D flow and transport with MADE parameters values (**Figure 2**); discussion of **Figures 1**, **2** to serve as starting point for the new developments of the paper. VC: establishing the connection of the results of the present study with those derived previously in the literature; comparison of the new results with the previous ones in the literature; relating the methodology of the paper to random walk theory. GD: coordination of the preparation of the study; establishing and description of the connection with previous work of the authors; formulation and writing of the final text while blending the various contributions of the team members and formulating the conclusions.

# ACKNOWLEDGMENTS

The authors are grateful to Francesca Boso for the numerical simulations of **Figure 2**. AB, AF, and AZ acknowledge funding from the Italian Ministry of Education, University and Research (MIUR) in the frame of the Departments of Excellence Initiative 2018–2022 granted to DICAM of the University of Trento (AB) and the Department of Engineering of Roma Tre University (AF and AZ).

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fenvs. 2019.00079/full#supplementary-material

#### REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Fiori, Zarlenga, Bellin, Cvetkovic and Dagan. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Heterogeneity and Prior Uncertainty Investigation Using a Joint Heat and Solute Tracer Experiment in Alluvial Sediments

Richard Hoffmann1,2 \*, Alain Dassargues <sup>1</sup> , Pascal Goderniaux <sup>2</sup> and Thomas Hermans <sup>3</sup>

<sup>1</sup> Hydrogeology and Environmental Geology, GEO<sup>3</sup> , UEE, Liège University, Liege, Belgium, <sup>2</sup> Geology and Applied Geology, Polytech Mons, University of Mons, Mons, Belgium, <sup>3</sup> Department of Geology, Ghent University, Ghent, Belgium

In heterogeneous aquifers, imaging preferential flow paths, and non-Gaussian effects is critical to reduce uncertainties in transport predictions. Common deterministic approaches relying on a single model for transport prediction show limitations in capturing these processes and tend to smooth parameter distributions. Monte-Carlo simulations give one possible way to explore the uncertainty range of parameter value distributions needed for realistic predictions. Joint heat and solute tracer tests provide an innovative option for transport characterization using complementary tracer behaviors. Heat tracing adds the effect of heat advection-conduction to solute advection-dispersion. In this contribution, a joint interpretation of heat and solute tracer data sets is proposed for the alluvial aquifer of the Meuse River at the Hermalle-sous-Argenteau test site (Belgium). First, a density-viscosity dependent flow-transport model is developed and induce, due to the water viscosity changes, up to 25 % change in simulated heat tracer peak times. Second, stochastic simulations with hydraulic conductivity (K) random fields are used for a global sensitivity analysis. The latter highlights the influence of spatial parameter uncertainty on the resulting breakthrough curves, stressing the need for a more realistic uncertainty quantification. This global sensitivity analysis in conjunction with principal component analysis assists to investigate the link between the prior distribution of parameters and the complexity of the measured data set. It allows to detect approximations done by using classical inversion approaches and the need to consider realistic K-distributions. Furthermore, heat tracer transport is shown as significantly less sensitive to porosity compared to solute transport. Most proposed models are, nevertheless, not able to simultaneously simulate the complementary heat-solute tracers. Therefore, constraining the model using different observed tracer behaviors necessarily comes with the requirement to use more-advanced parameterization and more realistic spatial distribution of hydrogeological parameters. The added value of data from both tracer signals is highlighted, and their complementary behavior in conjunction with advanced model/prediction approaches shows a strong uncertainty reduction potential.

Keywords: joint heat and solute tracer tests, density-viscosity dependent flow and transport, alluvial sediments, preferential flow paths, uncertainty investigation, distance-based global sensitivity analysis, principal component analysis

#### Edited by:

Frederick Delay, Université de Strasbourg, France

#### Reviewed by:

Jean-Francois Girard, UMR7516 Institut de Physique du Globe de Strasbourg (IPGS), France Bruno Cheviron, National Research Institute of Science and Technology for Environment and Agriculture (IRSTEA), France

> \*Correspondence: Richard Hoffmann richard.hoffmann@uliege.be

#### Specialty section:

This article was submitted to Hydrosphere, a section of the journal Frontiers in Earth Science

Received: 30 November 2018 Accepted: 26 April 2019 Published: 29 May 2019

#### Citation:

Hoffmann R, Dassargues A, Goderniaux P and Hermans T (2019) Heterogeneity and Prior Uncertainty Investigation Using a Joint Heat and Solute Tracer Experiment in Alluvial Sediments. Front. Earth Sci. 7:108. doi: 10.3389/feart.2019.00108

# INTRODUCTION AND MOTIVATION

Heterogeneity in porous media, inducing preferential flow paths, and non-Gaussian effects, influences significantly subsurface transport (among others: Fuchs et al., 2009; Heeren et al., 2010). An improved imaging of these preferential pathways, in connection with reducing uncertainty in transport simulations and predictions, is crucial for answering future groundwater quality questions.

Innovative tracer test set-ups, along with relevant interpretations, are possible new ways for quantifying more realistically this heterogeneity and the associated uncertainty (Davis et al., 1980; Maliva, 2016). In this context, heat is considered as a complementary tracer, compared to conservative solute tracers (saline tracer or fluorescent dye). Heat is usually considered as a non-conservative tracer, allows information about advection-conduction processes to be obtained, and heat has a natural retardation and more diffusion linked to the heat capacity of the solid (Anderson, 2005). Using both, solute and heat, thus provides two tracer plumes that can be compared, allows more information about the solid matrix properties to be obtained, and enables the quantification of subsurface processes (immobile water, matrix contributions) with a better resolution (Anderson, 2005; Irvine et al., 2015). For example, Wildemeersch et al. (2014) combine heat and solute tracer experiments to assess the heterogeneity in an alluvial aquifer. Sarris et al. (2018) also give a recent application of jointly interpreting heat and solute tracer data. They show, in a deterministic way, how these innovative tracer tests can contribute to a high-resolution description of deposits, and a significant improvement of transport processes understanding. In the joint heat and solute tracer inversion by Sarris et al. (2018), heat and solute seem to be sensitive to hydraulic conductivity and porosity. In their case study, heat also shows a stronger sensitivity to vertical hydraulic conductivity, resulting in a more complex aquifer parametrization, and more realistic transport predictions.

Deterministic approaches are useful for process understanding. However, predictions based on an unique "best" model parametrization bear a lot of uncertainty and, thus, must be justified and used with care (Renard, 2007; Remonti and Mori, 2016). Deterministic approaches generally reduce heterogeneity, typically by replacing spatially distributed properties by averaged properties, leading to poorer predictions with underestimated uncertainty (Alcolea et al., 2006; Renard, 2007). Using additional information gained from joint heat and solute tracer tests adds more constraints to the inversion process. However, when the tracer behavior is getting more complex, deterministic approaches can quickly turn into ill-posed inverse problems (Zhou et al., 2014), reducing their predictive reliability. Classical deterministic inversion approaches could therefore be questioned.

To explain and adequately represent the observed variables, more advanced transport simulation and forecast approaches, such as stochastic methods (Ptak et al., 2004), are generally required. Monte-Carlo simulations may, for example, be used to explore uncertainty ranges and to consider heterogeneity (among others, Ptak et al., 2004; Renard, 2007, Ferré, 2017). However, the full stochastic inversion of hydrogeological data when spatial uncertainty plays a key role remains difficult and time consuming, limiting the applicability of the methods (Renard, 2007). In this context, transdimensional inference (e.g., Sambridge et al., 2012) is a possible approach to combine the parsimony principle with stochastic inversion. In practice, transdimensional inference includes the number of parameters as an unknown, and therefore limits the complexity of the model to what is necessary to explain the data.

However, if field data is sparse and prior uncertainty is large, the transdimensional approach would also lead to an oversimplification of the model, which can be harmful for its predictive capability (referred to Hermans (2017), about the importance of realistic consideration of prior uncertainty). In contrast, a full stochastic approach allows realistically quantifying the uncertainty for transport predictions, instead of having a single deterministic inversion or multiple simulations with (partly) non-quantified approximations (among others, Caers, 2011; Hermans, 2017; Hermans et al., 2018; Scheidt et al., 2018). In combination with the use of complementary tracers such as heat, stochastic approaches and data analysis methods have a strong potential to learn more from collected data sets and falsify approximations done in conceptual models and prior estimations (Hermans et al., 2015a,b, 2016, 2018).

A further new potential for hydrogeological applications and transport predictions is Bayesian Evidential Learning (BEL) (Hermans, 2017; Hermans et al., 2018; Scheidt et al., 2018). BEL relies on a limited number of Monte-Carlo simulations sampling the prior distribution of model parameters, in order to analyze the global sensitivity of parameters (Park et al., 2016; Hermans et al., 2018) and falsify the prior distribution. In comparison to common single parameter sensitivity analysis, regionalized or global sensitivity analyses consider heterogenous sources of model uncertainty (e.g., Park et al., 2016). The falsification step consists in obtaining consistency between the sampled simulation data (i.e., prior) and the reference data (Hermans et al., 2015a; Scheidt et al., 2018). BEL can also be used, if necessary, to identify a statistical relationship between historical and forecast variables (Hermans, 2017; Hermans et al., 2018; Scheidt et al., 2018). The common inversion step is thus replaced by finding the direct relationship between the prior (sampled simulation data) and the desired forecast, which only depends on the complexity of the subsurface (i.e., model) (Hermans, 2017). Using Monte-Carlo, samples from the prior distribution are generated and used to simultaneously simulate synthetic data and forecasts. Both outcomes are analyzed for detecting direct relationships. Following this innovative approach, costexpensive inversion can be avoided by reformulating the prediction problem, and the likelihood directly in terms of the forecast (Hermans, 2017).

For addressing the uncertainty of transport predictions, for instance due to preferential flow paths, any forecasting approach should first consider realistic parameter uncertainty related to heterogeneity. In this paper and the context of BEL, "prior" is defined as the prior distribution of model parameters, according to the current knowledge of the field, and from which outcome samples are randomly drawn in Monte-Carlo methods (Rojas et al., 2009; Caers, 2011; Hermans, 2017).

In this paper, the prior uncertainty is investigated for a heterogeneous alluvial aquifer, using joint heat and solute tracer data, through a global sensitivity analysis. A performed tracer experiment (Wildemeersch et al., 2014; Hermans et al., 2015b) has been previously used by Klepikova et al. (2016), to calibrate a deterministic model through automatic inversion. The HydroGeoSphere code (HGS) was used allowing full 3D simulations (Therrien et al., 2010; Brunner and Simmons, 2012). HGS was used in conjunction with PEST as a parameter estimation tool for inversion (Doherty, 1994, 2003). Although those first analyses helped to understand groundwater flow and solute and heat transport in the aquifer, they also showed that the approximations impeded to explain simultaneously all observations. In Hermans et al. (2015a, 2018), additional stochastic approaches considering spatial uncertainty were successfully used in explaining parts of the experimental data, but they considered only limited data sets or parts of the whole aquifer system. In this paper, the current conceptual approximations (the prior) in the description of the alluvial deposits will be revisited with the goal to improve the heterogeneity characterization. Generating a prior, consistent with the observed heat and solute tracer test data, is a necessary step to being able to realistically predict transport in this complex geological setting.

Within this context, the objective of the paper is to take advantage of the complementary behavior of heat and solute tracers to better characterize the heterogeneity in the aquifer system. The aim is to formulate a more realistic prior and analyze its consistency, before moving toward more advanced prior analysis and direct predictions using the BEL framework. For this purpose, the variability of the tracer output signals will be analyzed (1) through a deterministic model, and (2) using Monte-Carlo simulations, followed by a global sensitivity analysis.

# MATERIALS AND METHODS

#### Test Site: Hermalle-Sous-Argenteau

The test site of Hermalle-sous-Argenteau (HssA) in the north of Liege (Belgium) lies between the canal Albert and the Meuse River (**Figure 1A**) in an alluvial plain field with a groundwater natural gradient of around 0.06 %. Between the 20 m distant injection well (Pz09) and the pumping well (PP), there are three panels with 10 piezometers including 19 observation points (i.e., most piezometers are screened at two different levels, **Figure 1B**). The first panel is located at 3 m, the second at 8 m and the third at 15 m from the injection well. An evaluation of the borehole logs during drilling shows that the aquifer is mostly composed of sandy gravel. The sand matrix is finer in the top part and its proportion decreases in the bottom part (**Figure 1B**). In previous studies, the adopted conceptual model split the aquifer in two layers, an upper (K = 2.38·10−<sup>3</sup> m s−<sup>1</sup> ) and a lower (K = 4.67·10−<sup>2</sup> m s−<sup>1</sup> ) part (Klepikova et al., 2016; Hermans et al., 2018). The estimated bulk thermal conductivity is in the range κ<sup>b</sup> = 1.37 W m−<sup>1</sup> K −1 to 1.86 W m−<sup>1</sup> K −1 (Klepikova et al., 2016).

The reference data set used in this study was described in Wildemeersch et al. (2014). A joint heat and solute tracer experiment was performed with a 24 h and 20 min continuous injection of heat (1T = 25.5 K) and naphtionate (C = 5.48 mg L−<sup>1</sup> ) at the rate of 3 m<sup>3</sup> h −1 , while 30 m<sup>3</sup> h <sup>−</sup><sup>1</sup> were extracted from the pumping well PP. Temperature distribution was the focus and therefore measured in all observation points, while the solute tracer was only measured in PP for validation purposes. Measurements in Pz13 and Pz17 are not used because these observation wells are uniformly screened all over the aquifer and do not allow separated measurements in both the upper and lower compartments (Klepikova et al., 2016; Hermans et al., 2018).

The observed heat tracer plume shows that the heat injected in the lower aquifer part tends to move upwards very quickly toward the first panel, then to be split and move downwards (Wildemeersch et al., 2014). The measured temperature in the second upper panel is significantly lower than in the first upper panel (for detailed measured reference data at all panels, we refer to the "**Supplementary Material Figures 1–4"**). This observed behavior is currently difficult to be simulated for all observation locations with just one deterministic model inverted with common methods, e.g., using pilot points (Klepikova et al., 2016).

#### Deterministic Porous Media Model

Based on the current conceptual two-layer aquifer model (each 3.5 m thick), Klepikova et al. (2016) developed a numerical model focusing on the heat transport simulation using HGS in finite difference mode (Therrien et al., 2010) and a pilot point approach (Doherty, 2003) to calibrate the model against heat data. This model is density dependent during the first 24 h only, as density effects were expected to be low afterwards (Ma and Zheng, 2009; Klepikova et al., 2016). This model describes a 40 × 60 × 7 m volume of alluvial aquifer with a grid of 84,280 elements in total. No recharge was assumed for the duration of the experiment. Due to the high permeability of the gravel, the vertical leakage at the bottom of the model was considered as negligible compared to lateral input/output. The initial groundwater temperature was set to Tini = 13.48 ◦C according to the measured values before the experiment. The model was running under transient flow conditions due to the simulation of the tracer experiment. Peclet numbers of 300 for the upper part, and 14,000 for the lower part were computed, suggesting an expected advection-dominated transport (Klepikova et al., 2016).

Here, this model is extended with a simultaneous solute injection. In HGS, the injection is simulated selecting two nodes next to each other: at the first node, representing the screen location in the borehole, the solute tracer is injected and one node below the heat is injected, respecting the actual experimental setup. Both injections are simulated using Neumann (2nd type) boundary conditions. For the solute injection, the prescribed mass injection rate is 4.3 10−6kg s−<sup>1</sup> . For the heat injection, the prescribed injection rate is 8.3 · 10−<sup>4</sup> J K−<sup>1</sup> s −1 (Klepikova et al., 2016). The grid is refined to 140,140 rectangular elements with

14 numerical grid sublayers (each 0.5 m thick), allowing a better representation of the spatial heterogeneity and uncertainty. To account for the influence of immobile water on heat conduction, an apparent thermal conductivity is computed using:

$$\kappa\_s' = \frac{\kappa\_{\rm f} \left(\theta - \text{n}\_{\rm eff}\right) + \kappa\_s \left(1 - \theta\right)}{\left(1 - \text{n}\_{\rm eff}\right)} \tag{1}$$

θ is the total porosity with 0.12, neff the effective porosity with 0.05, or otherwise called mobile water porosity, κ<sup>f</sup> the fluid thermal conductivity with 0.59 W m−<sup>1</sup> K −1 and κ<sup>s</sup> the solid thermal conductivity estimated from previous works to around 1.43 W m−<sup>1</sup> K −1 . As temperature affects the groundwater density and dynamic viscosity, the injected heat influences both groundwater flow and transport simulations. In contrast to Klepikova et al. (2016), the new numerical implementation allows fully density-dependent, but also viscosity-dependent simulations.

#### Stochastic Prior Uncertainty Investigation

This study investigates the prior-uncertainty using Monte-Carlo simulations, followed by a global sensitivity analysis and prior falsification. The applied procedure in this study corresponds to the first, second and third steps of the BEL method as described by Hermans et al. (2018). In the Monte-Carlo simulations, part of the calibrated values from the deterministic model parameters are replaced by random values sampled from random uniform distributions (**Table 1**).

In each Monte-Carlo simulation step (prior sampling), a new HGS forward model is parameterized with randomly generated advection global parameters, like the log(Kmean), the K variance, the porosity, the variogram ranges in X,Y,Z directions, the azimuth, and the gradient between the two main prescribed head boundary conditions located upgradient from the injection well and downgradient from the pumping well. Fixed values [i.e., identical to those chosen by Klepikova et al. (2016)] are considered for dispersivity, thermal conductivity, specific heat capacity, specific storage, and bulk density (**Table 1**). To represent the K-distribution within each Monte-Carlo simulation more realistically, sequential Gaussian simulations are used following two scenarios.

Scenario A uses the prior distribution from Hermans et al. (2015a). This prior was not falsified by geophysical and hydrogeological data acquired during the experiment (Hermans et al., 2015a) in the middle panel. It was thus not tested against the whole available data set, as it is proposed here. In particular, it uses the same two-layer approximation and ignores any trend in the alluvial deposits grain size distribution. Random Kmean values between 10−3.5 and 10−2.5 m s−<sup>1</sup> in conjunction with a K-variance between 1 and 100 m s−<sup>1</sup> are considered. Models are randomly generated without any additional constraint.

Scenario B considers the trend observed during drilling (**Figure 1B**) in the alluvial deposits, in which it is assumed that grain size distributions influenced the hydraulic conductivity values. Within 14 constrained sublayers, a vertical downwards increasing K-trend is considered in the geostatistical simulations. Within every Monte-Carlo step, for the 12 sublayers between the fixed top (Kmean = 10−<sup>4</sup> m s−<sup>1</sup> ) and bottom (Kmean = 10−<sup>2</sup> m s−<sup>1</sup> ) sublayer, new random generated mean values between Kmean = 10−3.5 and 10−2.5 m s−<sup>1</sup> , increasing downwards, are used. To account for field observations (see section Test site: Hermalle-sous-Argenteau, Wildemeersch et al., 2014; Hermans et al., 2015b) showing that the heat plume does not follow a straight path toward the pumping well, the possible occurrence



<sup>+</sup>The fixed values are taken from Klepikova et al. (2016).

\*The solid thermal conductivity estimated at 1.43 W m-1 K -1 is replaced by an apparent value for simulation (see section: Deterministic porous media model).

[ ] Sampled from a random uniform distribution.

of local low hydraulic conductivity zones (flow barriers) in the aquifer is thus considered. The presence of loam lenses with low hydraulic conductivity are actually observed at some places in the Meuse river alluvial deposits and are here assumed as a possible origin/explanation for the observed behavior. Two flow barriers are placed in front of Pz14 and Pz17 in the upper part, and the third one in the lower aquifer part between the injection well and Pz11. The constrained hydraulic conductivity is four orders of magnitude lower. Within the sequential Gaussian simulation, the fixed constrained K-values are considered as hard data for the random simulations. The size of the potential flow barriers is thus dependent on variogram characteristics.

For each scenario, 250 simulations are generated through Monte-Carlo methods. This number was considered sufficient to model the variability in the prior data sets, while keeping the computational cost to a minimum, and to estimate the global sensitivity analysis (see below). The reason lies in the fact that the used approach analyzes the data response (temperature or solute curve) which is less complex than the model spatial heterogeneity, therefore requiring only a limited number of samples (Hermans et al., 2018).

A distance metric using the root of the square sums of the difference between each simulated f(ti) and observed g(ti) data over the same experimental time interval tExp (0–10 days), showing positive zero definition, symmetry, and triangle inequality, is used to compare the ability of different simulations to reproduce field data. It allows to identify the best realization at each observation point within the generated 250 realizations, separately for each scenario:

Best simulation at observation point means minimizing:

$$d\_{ObsP} = \sqrt{\sum\_{i=1}^{t\_{Exp}} \left( f\left(t\_i\right) - \mathbf{g}\left(t\_i\right) \right)^2} \tag{2}$$

Best simulations are quantified calculating the root mean square error (RMSE) and the correlation coefficient (R<sup>2</sup> ).

#### Distance Based Sensitivity Analysis

A global sensitivity analysis reveals key information about model parameters most influencing the simulated data at observation points. With the output signals of the heat and solute realizations, the distance-based global sensitivity analysis (DGSA, Park et al., 2016) is applied, considering the global and spatial parameters of each simulation. DGSA can also identify conditional effects between pairs of parameters (Fenwick et al., 2014; Park et al., 2016). In DGSA, the sensitivity is defined by comparing the parameter cumulative distribution function (cdf) within k clusters to the original distribution. The number of clusters must be chosen so that there are enough simulations in each cluster while allowing sufficient discrimination between them (Hermans et al., 2018). The k clusters are computed using the k-medoid clustering technique applied on a multi-dimensional scaling map of the models. The latter is computed based on the metrics of equation (2). In DGSA, random parameters for each simulation are linked to the corresponding output signal produced by the forward model. We refer to Park et al. (2016) and Fenwick et al. (2014) for details.

Every sample of the prior distribution is parameterized using random generated global parameters, e.g., porosity, gradient, Kmean, and K-variance values and local parameter, i.e., the spatial random K-field, generated by geostatistical simulation. A local parameter is highly dimensional (number of elements) and therefore difficult to analyze using a sensitivity analysis. However, those local parameters can be reduced using Principal Component Analysis (PCA). PCA is one possibility to structure, simplify and visualize complex data sets by replacing multiple statistical variables with a limited, smaller, and approximated amount of linear combinations using the decomposition in eigenvectors (Krzanowski, 2000). With PCA, the first dimensions are explaining the average K distribution and, thus, larger scale heterogeneity, while higher dimensions will be characteristic of smaller-scale heterogeneity (e.g., Oware et al., 2013; Park and Caers, 2018). Therefore, the PCA's first dimensions represent the degree of heterogeneity in the aquifer. That is used to compute the score variables for each of the 250 simulations subsequently in the sensitivity analysis (Park and Caers, 2018).

#### Tracer Velocity Comparison

The modified deterministic model and the Monte-Carlo simulations are further used for a synthetic heat and solute tracer velocity comparison. The velocity comparison follows the approach of Irvine et al. (2015), but using here the peak times instead of the time of 50 % of tracer recovery in the breakthrough curve. The peak time is here preferred, due to the large uncertainty related to the missing solute tracer information between the injection and pumping wells. Irvine et al. (2015) equations are adapted by using, for the strong advective aquifer system of Hermalle-sous-Argenteau, as thermal retardation factor Rth = 1. An estimated thermal retardation factor based on a fixed specific heat capacity is in this study not sufficient to capture the difference between the two tracers as it cannot account for spatial heterogeneity. The calculations are:

$$\mathbf{v}\_{\text{sol}\_{\text{peak}}} = \frac{\mathbf{x}}{\mathbf{t}\_{\text{sol}\_{\text{peak}}}} \pmod{\text{velocity}} \tag{3}$$

$$\mathbf{v\_{heat\_{peak}}} = \mathbf{v\_{th\_{peak}}} \cdot \mathbf{R\_{th}} = \frac{\mathbf{x}}{\mathbf{t\_{heat\_{peak}}}} \,. \tag{4}$$

where tsolpeak [s] is the solute peak time of each prediction/simulation, x [m] is the shortest distance from the observation well to the injection point and vsolpeak [m s−<sup>1</sup> ] the corresponding modal velocity. In the heat case theatpeak [s] is the peak time of each prediction/simulation, Rth the thermal retardation factor, vthpeak [m s−<sup>1</sup> ] the thermal front velocity using the peak time.

High K-zones (i.e., corresponding to preferential flow paths) resulting in a mismatch of solute and heat distributions, lead to different vheat and vsolute values (i.e., diverging from a 1:1 line in a vheat vs. vsolute diagram) inducing a decrease of the regression coefficient (Irvine et al., 2015).

#### RESULTS

#### Prior Uncertainty Investigation

The heat observations at panel 1, 2, 3, and the joint observed heat and solute information at the pumping well are investigated and used to attempt prior falsification. In a first step, the numerical model considers groundwater density and dynamic viscosity effects caused by the injected heat. In a second step, the multiple heat breakthrough curves and the solute one at the pumping well are simulated using the multiple realizations generated by the Monte-Carlo procedure in conjunction with both K-distribution scenarios. **Figure 2** shows the comparison of the deterministic solution of the density (basis model) and the density-viscosity dependent model with the reference data, the two simulation scenarios A and B and the individual best heat simulation for the upper screened part in Pz11-up, Pz15-up, and Pz19-up (observation points in the upper middle lane of Panel 1, 2, and 3).

The change of the dynamic viscosity, e.g., at the peak-time for Pz11-up about 25 % (upper screen), has a significant effect on the simulated temperature, while the effect of density is limited (0.02 %) (**Figures 2E,F**, **3**). Accounting for this effect allows slight improvement to the fit with the observed heat breakthrough curve at Pz11-up and Pz15-up using a deterministic approach (**Figures 2C–F**). The simulated peak, e.g., at Pz11-up is slightly improved, as well as the tailing, but the overall fit is still not satisfying for all three observation points.

Monte-Carlo realizations surround the real data set of the heat tracer at all three points (**Figure 2**). It highlights the influence of spatial parameter heterogeneity on the resulting breakthrough curves. However, the prior of scenario A seems to be falsified by the tailing part of the curve observed in Pz11-up and Pz15 up (**Figures 2C,E**). Clearly, scenario B considering the observed vertical downwards increasing K-trend, and describes the tailing part of the curve more realistically compared to the deterministic approach and scenario A (e.g., compare **Figures 2E,F**). For Pz15 up and Pz19-up the best simulation allows representing the reference data more accurately, compared to the deterministic solutions (**Figures 2A–D**). Again, scenario B gives more realistic solutions than ignoring any trend in the simulations.

Selecting the 10 best heat simulations from both prior scenarios for all observation points at panels 1 to 3, upper and lower screen, further confirms that considering a vertical downwards increasing K-trend in the simulations is a more realistic description of spatial heterogeneity (referring to "**Supplementary Material Figures 1–4**"). At panel 3 the observed data fluctuates around a temperature change of 0 ◦C, without a significant peak. Thus, the solution simulation with 1T = 0 ◦C is identified as the best one. The modeled 1T = 0 ◦C is exactly zero for the simulation because it corresponds to a set of parameters where diffusion is larger than advection transport; the heat therefore does not reach panel 3. These results stress the need for a more realistic prior-uncertainty quantification and falsification of prior hypotheses. Here, a purely random K-field can be considerably improved by including sedimentological observations that the advocated procedure is capable of taking advantage of it.

Previous paragraphs do not integrate the joint heat and solute breakthrough curves at the pumping well (located 5 m downstream from panel 3). Here, the deterministic model solution calibrated on heat data only (Klepikova et al., 2016), including now both the density and viscosity changes, fails to predict the heat and solute tracers behaviors at the pumping well (**Figures 4A,B**. Note that the solution depending on density only is not realistic and is not shown).

Monte-Carlo simulations, with the random K-field without trend (scenario A), surround heat and solute observed data (**Figures 4A,B**), adding indications that spatial heterogeneity is necessary to generate realistic predictions. The best simulation for the observed heat signal using equation (2) (R<sup>2</sup> = 0.96, RMSE = 0.01 ◦C) is however different than the one for the solute signal (R<sup>2</sup> = 0.99, RMSE = 1.2·10−<sup>5</sup> g L−<sup>1</sup> ). Thus, the best heat simulation poorly predicts the solute signal and vice versa (**Figures 4A,B**). For the random K-field with the vertical downwards increasing K-trend (scenario B), the heat breakthrough at the pumping well is not as well simulated as in the intermediate panels and the solute breakthrough concentrations are strongly underestimated, even though the time occurrence of the peak seems to be correctly predicted (**Figures 4C,D**). Near the pumping well, the tracer is intensively diluted due to the high pumping rate (30 m3h −1 ), making

FIGURE 2 | Comparison between deterministic solution, real data and prior + best simulation for both random K-distribution scenarios for Pz19-up (Panel 3): (A) without K-trend (scenario A), (B), with a downwards increasing K-trend (scenario B), for Pz15-up (Panel 2): (C) without K-trend, (D) with K-trend, and for Pz11-up (Panel 1): (E) without K-trend, (F) with K-trend (10 best heat simulations at each observation point are in Supplementary File). The index of the best solution refers its number within the 250 simulations.

the heterogeneity around this well crucial for explaining the breakthrough curve (see in Discussion section).

The K-fields corresponding to the best heat and solute simulations describing the tracer breakthrough curves at the pumping well using scenario A are largely heterogeneous (**Figures 5A,B**). However, for both tracers, K-distributions are slightly different, which explains in parallel to the less good modeling of the tailing (**Figure 2**), why scenario A is not suitable to adequately describe complementary tracer movement (**Figures 4**, **5**). The injected heat forms a plume around the injection well enlarging with time by conduction and advection, while its temperature amplitude decreases (**Figure 5C**). Further, the solute prefers mainly the high hydraulic conductivity pathways, faster in the lower part than in the upper part (**Figure 5D**). Both tracers are best described by two different parameter distributions and only stochastic inversion could here result in simulations fitting both, while deterministic inversion finding one global parameterization, may tend to derive a smoothed parameter distribution poorly fitting the data. However, it should be stressed that this prior (scenario A) would not be able to reproduce the observations in intermediate panels. For the K-field and tracer distribution of scenario B (**Supplementary Material Figure 5**), the best solute tracer simulation displays a strongly heterogeneous model with preferential flow paths, while for heat, a more homogenous Kfield respecting the observed trend in the borehole drillings sedimentology (**Figure 1B**) is found.

These single local best realizations, for each observation point, do not explain the full data set, further highlighting the role of local heterogeneity on the measured signals. This probably explains why the global fit of the deterministic solution is poor and highlights the need for more realistic priors instead of trying to find unique parameterization describing reality.

For the Hermalle-sous-Argenteau test site, the prior considering a vertical downwards increasing K-trend seems to better represent the overall hydraulic conditions (until panel 3) and constitutes a better prior assumption than neglecting any K-trend. However, it seems to be somewhat falsified between the last panel and the pumping well in terms of solute concentration amplitude. This suggests that the current parameterization still oversimplifies the heterogeneity of the deposits at a larger scale and cannot be used for inversion or prediction. New hypotheses should be formulated, or new data collected to identify the specific processes taking place between the third panel and the pumping well. Existing ERT transects (Hermans et al., 2015a,b) and newly acquired cross-hole GPR sections are promising tools to image heterogeneity patterns with a higher resolution and at a larger scale.

## Distance Based Sensitivity Analysis

The 250 generated models from scenario B are further used for the distance-based global sensitivity analysis. The distance metrics (Equation 2) between pairs of Monte-Carlo simulations is calculated and used as starting point for DGSA. The sensitivity analysis investigates the global simulation parameter values within their given range (**Table 1**) and the spatial heterogeneity. To analyze spatial heterogeneity, the K-fields from Monte-Carlo simulations are represented using orthogonal basis vectors computed through principal component analysis (PCA) with 250 observation rows and 140,140 corresponding cell K-values as columns (in total: 35,035,000 K values). Only the first 15 principal components are retained and used as an approximate measure of heterogeneity. While, those 15 first dimensions explain only 23 % of the total variance in K, the first three principal components together describe 9.6 % of the variance. The small amount of explained variance is related to the strongly variating K-values from one simulation to the other. The 15 corresponding PCA scores are further included in the sensitivity analysis to characterize the role of spatial heterogeneity.

The sensitivity analysis results for the heat signal at the panels 1, 2, and 3 in relation are presented in **Figures 6A,C,E**. **Figures 6B,D,F** show the corresponding classification of the 250 models in three clusters. Clusters are used to group the simulations according to their response (i.e., the first cluster contains simulations with high temperature far above the reference data, the second contains simulations with temperature below the reference data and the third group consists in simulations around the reference data). Using three clusters gives satisfactory results in this case.

The sensitivity of a parameter is computed based on the difference between its cumulative distribution function (cdf) in each of the cluster compared to the global cdf. Significant differences mean that the parameter is considered as sensitive. For all panels, the resampling quantile of the distance is α = 0.95 (we refer to Park et al. (2016) for the detailed explanations of parameters used in DGSA).

The global "log(Kvar)" is the most sensitive parameter at Panel 1 and 2 (**Figures 6C,E**). Its sensitivity decreases within the third panel (**Figure 6A**) while "log(Kmean)" is getting more sensitive (**Figures 6A,C,E**). The influence of the first component of spatial heterogeneity "PC1" is large at every distance from the injection well. The increasing sensitivity and influence of "PC1" with distance from the injection well is related to the hydraulic conductivity in the direction of the gradient, which indicates a strong link to preferential pathways (**Figures 6A,C,E**). For the first two panels, most PCA components are sensitive. This is in accordance with the previous results, showing that the introduced K-trend is crucial to explain the observed breakthrough curves (section Prior uncertainty investigation). The global parameter "gradient" shows a decreasing sensitivity with distance. The sensitivity of the gradient highlights the influence of uncertain boundary conditions on the simulations. The fluxes around the injection well are crucial to initiate the tracer transport (**Figures 6A,C,E**). The global "porosity" and the "azimuth" are, at all three panels, a much less sensitive parameter for the strong advective system at Hermalle-sous-Argenteau. The variance explained in the low dimensional space is relatively constant over all three panels, but the clusters are getting closer to each other (**Figures 6B,D,F**). The sensitivity analysis further indicates that the scale of heterogeneity playing a role on the tracer distribution at panel three is different. The vertical K-trend is not sufficient anymore to explain the observations. It underpins that more

realistic imaged preferential pathways (e.g., using improved imaging methods like full-waveform GPR inversion) are probably necessary to understand the heterogeneity surrounding the pumping well.

The sensitivity analysis results at the pumping well using the heat and solute signals using the 250 simulation of scenario B are presented in **Figure 7**. For the heat and the solute signals, the "log(Kmean)" is more sensitive than the "log(Kvar)" and the "PC1," "PC2," and "PC3." The local heterogeneity is still sensitive and important to consider, but the results are less sensitive to small scale heterogeneity as mostly "PC1" to "PC4" are sensitive. For solute transport, "porosity" is also a sensitive parameter to be considered (**Figures 7A,B**), probably due to its direct effect on advection velocity. The analysis supports that simulating complementary tracer behavior requires a realistic description of heterogeneity.

The sensitivity analysis of the pumping well simulations using scenario B is extended by using alternatively the synthetic velocity ratio vheat/vsol as prior response for the DGSA (**Figure 8**). Replacing the breakthrough curve as model response by the velocity ratio vheat/vsol, averages the model response over the complete transport path.

As a reference, the field measured derived modal velocities at the peak time of heat (i.e., Rth = 1 ≥ wavefront velocity) and solute at the pumping well are:

$$\frac{\text{v}\_{\text{sol}\_{\text{peak}}}}{\text{v}\_{\text{th}\_{\text{peak}}}} = \frac{2.05 \cdot 10^{-4} \text{ m s}^{-1}}{1.08 \cdot 10^{-4} \text{ m s}^{-1}} = 1.90 \tag{5}$$

Using the assumption of Rth = 1, for scenario B (**Figure 8A**), the obtained velocity ratios are less spread around the observed data. Many solute velocities forecasts have the same value, but with a different corresponding heat velocity (**Figure 8A**). This is an indication that the heat signal provides more information, while the variance of the solute velocity responses decreases.

Applying now the DGSA using this alternative prior response, the sensitivity of "porosity" is now less strong than in the solute case, but stronger than in the heat case (**Figures 7**, **8B**). Then, similar to panel 1 to 3, the "log(KVar)," "gradient," and "PC1" are the most sensitive parameter at the pumping well (**Figure 8B**). Interestingly, the velocity ratio seems to be not directly sensitive

to "log(Kmean)," but on the global heterogeneity (Kvar and range), spatial heterogeneity (PC1) and fluxes (gradient). This is a clear indication that preferential flow paths, being the result of an interaction between K-heterogeneity and gradient, is the main reason for the variation in velocity ratio.

At this stage, it can be assessed that the joint heat-solute tracer experiment results are indeed better represented by a K-distribution with a vertical downwards increasing K-trend. The prior with the K-trend is the current best heterogeneity representation for Hermalle-sous-Argenteau test site between

each cluster.

the injection well and panel 3. However, this K-trend is not representative for the part of the simulated domain between the third panel and the pumping well as shown by the Monte-Carlo prior investigation and the DGSA results. This highlights how important a not-falsified prior is for robust decision making, and that every model containing approximations must be use with care for predictions. Furthermore, if the proposed prior seems valid at the local scale, it is not sufficient to explain all observations made at the site. The latter probably requires the inclusion of another level of heterogeneity, accounting for the change of behavior for the tracer.

## DISCUSSION

In the field tracer experiment, the heat tracer arrives 1 day after the solute transport and the recovered energy at the pumping well is very low. This delay is a consequence of the different transport processes, mainly the retardation effect related to heat conduction in the solid phase and in the immobile water. For example, the heat tracer test provides useful information to better understand the matrix processes quantifying the immobile water part. This complementary behavior helps to better characterize the actual transport processes occurring in the aquifer, in particular the preferential flow paths.

The previous existing calibrated model was based on heat tracing data only. The new numerical model implemented in the framework of the presented study, showed that the dynamic viscosity has a strong impact on simulated temperature values even in a narrow temperature range. It clearly appeared that this model was failing to reproduce the observed solute concentrations at the pumping well. All attempts to find one single deterministic model fitting both tracer data failed, illustrating the difficulty to approximate solute advection and heat conduction/storage with one single (smoothed) spatial parameter distribution. Even if a global minimum was found, any prediction would remain based on a simplified model with limited prediction capabilities.

To overcome limitations of the deterministic approach, and to avoid full stochastic inversions, performing prior parameter uncertainty investigation using multiple Monte-Carlo realizations offers the possibility to generate more geologically realistic subsurface parameter distributions. Compared to transdimensional inference, which although stochastic in essence and would involve some degree of simplification or parsimony (Sambridge et al., 2012), keeping the full variability in the model is necessary to generate realistic predictions. Thus, in this study, the analysis of those simulations revealed that increasing the spatial heterogeneity of the alluvial deposits allows to better reproduce the observed breakthrough curves. The considered prior uncertainty generates a range of possible outcomes surrounding the observed data. The specific behavior of the breakthrough curves, such as the sharp decrease of temperature after the peaks, is much better reproduced. It is also clearly shown that approximations made in deterministic approaches (e.g., using smoothed K-distributions), strongly influence the results and contribute to higher uncertainty. Furthermore, it appeared that modeling the deposits with two separate layers did not allow the reproduction of the tailing part of the breakthrough curves, whereas a continuous distribution with a vertical downwards increasing trend was more able to model this behavior. However, it was also shown that the used vertical K-trend seems not to be appropriate, i.e., between the third panel and the pumping well. Investigating prior uncertainty here has greatly helped to update the previous conceptual ideas that were mostly based on simple investigations like borehole log description.

The proposed prior with the K-trend is consistent with all observation points (Panel 1 to 3) except the pumping well and Pz18, 19 in the lower aquifer part. Between panel 3 and the pumping well, there are likely preferential flow paths influencing the tracer behaviors, not properly described by the proposed prior. The latter was mainly built based on the high borehole density from intermediate panels. In the original log description of the pumping well (drilled in the 90's), there is no grain size trend described. One possibility is therefore that the strongly heterogeneous alluvial deposits cannot be described by a single simplified parameterization (here Gaussian simulations with a trend) but must include more heterogeneity at the larger scale (for example different vertical trend). It appears that lateral variations occur in the aquifer, stressing the need for a more global description of the heterogeneity, including larger scale sedimentological structures such as channels, and advanced integration of secondary data such as geophysical tomographies (e.g., Hermans et al., 2015a). Indeed, geophysical data acquired on the site showed lateral variations in electrical resistivity related to gravel structures (Hermans and Irving, 2017). A trade-off between the acquisition of new data to refine understanding and cost-affordable field studies must be found. In this framework, the combination of hydrogeological testing (such as joint-heat tracer tests) with static and time-lapse geophysical data (such as GPR and ERT) at an early stage of site characterization is the key to acquire informative data sets at limited costs.

The prior uncertainty analysis also reveals that each specific temperature breakthrough observation is better reproduced by a different prior realization and, therefore, spatial parameter distribution. This clearly identifies spatial heterogeneity as having a major influence on the simulation results. The solute tracer breakthrough at the pumping well is better represented with models showing preferential flow paths, largely influencing advective-dispersive processes. In contrast, temperature observations in the intermediate panels and at the pumping well seem to be better represented with a slightly more homogeneous model, as conduction is indeed important. This might indicate that a significant part of the pore space is occupied by immobile water. This interpretation shows clearly that the previous conceptual model represented by Peclet numbers of 300 (in the upper layer) and 14,000 (in the lower layer) is not adequate. Furthermore, it shows that the use of a heat tracer alone is not necessarily a good choice to calibrate a model, especially if solute transport should be predicted. Trying to use one single deterministic model parametrization is limited here by two points: (1) complementary tracers cannot really predict each other with classic underlying simplifying assumptions and one parametrization and (2) heterogeneity patterns are complex, meaning that highly parameterized inversion might fail to converge toward a realistic solution.

Similarly, stochastic inversion or optimization techniques might be very complicated to tune to convergence in such a complex layout. Here starts the potential of advanced prediction approaches such as Bayesian Evidential Learning. An informative prior sampled by multiple realizations containing complementary tracer processes might be directly used for prediction if a statistical relationship can be found between data and prediction variables. This kind of approach is probably very promising for the future of hydrogeological modeling where the full, explicit inversion fails due to the lack of sufficient qualitative data to constrain the geometry of the deposits. Some uncertainty component might be irreducible and impossible to resolve through inversion methodologies. Approaches such as BEL, combined with an in-depth prior uncertainty analysis, can therefore be a good way to account for those in prediction uncertainty assessment in a computational efficient way.

The fact that a global sensitivity analysis shows different sensitivity patterns for heat and solute responses, here the porosity, is another indication of the complementarity between the tracers. If heterogeneity is more realistically represented by the K-Trend distribution (scenario B), heat seems, in comparison to solute, insensitive to porosity. Although, the Hermallesous-Argenteau site is characterized by a strongly advective system, the heat data set remains dominated by the effect of conduction. Heat is mainly stored around the injection well and is only slightly and very slowly withdrawn from the reservoir at the pumping well. Some parts of the heat are transported fully through conduction (immobile water and solid matrix).

The presented study also shows that using the Euclidean distance for the distance-based global sensitivity analysis, might be of limited interest for data sets containing strong complementary behavior. It can result in different sensitivities between needed parameterization for the related output. An alternative proposition is to use the velocity ratio as a proxy for the model response, as it allowed to clearly identify preferential flow paths as the main explanation for the difference in tracer behaviors.

# CONCLUSION

New innovative imaging methods, namely joint heat and solute tracer tests were combined with advanced field data analysis tools to better assess preferential pathways and associated uncertainty in complex alluvial deposits. This paper demonstrates the limitation of deterministic inversion approaches in capturing the complementary behavior of heat and solute tracers. To overcome those limitations, a prior-uncertainty investigation and a heat-solute velocity comparison are applied. Monte-Carlo simulations are used to investigate the range of simulated data and are complemented by a distance-based global sensitivity analysis. The main results are:


strong advective system at Hermalle-sous-Argenteau, heat transport does not seem to be affected by porosity, as long as realistic heterogeneity is considered, using a vertical downwards increasing K-Trend distribution respecting the borehole sedimentology. Indicators linked to local spatial heterogeneity are sensitive parameters for both heat and solute transport, stressing the need to use an adequate prior description of the deposits, a prerequisite for any stochastic Bayesian inversion.

5) The tracer velocity comparison shows that the prior and the sampled Monte-Carlo simulations yield a better representation of the joint heat and solute behavior as observed on the field. This is a key point for further research steps in modeling and predicting the transport processes in this aquifer.

# DATA AVAILABILITY

Note that the datasets analyzed for this study must be stored on the H+ Network (http://hplus.ore.fr/en/enigma/ data-hermalle) as a mandatory part of the funding project (ENIGMA ITN). The files are ready and currently under upload. We guarantee, that they will be uploaded as it is a mandatory requirement. Codes for DGSA are freely available at https://github.com/SCRFpublic.

# AUTHOR CONTRIBUTIONS

RH generated the results, did most of the writing and the layout of the contribution. TH was the main supervisor and motivator for the paper, contributed mostly to the redaction and writing part as well as assisted actively the modeling part. AD ensured the financial support of the joined tracer tests in Hermalle, did the supervision of the previous works (Wildemeersch et al., 2014; Klepikova et al., 2016), contributed in conceptual and methodological discussions, helped for the writing part and is the main promoter of the Ph.D. of RH. PG reviewed the paper from a global and external position and is the second promoter of the Ph.D. project of RH.

# ACKNOWLEDGMENTS

This work is part of the Ph.D.-thesis of RH in the ENIGMA ITN framework. ENIGMA has received funding from European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Grant Agreement N◦ 722028. We thank Jef Caers for our fruitful discussion about the results of the sensitivity analysis. Further, we are thankful for the review and the fruitful exchange with three reviewers and the associated Editor.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/feart. 2019.00108/full#supplementary-material

#### REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Hoffmann, Dassargues, Goderniaux and Hermans. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Comparison Between Hydraulic Conductivity Anisotropy and Electrical Resistivity Anisotropy From Tomography Inverse Modeling

#### Simon Gernez <sup>1</sup> \*, Abderrezak Bouchedda<sup>1</sup> , Erwan Gloaguen<sup>1</sup> and Daniel Paradis <sup>2</sup>

1 Institut National de la Recherche Scientifique, Eau-Terre-Environnement, Quebec City, QC, Canada, <sup>2</sup> Natural Resources Canada, Geological Survey of Canada, Quebec City, QC, Canada

#### Edited by:

Philippe Renard, Université de Neuchâtel, Switzerland

#### Reviewed by:

Renaud Toussaint, Université de Strasbourg, France Abderrahim Jardani, Université de Rouen, France

> \*Correspondence: Simon Gernez simon.gernez@ete.inrs.ca

#### Specialty section:

This article was submitted to Freshwater Science, a section of the journal Frontiers in Environmental Science

Received: 19 September 2018 Accepted: 06 May 2019 Published: 04 June 2019

#### Citation:

Gernez S, Bouchedda A, Gloaguen E and Paradis D (2019) Comparison Between Hydraulic Conductivity Anisotropy and Electrical Resistivity Anisotropy From Tomography Inverse Modeling. Front. Environ. Sci. 7:67. doi: 10.3389/fenvs.2019.00067 Hydrogeophysics is increasingly used to understand groundwater flow and contaminant transport, essential basis for groundwater resources forecast, management, and remediation. It has proven its ability to improve the characterization of the hydraulic conductivity (K) when used along with hydrogeological knowledge. Geophysical tools and methods provide high density information of the spatial distribution of physical properties in the ground at relatively low costs and in a non-destructive manner. Amongst them, the Electrical Resistivity Tomography (ERT) has been widely used for its high spatial coverage and for the strong theoretical links between electrical resistivity (ρ) and key hydrogeological parameters, such as K. Historically, ERT data processing was based on isotropic hypothesis. However, the unconsolidated aquifers in Canada reveal in most cases a strong anisotropic behavior for K both with in situ or laboratory measurements. Recently, electrical anisotropy has been considered model-wise, but it is seldom considered as an interpretation tool or in the characterization process of the anisotropy of K. In order to evaluate the potential of ERT to assess the anisotropy of electrical resistivity, we developed a forward and inverse modeling code. These codes have been validated and tested on a realistic synthetic case reproducing the behavior of a real aquifer extensively characterized, the site of Saint-Lambert-de-Lauzon in Quebec (Canada). On this site, innovative in situ hydraulic tomography has revealed a strong anisotropy, with up to three orders of magnitude between horizontal and vertical K components. In order to confirm the link between in situ K- and ρ-anisotropies, an ERT survey has been performed, using the same wells as for the hydraulic tomography. The inversion confirms a strong link between K- and ρ-anisotropies. It demonstrates the suitability of the anisotropic ERT approach coupled with well measurements to provide better estimates of K and its anisotropy at the scale of a site.

Keywords: hydrogeophysics, anisotropy, electrical resistivity tomography, hydraulic conductivity, modeling, groundwater

# 1. INTRODUCTION

Understanding groundwater flow and contaminant transport in the subsurface for water management and aquifer remediation generally requires a good knowledge of the spatial distribution of hydraulic properties within the aquifers. The hydraulic conductivity (K) is a key parameter to assess as it affects both the direction and velocity of flow and contaminant in aquifers. K can also vary over several orders of magnitude within a same geological unit, which highlights the importance of having accurate high-resolution and high-coverage estimates to reduce errors in groundwater flow and mass transport (de Marsily et al., 2005) and improve groundwater management. While several methods have shown their potential to estimate K at different scales (Butler, 2005), few have been focused on the characterization of its anisotropy that can greatly affect the outcomes of different hydrogeological in situ problems, such as groundwater recharge (e.g., Hart et al., 2006) well capture zone (e.g., Barry et al., 2009), and spreading of contaminant plumes (e.g., Falta et al., 2005).

Indeed, K-anisotropy can be obtained from laboratory permeameters on sediment or rock samples collected in the field (Wenzel and Fishel, 1942). However, the difficulties in the experimental procedures related to sample collection and manipulation may restrict reliable estimations of K-anisotropy for certain kinds of materials. Moreover, permeameter estimates may require an up-scaling to field conditions to be representative. In order to overcome this burdens, several authors have proposed different hydraulic tests in wells to estimate K-anisotropy, such as the dipole-flow test using in one well (Kabala, 1993; Zlotnik and Ledder, 1996; Xiang and Kabala, 1997; Zlotnik and Zurbuchen, 1998; Hvilshøj et al., 2000; Sutton et al., 2000; Zlotnik et al., 2001) or two wells (Goltz et al., 2008), the single-well vertical interference test (Burns Jr et al., 1969; Hirasaki et al., 1974; Onur et al., 2002; Sheng, 2009; Paradis and Lefebvre, 2013), and hydraulic tomography (Paradis et al., 2015a, 2016a,b).

While previous hydraulic tests were shown to provide invaluable estimates of K-anisotropy in real field conditions, these methods are time consuming to operate and can thus only provide very local information. In this study, we propose using geophysical data to complement hydraulic tests as geophysical methods can provide broad pictures of the subsurface in a considerably shorter amount of time than hydraulic methods. Electrical methods, in particular direct current (DC) methods, are frequently used to infer porosity and K (Archie, 1942; Lesmes and Friedman, 2005). However, only a few studies have been done to study the anisotropy of the resistivity of unconsolidated sediments. Anisotropy of electrical conductivity (ρ) is a wellknown phenomenon (Maillet, 1947) but its accurate in situ estimation has only been studied recently (Greenhalgh et al., 2010; Kenkel and Kemna, 2016; Gernez et al., 2018). Moreover, there is a theoretical equivalence between K-anisotropy and ρanisotropy in unconsolidated sediments were the electric current flows in the conductive saturated pores (Hubbard and Rubin, 2005). Recently, laboratory investigations have demonstrated strong similarities between ρ- and K-anisotropies on core samples (Adams et al., 2016). In addition, recent field works have shown that taking into account ρ-anisotropy in DC surveys leads to more accurate estimations of both ρ values and structures (Pek¸sen and Yas, 2018), and have shown reasonable estimates of hydraulic anisotropy in slightly anisotropic aquifer systems (Yeboah-Forson and Whitman, 2014).

The objective of this paper is to demonstrate the ability of DC methods to quantify ρ-anisotropy and to illustrate how it compares with K-anisotropy in a real case study. After introducing the study area (section 2) and presenting theoretical considerations related to ρ-anisotropy (section 3), we provide methodological insights, through a synthetic case, related to DC data acquisition to ascertain the presence of ρ-anisotropy (section 4). Then, the methodology is applied for a real case study known to be highly heterogeneous, and ρ-anisotropy estimated through anisotropic inversion is compared to Kanisotropy obtained with hydraulic tests at the study site to strengthen the reliability of the proposed approach (section 5). This study exposes the capacity of DC surveys to improve hydrogeological characterization.

# 2. STUDY AREA AND EVIDENCES OF ANISOTROPIC CONDITIONS

The study area is located in Saint-Lambert-de-Lauzon (SLdL), 30 km south of Quebec City, Canada (**Figure 1**). The SLdL study area is a 12 km<sup>2</sup> sub-watershed surrounding a decommissioned sanitary landfill site in an unconfined granular aquifer. The surficial sediments composing the aquifer consists primarily of Late Quaternary sandy and silty sediments that were deposited in the receding Champlain Sea, which was an arm of the Atlantic Ocean that invaded the St-Lawrence Valley at the time of the last deglaciation (Bolduc, 2003). Deposition of the Saint-Lambert site was controlled mainly by longshore currents that redeposited in littoral and sublittoral settings that supplied the Chaudière River paleodelta. This geological depositional environment leads thus to sediment size ranging from fine sand to very fine silt with poor to very poor grain-size sorting. Furthermore, this environment shows minor proportions of clay (generally <20%). Clay in major proportions (>50%) is only present below the cross-section studied. The resulting superposition of finely layered sand and silt sediments create very heterogeneous distribution of sediments at centimetric to decametric scales along with more gradual lateral transitions in these littoral and sublittoral sediments as a result of changing energy levels along the Champlain Sea shorelines. The depth of the granular aquifer varies in depth between 0 and 22 m, the water table is generally within 2 m from the surface (Paradis et al., 2014; Tremblay et al., 2014).

This site has been extensively characterized by previous studies using different techniques, such as Ground Penetrating Radar (GPR) and resistivity surveys, Cone Penetrometer Test— Soil Moisture Resistivity soundings (CPT/SMR ), hydraulic tests in wells and logging (Paradis et al., 2014; Tremblay et al., 2014). These data allowed to obtain valuable information on the structure of the aquifer system (aquifer and aquitard layers) including information on its heterogeneity. Particularly, several observations suggest that the heterogeneous nature of the sediments at a fine scale may induce anisotropy at larger scale posing challenges to the interpretation of flow and transport

FIGURE 1 | Saint-Lambert-de-Lauzon (SLdL) study area. It is located (A) in Québec, Canada, (B) 30 km south of Québec City between to the Chaudière and Beaurivage rivers. (C) Geology and characterization details of the study area. (D) Anisotropic hydraulic and electrical tomography site corresponding to the "W" on (C). ERT acquisition is done using an IRIS Syscal Pro system. Nine and Eight electrodes are, respectively, immersed in P17 and P21. Seventeen electrodes are planted between P17 and P21. P17 and P21 are separated by 8 m, electrodes separation is 1 m inside the wells and 0.5 m at the surface. Adapted from Paradis et al. (2015b). processes in this environment. Anisotropy can be due to the microscopic scale organization of the minerals (micro or intrinsic anisotropy, e.g., crystals ordered structure or oblong grains) or, as is the case here, to the macroscopic structural elements of the ground (macro or extrinsic anisotropy, e.g., fractures or alternating heterogeneous beds). First, the comparison of 59 vertical hydraulic conductivity (KV) estimates made on 15 cm undisturbed sediment samples with a laboratory permeameter to horizontal hydraulic conductivity (KH) values obtained from high-resolution multi-level slug tests on similar intervals reveals a strong K-anisotropy even at this small scale. K-anisotropy (or the ratio of K<sup>H</sup> on KV, KH/KV) was indeed up to two orders of magnitude (Paradis and Lefebvre, 2013). Then, numerical inversion of vertical interference slug tests and hydraulic tomography experiments indicate that K-anisotropy should be considered to match hydraulic responses measured in wells (Paradis and Lefebvre, 2013; Paradis et al., 2016a). For a 60 cm vertical resolution of the numerical grid, KH/K<sup>V</sup> values ranged from near isotropy (1) to more than 100. Moreover, comparison of high-resolution cone-based ρ measurements (SMR probe) with collocated estimates of the ρ computed with surface-based surveys (ERT) revealed a bias between the two data sets (Ruggeri et al., 2014). For instance, the magnitudes of the SMR probe data are generally higher than those of the ERT surveys. This suggests that given the SMR probe sense essentially the horizontal component of ρ due to the configuration of its electrodes (spaced vertically by 9 cm), lower ρ values from ERT surveys are the results of the influence of the ρ-anisotropy induced by the heterogeneous nature of the sediments (section 3.2.2). Finally, those evidences motivate the need to develop a geophysical approach able to handle this anisotropy to provide insights about K-anisotropy in order to better characterize aquifer systems for groundwater flow and contaminant transport studies.

# 3. ELECTRICAL RESISTIVITY ANISOTROPY 3.1. Theoretical Considerations and Definitions

Electrical anisotropy refers to the directional dependence of electrical conductivity or resistivity which results in the directional dependence of the measured potential fields. This means that the current can preferentially flow in certain directions compared to others. Ohm's law establishes the relationship between an injected electric current in the ground and the induced potential field (Dey and Morrison, 1979). In order to take into account the 2D electrical anisotropy, the scalar conductivity σ in Ohm's law is replaced by the conductivity tensor ¯σ¯ = - σ<sup>H</sup> 0 0 σ<sup>V</sup> (or its inverse, the resistivity tensor ¯ρ¯ = ¯σ¯ −1 ), with σ<sup>H</sup> and σ<sup>V</sup> being the conductivity values in the horizontal and vertical directions, respectively (Greenhalgh et al., 2009). Anisotropic Poisson's equation has the following expression in the 2.5D case, i.e., 2-D resistivity structure (plane invariance) and 3-D current flow (Zhou et al., 2009):

$$\nabla \cdot (\bar{\bar{\sigma}} \nabla \tilde{\phi}) + k\_\wp^2 \sigma\_H \tilde{\phi} = -\frac{1}{2} \delta \left( \mathbf{r}(\mathbf{x}, z) - \mathbf{r}\_\mathbf{s}(\mathbf{x}\_s, z\_s) \right) \tag{1}$$

where φ˜ is the potential in the frequential domain, k<sup>y</sup> is the wavenumber, **r**(x, z) are the coordinates in the computational domain or on its boundaries, I is the current source intensity located at **rs**(x<sup>s</sup> , zs) and δ is the Dirac function. The coefficient of anisotropy is defined as λ = √ σH/σ<sup>V</sup> = √ ρV/ρ<sup>H</sup> ≥ 1: anisotropy increases as λ departs from the value of 1 (λ = 1 corresponds to isotropy). In this study, we will consider an H/V anisotropy. More complex geometries are handleable by the numerical modeling tool we developed to this end (AIM4RES), but they will not be investigated in this study.

#### 3.2. Diagnosis of Electrical Anisotropy

The next sections aim at demonstrating the effects of electrical anisotropy on the interpretation of ERT data using isotropic ERT inversion. We take advantage of three particular effects to propose an electrical diagnosis to detect anisotropy on measured electric potentials. These effects are observable without the need for a complete characterization study, both in terms of field and numerical resources.

#### 3.2.1. Importance of Data Acquisition Protocols

The measured electric potential field is linked to the amount of electric current passing through the different heterogeneous part of the ground. Hence, in the case of surface ERT measurements, a thin conductive anisotropic layer and a thicker less conductive isotropic layer can produce the same electric potential differences (equivalence principle, Maillet, 1947). In other words, it is impossible to distinguish between isotropic layer response from anisotropic layer response using surface ERT data. Consequently, isotropic ERT inversion (Loke, 2001; Bouchedda, 2010) of surface anisotropic data always converge to an equivalent resistivity model which is not representative of the true electrical state of the ground, leading to erroneous resistivity model of the earth. To overcome this problem, anisotropic ERT acquisition and inversion should be used. To address the data directionnality problem, we unavoidably need borehole electrodes along with surface electrodes. In that way, anisotropic ERT inversion requires an optimization of the acquisition protocol, in order to converge toward the true solution.

Nevertheless, in presence of anisotropy, isotropic inversion of ERT directional data leads to unrealistic solutions. It is explained by the fact that there is no physical isotropic solution fitting both surface and inhole data. To demonstrate this effect, two data experiments were simulated using only surface electrodes in the first one and both borehole and surface electrodes in the second one. The resistivity model consists of two horizontally anisotropic layers (**Figure 2A**). The first layer has a thickness h = 4 m with ρ<sup>H</sup> = 100 .m and ρ<sup>V</sup> = 400 .m. The second layer is a semi-infinite space with ρ<sup>H</sup> = 10 .m and ρ<sup>V</sup> = 40 .m. For the whole section, the anisotropy coefficient λ is 2.

The first experiment was performed using only surface Wenner array data. We assumed the convergence is reached as the RMSE values are very low (0.0026%), but the inverted model (**Figure 2B.1**) is not consistent with the true resistivity model (**Figure 2A**) neither in terms of amplitude of the resistivity nor in terms of geology. According to the theory, the resistivity of the upper layer appears to be <sup>√</sup> 100 · 400 = 200 .m and the

Semi-infinite space: ρH = 10 .m and ρV = 40 .m. λ = 2 for the whole model. Electrodes (white dots) are located at the surface and in-hole borehole at x = 14m. In yellow is shown an example of surface-borehole measure angle (D). (B) (section 3.2.1) Isotropic inversions of potentials acquired using (B.1) only surface electrodes and (B.2) surface and borehole electrodes (borehole at x = 14 m). (C) (section 3.2.2) Comparison between logged (ρlog, blue curve) and inverted (ρinv, red curves) resistivities: (C.1) isotropically inverted resistivity. (C.2) ρH from inversion vs. ρlog. (C.3) ρV from inversion vs. ρlog (borehole at x = 20 m). The logged resistivity is the direct resistivity measurement at x = 20 m, therefore the blue curve is the same for all (C.1–3) graphs. (D) (section 3.2.3) Relative error behavior as a function of the measure angle θ. The points are the error values, their color represents the associated dipole-dipole distance. Orange area represents a positive error, blue area represents a negative error. (D.1) Relative error from an isotropic inversion of data acquired on an anisotropic model, displaying a sigmoid shape. (D.2) Relative error from an anisotropic inversion of data acquired on an anisotropic model, relative error is close to zero and is not angle dependant (borehole at x = 25 m).

resistivity of the semi-infinite space appears to be <sup>√</sup> 10 · 40 = 20 .m. In addition, the thickness of the upper layer appears to be λ · h = 8 m.

In addition to the previous Wenner surface array, a dipoledipole array in the borehole and a mixed surface-borehole array were added to the acquisition protocol in the second experiment. The isotropic ERT inversion result of these data sets are presented in the **Figure 2B.2**. It can be clearly seen that isotropic inversion of directional ERT data leads to unrealistic solutions. Furthermore, the final misfit between measured data and predicted data is high (RMSE = 22.1%) for isotropic inversion in comparison to anisotropic inversion (0.002%), showing that directional data are unable to fit an isotropic solution. This can be used as an evidence of electrical anisotropy.

#### 3.2.2. Effect of Anisotropy on ERT Measurements

In the case of horizontally anisotropic resistivity model, it has been pointed out by Maillet (1947) and Keller and Frischknecht (1966) that the measurements made in the horizontal direction are equal to geometric mean of horizontal and vertical resistivity components, while the measurements made in the vertical direction are equal to the horizontal resistivity components. This is the paradox of electrical resistivity anisotropy: measurements along vertical profiles in the case of layered anisotropic model are sensitive to horizontal component as shown in our synthetic model example. For formal demonstration please see Lüling (2013). Indeed, electrical resistivity logs can be used as in situ measurements of the horizontal resistivity ρ<sup>H</sup> that can be introduced as a constraint in the inversion system or employed in combination of surface ERT data to diagnose the anisotropy.

In our case, the electrical resistivity logging was measured using a CPT-SMR instrument which does not require a well installation, simplifying its implementation. The probe is 5 cm thick and 9 cm long. Note that the small probe diameter and the small electrodes separation make the hole effect negligible and the measured resistivity is only sensitive to ρH.

In order to demonstrate the effectiveness of resistivity well logs as anisotropy diagnosis tool, let us compare well logs resistivities and estimated resistivities from isotropic ERT inversion of surface data obtained on an anisotropic resistivity models. When isotropic ground is considered, both resistivities are expected to be similar. For horizontally anisotropic ground (as in SLdL), well log resistivities are equal to horizontal resistivity components whereas isotropic ERT inversion returns an equivalent resistivity model which combines both horizontal and vertical resistivity components (equivalence and paradox effects). In other words, the difference between the two resistivities can be very important depending on the value of anisotropy coefficient. Consequently, any difference between the two resistivities is an indication of the presence of anisotropy.

To illustrate the anisotropy diagnosis using synthetic model let consider the previous two layered anisotropic model (**Figure 2A**). **Figure 2C.1** shows the comparison between the electrical resistivity logging (blue curve, e.g., obtained with a CPT-SMR logging at x = 14 m) and the corresponding resistivity (red curve) estimated using the isotropic ERT inversion of surface data. Both curves depart from each other, indicating the presence of anisotropy. **Figures 2C.2,C.3** show the comparison between the same electrical resistivity logging and collocated horizontal and vertical resistivities obtained from anisotropic ERT inversion of surface and borehole data. As logged resistivity is carried along a vertical profile, it is only sensitive to the horizontal resistivity component of the ground and thus departs from the estimated vertical resistivity of anisotropic medium which confirms the validity of our methodological approach to quantify the anisotropy. Please see references for more details.

#### 3.2.3. Relative Error vs. Array Angle

To assess the effect of anisotropy on data misfit error of isotropic ERT inversion, we consider the same two layered synthetic model as in sections 3.2.1 and 3.2.2. The data acquisition is simulated using only surface-borehole data. The current electrodes are located in wells and the potential electrodes are located at the surface. Array configuration were made using 50 surface electrodes and 15 boreholes electrodes. Electrode spacing is 1 m. Centers of each bipole describe a skew line with the horizontal, forming an angle θ (**Figure 2A**). The simulated potential data (φtrue) are isotropically inverted. The data misfit relative error is computed as the normalized difference between the potential data calculated using the inverted model (φcalc) and the simulated potential data (φtrue):

$$\frac{\phi\_{\text{calc}} - \phi\_{\text{true}}}{\phi\_{\text{true}}} \ast 100 \tag{2}$$

**Figure 2D** displays scatter plots of data misfit relative error as function of array angle θ for isotropic and anisotropic ERT inversion. For angles between 0 and 45◦ , the relative errors are mostly negative, meaning that φcalc are underestimated (blue area in **Figure 2D.1**). The errors become positive for angles between 45 and 90◦ (orange area in **Figure 2D.1**). This sigmoid error shape is expected when an isotropic inversion is used to invert ERT data of horizontally anisotropic media. The horizontal resistivity component ρ<sup>H</sup> is lower than the vertical resistivity component ρV. At low acquisition angles, current flow is mainly driven by ρH, and isotropic inversion underestimate the apparent resistivity values. Conversely, current flow is mainly driven by ρ<sup>V</sup> at high angles and isotropic inversion overestimate the apparent resistivity values. The underestimations (blue area at low angles) compensate overall the overestimations (orange area at high angles). The sigmoid shape arises for any borehole dipole depth. For a given angle value, the more space is integrated—i.e., the deeper the borehole dipole–, they higher the local relative error (as represented by the colored points in **Figure 2D.1**). The total mean error of the measures is −2%. The same relative error computation is made with anisotropically inverted data. It shows an error close to zero (−2.75%) and independent of array angles (**Figure 2D.2**). A sigmoid relative error shape between true data and calculated data resulting from isotropic ERT inversion is then a strong indication of anisotropy existence in the ground.

The points addressed by section 3.2 give various ways to detect electrical anisotropy by analyzing ERT data. It can be difficult to gather information from multiple sources on the field: lack of outcrops, incapacity of drilling numerous wells (e.g., as needed for hydraulic tomography), or even total absence of well. We propose this preliminary methodological qualitative study to ascertain the presence of electrical anisotropy, and then a fortiori to ascertain the presence of hydraulic anisotropy. A full quantitative anisotropic study, in terms of data acquisition and processing, represents more time, resources and costs than a common isotropic study. Nevertheless, processes comprehension and interpretations suffer greatly from the lack of trustful data, and anisotropy consideration might be unavoidable to produce better forecasts, reducing the uncertainties and then the risks on the investigation or engineering works.

The next sections methodologically demonstrate the ability of anisotropic ERT campaigns to quantify the electrical and hydraulic ground anisotropies.

# 4. ANISOTROPIC ELECTRICAL RESISTIVITY INVERSION FOR A SYNTHETIC CASE

Before starting the real case study, a synthetic electrical model is created on the basis of hydraulic tomography results. Forward modeling is performed on this model to generate synthetic electric potentials to simulate data acquisition in the field. After that, anisotropic ERT inversion is used to reconstruct ρ<sup>H</sup> and ρ<sup>V</sup> fields. The comparison between anisotropically inverted fields and the original synthetic model will allow to assess the robustness of the proposed approach to estimate ρ-anisotropy. The section describes the synthetic model (section 4.1), the optimal data acquisition protocol for anisotropic characterization of the subsurface (section 4.2), and the details of the forward and inverse modeling procedures (section 4.3) along with the performances of inversion (section 4.4).

## 4.1. Synthetic Model

The synthetic model used in this numerical experiment mimics the K fields model obtained from the hydraulic tomography experiment measured between wells P21 and P17 (see **Figure 1** for location) by Paradis et al. (2016a). The two wells are separated by 8 m and the aquifer thickness is 9 m, which corresponds to the approximate length of the wells. The K fields were directly transformed in σ values by increasing K by a factor 10<sup>5</sup> , which were inverted to obtain ρ values, to make it realistic of earth materials at the site (values between 10<sup>2</sup> and 10<sup>5</sup> .m, **Figure 3**). This transformation leads to log(ρH) ∈ [1.30; 2.52]log(.m) (**Figure 3A**), log(ρV) ∈ [2.44; 4.60]log(.m) (**Figure 3B**), and log(ρV/ρH) ∈ [1; 3] (**Figures 3C,D**), which could be qualified as a moderate anisotropic field. For the synthetic simulation, 34 electrodes (black dots in **Figure 3**) were placed around the synthetic model: every 1 m inside the wells and every 0.5 m at the surface.

## 4.2. Optimal Data Acquisition Protocol

As in the isotropic case presented in section 3, it is crucial to adapt the data acquisition protocol given that electrode configurations are not necessarily sensitive to the same subsurface features. Different electrodes configurations do not have the same sensitivity to anisotropy (Wiese et al., 2009; Greenhalgh et al., 2010; Kenkel and Kemna, 2016). In particular, Bing and Greenhalgh (2000) have detailed the use of cross-hole ERT. Thus, configurations not sensitive to anisotropy should be avoided as using them will lead the inversion toward an isotropic (hence wrong) inverted solution. Using the synthetic model previously described (**Figure 3**), nine quadrupoles configurations were tested (**Figure 4**) to assess their ability to detect anisotropy (**Figure 5**). Those quadrupoles use different combinations of electrodes placed in wells and at the soil surface. The following arrays were tested:

FIGURE 3 | Electrical resistivity synthetic model based on real case hydraulic study results from Paradis et al. (2016a). The horizontal (A) and vertical (B) resistivities ρH and ρV are the components of the anisotropic resistivity tensor ¯ρ¯. The anisotropy (C) shows locally up to three orders of magnitude. Black dots represent the electrodes locations. The distribution of anisotropy is represented on the histogram (D).


For each of these quadrupoles, the sensitivity of the electric potentials to ρ<sup>H</sup> and ρ<sup>V</sup> was analyzed using values of the Jacobian matrix, which is the matrix of the potential derivatives according to the resistivity values of the model (Greenhalgh et al., 2009).

As each quadrupole have a distinct sensitivity pattern, several observations can be made from **Figure 5**. First, inline borehole (IB1, 1B2) and surface (S) configurations show larger sensitivities close to the location of the electrodes, which limit the area of investigation of the surveys performed with those quadrupoles. On the other hand, crosshole (XH1, XH2) and surface-borehole (SB1, SB2, SB3, SB4) quadrupoles are sensitive to much larger areas. However, the magnitude of the sensitivities is larger for S, SB1, and SB3 quadrupoles, which can better resolve ρ<sup>H</sup> and ρ<sup>V</sup> in the associated sensitive areas using those configurations. Then, the sensitivity patterns for symmetric configurations (IB1 and IB2, SB1 and SB3, SB2 and SB4) show similar behavior despite the electrodes being located in different materials due to the heterogeneous nature of the synthetic model. The contrast in ρ material seems thus to have less impact on sensitivities than the electrode configuration itself. Also, sensitivity patterns for ρ<sup>H</sup> and ρ<sup>V</sup> are different. This means that a quadrupole can be more influenced by one component of ¯ρ¯ than by the other. Quadrupoles configuration should be thus chosen accordingly to avoid bias in the measurements.

Given the previous observations, the S, IB1, IB2, XH1, XH2, SB1, and SB3 quadrupoles were then found the most informative and useful for an anisotropic inversion. IB1, IB2, XH1, XH2, SB1, and SB3 quadrupoles were chosen because they appear to be more sensitive to the central region of the investigated section, far from the surface and the wells. They provide information on the whole section. The S quadrupoles, even not significantly sensitive in depth, have been considered because they provide constraints for the model. This further constraints are particularly important since the surficial cells are not well-constrained by borehole electrode configurations and have an important effect on the inversion. Amongst the chosen subprotocols, electrode configurations might be sensitive to the same parts of the characterized section, incorporate redundancy. This redundancy is to be avoided in order to ease convergence of an inverted solution.

## 4.3. Forward and Inverse Modeling

In this section, the forward- and inverse-modeling (Dey and Morrison, 1979) adapted for anisotropic conditions (Gernez et al., 2018) is used to compute both forward and inverse modeling of ERT data on a numerical grid made of 8,970 squared cells of 25 × 25 cm. The forward modeling on the synthetic model used a protocol made of 755 quadrupoles chosen to be sensitive to the anisotropy (IB1[73], IB2[63], S[39], XH2[220], SB1[118], SB3[242]). The synthetic potentials are transformed on equivalent apparent resistivities ρapp and were then inverted to reconstruct ρ<sup>H</sup> and ρ<sup>V</sup> fields. For this reason, XH1 has not been considered since most of its apparent resistivities were negative. To reduce the risk of the model to converge toward a local minimum, homogeneous and anisotropic ρ values were also used to initialize the inverted model (ln ρ<sup>H</sup> = 4.75 ln .m, ln λ = 6.2146). A weak first-order Tikhonov constraint (α) on the vertical direction was used (αV/α<sup>H</sup> = 0.5) in order to promote horizontal structures (which is consistent which geological information and GPR data). This horizontal smoothing is used to favor a layered inverted model. Conversely, a ratio departing too much from 1 will show horizontal artifacts. By rule of thumb, we choose 0.1 < αV/α<sup>H</sup> < 1. Refining is done by trial and error. A regularized iterative Gauss-Newton method was used to tackle the non-linear inverse problem.

## 4.4. Inversion Performances

The **Figure 6C** presents a histogram of the relative error between synthetic and inverted potential values after convergence of the model at the seventh iteration. With most of the relative error centered on zero and an overall low RMSE of 1.7%, the inversion is considered to fit almost perfectly the synthetic potentials.

Moreover, the **Figure 6A** (right) shows ρ fields resulting from the inversion. While the ρ fields are smoother than the synthetic model [**Figure 3**, **6A** (left)], the main features of the subsurface are reproduced, such as alternations of low and high ρ layers and overall range of ρ variations. Also, the analysis of the frequency distributions of synthetic and inverted ρanisotropy reveals similarities with quasi-normal distributions (**Figures 3D**, **6B**). Examination of rho profiles along the depth (**Figure 6D**) illustrates more specifically the good agreement between synthetic and inverted fast alternations between low and high values of rho-anisotropy: both the trends and magnitudes of the synthetic and inverted profiles are well-reproduced. Finally, inverted ρ<sup>H</sup> matches well the logged rho (**Figure 6E**), in agreement with the paradox of electrical resistivity anisotropy detailed in section 3.2.2..

Given the above model performances, we have shown that anisotropic inverse modeling is able to reconstruct ρ fields, particularly ρ-anisotropy, even for a very challenging aquifer with moderate heterogeneity and anisotropy.

## 5. FIELD CASE STUDY: COMPARISON BETWEEN ELECTRICAL AND HYDRAULIC ANISOTROPIES

Through the synthetic study presented in section 4.1, we demonstrated the ability of our methodology to characterize an electrically anisotropic environment using an adapted ERT survey (acquisition and inversion), without further external

information. In this section, we want to verify with in situ measurements the possibility to characterize K-anisotropy from ERT anisotropic inversion.

The wells used for ERT were installed by direct-push technique in order to minimize skin effects around wells during testing (Paradis et al., 2011). Conventional well installation procedures indeed require the use of sand-pack to fulfill the space between the drilling hole and the screen, which may hinder the electrical response of the natural formation behind it. Direct-push well installation procedure allows the screen to be in direct contact with the aquifer with minimal disturbances to the surrounding sediments. The screen of the wells is open to the entire thickness of the aquifer allowing for multi-level hydrogeological and geophysical surveys. The screens ensure the free flow of water with slotted openings of 2.5 mm spaced vertically at every centimeter and covering over half of the circumference of the screens. The wells are also made of electrical insulator material (PVC) to ensure the integrity of electrical measurements.

The ERT setup is displayed in **Figure 1**. It consists of 17 inhole electrodes (9 in P17 and 8 in P21) and 17 surface electrodes located around the plane formed by P17 and P21 wells. P17 and P21 are separated by 8 m, electrodes separation is 1 m inside the wells and 0.5 m at the surface. Using this configuration, 18,936 electric potentials were measured with an IRIS Syscal Pro system. In addition, high resolution horizontal resistivity log data are available, acquired along P17 and P21 with the CPT-SMR probe. The SMR probe measures the resistivity using two ring electrodes 9 cm apart at a 1 kHz frequency to reduce polarization effects (Shinn et al., 1998; Paradis et al., 2015b). The log is 10.41 m deep at P17 and 9.96 m deep at P21, and ρ<sup>H</sup> is measured over a 5 cm interval.

Before getting to the inversion, a quality control was done on the data using the reciprocal data. Interchanging the two electrodes inside a pair (current or potential) should only alter the sign of the measured potential data. Alternatively, interchanging the two pairs (current with electrodes) should provide the same measured potential data by principle of reciprocity (Slater et al., 2000). During our survey, 5369 data has been acquired to that end. Amongst them, 89.6% of these data show a difference of <15%, and that 84.5% show a difference of <5%. These values show an overall good quality data set. From the whole data set, we can extract the data used for the inversion (section 5.2). Inverse modeling is computed on a numerical grid made of 8,970 squared cells of 25 × 25 cm, similarly to the synthetic case (section 4).

# 5.1. Anisotropy Diagnosis of Real Case Study ERT Data

In section 3.2 two different approaches were presented to assess the electrical anisotropy by analyzing ERT data. In the following, anisotropy diagnosis of real case study ERT data is studied by performing isotropic ERT inversion. After removing negative ρapp data and poor quality data, 12,933 resistance data measurements were considered for the inversion. The isotropic ERT inversion converges after 10 iterations with an acceptable RMSE of 8.3%. Nevertheless, numerous erratic structures that do not correspond to the known geology of the site (section 2) appear on the isotropic resistivity image (ρiso, **Figure 7A**). More precisely, few small resistivity structures are close to the electrodes, where the inverted section is usually better resolved as shown in **Figure 2B.2**.

When ρiso is compared to the logged ρ in P17 and P21 wells (**Figure 7B**), ρiso shows very high frequency variations on both wells, and its values do not correspond to ρ values. As we have shown before (section 3 and **Figure 3C**), this is due to anisotropy. We then established the presence of anisotropy using a very fast approach in comparison to hydrogeological experiments. In fact, ERT data are carried out in less than a few hours, whereas several weeks are needed for anisotropic hydrogeological tomography data acquisition. Therefore, electrical anisotropy

FIGURE 8 | Real case study inversion results displaying logarithmic inverted resistivities. (A) Inverted results from the apparent resistivities acquired between P17 (x = 0 m) and P21 (x = 8 m) on the tomography site (Figure 1). (B) Histogram of the relative error between measured and inverted ρapp. (C) Comparison between inverted hydraulic (blue curves) and electrical (red curves) anisotropies. Hydraulic data starts at z = 1 m (saturated depth). To compensate the resolution difference, the inverted resistivities are averaged on the 2.5 m around the wells and in the center of the modeled section. (D) Comparison between the logged (ρSMR, blue curves), and the inverted horizontal (ρH, red curves) and vertical (ρV , yellow curves) resistivities (corresponding to the P17 and P21 resistivities).

diagnosis approaches can be used as an assisting tool to help taking hydraulic decisions.

# 5.2. Anisotropic Inversion of Anisotropic ERT Data

As for the synthetic case, XH1, SB2, and SB4 arrays are not considered. The used protocol is made of 975 measures (IB1[75], IB2[64], S[38], XH2[436], SB1[186], SB3[176]), based on the result obtained in the synthetic study. Homogeneous and anisotropic ρ values were used to initialize the inverted model with an initial ρ<sup>H</sup> value corresponding to the median of the measured apparent resistivities and an initial anisotropy value of λ = 10. The convergence is reached after 6 iterations, after which misfit or RMSE slightly decrease, but do not improve significantly anymore. At this point, the inverted model starts integrating the noise held in the data, so further iterations are ignored. The relative error between the measured and the computed ρapp from the inverted model is shown in **Figure 8B**. The histogram displays a slight bias of 4.3% in the relative error and a standard deviation of 10.83%. Unlike the synthetic case study, detailed information of the ground structure is unavailable, making hard the building of an optimized protocol. The chosen protocol was inspired from the synthetic case study since the latter was based on the hydraulic characterization of the ground. The residual error can be explained by the difficulty it met to reconcile its different sensitivities, whose preliminary examination is not achievable. Nevertheless, the final error (9.3%) combined with the relative error are considered low in a noisy real case study context.

The inverted sections are shown in **Figure 8A**. Both ρ<sup>H</sup> and ρ<sup>V</sup> sections show subhorizontal structures, as expected from our geological and hydrogeological knowledge of SLdL. The comparison between K (blue curve in **Figure 8C**) and ρ (red curve **Figure 8C**) anisotropies show strong similarities in their patterns and their amplitudes. It indicates the ability of anisotropic ERT inversion to characterize K-anisotropy. Finally, CPT-SMR resistivity is compared to anisotropic ERT inversion results (**Figure 8D**). Similarly to the synthetic case, the graphs display the collocated logged (ρSMR, blue curve), horizontal (ρH, red curve) and vertical (ρV, yellow curve) resistivities on wells P17 and P21. On the contrary to isotropic ERT inversion, horizontal resistivities at P17 and P21 are smooth which is consistent with CPT-SMR resistivities. However, there is a gap between the two curves. More precisely, horizontal resistivities are several times higher than CPT-SMR resistivities. This difference is due to the fully-screened wells effect on ERT data. CPT-SMR data measurement is carried out by directpush before fully-screened well installation. Its coupling is very good, the electrodes being in direct contact with the undisturbed investigated underground. In the ERT case, acquisition is done using electrodes immersed into water in the screened well. Due to this aqueous environment, a part of the current is channelized along the well and affects the inverted model. The use of packers could prevent this channeling. Unfortunately, we were not able to implement this experiment during the campaign acquisition. Moreover, to our knowledge, the borehole effect has been studied in the isotropic case when the electrodes are mounted on the electrically insulated borehole casing (Doetsch et al., 2010; Wagner et al., 2015; Lee et al., 2016). This effect is important only for large resistivity contrasts between the rock formation and borehole fluid and for large borehole diameters (Doetsch et al., 2010). Furthermore, sensitivity is very high close to the electrode. Impact of close objects or structures are important (Binley and Kemna, 2005). According to us, because the ERT method is very sensitive to the resistivity variations close to the electrodes, the screened borehole casing impacts the resistivity model. In our opinion, borehole effect in our case is substantially handled by anisotropic inversion because hydraulic tomography shows approximately the same anisotropy variations as electrical resistivity tomography. On the contrary, isotropic inversion shows a lot of artifacts around the boreholes.

If **Figures 8C,D** show the comparison between various linked parameters, a direct proportionality relation between ρ and K anisotropies is hard to achieve. The sensitivities of the geophysical and hydraulic methods are not the same. Moreover, both anisotropy sections are inverted sections, coming from two different inversions (in terms of grid size, regularization, etc.). To obtain a direct proportionality relation between ρ and K anisotropies, or even directly between ρ and K values, further investigations are needed to that goes beyond the framework of this work (use of packers to prevent current channeling, use 3-D finite elements code to more efficiently remove the well effects, etc.).

# 6. CONCLUSIONS

Hydraulic anisotropy has a major influence on the groundwater flow and mass transport. Its consideration is essential when it exists. Through this study, we pointed out: 1. the ability of isotropic ERT modeling to assess the presence of ρ-anisotropy, 2. the ability of anisotropic ERT modeling to quantify ρanisotropy and 3. the strong relationship existing between K- and ρ-anisotropies through an in situ survey. To achieve this work, we developed a new methodology based on an innovative anisotropic ERT modeling tool. To overcome the equivalence problem, electrodes were placed inside a fully screened borehole along with surface electrodes. Anisotropic ERT inversion is then carried out to estimate the ρ-anisotropic model. The latter suggest a strong link with the collocated K-anisotropic characterization: even though the setup used does not allow a direct proportionality relation, the proposed geophysical method is able to provide proxy of the in-situ hydraulic anisotropy.

In this study, we have shown that the anisotropic electrical resistivity surveys are helpful for anisotropic hydrogeologic parameters characterization, which paves the way for large scale hydrogeophysical characterization campaigns, even in challenging anisotropic environments. Integrated hydrogeophysical studies can therefore be powerful approaches in the understanding processes in order to produce more reliable forecasts.

## AUTHOR CONTRIBUTIONS

EG funded the project. DP and EG conceived the idea of linking the in situ resistivity and hydraulic anisotropies. AB, DP, and EG supervised the project. AB and SG developed the physical and numerical theory, worked on data curation and analysis, and designed the computational modeling framework. AB encouraged further investigation on the data selected for the inversion, contributed to the interpretation of the results, and proposed further methodological experiments to improve data interpretation and final conclusion. SG conceived and carried out the synthetic and field experiments and wrote the manuscript

#### REFERENCES


draft. SG, EG, AB, and DP provided review, editing, and final approving on the manuscript.

#### ACKNOWLEDGMENTS

The field data considered in this paper were acquired through the support of the Régie Intermunicipale de gestion des déchets des Chutes-de-la-Chaudière. The authors would like to acknowledge the participation of students for fieldwork. The authors would also like to thank the two reviewers that helped to improve the initial manuscript. This is LMS contribution number 20180391.


With a Section on Direct Laboratory Methods and Bibliography on Permeability and Laminar Flow. US Geological Survey, United States Department of the Interior. doi: 10.3133/wsp887


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Gernez, Bouchedda, Gloaguen and Paradis. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Parallelized Adaptive Importance Sampling for Solving Inverse Problems

#### Christoph Jäggli\*, Julien Straubhaar and Philippe Renard

Stochastic Hydrogeology and Geostatistics Group, University of Neuchâtel, Neuchâtel, Switzerland

#### Edited by:

Yuri Fialko, University of California, San Diego, United States

#### Reviewed by:

Thomas Romary, ParisTech École Nationale Supérieure des Mines de Paris, Université de Sciences Lettres de Paris, France Shuai Zhang, Lawrence Livermore National Laboratory, United States Department of Energy (DOE), United States Sarah Minson, Earthquake Science Center, United States Geological Survey, United States

> \*Correspondence: Christoph Jäggli christoph.jaeggli@gmail.com

#### Specialty section:

This article was submitted to Solid Earth Geophysics, a section of the journal Frontiers in Earth Science

Received: 23 July 2018 Accepted: 26 October 2018 Published: 21 November 2018

#### Citation:

Jäggli C, Straubhaar J and Renard P (2018) Parallelized Adaptive Importance Sampling for Solving Inverse Problems. Front. Earth Sci. 6:203. doi: 10.3389/feart.2018.00203 In the field of groundwater hydrology and more generally geophysics, solving inverse problems in a complex, geologically realistic, and discrete model space often requires the usage of Monte Carlo methods. In a previous paper we introduced PoPEx, a sampling strategy, able to handle such constraints efficiently. Unfortunately, the predictions suffered from a slight bias. In the present work, we propose a series of major modifications of PoPEx. The computational cost of the algorithm is reduced and the underlying uncertainty quantification is improved. Advanced machine learning techniques are combined with an adaptive importance sampling strategy to define a highly efficient and ergodic method that produces unbiased and rapidly convergent predictions. The proposed algorithm may be used for solving a broad range of inverse problems in many different fields. It only requires to obtain a forward problem solver, an inverse problem description and a conditional simulation tool that samples from the prior distribution. Furthermore, its parallel implementation scales perfectly. This means that the required computational time can be decreased almost arbitrarily, such that it is only limited by the available computing resources. The performance of the method is demonstrated using the inversion of a synthetic tracer test problem in an alluvial aquifer. The prior geological knowledge is modeled using multiple-point statistics. The problem consists of the identification of 2 · 10<sup>4</sup> parameters corresponding to 4 geological facies values. It is used to show empirically the convergence of the PoPEx method.

Keywords: adaptive importance sampling, machine learning, uncertainty quantification, bayesian inversion, monte carlo, multiple-point statistics, parallelization

# 1. INTRODUCTION

Inverse problems play a key role in almost all the geosciences. Indeed, this is often the only approach allowing to identify hidden structures of the interior of the earth and to estimate the physical properties of the buried rocks from indirect physical measurements at the surface or in a few boreholes. In groundwater hydrology, the aim is generally to infer the position of highly permeable or impermeable rocks and estimate their porosities and permeabilities from punctual measurements of state variables (e.g., hydraulic heads, tracer concentrations, water temperature, etc.). As for any geophysical problem, inverse methods are of utmost importance and a fundamental step in most quantitative hydrogeological studies (de Marsily et al., 2000; Carrera et al., 2005; Zhou et al., 2014) as well as many environmental modeling problems (Moles et al., 2003; Wainwright and Mulligan, 2005).

However, despite its huge significance and despite more than 50 years of research on this topic in geophysics and hydrology, current methods are still unable to solve certain types of problems efficiently. For instance, an open problem is to solve probabilistic inverse problems that involve discrete structures such as channels, lenses, karst conduits, or faults which cannot be represented by standard multi-Gaussian fields (Gómez-Hernández and Wen, 1998; Journel and Zhang, 2006). The identification and representation of such geological features is indispensable because it heavily controls fluid flow in the underground (Feyen and Caers, 2006). Using a wrong and smoothed representation of such discrete features is known to bias significantly the groundwater forecasts and corresponding uncertainty analysis (Gómez-Hernández and Wen, 1998; Kerrou et al., 2008).

To overcome this difficulty, different approaches have been developed and were recently reviewed by Linde et al. (2015). One general strategy is to construct first a probabilistic prior able to represent stochastic but geologically realistic structures and to embed it in the inverse method. Often, this geological prior can take only discrete values representing the rock types or some specific geological features.

Inverse methods relying heavily on continuity assumptions or simple statistical distributions (typically multi-Gaussian) are not capable to manage this type of problems. On the opposite, sampling algorithms can account for such complex setup (Oliver et al., 1997; Robert and Casella, 2004; Fu and Gómez-Hernández, 2008; Mariethoz et al., 2010a; Hansen et al., 2012; Laloy et al., 2016; Rubinstein and Kroese, 2016). These methods represent the solution of the inverse problem as a set of models (or samples) describing the posterior distribution. From this set of samples, one may approximate any quantity of interest such as mean values, maximum likelihood values, uncertainty bounds, or probabilities of characteristic events. Unfortunately, for most of these approaches, the computational effort is extremely demanding (Fu and Gómez-Hernández, 2008; Romary, 2010; Linde et al., 2015) and the challenge is to design an efficient sampling scheme able to deal with categorical information in the prior distribution.

In a previous paper (Jäggli et al., 2017), we proposed the Posterior Population Expansion (PoPEx) algorithm to expand iteratively an existing set of geological models. PoPEx was specifically designed for handling discrete parameter values, even if it can be applied to the continuous case as well. The discrete parameter fields can be generated with any geostatistical method.

In our previous paper and in this one, we use a multiplepoint statistics technique for expressing the prior distribution because this allows the user of PoPEx to formulate its prior geological knowledge in the area where he is carrying out the inversion. This knowledge is expressed by providing a training image (TI). Multiple-point statistics (MPS) simulation techniques (Strebelle, 2002; Arpat and Caers, 2007; Honarkhah and Caers, 2010; Mariethoz et al., 2010b; Straubhaar et al., 2013) can learn the spatial patterns from the TI and can produce stochastic simulations that resemble the TI. The simulations can be conditioned by local values if they are known (hard data). The advantage of that approach is that it is flexible. The same code can generate all kind of geological structures (channels, lobes, braided systems, fractures, etc.) and therefore it can be applied to a very wide range of inverse problems and applications.

Like most sampling techniques, PoPEx produces iteratively new parameter fields (the samples) using a geostatistical technique (see for example the book of Chilès and Delfiner, 2009), then runs the forward problem (in our case a groundwater flow and transport simulation, but it could be any forward operator), evaluates the misfit and likelihood for that solution, and accumulates novel knowledge. At each iteration, the geostatistical simulation algorithm is controlled by PoPEx: the general mechanism is to condition the simulation of the parameter fields with a set of punctual values (hard data) selected preferentially from previous models having a high likelihood.

This method proved to be very efficient on a synthetic example (Jäggli et al., 2017): a comparison with two existing Markov chain Monte Carlo (McMC) methods showed that the method was able to considerably decrease the computational cost. But this study also allowed us to identify that the initial version of PoPEx produced slightly biased predictions.

In this paper, we revisit completely the core of the PoPEx algorithm. The overall goal is to improve the usability, accuracy and computational time. The most important contribution is to introduce a new strategy allowing to produce unbiased predictions. The bias happens because the generation of a new realization is influenced by all the previous models in the chain. This sampling strategy favors some realizations over others. When computing predictions, however, these correlations must be taken into account. In other words, we propose to consider the method as an adaptive importance sampling (AIS) (Naylor and Smith, 1988; Oh and Berger, 1992; Murphy, 2012) and suggest a simple technique to produce unbiased predictions. The additional computational cost is negligible and does not increase the overall running time. From this perspective, the method can be interpreted as an unsupervised machine learning scheme that aims to learn an optimal probability density which can be used in the AIS scheme. The class of inverse problems that can be addressed is very broad and goes beyond applications in the field of geostatistics. The only requirements are a forward problem solver, an inverse problem description (including the likelihood function), and a conditional simulation tool (e.g., any geostatistical method) that generates models according to the prior distribution.

On top of that, we show how the algorithm, together with all modifications, can be parallelized. We show that it scales perfectly in the considered example. Hence, the computational time is directly reduced by the number of parallel chains, without compromising the outcomes. This is a powerful result, because the main hindrance against the use of sampling strategies is the computational costs. With the proposed methodology, models can be produced in parallel. The only limitations concerns the number of available CPU's, or more precisely, the number of forward problem evaluations that can be run in parallel. Today, most research and engineering groups have access to high performance computer facilities, and therefore these requirements are not too restrictive.

The paper is organized as follows. Section 2 provides the required background related to the inverse problem and the general concepts of the method before explaining the details of the modified algorithm. A case study together with a convergence analysis is presented in section 3. Finally, in section 4, the advantages and limitations of the methodology are discussed and summarized.

# 2. METHODOLOGY

In this section, we first review the general definition of the inverse problem following the notations and approach from Tarantola (2005). Then we introduce the most important techniques constituting the base of PoPEx (Jäggli et al., 2017). As a consequence, the first part of this section mainly presents material that has been proposed and discussed elsewhere. It is toward the end of section 2.2 and in section 2.3 that we present the novel methods that constitute the core of this paper.

## 2.1. Inverse Problem

The general inverse theory presented by Mosegaard and Tarantola (2002) and Tarantola (2005) contains the commonly used Bayesian formulation as special case. Furthermore, it lives without the (problematic) notion of conditional probabilities (e.g., Borel's paradox) and alternatively uses the concept of states of information. In the following, we slightly enrich their explanations with a few comments specifically dedicated to the hydrogeological framework.

Solving an inverse problem is usually related to honoring a sparse set of observations **d** obs = {d obs 1 , . . . , d obs <sup>m</sup> } called **data**. The nature of these observations can differ widely and may depend on the overall framework. When studying subsurface properties, they often represent measurements of state variables such as hydraulic heads, production data or contaminant concentration. Due to imperfect measuring devices, these quantities usually include uncertainties. It is common to use a finite set of parameters **m** = {m1, . . . , mn} to fully describe the physical system under study. Any possible collection of such values will henceforth be called a **model** or equivalently a **realization**. In this regard, a model can cover a vast number of physical and conceptual quantities, as, for instance, boundary conditions, hydraulic conductivity maps, or specific storage values. The collection of all possible models is called **model space** and is denoted by M. In the hydrogeological framework, a common approach is to subdivide an aquifer into a finite number of volume elements (simulation grid) and characterize the hydraulic conductivity in each grid cell. In this case, the underlying model **m** includes one parameter m<sup>i</sup> per grid element, that defines the physical property in this small sub-domain. The choice of a set of representative dimensions is equivalent to the definition of a parametrization of M. Note that for a given system, such a coordinate system is not unique. "Permeability," for example, can be replaced by "resistivity," "speed" with "slowness" or "frequency" with "period."

In practice it is possible to observe parameters that can also be included in **m**. Boreholes, for example, often provide cores, from which petrophysical values can be deduced with high precision. If the model space is designed to describe the same quantities, we simply remove the corresponding degrees of freedom from any possible model **m**, and reduce the number of dimensions in the model space M.

In many fields, well-founded physical theories have been established in order to describe processes and interactions. They can be used to describe relations between the models and the observations. From a naïve point of view, it means that for a given model **m** the error-free values of the corresponding data set **d** can be predicted. This theoretical link between a model and the observable parameters is called the forward problem and described by **d** = **g**(**m**). The function **g** = {g1, . . . , gm} denotes the **forward operator**. Tarantola (2005) formulated the probabilistic solution of an inverse problem as a non-negative measure function that combines two different states of information. Typically, these states of information are captured by the prior and the likelihood function. The prior distribution ρ(**m**) describes any available information on the model parameters, that is independent of the data set. The likelihood function, L(**m**), usually embeds the forward operator and is a probabilistic measure of how well a given model is able to explain the observations. The solution, called the **posterior distribution**, of an inverse problem is the conjunction of the prior and the likelihood operator such as

$$
\sigma(\mathbf{m}) = \mathcal{c}\,\rho(\mathbf{m})L(\mathbf{m}),\tag{1}
$$

where c is a normalization constant. In the Bayesian framework, the posterior measure is considered to be the product of (conditional) probability distributions. The latter approach is contained in Equation (1) and applies under some regularity conditions. For this reasons, the formulation by Tarantola (2005) is more general.

# 2.2. Posterior Population Expansion (PoPEx)

It is worthwhile to recall several important concepts, that originally have been introduced by Jäggli et al. (2017). Afterwards, some small improvements will be suggested. These modifications just slightly influence the evolution of the sampling scheme, so that we decided not to rename the method and still call it Posterior Population Expansion (PoPEx). The general approach of the PoPEx algorithm is to generate a large number of models **m**1, . . . , **m**<sup>N</sup> that represent the posterior probability density in Equation 1. From this approximation it is possible to compute posterior probabilities of events. The sampling procedure, however, requests to compute σ(**mk**) for every k = 1, . . . , N, what can be highly intensive in terms of computational costs. For this reason, the main idea of the PoPEx method is to make the sampling as efficient as possible. Each generation of a new model **m**<sup>k</sup> is therefore guided by all the previous samples **m**1, . . . , **m**k−<sup>1</sup> . For doing so, information maps (denoted by P k and D(P k ||Q), see below) are computed iteratively and ensure that the sampling of **m**<sup>k</sup> is strongly guided by 'good' models with high posterior values. The transfer of information from **m**1, . . . , **m**k−<sup>1</sup> to **m**<sup>k</sup> runs through a set of value restrictions imposed on the new model (denoted by HD<sup>k</sup> , see below).

#### 2.2.1. Set of Models M<sup>k</sup>

The underlying algorithm is able to examine many different types of uncertainties and parameter identification problems. It is possible, for example, to consider parameters concerning boundary and/or initial conditions, spatial heterogeneities, recharge time series, etc. The model set **m** is then simply subdivided into different parts **m** = {**m**1, **m**2, . . .}, where each **m**<sup>i</sup> = {mi<sup>1</sup> , . . . , mi<sup>r</sup> } represents one specific parameter type. The only requirement is that samples representing that uncertainty can be generated from a conditional simulation tool.

In order to keep the following descriptions as simple as possible, we will only consider one type of model parameters and write **m** = {**m**1} = {m1, . . . , mn}. This set will be used to describe spatial heterogeneities of hydraulic permeabilities and is generated by a pixel based MPS technique (Strebelle, 2002; Mariethoz et al., 2010b; Straubhaar et al., 2011). Such methods require a spatial subdivision of the computational domain into a finite number of n ∈ N elements (**pixels**). The union of all pixels is called the **simulation grid**. MPS generate realizations of a random variable by reproducing multiple-point statistics from a training image. Each realization can be associated to a model **m** = {m1, . . . , mn} by putting the MPS value from pixel j into the parameter m<sup>j</sup> . In the example above, a variable m<sup>j</sup> could then be linked to the constant permeability (or resistivity) in the j-th volume element of the computational domain. The term "linked" is used because it is not uncommon for the model parameters m<sup>j</sup> to not contain permeability (or resistivity) values directly but only conceptual representatives of such. For the present work, it is assumed that the prior probability density ρ is precisely the distribution of the MPS random variable. Therefore, using the MPS machine to produce independent and unconditioned models is equivalent to drawing realizations from ρ. It is important to note that conditioning simulators work sequentially. This means that they start by randomly selecting a permutation ς over the set of indices {1, . . . , n} that defines the order in which the components of a new model are treated. Whenever mς(j) is about to get informed, conditional simulation tools only consider previously simulated components and draw mς(j) according to the probability

$$\mathbb{P}(\cdot \mid m\_{\mathbb{S}(1)}, \dots, m\_{\mathbb{S}(j-1)}).$$

In other words, at this point of the simulation, mς(j) is considered to be independent of any uninformed component in **m**.

Sampling a model space for solving an inverse problem, means to iteratively produce a finite number of N realizations

$$
\mathfrak{m}\_1 \to \mathfrak{m}\_2 \to \cdots \to \mathfrak{m}\_N,
$$

that characterize (in some way) the posterior distribution. During this procedure, the likelihood function must be evaluated for every model in the chain. It is not uncommon that this computation is very demanding and represents the most important source of computational cost. After each iteration k = 1, . . . , N, the models can be assembled within the collection

$$\mathcal{M}^k = \{\mathbf{m}\_1, \dots, \mathbf{m}\_k\},\tag{2}$$

while the normalized likelihood values

$$\tilde{L}(\mathbf{m}\_j) = \frac{L(\mathbf{m}\_j)}{\sum\_{r=1}^k L(\mathbf{m}\_r)}, \qquad j = 1, \dots, k, \quad 1$$

are joined in L˜ <sup>k</sup> = {L˜(**m**1), . . . , L˜(**m**<sup>k</sup> )}. The tilde notation indicates that a normalization has been applied, a convention that will be used throughout this paper. There are two different kinds of normalization that will be used. In the latter equation, the total weight was computed by summing all likelihood values from the previous iterations. This action must be renewed, whenever a new model **m**k+<sup>1</sup> is sampled. Secondly, we will define spatial maps. The normalization is then performed through all locational values, and the resulting map can be interpreted as a spatial probability density (c.f. Equation 5).

#### 2.2.2. Probability Maps Q and P k

The possible value range for each model parameter m<sup>i</sup> depends on the TI. After defining a set of s − 1 threshold levels this range may be separated into s different categories, called **facies values** or simply **facies** and denoted by {f1, . . . , fs}. When working with discrete models, these categories usually define a one-to-one relation to the set of all possible values in the TI. From the facies values, it is possible to establish a collection of pixel-based indicator functions. If **m** is a given model and each pixel j ∈ {1, . . . , n} is represented by its center location **x**<sup>j</sup> , these functions are defined as

$$\mathbf{1}\_{f\_i}(\mathbf{m}; \mathbf{x}\_j) = \begin{cases} 1 & \text{if } m\_j \text{ belongs to category} \\ 0 & \text{otherwise.} \end{cases} \tag{3}$$

Any linear combination of the quantities in Equation (3) can be interpreted as a map with constant value in each pixel. The concept of these indicator functions is very important throughout the present paper. If the precise pixel location **x**<sup>j</sup> is not relevant, we will henceforth omit its explicit notation. The indicator functions help to compute moments of the random vector that is associated to the MPS tool. Let q<sup>i</sup> represent the pixel-wise probability of the model values to fall into category f<sup>i</sup> . If E(·) denotes the usual expectation operator, they read

$$q\_i = \mathbb{E}(\mathbf{1}\_{f\_i}(\mathbf{m})), \quad i = 1, \ldots, s.$$

The set Q = {q1, . . . , qs} then collects all the prior probability maps for the facies categories. If the MPS machine is trained to produce stationary and unconditioned simulations, then the maps q<sup>i</sup> are constant over the computational domain and equal the corresponding facies proportion in the training image. On the other hand, a set M<sup>k</sup> = {**m**1, . . . , **m**<sup>k</sup> } can be used to define a second collection P <sup>k</sup> = {p k 1 , . . . , p k s } such that

$$p\_i^k = \sum\_{j=1}^k \mathbf{1}\_{f\_i}(\mathbf{m}\_j)\tilde{L}(\mathbf{m}\_j). \tag{4}$$

The superscript k in the notation p k i indicates the number of realizations that has been used in its computation. It is important to perceive the consequences of weighting the summands by the normalized likelihood values L˜(**m**j). If **m**j<sup>0</sup> is a model with a large likelihood value (with respect to the other ones), this means that some facies patterns in **m**j<sup>0</sup> may be very important. Therefore, the probability maps in Equation (4) are formed by weighting "good" facies patterns more heavily than "bad" ones. Consequently, these maps may be able to provide information that can be used to generate "good" models. But at this point it is unclear where this information can be found and how it could be used. The answer to this question lies in the relation between Q and P k . The central idea of the PoPEx sampling is to consider and learn from all models **m**1, . . . , **m**<sup>k</sup> , before generating **m**k+<sup>1</sup> . This procedure can be split into two parts, that will be explained in the following.

#### 2.2.3. Kullback-Leibler Divergence D(P k ||Q)

Kullback and Leibler (1951) introduced a measure called Kullback-Leibler divergence (KLD) to compare two probability distributions. It computes how a candidate probability diverges from an expected one. This is precisely what is needed to measure the information content of P <sup>k</sup> with respect to Q. In other words, the Kullback-Leibler divergence can be used to identify pixel locations, where the facies probabilities in P k are "extreme" with respect to Q. It is given by

$$D(P^k || Q) = \sum\_{i=1}^s p\_i^k \log \left(\frac{p\_i^k}{q\_i}\right). \tag{5}$$

Whenever q<sup>i</sup> > 0 for all i = 1, . . . ,s, this equation is well defined. But let's assume that there is i ∈ {1, . . . ,s} and a pixel **x**<sup>j</sup> with qi(**x**j) = 0. This means that it is impossible for the MPS tool to produce a model **m** where the value m<sup>j</sup> falls into the i-th category. From Equation (4) it follows that p k i (**x**j) must vanish as well. In short, qi(**x**j) = 0 implies p k i (**x**j) = 0, and the corresponding terms in Equation (5) can be ignored. A brief comment on the prior maps q<sup>i</sup> may help to enhance the meaning of Equation (5). If there is a large set of independent models {**m**1, . . . , **m**N} that is distributed according to ρ, the law of large numbers (LLN) [c.f. Durrett (2010)] suggests to use approximations

$$q\_i \approx \frac{1}{N} \sum\_{j=1}^{N} \mathbf{1}\_{f\_i}(\mathbf{m}\_j). \tag{6}$$

From this perspective, the relation between p k i and q<sup>i</sup> is easier to detect. Both definitions use the same indicator functions, but are weighted differently. D(P k ||K) provides a pixel based information map, that indicates how surprising the facies patterns become, whenever they are weighted by the likelihood values. As mentioned earlier, it is possible to normalize the Kullback-Leibler divergence map spatially. The rescaled map is denoted by <sup>e</sup>D(<sup>P</sup> k ||Q) and can be interpreted as a discrete probability density defined over the pixel locations.

#### 2.2.4. Hard Conditioning Data HD<sup>k</sup>

We mentioned earlier, that each model must be generated by a "conditional simulation tool." This means that it must be possible to condition (impose) some of the values in **m**. Doing so allows fields that honor local data, commonly known as

**hard conditioning (HD)** (Mariethoz and Caers, 2014), to be generated. The enforced value v together with the pixel location **x** forms one conditioning object, denoted by (**x**, v). A reliable set of hard conditioning data may enhance the chance to generate a new model **m**k+<sup>1</sup> that provides a large likelihood value L(**m**k+<sup>1</sup> ). Considering the previous explanations, it seems natural to sample a set {x1, . . . , xn<sup>k</sup> } of hard conditioning locations (where conditioning should apply) from the normalized Kullback-Leibler information <sup>e</sup>D(<sup>P</sup> k ||K). For every selected position **x**<sup>i</sup> , we can then sample a model index j ∈ {1, . . . , k} according to L˜ k and extract the conditioning value (which value should be imposed) from **m**j(**x**i). This produces a set of hard conditioning data HD<sup>k</sup> = {(**x**1, v1) , . . . , (**x**n<sup>k</sup> , vn<sup>k</sup> )}.

So far, nothing original has been proposed. The modifications that we suggest now, concern the number of elements in HD<sup>k</sup> . Jäggli et al. (2017) started with a set of unconditioned models, before fixing the number of conditioning points to a user defined parameter and leaving it unchanged. However, the statistical significance and robustness of the algorithm could certainly be increased by adding some "randomness" into this selection procedure. We suggest to change randomly the number of conditioning points in each iteration. For this, we suggest to fix an upper bound nmax, and draw the number of conditioning points from an uniform distribution over the set {0, 1, . . . , nmax}. The amount of hard conditioning data n<sup>k</sup> thus may change in each iteration k. It is therefore possible to occasionally generate unconditioned realizations.

#### 2.2.5. Parallelization of the Algorithm

Every loop of the PoPEx algorithm consists in four main steps: derive a set of hard conditioning points, generate a new model, compute its likelihood value and update the Kullback-Leibler divergence map. One strategy to parallelize this procedure is to encapsulate the first three steps in a subprocess separated from the last one. Then, a master process launches such subprocesses in parallel on other CPU's. Each subprocess is simply fed by the current available KLD map and performs the enclosed steps independently. After the result of a subprocess is communicated back to the master process, this latter updates the KLD map and launches another subprocess. A brief overview of this workflow is presented in **Figure 1**.

The pseudocodes of the parallelized PoPEx algorithm and the corresponding subprocesses are given in the algorithms 1 and 2, respectively. The variable "manager" appearing within the main algorithm is a FIFO ("first in first out") queue of maximal length npar that maintains the communication toward the subprocesses. FIFO stands for queues where new elements are appended at the tail (line 9) and removed from its head (line 11). In this regard, the lines 5-10 of algorithm 1 are designed to launch npar parallel subprocesses (line 8) and retain corresponding handles (line 9). The lines 11-15 on the other hand, check the status of the first subprocess (line 12) and react accordingly. If it has terminated, their outputs are received (line 13) and the corresponding variables are updated (line 14). If it is still running however, the handle is sent to the back of the queue (line 17). The main motivation for appending the running subprocesses at the end is to rapidly detect and remove other jobs that have been completed. But as a consequence, reproducibility of the algorithm is not guaranteed. If reproducibility is crucial, we could simply change line 17 such that the processes are re-appended at the head of the queue and ensure that the first npar workers are launched before lines 11-18 may apply.

#### **Algorithm 1** PoPEx

1: **Input:** nmax, npar, N and Q 2: k ← 0 and P <sup>0</sup> ← Q 3: manager ← empty queue # FIFO queue 4: **while** k < N **do** 5: n<sup>m</sup> = length(manager) 6: **if** n<sup>m</sup> < npar **and** k + n<sup>m</sup> < N **then** 7: p ← **new** subprocess 8: p.start(Subprocess(M<sup>k</sup> , L˜ k , D(P k ||Q), nmax)) 9: manager.append(p) 10: **end if** 11: p ← manager.pop() 12: **if** p.ready() **then** 13: (**m**k+<sup>1</sup> , L(**m**k+<sup>1</sup> )) = p.get() 14: update M<sup>k</sup> , L˜ k and D(P k ||Q) 15: k ← k + 1 16: **else** 17: manager.append(p) 18: **end if** 19: **end while**

#### **Algorithm 2** Subprocess

1: **Input:** M<sup>k</sup> , L˜ k , D(P k ||Q) and nmax 2: **Output: m**k+<sup>1</sup> and L(**m**k+<sup>1</sup> ) 3: sample n<sup>k</sup> ∼ U(0, nmax) 4: HD<sup>k</sup> ← hd(n<sup>k</sup> ,M<sup>k</sup> , L˜ k , D(P k ||Q)) 5: **m**k+<sup>1</sup> ← model(HD<sup>k</sup> ) 6: L(**m**k+<sup>1</sup> ) ← likelihood(**m**k+<sup>1</sup> )

Calling "hd(n<sup>k</sup> ,M<sup>k</sup> , L˜ k , D(P k ||Q))" within a subprocess (algorithm 2, line 4), uses the above strategy to compute a set of n<sup>k</sup> hard conditioning couples. On the other hand, the methods "model(HD<sup>k</sup> )" (line 5) and "likelihood(**m**)" (line 6) are application dependent functions that generate a new model from a given set of conditioning data and compute the corresponding likelihood value.

In practice it might be unclear how to provide a suitable collection Q. Assuming that the involved modeling tool samples from the prior distribution, opens the door to approximate Q. Even before launching the PoPEx algorithm, we could produce a sufficiently large number of unconditioned models, and approximate Q by Equation (6). As the effort of generating a model is often negligible with respect to the computation of the likelihood value, the additional cost for approximating Q is unimportant. If there is a considerable effort required to generate a model, we could also consider to start with an initial guess of Q and iteratively improve it. However, changing Q along the sampling procedure may render the algorithm unstable.

#### 2.3. Posterior Prediction of Events

Solving an inverse problem, not only serves to represent the posterior measure function, but also aims to compute the (posterior) probability of events A ⊂ M. More generally, we would like to compute integrals with respect to σ, such as

$$
\mu = \int\_{\mathcal{M}} f(\mathbf{m}) d\sigma,\tag{7}
$$

where f(·) is an operator that expresses some quantity of interest. Because the model space M and the posterior measure function σ can be very complex, an analytical solution of these integrals is usually not available. The generic term importance sampling (IS) (Hesterberg, 2003; Robert and Casella, 2004; Liu, 2008; Rubinstein and Kroese, 2016) stands for a framework that provides approximations of such integrals by a weighted sum over a large number of realizations. Because it is often difficult or inefficient to directly sample from the distribution σ, importance sampling suggests instead, to draw realizations from a sampling distribution φ and weight the summands proportionally to the ratio σ(**m**)/φ(**m**). To find and use an appropriate sampling distribution φ however, can be challenging.

We propose to consider the PoPEx algorithm as a procedure, that iteratively learns and adapts the sampling distribution φ<sup>k</sup> . During this procedure, all the previously generated realizations are combined and used to localize important regions in the model space. This is known as adaptive importance sampling (AIS) and has been introduced in a econometric framework (Naylor and Smith, 1988; Oh and Berger, 1992). The generation of a new model **m**k+<sup>1</sup> is understood as to randomly draw one sample according to φ<sup>k</sup> . By construction, this distribution must include the random selection of HD<sup>k</sup> as well as the conditional modeling tool. For each model in a chain of N realizations, we compute a weight ratio w<sup>k</sup> = σ(**m**<sup>k</sup> )/φ<sup>k</sup> (**m**<sup>k</sup> ) and estimate the integral µ by

$$
\hat{\mu} = \sum\_{k=1}^{N} f(\mathbf{m}\_k) \tilde{w}\_k. \tag{8}
$$

Again, the tilde notation was used to indicate normalized weights w˜ k such that

$$
\tilde{\mathbf{w}}\_k = \frac{\mathbf{w}\_k}{\sum\_j \mathbf{w}\_j}.
$$

Several remarks are worth being considered. The computation of Equation (8) only uses normalized weights. Therefore, it is not required to know precisely the normalization constant of either σ or φ. Furthermore, the computation of the weights w<sup>k</sup> can be simplified by using the factorization of σ (c.f. Equation 1). Each likelihood value L(**m**<sup>k</sup> ) is evaluated during the PoPEx procedure, so that for constructing the weights, it is sufficient to compute the ratio

$$\frac{\rho(\mathbf{m}\_k)}{\phi\_k(\mathbf{m}\_k)}, \quad \text{for } k = 1, \ldots, N. \tag{9}$$

Roughly speaking, this ratio compares the probability measure of generating a model **m**<sup>k</sup> with and without observed data.

#### Computation of the Sampling Weights

In every iteration of the PoPEx algorithm, a set of location-value pairs is derived and imposed as hard conditioning for the next model. We will show in this section, that when using a pixelbased MPS technique to generate the models, the sampling ratios in Equation (9) only depend on the sets HD<sup>k</sup> . Let us consider HD<sup>k</sup> = {(**x**1, v1) , . . . , (**x**n<sup>k</sup> , vn<sup>k</sup> )} and distinguish two events that henceforth will be noted similarly:


Henceforth, we will only consider combinations **m** and HD<sup>k</sup> that produce strictly positive measure values (i.e., where the values on the n<sup>k</sup> locations coincide). Furthermore, we will assume that all the conditioning binomials in HD<sup>k</sup> are independent of each other. This assumption is reasonable if the conditioning locations are well separated. It is therefore necessary, that the number of conditioning points is adequate with respect to the simulation grid. The MPS processes involved behind ρ(**m**|HD<sup>k</sup> ) and φ<sup>k</sup> (**m**|HD<sup>k</sup> ) are the same. It follows that the two measure values must be equal, and thus

$$\frac{\rho(\mathbf{m})}{\phi\_k(\mathbf{m})} = \frac{\rho(\mathbf{m})}{\rho(\mathbf{m}|HD^k)} \frac{\phi\_k(\mathbf{m}|HD^k)}{\phi\_k(\mathbf{m})}.$$

Using the definition of conditional probabilities, the ratio can be rearranged as

$$\frac{\rho(\mathbf{m})}{\phi\_k(\mathbf{m})} = \frac{\rho(HD^k)}{\rho(HD^k|\mathbf{m})} \frac{\phi\_k(HD^k|\mathbf{m})}{\phi\_k(HD^k)}.$$

Standard techniques from the field of combinatorial probability allow to express all the above quantities. On the one hand, ρ(HD<sup>k</sup> |**m**) measures the probability of informing the first n<sup>k</sup> pixels according to HD<sup>k</sup> , when the sampled model is known. But knowing **m** implies that the conditioning values in HD<sup>k</sup> are given, so that we only need to compute the probability to meet the n<sup>k</sup> conditioning locations (in any order) in the very beginning of the MPS simulation. If there are n pixels in the simulation grid, ρ(HD<sup>k</sup> |**m**) is given by

$$\rho(HD^k|\mathbf{m}) = \frac{n\_k!(n-n\_k)!}{n!}.$$

On the other hand, because the hard conditioning data is assumed to be independent, ρ(HD<sup>k</sup> ) reads

$$\rho(HD^k) = \frac{n\_k!(n-n\_k)!}{n!} \prod\_{j=1}^{n\_k} \rho(\nu\_j; \mathbf{x}\_j),$$

where ρ(vj; **x**j) is the prior probability of meeting the value v<sup>j</sup> at location **x**<sup>j</sup> . In section 2.2, the simulation values have been categorized into the set {f1, . . . , fs}. For a fixed HD<sup>k</sup> , let us define an index-to-index map r = r(j) such that fr(j) identifies the category of v<sup>j</sup> . An approximation to ρ(vj; **x**j) can be obtained from Equation (6) by specifying ρ(vj; **x**j) ≈ qr(j) (**x**j). This simply suggests to find the map qr(j) that corresponds to the category of v<sup>j</sup> and extract the probability value at **x**<sup>j</sup> .

Every iteration contains the following three steps. Select a number n<sup>k</sup> , sample conditioning locations from <sup>e</sup>D(<sup>P</sup> k ||Q) and extract conditioning values by weighting the simulations according to the computed likelihood measures in L˜ k . They are performed independently such that the probability of selecting HD<sup>k</sup> , knowing **m**, is measured as

$$\phi\_k(HD^k|\mathbf{m}) = \phi\_k(n\_k) \prod\_{j=1}^{n\_k} \widetilde{D}(P^k||Q)(\mathbf{x}\_j)$$

while similarly (with the hard conditioning data points being independent)

$$\phi\_k(HD^k) = \phi\_k(n\_k) \prod\_{j=1}^{n\_k} \phi\_k(\nu\_j; \mathbf{x}\_j) \widetilde{D}(P^k || Q)(\mathbf{x}\_j).$$

The value φ<sup>k</sup> (n<sup>k</sup> ) is the probability of selecting n<sup>k</sup> while the measure φ<sup>k</sup> (vj; **x**j) is the probability to draw a model (according to L˜ k ) that presents the value v<sup>j</sup> at location **x**<sup>j</sup> . This quantity can again be approximated by using the index-to-index relation r(j) together with the categorical probabilities in P k (c.f. Equation 4), such that φ<sup>k</sup> (vj; **x**j) ≈ p k r(j) (**x**j). It is worthwhile to note that when working with discrete models, where the categories {f1, . . . , fs} have a one-to-one relation to the range of all simulation values in the training image, these approximations are exact. Finally, a computable ratio is provided by

$$\frac{\rho(\mathbf{m})}{\phi\_k(\mathbf{m})} = \prod\_{j=1}^{n\_k} \frac{q\_{r(j)}(\mathbf{x}\_j)}{p\_{r(j)}^k(\mathbf{x}\_j)}.\tag{10}$$

This expression is very practical. All the quantities in Equation (10) are assembled during the PoPEx algorithm, so that the required effort for evaluating the ratio is negligible. Moreover, the expression is easily translated into log-probabilities, what can simplify the floating-point representation of the values. Although it only represents an approximation of the true ratio, often the assumptions are not too strongly violated, and the usage of the above equation is feasible. Finally, the weights w<sup>k</sup> are computed by correcting the likelihood measure according to the hard conditioning data:

$$\omega\_k = L(\mathbf{m}\_k) \frac{\rho(\mathbf{m}\_k)}{\phi\_k(\mathbf{m}\_k)} = L(\mathbf{m}\_k) \prod\_{j=1}^{n\_k} \frac{q\_{r(j)}(\mathbf{x}\_j)}{p\_{r(j)}^k(\mathbf{x}\_j)}. \tag{11}$$

The ratios in the correction term compare the prior vs. the likelihood weighted probabilities of observing the selected values at the locations of the conditioning data. These quantities are directly available in the Q and P <sup>k</sup> maps.

#### 2.3.1. Degeneracy of the Sampling Weights

The estimator µ˜ in Equation (8) suffers from a degeneracy in the sense that the distribution of W<sup>N</sup> = {w1, . . . ,wN} may become increasingly skewed when the dimension of M grows large (Doucet et al., 2001; Robert and Casella, 2004; Liu, 2008). This means, that the weights may take small values with high probability, but occasionally become very large. Using such weights in Equation (8) would produce estimators that are dominated by very few samples. Several preventive techniques exist, and they often try to consider a reduced dimensionality in the computation of the weights (Doucet et al., 2001; Rubinstein and Kroese, 2016). The expression in Equation (10) uses a reduction technique by limiting the computation of the ratio to the hard conditioning data. But this expression only represents one part of the weights in Equation (11), so that the degeneracy problem still exists. A diagnostic that can be used to assess the skewness of the weights, is called the **effective sample size** (Owen, 2013) and defined as

$$m\_{\varepsilon}(\boldsymbol{W}^{N}) = \frac{\left(\sum\_{i=1}^{N} w\_{i}\right)^{2}}{\sum\_{i=1}^{N} w\_{i}^{2}} = \frac{N\overline{w}^{2}}{\overline{w^{2}}},\tag{12}$$

where w = (1/N) P<sup>N</sup> <sup>i</sup>=<sup>1</sup> w<sup>i</sup> and w<sup>2</sup> = (1/N) P<sup>N</sup> <sup>i</sup>=<sup>1</sup> w 2 i . There is an obvious link between n<sup>e</sup> and the variance of WN. It suffices to notice, that an estimator of the variance is obtained by w<sup>2</sup> − w 2 . Strongly varying weights would give w 2 /w<sup>2</sup> << 1 and therefore, n<sup>e</sup> << N. In general, lowering the variance increases the effective number ne. In practice, it is often hard to specify a bound under which n<sup>e</sup> is alarmingly small, because this strongly depends on the application.

We will now present a method that aims to soften the degeneracy by modifying the variance of the weights. The value of any positive weight w<sup>i</sup> > 0 can be changed by exponentiation, (wi) α , and we know that

$$\lim\_{\alpha \to 0} (w\_i)^{\alpha} = 1.$$

For a given 0 < α < 1, the variance can therefore be reduced by transforming the set W<sup>N</sup> into

$$W\_{\alpha}^{N} = \{ (\omega\_1)^{\alpha}, \dots, (\omega\_N)^{\alpha} \}.$$

It is clear that in the limit α → 0, ne(W<sup>N</sup> α ) is equal to the total number of positive weights in WN. Before computing an estimator µˆ from WN, we select an appropriate α, and use the weights in W<sup>N</sup> α instead. To make a good choice for α might depend on the application and can be challenging. We propose to define a lower bound l<sup>0</sup> and choose α such that

$$n\_{\varepsilon}(W\_{\alpha}^{N}) = \max\left\{l\_{0}, \ n\_{\varepsilon}(W^{N})\right\}.\tag{13}$$

The idea of Equation (13) is to ensure that the computation of µˆ is based on at least l<sup>0</sup> significant models. Furthermore, it assures that the growth rate of ne(WN) and ne(W<sup>N</sup> α ) are equal for ne(WN) > l0. This might be important for the asymptotic behavior of the method. Finally, we propose to use the pseudo code in the algorithm 3 to compute predictions. The computation


of α can be translated into a smooth, 1-dimensional optimization problem, and does not require a considerable effort. The most important effort usually goes into the evaluation of f(**m**i). But all the weights are known in advance and therefore we can omit computations that are associated with zero weights. Furthermore, the iterations in the algorithm 3 are independent and can be performed simultaneously in parallel.

#### 3. CASE STUDY AND RESULTS

In this section, we illustrate how PoPEx performs to solve an inverse problem with an example of a tracer test in a fluvial aquifer. We also consider the problem of quantifying the uncertainty related to the prediction of the capture zone of a pumping well in such geological environments.

#### 3.1. Problem Setup

For this example, the conceptual model for the geological heterogeneity is derived from a 3D simulation of the geological processes occurring in a fluvial plain using the FLUMY software (Lopez et al., 2009). This tool combines a process-based approach with a stochastic component. A meandering river crosses an alluvial valley with a given slope and causes erosion and deposition of sediments. Over time, the river migrates and alter the topography of the alluvial plain. This process generates complex geological patterns with a realistic and highly heterogeneous architecture. However, the thickness of the alluvial sediments is usually negligible with respect to the horizontal dimensions of the plain. It is therefore reasonable to reduce the complexity of the problem and neglect the vertical component of the flow. The parameter of interest is then the transmissivity of the aquifer.

Following this approach a 2-dimensional training image was generated. It is represented in **Figure 2**. The training image represents a domain of 5, 000 × 4, 000 m and is subdivided into 1, 000 × 800 quadratic pixels. It was obtained by vertical integration of a 3D model generated with FLUMY. The resulting field was categorized into four facies types f1, f2, f3, and f<sup>4</sup> that represent transmissivity values of 10−<sup>5</sup> , 10−<sup>3</sup> , 10−<sup>2</sup> , and 10−<sup>1</sup> (m<sup>2</sup> /s), respectively. The drainage porosity and the specific storage were fixed uniformly to 0.2 and 10−<sup>6</sup> .

We then consider a smaller area of size 1, 000 × 500 m, discretized into 200 × 100 quadratic pixels. The area hosts a pumping well at the location (750, 250) that extracts 15(l/s) of groundwater for a total duration of 20 days. The terrain is exposed to a natural slope of 4‰ in the x-direction, while the basin is closed at y = 0 (m) and y = 500 (m). Corresponding boundary conditions are: fixed head values of 4 (m) (left) and 0 (m) (right) together with no-flow on the upper and lower boundary. A constant tracer concentration of 1(kg/m<sup>3</sup> ) is enforced at (250, 250) throughout the time period.

For any given model, the subsurface water flow together with the tracer expansion is computed by the GroundWater

simulation software (Arpat and Caers, 2007). At days 2, 4, 6, 8, 10, 12, 15, and 20 the solute concentration is recorded at the pumping well. This provides a set of 8 observations and represents all the data constraints used for conditioning the inverse problem.

For modeling the spatial structures, we used the training image shown in **Figure 2** and the DeeSse multiple-point statistics software (Straubhaar, 2011). An arbitrary seed was used to generate the reference domain in **Figure 3A**. The black triangles indicate tracer injection (left, pointing right) and pumping well location (right, pointing left), respectively. The tracer concentration at the pumping well resulting from this reference domain is shown in **Figure 3B**. The red dots indicate the extracted data that was used in the inverse procedure. This means that the entire reference domain in **Figure 3A** is unknown to the PoPEx algorithm. Its only task is to represent an unknown subsurface model and provide a sparse set of data points that can be used in the sampling procedure. For constructing the likelihood measure L(**m**), we assume the observations to be independent and consider a multivariate normal distribution between the predictions **g**(**m**) = {g1(**m**), . . . , g8(**m**)} and observations **d** obs = {d obs 1 , . . . , d obs 8 } with uniform standard deviation of σ<sup>L</sup> = 0.0015(kg/m<sup>3</sup> ). This represents 1.5‰ of the concentration at the injection point, and roughly 5% of the maximal concentration at the extraction location in the reference domain (c.f. **Figure 3B**). The subscript L distinguishes the standard deviation of the likelihood measure σ<sup>L</sup> from the posterior density σ in Equation (1). Assuming an uniform and independent Gaussian behavior of **g**(**m**) around **d** obs, the density function of the likelihood measure is proportional to exp n − 1 2σ 2 L P i (gi(**m**) − d obs i ) 2 o .

# 3.2. Tracer Breakthrough Curve

The PoPEx method has been trained to run the above problem for a total of N = 20, 000 models with nmax = 25. Three random realizations are shown in **Figure 4**.

The prior facies probabilities in Q were computed from 500 unconditioned MPS models. For each realization in the PoPEx chain, the algorithm computed the tracer concentration at the pumping well, extracted 8 data points and compared them to the reference data in **Figure 3B**. Together with the weights from Equation (11), the posterior distribution of the tracer breakthrough curve can be computed. **Figure 5** shows the 2.5 −

97.5% (dashed), 25 − 75% (full) and average (blue) curves of the prior and the posterior tracer concentration at the pumping well.

The red dots indicate the extracted reference data. It is clear that for any sampling strategy a critical measure is the required computational effort, which usually is proportional to the number of samples. For this reason, all results are shown for two different stages in the sampling procedure: after 10, 000 and after 20, 000 realizations. At a first glance, both estimations of the posterior probabilities are quite similar. This may be surprising when keeping in mind that the computational effort for the second estimation is twice as high. However, it can be seen that the probability lines are steadier and smoother in the last image. Both estimations of the 50% region (between the full lines) fully embeds the red reference data and follows the shape of the reference curve very precisely. The estimation of the posterior expectation (blue) almost matches the entire curve. The higher density of data points in the first 10 days, increases the relative importance of this period with respect to the second half. Thus, it is reasonable to allow less uncertainty in the beginning of the simulation. The more generous 95% regions (between the dashed curves) are still appropriate in reproducing the shape of the reference curve. This is even more significant when realizing that the prior distribution is far from being centered around the reference curve.

## 3.3. Predict 10-Days Capture Zone

In practice, when producing freshwater from an aquifer, it is often crucial to protect the resource and determine the capture zone (Leeuwen et al., 1998). Here, we used the results of the PoPEx model chain for predicting the posterior probabilities of the 10-days capture zone. It means that for each location in the simulation grid, we computed a Bernoulli probability value for the water to be captured within 10 days. **Figure 6** shows the predicted probabilities for the prior distribution and for the posterior distribution after 10, 000 and 20, 000 iterations, respectively.

As expected, since the tracer is arriving in <10 days at the pumping well, the injection point is located within a region having a high probability to belong to the 10-days capture zone. This is already clearly visible in the map generated from 10, 000 realizations. These results show the existence of a connected path of high transmissivity between the injection point and the pumping well. However, zones of lower probability are located in between these two points. This indicates that the position of the channel is not well identified from these tracer data alone. In the reference domain, shown in **Figure 3**, we can see that the yellow facies (with the largest transmissivity value) first shows a very tight upwards bend before heading almost directly toward the extraction well. The injected tracer will mostly follow the region with the largest transmissivity. Therefore, it will not take a direct path toward the well and its arrival time will be delayed. The only information that can be extracted from the observations is the delay. From the available data, it is therefore impossible to predict precisely water pathways that are far from the tracer injection and the algorithm is correctly informing us about that uncertainty.

It is interesting that the reference capture zone (red line) slightly passes outside the 95% region in the top section of the

FIGURE 6 | Prior and posterior 10 days capture zone probabilities. The curves indicate 5, 25, 50, 75, and 95% regions, while the red lines delineates the capture zone in the reference domain.

computational domain. This should not be interpreted as an inaccuracy of the PoPEx method, because it similarly appears in the approximation of the exact solution in **Figure 7**. However, it indicates that the training image (prior knowledge) together with the available observations (likelihood function) make the upwards extension of the reference zone very unlikely in terms of the posterior probability.

#### 3.4. Convergence and Parallel Behavior

The synthetic inverse problem described above allows to compute exact predictions from a sufficiently large set of models. To do so, we put nmax = 0 and generated an empirical reference set of 1, 000, 000 unconditioned realizations. From this large ensemble, any prediction can be computed accurately by using Equation (8) together with weights such that w<sup>k</sup> = L(**m**<sup>k</sup> ) (c.f. Equation 11). As the reference set is sufficiently large, the degeneracy problem described in section 2.3 can be ignored. The resulting predictions are considered to be the exact solutions and are denoted by µex. Although the number of realizations is very large, it is not unsoiled to call these solutions to be exact. Nevertheless, these are very accurate approximations of the true solution such that, in this work, we will call them "exact prediction" or "exact solution." The corresponding prediction of the 10-days capture zone probability is shown in **Figure 7**.

Once an exact solution is available, we might be interested in the convergence speed of the PoPEx algorithm. Therefore, after each iteration k = 1, . . . , N, a prediction µˆ <sup>k</sup> is computed by using the algorithm 3 and compared to µex. As mentioned earlier, these two maps define Bernoulli probability values for each point in the computational domain. It determines whether the groundwater

at the corresponding location belongs to the 10 days capture zone or not. A convenient distance between two Bernoulli probability maps µˆ <sup>k</sup> and µex is the Jensen-Shannon divergence (JSD) [e.g., Lin (2006)] reading

$$J(\hat{\mu}\_k || \mu\_{\text{ex}}) = \frac{1}{2} \Big( D(\hat{\mu}\_k || m) + D(\mu\_{\text{ex}} || m) \Big),$$

with m = (µˆ <sup>k</sup> + µex)/2 and D being the Kullback-Leibler divergence as in Equation (5). This distance measure is computed pointwise over the simulation grid and therefore defines one distance value per pixel. A (scalar) error value is then obtained by computing the spatial average of the Jensen-Shannon divergence map.

**Figure 8** shows the evolution of error between µˆ <sup>k</sup> and µex with respect to the iteration k. For increasing the statistical significance of the results, every curve represents the average performance of 5 similar runs with different initial seed. First, the minimum number of effective modelsl<sup>0</sup> has been fixed to 100 and we varied the maximum number of conditioning values nmax ∈ {10, 25}. **Figure 8A** shows that the two convergence curves are quite similar. This is not surprising, because the PoPEx algorithm is designed to correct the influence of the hard conditioning by using Equation (10). It follows that for a reasonable hard conditioning bound, the results are not highly sensible to the choice of nmax. On the other hand, it can be seen from the blue curve that for nmax = 10 and k > 9, 000 the error reaches a "plateau." This signifies that for a certain time, the PoPEx algorithm was not able to further improve the prediction or in other words, that the method could not find sufficiently important realizations. From such behavior it can be deduced that the learning effect must be reinforced by increasing n**max**. However, what is important is that the overall convergence rate of both curves well compares with the dashed line representing k −1/2 . This is significant because if we directly sample from the posterior probability distribution σ and k is the number of samples, the Central Limit Theorem (CLT) (Durrett, 2010) predicts a convergence rate of k −1/2 . Because the error curves represent the average performance of 5 PoPEx runs, it is not surprising that they slightly fluctuate and do not reproduce the theoretical rate of k <sup>−</sup>1/<sup>2</sup> precisely.

For the second experience, we fixed nmax = 25 and varied l<sup>0</sup> ∈ {0, 25, 100, 500}. We recall that the choice of a large value for l<sup>0</sup> generally increases the effective number of weights but implies a risk to produce biased predictions. On the other hand, when the effective number of weights is too low, the predictions will be based on very few models and may be biased as well. It is therefore not surprising that for l<sup>0</sup> = 0 the approximation accuracy is very bad (green curve in **Figure 8B**). However, the remaining three convergence curves are highly similar for k ≤ 4, 000 where the magenta curve (l<sup>0</sup> = 25) reaches a "plateau" and has difficulties to further improve the approximations. As in the previous figure, the curved represent the average performance of 5 similar PoPEx runs with different initial seed. It follows that small fluctuations may arise and should not be overestimated. However, the stagnation of the curve with l<sup>0</sup> = 25 might be due out of a different reason. Whenever the parameter l<sup>0</sup> is small, the weights in W<sup>k</sup> α are more sensible to highly dominant values. This means that a model **m**k<sup>0</sup> with very large weight wk<sup>0</sup> might dominate the prediction µˆ k for many iteration k > k<sup>0</sup> and therefore, the approximation error only slightly changes. So such a behavior indicates that l<sup>0</sup> should not be too small. We can again conclude by the fact for a reasonably large l0, the overall convergence rate compares very well with the theoretical rate of k −1/2 .

The last part of the results section is dedicated to a short analysis of the parallel scalability of PoPEx. We repeat the same exercise by first using npar = 15 on a 64 CPU facility (34.4(Tflop/s)), and then changing to npar = 75 on 320 CPU's (172(Tflop/s)). Therefore, between the first and the second procedure, the computational capacity has been increased by a factor of 5. The performances will be compared by measuring the total sampling time and by a convergence analysis similar to the one in **Figure 8**. We fixed nmax = 25 and l<sup>0</sup> = 100 and ran PoPEx until 20,000 models have been sampled. All runs were performed 5 times with different initial seeds. The total runtime for the two setups was 27.51 ± 1.521[h] and 5.00 ± 0.397[h], respectively. This signifies an overall speedup factor of 5.5 ± 0.74 and therefore fully satisfies the expectations.

Considering the convergence analysis in **Figure 8** we are now interested in the speedup factor for obtaining the same approximation accuracy when predicting the 10-days capture zone probability. This means that in each iteration k = 1, . . . , 20, 000, the approximation errors are again computed by a Jensen-Shannon divergence between the prediction and the exact solution. In **Figure 9A** however, we compare the approximation error vs. the elapsed time in (s).

It can be seen that the convergence rate of both curves are very similar and the obtained gain factor highly matches the increasement of the computer resources. This becomes even more obvious in **Figure 9B**. It shows the observed speedup in time for obtaining the same approximation accuracy. This means that for any error value (y-axis) we computed the times (and the corresponding speedup factor) that were needed for reaching the considered approximation accuracy. From the relatively small statistical set of 5 chains per exercise, it is not surprising that there is a certain variability in the computations. However, it is evident that the curve significantly matches the predicted speedup factor of 5 and therefore underlines the exceptional scaling behavior of the PoPEx algorithm.

#### 4. DISCUSSION

This paper presents a fast and efficient sampling method for solving inverse problems having a complex and discrete prior. The algorithm is parallelized and scales perfectly. This means that the number of samples computed in parallel is equal to the time reduction factor without compromising the quality of the results. Every sample involves two different main processes: generate a new model and compute the corresponding likelihood value. In this regard, the main concern for using the proposed method in practice is the number of such processes that can be run simultaneously. As there are many supercomputers publicly available however, handling a significant number of computations in parallel should not be a major issue.

Some important concepts of the above algorithm have originally been introduced by Jäggli et al. (2017) where the inverse method was named Posterior Population Expansion (PoPEx). In the present paper we suggest some minor changes concerning the sampling procedure and completely reconsider the method to compute predictions. Nevertheless, we decided to keep the name of the algorithm so that whenever the terminology PoPEx is used in the following, it refers to the algorithm as presented in this paper. PoPEx is capable to handle all the four different types of uncertainty distinguished by Sagar et al. (1975): spatial heterogeneities, initial conditions, boundary conditions and sources/sinks. The only requirement for the algorithm to be efficient, is that some uncertainties are modeled by conditional simulation tools.

As illustrated in the case study, a possibility is to use Multiple Point Statistics (MPS) to produce the conditional simulations of heterogeneity. But whenever MPS tools are used, a critical issue is to select an appropriate training image. In practice, it is therefore not uncommon to hesitate about this choice. With the above method, multiple training images can be included. This corresponds to a discrete choice that needs to be formulated in the inverse problem. PoPEx can iteratively learn which image is most appropriate and provide a posterior distribution of the training image selection issue.

PoPEx has been tested based on a two dimensional meandering channel aquifer of size 1, 000 × 500 (m). A natural gradient of 4‰ and a groundwater extraction rate of 15(l/s) control the groundwater flow. Considering the high complexity of the categorical models, together with the small number of extracted data points, the method solved the inverse problem efficiently and produced accurate estimations of prediction uncertainty. After a very large computational effort, we were able to compute the exact solution and compare it with the predictions made by a PoPEx chain. It was shown empirically that the prediction converged to the exact solution very fast. The convergence speed was comparable with the theoretical rate of k <sup>−</sup>1/<sup>2</sup> predicted by the central limit theorem (where k is the number of samples). Furthermore, we demonstrated that the PoPEx results are not very sensitive to the choice of the two main input variables nmax and l0. This is very convenient, because there is no uniform criterion for their optimal choice.

In section 2, we mentioned that PoPEx can be interpreted as an adaptive importance sampler (AIS). According to Oh and Berger (1992), the sampling distribution φ<sup>k</sup> of an AIS technique should follow three properties:


The first property depends on the conditional simulation tool entrained to generate new models and is usually satisfied. Regarding the third property, it can be shown that the sampling distribution that minimizes the variance of µˆ <sup>k</sup> in Equation (8) is proportional to f ∗ σ. When working with prior distributions ρ that are fairly flat over the region where f(**m**)L(**m**) is concentrated, taking a sampling distribution proportional to f ∗L is nearly optimal (Oh and Berger, 1992). But as samples may be used to generate predictions for many different functions, PoPEx is trying to learn a sampling distribution according to the likelihood values L(**m**) (c.f. Equation 4). However, the link between L and φ<sup>k</sup> must not be too strong. Let's assume that for a sufficiently large k, the sampling distributions is approximately proportional to L r for a given power r > 1. In this case we have

$$\frac{\sigma}{\phi\_k} \propto \frac{\rho}{L^{r-1}}.$$

For a flat distribution ρ and an infinite model space M this ratio might be unbounded so that the variance of Equation (8) is not finite.

The main limitation of the PoPEx method is that the likelihood values in Equation (8) must be evaluated and represented by a floating-point number. If the dimension of the data space is very large, it may happen that the numerical likelihood values are zero for most realizations. In this case, most of the indicator functions in Equation (4) are multiplied by zero and the learning process of the method is very slow. But if the number of observations is large, it is not uncommon that they are highly correlated. This means that it might be possible to trim the data set and project the observations onto a smaller data space. In other words, a possible strategy to overcome this issue would be to analyze the set of observations, extract a smaller amount of independent information and define an appropriate likelihood function. Alternatively, the likelihood function may be written as a Gibbs field (or measure) (Winkler, 2012), i.e.,

$$L(\mathbf{m}) = \frac{1}{C} \exp\{-H(\mathbf{m})\}.$$

Such a measure is induced by a normalization constant C and an energy function H. The latter is unique up to an additive constant and therefore, for finite model spaces as well as for Gaussian distributions we may assume that H ≥ 0. Usually, for floating point operations it is easier to work with the energy H(**m**) rather than with the unnormalized Gibbs measure exp{−H(**m**)} directly. During the evolution of the PoPEx algorithm, whenever P k is computed from Equation (4), we could weight the indicator functions **1**f<sup>i</sup> (**m**j) proportional to

$$\frac{1}{1 + H(\mathbf{m}\_{\bar{j}})}$$

rather than L˜(**m**j). For the computation of ergodic predictions however, we would still need to compute the likelihood values. But considering the underlying floating point operations, it can be advantageous to learn from non-zero energy values H(**m**) in order to obtain a sufficiently large number of non-zero likelihood values L(**m**).

#### DATA AVAILABILITY

The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.

#### AUTHOR CONTRIBUTIONS

All authors conceived of the presented idea and developed the theory. CJ designed and wrote the software in order to perform the computations. All authors designed the case study, discussed the results and contributed to the final manuscript.

#### REFERENCES


#### ACKNOWLEDGMENTS

We would like to thank Przemyslaw Juda for his contributions toward the generation of the conceptual training image. This work was founded by the Swiss National Science Foundation through the grant number 153637.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Jäggli, Straubhaar and Renard. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Efficient gHMC Reconstruction of Contaminant Release History

David A. Barajas-Solano<sup>1</sup> , Francis J. Alexander <sup>2</sup> , Marian Anghel <sup>3</sup> and Daniel M. Tartakovsky <sup>4</sup> \*

<sup>1</sup> Pacific Northwest National Laboratory, Computational Mathematics, Richland, WA, United States, <sup>2</sup> Brookhaven National Laboratory, Computational Science Initiative, Upton, NY, United States, <sup>3</sup> Los Alamos National Laboratory, Computer, Computational, and Statistical Sciences Division, Los Alamos, NM, United States, <sup>4</sup> Department of Energy Resources Engineering, Stanford University, Stanford, CA, United States

We present a generalized hybrid Monte Carlo (gHMC) method for fast, statistically optimal reconstruction of release histories of reactive contaminants. The approach is applicable to large-scale, strongly nonlinear systems with parametric uncertainties and data corrupted by measurement errors. The use of discrete adjoint equations facilitates numerical implementation of gHMC without putting any restrictions on the degree of nonlinearity of advection-dispersion-reaction equations that are used to describe contaminant transport in the subsurface. To demonstrate the salient features of the proposed algorithm, we identify the spatial extent of a distributed source of contamination from concentration measurements of a reactive solute.

Edited by:

Philippe Renard, Université de Neuchatel, Switzerland ¯

#### Reviewed by:

Guillaume Pirot, Universität Lausanne, Switzerland Roseanna M. Neupauer, University of Colorado System, United States

> \*Correspondence: Daniel M. Tartakovsky tartakovsky@stanford.edu

#### Specialty section:

This article was submitted to Freshwater Science, a section of the journal Frontiers in Environmental Science

Received: 12 November 2018 Accepted: 17 September 2019 Published: 11 October 2019

#### Citation:

Barajas-Solano DA, Alexander FJ, Anghel M and Tartakovsky DM (2019) Efficient gHMC Reconstruction of Contaminant Release History. Front. Environ. Sci. 7:149. doi: 10.3389/fenvs.2019.00149 Keywords: source identification, contaminant transport, Markov Chain Monte Carlo, hybrid Monte Carlo, inverse problems, uncertainty quantification

# 1. INTRODUCTION

An accurate reconstruction of the release history of contaminants in geophysical systems is essential to regulatory and remedial efforts. These efforts rely on measurements of pollutant concentration to identify the sources and/or release history of a pollutant. Unfortunately, available concentration data are typically sparse in both space and time and are corrupted by measurement errors. Source identification and reconstruction of release history are further complicated by both spatial heterogeneity of model parameters and their insufficient characterization, although we do not consider these effects in the present work.

Detailed reviews of the historic developments and state-of-the-art in the field of inverse modeling as related to contaminant source identification are presented in Atmadja and Bagtzoglou (2001b) and Hutchinson et al. (2017). The existing approaches can be subdivided into two broad classes: deterministic and probabilistic. Deterministic approaches include, but are not limited to, Tikhonov regularization of convolution integrals (Liu and Ball, 1999; Ito and Jin, 2015), least-square estimation from analytical approximations (Butcher and Gauthier, 1994), least-square solution of an optimal control problem (Gugat, 2012), the method of quasi-reversibility (Skaggs and Kabala, 1995; Bagtzoglou and Atmadja, 2003), and the backward beam equation method (Atmadja and Bagtzoglou, 2001a; Bagtzoglou and Atmadja, 2003). These approaches provide estimates of the release history from a source of known locations and are not designed for quantifying the uncertainty associated with these estimates. The robustness of these methods is highly sensitive to measurement errors, and their mathematical formulations are often fundamentally ill-posed.

While existing probabilistic approaches, such as random walk particle tracking for the backward transport equation (Bagtzoglou et al., 1992), minimum relative entropy (Woodbury and Ulrych, 1996), and adjoint methods (e.g., Neupauer and Wilson, 1999), alleviate some of these problems, others remain. For example, these and similar methods do not take advantage of the regularizing nature of the measurement noise and, hence, are often ill-posed. Thus, the minimum relative entropy method treats concentration measurements as ensemble averages. Additionally, there are some outstanding issues with quantifying uncertainty (Neupauer et al., 2000) and the inability of many existing approaches to handle more than one observation point (Neupauer and Wilson, 2005).

Finally, most existing approaches to the reconstruction of release history are restricted to linear transport phenomena, that is, transport phenomena for which the transport equation is linear in the concentration, and thus are limited to migration of contaminants that are either conservative (all the references above) or exhibit first-order (linear) reaction rates (Neupauer and Wilson, 2003, 2004). This is because such approaches are based on either Green's functions (Skaggs and Kabala, 1994; Woodbury and Ulrych, 1996; Stanev et al., 2018) or analytically derived adjoint equations (Neupauer and Wilson, 1999, 2005). The use of Kalman filters for source identification (Herrera and Pinder, 2005) is formally limited to linear transport phenomena and Gaussian errors. While both limitations can be relaxed by employing various generalizations of the Kalman filter such as the extended and ensemble Kalman filter (e.g., Xu and Gömez-Hernández, 2016, 2018), these generalizations are known to fail if the nonlinearity is too strong. Bayesian optimization approaches (Pirot et al., 2019), accelerated by the use of Gaussian process models as surrogates, provide a promising alternative to the Kalman filter since they impose no linearity requirements.

Purely statistical approaches to history reconstruction, such as the geostatistical inversion with Bayesian updating (Snodgrass and Kitanidis, 1997) and various machine learning techniques (Vesselinov et al., 2018, 2019), are applicable to nonlinear transport. Since this is achieved by ignoring governing equations, the reconstructed release histories could have nonphysical characteristics, including negative concentrations. These problems have been alleviated by introducing additional constraints into an optimization functional and requiring the reconstructed field to be Gaussian (Michalak and Kitanidis, 2003, 2004a). Combining these geostatistical approaches with analytically derived adjoint equations (Michalak and Kitanidis, 2004b; Shlomi and Michalak, 2007) however brings back the linearity requirement.

We present an optimal reconstruction of contaminant release history that fully utilizes all available information and requires neither the linearity of governing transport equations nor the Gaussianity of the underlying fields. In section 2 we formulate the problem of reconstructing the contaminant release history from noisy observations. Section 3 introduces our general computational framework, which is further implemented in section 4 for various examples.

# 2. PROBLEM FORMULATION

# 2.1. Reconstruction of Contaminant Release History

We consider migration of a single chemically active contaminant in a porous medium ⊂ R d , d ∈ [1, 3]. We assume that reactive transport is adequately described by the advection-dispersionreaction equation with reaction term R(c):

$$\frac{\partial \mathcal{L}}{\partial t} = \nabla \cdot (\mathbf{D} \nabla \mathcal{c}) - \nabla \cdot (\mathbf{u} \mathcal{c}) - R(\mathcal{c}) + r(\mathbf{x}, t), \ \mathbf{x} \in \Omega, \ t > 0, \tag{1}$$

together with corresponding boundary conditions. Here c = c(**x**, t) is the solute concentration at point **x** and time t, **u** is the average macroscopic pore velocity, **D** is the dispersion coefficient tensor, and r(**x**, t) is the source function. Both the location and duration of the contaminant release, i.e., the source function r(**x**, t), can be unknown, but only the former source of uncertainty is treated in the computational examples of section 4, that is, we assume that r(**x**, t) = r(**x**)δ(0).

Introducing the dimensionless quantities

$$\begin{aligned} \mathbf{c}' &= \frac{c}{c\_0}, \quad \mathbf{x}'\_i = \frac{\mathbf{x}\_i}{\mathbf{x}\_0}, \quad i \in [1, d], \quad t' = \frac{t}{t\_0}, \quad \mathbf{D}' = \mathbf{D} \frac{t\_0}{\mathbf{x}\_0^2}, \\\mathbf{u}' &= \mathbf{u} \frac{t\_0}{\mathbf{x}\_0}, \quad r' = r \frac{t\_0}{c\_0}, \end{aligned}$$

and the normalized reaction term R ′ = Rt0/c0, we rewrite (1) in terms of dimensionless quantities,

$$\frac{\partial \mathcal{L}'}{\partial t'} = \nabla' \cdot \left( \mathbf{D}' \nabla' \boldsymbol{\varepsilon'} \right) - \nabla' \cdot \left( \mathbf{u}' \boldsymbol{\varepsilon'} \right) - \mathbf{R}' + \boldsymbol{r'} \qquad \mathbf{x}' \in \Omega', \quad t' > 0. \tag{2}$$

The non-dimensionalization of the reaction term is specific to its functional form. The non-dimensionalization of a particular case employed in this manuscript is discussed in section 4. For all the following discussions we will consider the dimensionless Equation (2) and we will drop the prime notation for denoting dimensionless quantities.

For the remainder of this work we assume that **u**, **D**, and the boundary conditions are known, while the source function r(**x**, t) is unknown. Our goal is to reconstruct the release history r(**x**, t) from concentration data c¯mi = ¯c(**x**m, ti) collected at points {**x**m} M m=1 at times {ti} I i=1 , and for known **u**, **D**, and boundary conditions.

Concentration measurements are corrupted by measurement errors. We assume that the measured concentrations c¯mi differ from the true concentration by an additive measurement noise, so that

$$
\bar{c}\_{mi} = c(\mathbf{x}\_m, t\_i) + \epsilon\_{mi} \tag{3}
$$

where the errors ǫmi are zero-mean Gaussian random variables with covariance E[ǫmiǫnj] = δijRmn, where E[·] is the expectation operator, δij denotes the Kroneker delta function, and the Rmn, m, n ∈ [1, M], are components of the spatial covariance matrix **R** ∈ R <sup>M</sup>×<sup>M</sup> of measurement errors. This treatment of measurement noise assumes that the measurements are well separated in time to neglect any temporal correlations, but the model can be easily extended to include temporal correlations. We use the additive error model (3) of Woodbury and Ulrych (1996) rather than the multiplicative error model of Skaggs and Kabala (1994) for the purpose of illustration only. Both models have similar effects on the accuracy of history reconstruction (Neupauer et al., 2000) and can be handled by our approach.

In a typical situation, one has prior information (or a belief) about potential sources of contamination (a region <sup>c</sup> within the flow domain ) and a time period [T<sup>l</sup> , Tu] during which the release has occurred. Examples of <sup>c</sup> include spatially distributed zones of contamination (e.g., landfills) and a collection of point sources (e.g., localized/small industrial sites or storage facilities), some of which have contributed to contamination. The lower (T<sup>l</sup> ) and upper (Tu) bounds of the release interval might represent the time when a landfill became operational and the time when contamination has first been detected, respectively. In the absence of prior information about the release occurrence, one can assume a uniform random distribution of the release in [T<sup>l</sup> , Tu] × . We allow for an arbitrary number of measurement points and for either discrete or continuous-intime measurements.

#### 2.2. Likelihood Function

To simplify the exposition, we assume a spatially distributed chemical release at time t = 0 only, i.e., r(**x**, t) = c(**x**, 0)δ(t). Given the measurements c(**x**m, ti) and the noise model (3), our goal is to determine the likelihood of a given release configuration c(**x**, 0). Unfortunately, the measurements, generally taken at later times, do not estimate directly the likelihood of a release configuration. Nevertheless, because the transport Equation (2) is deterministic, we can implicitly assess the likelihood of a given release configuration, P[c(**x**, 0)], from the probability (likelihood) of a given (computed) concentration history c(**x**, t). This likelihood can be expressed as (Alexander et al., 2005)

$$P[c(\mathbf{x}, \mathbf{0})] \sim \exp\{-\tilde{H}\_{\text{obs}}[c(\mathbf{x}, t)]\},\tag{4}$$

where H˜ obs[c(**x**, t)] is the so-called "Hamiltonian" or loglikelihood function,

$$\tilde{H}\_{\text{obs}}[c(\mathbf{x}, t)] = \frac{1}{2} \sum\_{m, n=1, i=1}^{M, I} \Delta\_{mi} (\mathbf{R}^{-1})\_{mn} \Delta\_{mi},\tag{5}$$

where 1mi ≡ c(**x**m, ti) − ¯c(**x**m, ti), and **R** is the covariance matrix of measurement errors, as defined in section 2.1. Since (2) uniquely determines the evolution of the solute concentration from its initial state c(**x**, 0), the Hamiltonian (5) is a nonlinear functional of the initial conditions c(**x**, 0), i.e., H˜ obs[c(**x**, t)] = Hobs[c(**x**, 0)].

This formulation assumes that the measurement errors ǫmi are Gaussian and uncorrelated with the state of the system. Other distributions of the measurement noise and the stochasticity of governing equations can be handled as well (Alexander et al., 2005). The Hamiltonian for stochastic systems, which can represent, e.g., uncertain hydraulic conductivity and flow velocity that are treated as random fields, can be reformulated to explicitly include the transport equation (Alexander et al., 2005; Archambeau et al., 2007).

The contribution of highly fluctuating or unphysical initial conditions is reduced by adding a regularization term Hreg[c(**x**, 0)] to the observation Hamiltonian (5) and replacing the likelihood function (4) with

$$P[c(\mathbf{x}, \mathbf{0})] \sim \exp\{-H[c(\mathbf{x}, \mathbf{0})]\},\tag{6a}$$

where

$$H[c(\mathbf{x},0)] = H\_{\text{obs}}[c(\mathbf{x},0)] + \gamma H\_{\text{reg}}[c(\mathbf{x},0)],\tag{6b}$$

and the weight γ > 0 is a tuning hyperparameter. The regularization term Hreg is equivalent to a Bayesian log-prior distribution on the initial condition. The selection of an appropriate regularization Hamiltonian is particularly important for problems in which the observation Hamiltonian does not specify a proper probability distribution forc(**x**, 0) due to a lack of measurements. For a one-dimensional source profile, the squared gradient of the initial spatial profile can play the role of the regularization Hamiltonian. In higher dimensions, one can use a thin-plate penalty functional (Wahba, 1990).

A conceptual difference between our approach and maximum likelihood methods is worth emphasizing. Rather than sampling the Gibbs distribution exp{−H[c(**x**, 0)]}, as we do here, maximum likelihood methods minimize the Hamiltonian (6b) over c(**x**, 0). While standard maximum likelihood methods determine the mode and variance of the posterior distribution under a Gaussian approximation, the approach described below can be used to determine the mean and higher-order statistics, and it is valid even when the posterior distribution is multi-modal.

#### 3. MONTE CARLO SAMPLING

In principle, one can sample the Gibbs distribution by using Markov-chain Monte Carlo (MCMC) (e.g., Michalak and Kitanidis, 2003). However, quite often, the disadvantage of local MCMC-based methods is their slow convergence. To improve convergence, we apply a Generalized Hybrid Monte Carlo (gHMC), which enables one to efficiently sample release configurations c(**x**, 0) with probability given by (6a).

## 3.1. Hybrid Monte Carlo (HMC)

Hybrid Monte Carlo (HMC) refers to a class of methods that combine Hamiltonian molecular dynamics with Metropolis-Hastings Monte Carlo simulations (see Neal, 1993 for an introductory survey). Specifically, a time-discretized integration of the molecular dynamics equations is used to propose a new configuration, which is then accepted or rejected by the standard Metropolis-Hastings Monte Carlo criteria. The change in total energy serves as the acceptance/rejection criteria.

In HMC one treats the log-likelihood function H in (6b) as the configurational Hamiltonian for a system of N "particles," each of which has unit mass and generalized coordinates q1, q2, . . . , qN. Each of these generalized positions corresponds to the solute concentration c(**x**, t) at a space-time point (**x**, t). In the following, the particle positions correspond to the initial concentration at time t = 0, e.g., q<sup>i</sup> = c(x<sup>i</sup> , 0), x<sup>i</sup> = iL/(N−1), i = 0, . . . , N−1 for a contaminant release over the one-dimensional domain [0, L].

At any given time, the state of the system is completely described by (**q**, **p**), where **q** = {qi} N i=1 and **p** = {pi} N i=1 . Here, the momentum of the ith particle, p<sup>i</sup> , is dqi/dτ = p<sup>i</sup> , where τ is the fictitious time of the molecular dynamics. The kinetic energy of the system of N particles is given by

$$H\_K(\mathbf{p}) = \frac{1}{2} \sum\_{i=1}^N p\_i^2,\tag{7}$$

and the total Hamiltonian of the system is

$$
\hat{H}(\mathbf{q}, \mathbf{p}) = H(\mathbf{q}) + H\_K(\mathbf{p}).\tag{8}
$$

It follows that the Hamiltonian dynamics are given by

$$\frac{\mathrm{d}q\_i}{\mathrm{d}\tau} = p\_i, \quad \frac{\mathrm{d}p\_i}{\mathrm{d}\tau} = F\_i, \quad F\_i = -\frac{\partial H}{\partial q\_i}, \tag{9}$$

where F<sup>i</sup> is the force acting on the ith particle, that is to be computed from the governing transport equation. During the time interval 1τ , the system evolves from its current state (**q**, **p**) to a new state (**q**˜, **p**˜), which can be computed by discretizing the Hamiltonian dynamics (9). An example of such a discretization is the standard leapfrog method, which is written as

$$
\tilde{q}\_i = q\_i + \Delta \mathbf{r} \, p\_i + \frac{\Delta \mathbf{r}^2}{2} F(\mathbf{q}) \tag{10a}
$$

$$
\tilde{p}\_i = p\_i + \frac{\Delta \tau}{2} \{ F(\mathbf{q}) + F(\tilde{\mathbf{q}}) \}, \tag{10b}
$$

for i = 1, . . . , N. Multiple leapfrog steps, i.e., multiple applications of Equation (10), can be performed. For the hybrid Monte Carlo method, the number of leapfrog steps S is larger than one. For S = 1 we obtain the Langevin Monte Carlo method (Neal, 1993). This completes the "proposal part" of HMC.

The remaining part of HMC consists of deciding whether to accept or reject the new state (**q**˜, **p**˜). This is done by the Metropolis-Hastings sampling strategy, according to which the new state (**q**˜, **p**˜) is accepted with probability

$$Q = \min\left\{1, \exp\{\hat{H}(\mathbf{q}, \mathbf{p}) - \hat{H}(\tilde{\mathbf{q}}, \tilde{\mathbf{p}})\}\right\}.\tag{11}$$

The momenta variables **p**˜ are resampled after each acceptance/rejection of the position variables according to a Gaussian distribution of independent variables ∼ exp(−HK). The time-marching and acceptance/rejection process represents one step in the Markov chain, and therefore one Monte Carlo sample. It is important to note that the update from (**q**, **p**) to (**q**˜, **p**˜) does not conserve energy as a result of the time discretization. The extent to which energy is not conserved is controlled by the time step 1τ . Detailed balance is achieved if the configuration obtained after evolving several steps is accepted with probability Q in (11). Thus, the Metropolis step corrects for time discretization errors.

As we have noted before, the method samples from the multivariate target distribution, ∼ exp(−Hˆ ), by computing a Markov chain. Sampling from this density allows us to estimate the mean state (reconstructed initial configuration) and its variance. Markov chain sampling from the posterior distribution involves a transient phase, in which we start from some initial state and simulate the Markov chain for a period long enough to reach its stationary density, followed by a sampling phase, in which we assume that the Markov chain visits states from this stationary density. If the chain has converged and the sampling phase is long enough to cover the entire posterior distribution, accurate inferences about any quantity of interest are made by computing the sample mean, variance, and other desired statistics (Landau and Binder, 2009).

#### 3.2. Generalized Hybrid Monte Carlo (gHMC)

In many cases, the generalized hybrid Monte Carlo (gHMC) of Toral and Ferreira (1994) can improve the efficiency of standard HMC by means of the nonlocal sampling strategy described in some detail below. For **q**, **p** ∈ R <sup>N</sup>, gHMC replaces the Hamiltonian dynamics in (9) with a more general formulation,

$$\frac{d\mathbf{q}}{d\tau} = \mathbf{A}\mathbf{p}, \quad \frac{d\mathbf{p}}{d\tau} = \mathbf{A}^\top \mathbf{F}, \tag{12}$$

where **A** is a linear operator represented by a R <sup>N</sup>×<sup>N</sup> matrix. The corresponding leapfrog discretization is then given by

$$\mathbf{q}(\delta \mathbf{r}) = \mathbf{q} + \delta \mathbf{r} \mathbf{A} \mathbf{p} + \frac{\delta \mathbf{r}^2}{2} \mathbf{A} \mathbf{A}^\top \mathbf{F}[\mathbf{q}],\tag{13a}$$

$$\mathbf{p}(\delta \tau) = \mathbf{p} + \mathbf{A}^{\top} \frac{\delta \tau}{2} \{ \mathbf{F}[\mathbf{q}] + \mathbf{F}[\mathbf{q}(\delta \tau)] \}. \tag{13b}$$

The two formulations, (9) and (12), are identical if **A** is the identity matrix. The goal is to find a matrix **A** that leads to a significant reduction of the temporal correlations of the Markov chain without appreciably increasing the cost of the update due to matrix-vector multiplications.

In order to illustrate how the introduction of the matrix **A** can help to reduce the correlations of the Markov chain, consider the problem with **q** ∈ R <sup>N</sup> and Hamiltonian

$$H(\mathbf{q}) = \frac{1}{2} (\mathbf{q} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{q} - \boldsymbol{\mu}),$$

so that the forcing is given by −6 −1 (**q** − µ). For the case **A** = **I**, it can be seen from (13) that the different components q<sup>i</sup> are updated at different rates, given by the covariance matrix 6. For a given δτ , some components would be updated with long steps, while others would be updated with shorter steps.

The disadvantage of such a configuration is that too long of a step for a certain component might increase the total Hamiltonian enough to produce a rejection according to (11). If the rejection rate of the chain is too large, one would have to reduce δτ , which affects all components. The issue of the rejection rate would be addressed, but then some components would be updated with very short steps, increasing their correlation. To solve this issue, one can remove the appearance of 6 altogether by choosing **A** such that **AA**⊤6 = **I**. If 6 is a valid covariance matrix, this is trivially accomplished by choosing

**A** as the Cholesky factor of 6. We therefore refer to **A** as the "acceleration" matrix.

Unfortunately, in general, the Hamiltonian (6b) for our problem does not have a simple bilinear form for which an appropriate selection of acceleration matrix **A** can be derived. Nevertheless, it stands to reason that one can build acceleration matrices for more complex systems to partially reduce the correlation of the Markov chain.

#### 4. NUMERICAL EXPERIMENTS

In this section we illustrate how the framework outlined above can be applied to source identification problems. In the first case we study the implementation of HMC to contaminant transport problems with a nonlinear reaction term for different configurations of observations. In the second case we study a linear advection-dispersion problem and explore possible selections of the gHMC acceleration matrix.

## 4.1. Discrete in Space, Continuous in Time Measurements

We consider a one-dimensional transport with uniform velocity u and dispersion coefficient D. We employ as the regularization operator the ℓ2-norm of the gradient of the initial spatial distribution. Furthermore, we assume that the time of the contaminant release is precisely known and we impose no constraints on the total mass of the released contaminant. The measurements are taken continuously over the time interval (0, T) at a subset J of discrete locations in the spatial domain. We assume that the measurement errors are uncorrelated in space and time and have the same variance σ 2 ǫ at every point. This setup represents observations of contaminant breakthrough curves at a number of sampling locations.

The Hamiltonian corresponding to this setup is

$$H = \frac{1}{2\sigma\_{\epsilon}^{2}} \int\_{0}^{T} \sum\_{j \in I} \left[ c(\mathbf{x}\_{j}, t) - \bar{c}(\mathbf{x}\_{j}, t) \right]^{2} \, \mathrm{d}t + \mathcal{y} \int\_{\Omega} |\nabla c(\mathbf{x}, 0)|^{2} \, \mathrm{d}x. \tag{14}$$

It is evaluated, together with its sensitivity with respect to the initial condition, by using a method-of-lines discretization of the concentration field c(**x**, t). Once the governing equation has been discretized into a system of ordinary differential equations (ODEs), one can compute the sensitivity ∇**q**H via the adjoint sensitivity method (Cao et al., 2003; Li and Petzold, 2004). The disadvantage of this approach is that it incurs two levels of numerical error: the integration error of the forward problem, which affects the initial condition of the adjoint problem, and the integration error of the backward problem. If these errors are significant, both the quality of the estimator and the rejection rate of the Markov chain can be compromised. Reducing the error requires one to decrease the time step used for integration in both directions, which would increase the computational cost per leapfrog step.

To partially alleviate this problem, we use a single-step ODE integration scheme for the forward problem and compute the sensitivity of H with respect to the initial condition via multiple applications of the chain rule (Daescu et al., 2000). Let **c** i (i = 0, . . . ,I) be a vector of discretized states evaluated at time t = iT/I and **c**˜ i be a vector of the measurements at time t<sup>i</sup> in the elements corresponding to the J measurements' locations and zeros in the other elements. Let **C** be a diagonal matrix with ones on the diagonal elements corresponding to the J subset of measurement locations and zeros in all other locations. We use this notation to rewrite the observation Hamiltonian and the sensitivity as

$$H\_{\rm obs}(\mathbf{c}, \mathbf{q}) = \frac{1}{2\sigma\_{\epsilon}^{2}} \Delta t \sum\_{i=1}^{I} (\mathbf{c}^{i} - \tilde{\mathbf{c}}^{i})^{\top} \mathbf{C} (\mathbf{c}^{i} - \tilde{\mathbf{c}}^{i}),$$

and

$$\nabla\_{\mathbf{q}} H\_{\text{obs}} = \frac{1}{\sigma\_{\epsilon}^{2}} \Delta t \sum\_{i=1}^{I} \left(\frac{\mathbf{d} \mathbf{c}^{i}}{\mathbf{d}} \mathbf{q}\right)^{\top} \mathbf{C} (\mathbf{c}^{i} - \tilde{\mathbf{c}}^{i})\_{\*}$$

respectively, where d**c** i /d**q** denotes the Jacobian of **c** <sup>i</sup> with respect to **q**. Using the chain rule, the sensitivity ∇**q**Hobs can be rewritten as

$$\nabla\_{\mathbf{q}} H\_{\mathrm{obs}} = \frac{1}{\sigma\_{\mathbf{q}}^{2}} \Delta t \left(\frac{\mathrm{d}\mathbf{c}^{1}}{\mathrm{d}\mathbf{q}}\right)^{\top} \left[\mathbf{C}(\mathbf{c}^{1} - \tilde{\mathbf{c}}^{1}) + \left(\frac{\mathrm{d}\mathbf{c}^{2}}{\mathrm{d}\mathbf{c}^{1}}\right)^{\top} \left[\mathbf{C}(\mathbf{c}^{2} - \tilde{\mathbf{c}}^{2}) + \cdots \right]\right].$$

This implies that the sensitivities can be evaluated by repeatedly computing products of the form (d**c** i+1 /d**c** i ) <sup>⊤</sup>**u**. If these products can be computed exactly, then this approach provides the exact sensitivity of the (space-time discretized) system, which is useful for problems with costly forward and backward solutions. The disadvantage of this approach is that it is highly applicationspecific and restricts the selection of ODE solvers to a specific family. Details of the implementation of this discrete sensitivity analysis approach are presented in **Appendix**.

We test this formulation on a one-dimensional transport problem defined in the domain [0, L], L = 1 with constant velocity u, dispersion coefficient D, and the reaction model (Lichtner and Tartakovsky, 2003)

$$R(c) = 2k(c^2 - c\_{eq}^2),\tag{15}$$

corresponding to a nonlinear heterogeneous (precipitation/ dissolution) reaction with equilibrium concentration ceq. Here k denotes the dimensionless kinetic rate constant normalized by porosity, given by k = ˆk t<sup>0</sup> c0, where ˆk is the kinetic rate constant normalized by porosity, with dimensions of inverse concentration times inverse time, and t<sup>0</sup> and c<sup>0</sup> are the time and concentration scales defined in section 2.1. The parameter values are set to D = 1.0, u = 50.0, ceq = 0.4, and k = 1.0. Boundary conditions are dc/dx = 0 at x = 0, L. The transport equation is discretized employing a finite-volumes scheme consisting of N = 128 cells of uniform size 1x. Concentration measurements are taken at the M = 3 spatial locations x = L/2, x = 3L/4, and x = L over the time period (0, 2.5 × 10−<sup>2</sup> ) (**Figure 1**). The standard deviation of these measurements is set to σ<sup>ǫ</sup> = 0.02.

shown in dashed.

standard deviation.

The release configuration is inferred using the HMC scheme, carried out with hybrid timestep 1τ = 0.17, number of leapfrog steps S = 5, and regularizing parameter γ = 0.05. A total of 1 × 10<sup>5</sup> samples of the release profile were retained after the burn-in period, which are employed to compute the Monte Carlo estimates c¯<sup>0</sup> and σˆc<sup>0</sup> of the posterior mean and standard deviation, respectively. These estimates are shown in **Figure 2**. In particular, **Figure 2A** shows the posterior mean estimate c¯<sup>0</sup> compared to the true release profile, c0. We also show the 95% confidence interval of the posterior mean estimate. It can be seen that the HMC scheme is able to infer the main features of the initial condition, namely the location of the release and the total mass of contaminant released. For comparison, we also compute the Bayesian maximum a posteriori (MAP) point estimate cMAP of the release profile, also presented in **Figure 2A**. The MAP is given by

$$c\_{\mathrm{MAP}} = \underset{c}{\mathrm{arg\,min}} \, H[\mathcal{c}],$$

where H[·] is the Hamiltonian given in (6b). MAP estimation is similar to the method of Bayesian global optimization (BGO) (Pirot et al., 2019) in that both aim to minimize the data misfit. BGO yields an estimate guaranteed to be the global minima over the search space, while MAP may converge toward local minima of the data misfit function. It can be seen that the MAP estimate and the posterior mean estimate are similar, although in general they need not coincide, as they correspond to different statistics. We note that, unlike MAP estimation and Bayesian global optimization, the HMC method is not limited to point estimates and can be used to quantify the uncertainty in the reconstruction. Nevertheless, MAP and Bayesian global optimization estimates are useful when quantifying uncertainty is not critical as their computational cost is smaller than that of HMC.

**Figure 2B** shows the posterior standard deviation. It can be seen that the posterior standard deviation is large, which is due to the dearth of data available and the ill-posedness of the inversion problem. We also note that the posterior standard deviation is the largest for x = L. Due to the strong advective velocity together with the outflow boundary condition, c0(L) is only informed by the observations at x = L at early times.

#### 4.2. Application of gHMC to Linear Transport

In order to study the construction of an acceleration matrix **A** appropriate for contaminant transport, we consider a 1-D advection-dispersion (no reaction) problem

$$\frac{\partial \mathcal{L}}{\partial t} + \mu \frac{\partial \mathcal{L}}{\partial \mathbf{x}} = D \frac{\partial^2 \mathcal{L}}{\partial \mathbf{x}^2}, \quad \mathbf{x} \in [0, 2\pi], \quad t = \langle 0, T \rangle, \tag{16}$$

with uniform coefficients u and D. This equation is subject to the periodic boundary condition

$$c(0, t) = c(2\pi, t),\tag{17}$$

and (unknown) initial condition

$$c(\mathbf{x}, \mathbf{0}) = c\_0(\mathbf{x}). \tag{18}$$

Frontiers in Environmental Science | www.frontiersin.org

Similar to the case studied in section 4.1, available concentration data consist of a set of measurements continuous in time on the interval (0, T) collected at a subset J of the discrete locations x<sup>j</sup> , cobs,<sup>j</sup> = cobs(x<sup>j</sup> , t). The measurements are subject to space-time uncorrelated additive errors of equal variance σ 2 ǫ .

#### 4.2.1. Observation Hamiltonian

The state variable c(x, t) is discretized into N functions cj(t) = c(x<sup>j</sup> , t), where x<sup>j</sup> = 2πj/N, j = 0, . . . , N − 1 are N equidistant nodes along the domain [0, 2π). We define the measurement Hamiltonian as

$$H\_{\rm obs} = \frac{1}{2\sigma\_{\epsilon}^{2}} \sum\_{j \in J} \int\_{0}^{T} \left[ c\_{j}(t) - c\_{\rm obs,j}(t) \right]^{2} \,\mathrm{d}t,\tag{19}$$

which defines the likelihood of the measurements given an initial release vector **q** with components q<sup>j</sup> = c0,<sup>j</sup> . The solution to (16)–(18) can be represented in terms of its discrete Fourier transform (DFT)

$$\hat{c}\_{k} = \frac{1}{N} \sum\_{j=0}^{N-1} c\_{j} e^{-ik\mathbf{x}\_{j}}, \quad k = -N/2, \dots, N/2 - 1,\tag{20}$$

which defines the N Fourier modes cˆ<sup>k</sup> . The backward or inverse transform is given by

$$c\_j = \sum\_{k=-N/2}^{N/2-1} \hat{c}\_k e^{ik\mathbf{x}\_j}, \quad j = 0, \dots, N-1. \tag{21}$$

Let **c** denote a vector of discrete values c<sup>j</sup> and **c**ˆ denote its DFT. Then, (20) and (21) can be rewritten as

$$
\hat{\mathbf{c}} = \frac{1}{N} \mathcal{F} \mathbf{\hat{c}}, \quad \mathbf{c} = \mathcal{F}^\* \hat{\mathbf{c}}, \tag{22}
$$

where F is the DFT matrix whose elements are

$$\mathcal{F}\_{\mathbb{P}q} = \omega^{(\mathbb{P}-N/2)q}, \quad \omega = e^{-2\pi i/N} \tag{23}$$

and (·) <sup>∗</sup> denotes the Hermitian adjoint. By projection, (16) is discretized into the set of uncoupled ODEs for the Fourier modes

$$\frac{\partial \hat{c}\_k(t)}{\partial t} = -(Dk^2 + iku)\hat{c}\_k, \quad k = -N/2, \dots, N/2 - 1, \dots$$

with initial conditions cˆ<sup>k</sup> (0) = ˆq<sup>k</sup> , where the qˆ<sup>k</sup> , k = −N/2, . . . N/2−1 are the components of **q**ˆ, the DFT of **q** ≡ **c**(0). The solution to the ODEs is then given by

$$
\hat{c}\_k = \hat{q}\_k \exp\left\{-(Dk^2 + iku)t\right\}.\tag{24}
$$

Substituting (22) and (24) into (19) yields the following expression for the measurement Hamiltonian:

$$\begin{split} H\_{\mathrm{obs}} &= \frac{1}{2\sigma\_{\epsilon}^{2}} (\hat{\mathbf{q}} - \hat{\mathbf{q}}\_{\mathrm{obs}})^{\*} \left( \int\_{0}^{T} \mathbf{B}^{\*} \mathcal{F}\_{\!\!/\!T} \mathbf{\hat{z}}\_{\!\!/\!T}^{\ast} \mathbf{B} \, \mathrm{d}t \right) (\hat{\mathbf{q}} - \hat{\mathbf{q}}\_{\mathrm{obs}}), \\ &= \frac{1}{2\sigma\_{\epsilon}^{2}} (\hat{\mathbf{q}} - \hat{\mathbf{q}}\_{\mathrm{obs}})^{\*} \hat{\mathbf{G}}\_{\mathrm{obs}} (\hat{\mathbf{q}} - \hat{\mathbf{q}}\_{\mathrm{obs}}), \end{split} \tag{25}$$

where **q**ˆ obs is the DFT of **c**obs(0), <sup>F</sup><sup>J</sup> corresponds to the J columns of F, **B**(t) is a diagonal matrix with elements Bkk = exp[−(Dk<sup>2</sup> + iku)t], and **G**ˆ obs is a Hermitian (semi)positive definite matrix. The measurement Hamiltonian specifies a multivariate normal distribution for **q**ˆ, and given that **q** and **q**ˆ are related via a linear transformation, it follows that the measurement Hamiltonian specifies a multivariate normal distribution for **q**.

If the measurements are available at every node of the computational domain, i.e., if F<sup>J</sup> = F, then FF<sup>∗</sup> = N**I** and (25) simplifies to

$$H\_{\rm obs} = \frac{N}{2\sigma\_{\epsilon}^{2}} (\hat{\mathbf{q}} - \hat{\mathbf{q}}\_{\rm obs})^{\*} \left( \int\_{0}^{T} \mathbf{B}^{\*} \mathbf{B} \, \mathrm{d}t \right) (\hat{\mathbf{q}} - \hat{\mathbf{q}}\_{\rm obs})\_{\*}$$

which is equivalent to

$$H\_{\rm obs} = \frac{N}{2\sigma\_{\epsilon}^{2}} \sum\_{k=-N/2}^{N/2-1} |\hat{q}\_{k} - \hat{q}\_{\rm obs,k}|^{2} \hat{g}\_{k} \tag{26}$$

where the coefficients gˆ<sup>k</sup> are given by

$$\begin{aligned} \hat{\mathbf{g}}\_k &= \int\_0^T |\exp\left\{- (Dk^2 + ik\mu)t\right\}|^2 \,\mathrm{d}t \\ &= \int\_0^T \mathbf{e}^{-2Dk^2t} \,\mathrm{d}t = \frac{1 - \exp(-2Dk^2T)}{2Dk^2} .\end{aligned}$$

Note that all coefficients gˆ<sup>k</sup> are real, symmetric (gˆ<sup>k</sup> = ˆg−<sup>k</sup> ), and depend only on the dispersion coefficient D.

It follows from (25) that

$$\begin{split} \nabla\_{\mathbf{q}} H\_{\mathrm{obs}} &= \frac{1}{\sigma\_{\epsilon}^{2}} \mathcal{F}^{\*} \hat{\mathbf{G}}\_{\mathrm{obs}} (\hat{\mathbf{q}} - \hat{\mathbf{q}}\_{\mathrm{obs}}) = \mathcal{F}^{\*} \hat{\mathbf{G}}\_{\mathrm{obs}} \mathcal{F} (\mathbf{q} - \mathbf{q}\_{\mathrm{obs}}) \\ &= \mathbf{G}\_{\mathrm{obs}} (\mathbf{q} - \mathbf{q}\_{\mathrm{obs}}) \end{split}$$

where **G**obs = F∗**G**ˆ obsF. That brings the forcing into the form **F**[**q**] = −6 −1 (**q** − µ) required by our analysis in section 3.2, which suggests a possibility of computing the acceleration matrices as **AA**<sup>∗</sup> = **G** −1 obs. Unfortunately this is not generally feasible, because the matrix **G**obs is singular unless the measurements are taken at every node of the domain. The singularity of **G**obs implies the singularity of the multivariate normal distribution of **q**ˆ given by Hobs, which means the distribution is concentrated in a r-dimensional subspace of C N, r < N. Since **q** results from a linear transformation of **q**ˆ, the multivariate normal distribution of **q** is also degenerate. This implies that there is a linear subspace of configurations **q** for which Hobs does not assign a probability, and therefore cannot be identified.

An empirical study of the SVD decomposition **G**ˆ obs = **USV**<sup>∗</sup> of the matrix **G**ˆ obs computed for the example of section 4.2.5 provides some insight into features of the degenerate distribution of **q** defined by the observations. Specifically, the vectors forming a basis for ker **G**ˆ obs have negligible terms associated with the lower Fourier modes of **q**, i.e., |Vk,<sup>j</sup> | ≈ 0 for small |k| and rank **G**ˆ obs < j. This implies that the lower frequency components of **q** fall mostly on the subspace of identifiable configurations. In general, |Vk,<sup>j</sup> | 6= 0 for high |k| and rank **G**ˆ obs < j, which implies that in general high frequency features cannot be identified.

#### 4.2.2. Regularization Hamiltonian

The regularization Hamiltonian extends the distributions of **q** and **q**ˆ in order to make them well-defined. After a realspace discretization, and accounting for the periodic boundary conditions (17), the ℓ2-norm regularization Hamiltonian takes the form

$$H\_{\rm reg} = \boldsymbol{\chi} \,\mathbf{q}^{\top} \mathbf{G}\_{\rm reg} \mathbf{q},\tag{27}$$

where γ is a regularization hyperparameter and **G**reg is the circulant matrix

$$\mathbf{G}\_{\text{reg}} = \frac{1}{\Delta x} \begin{bmatrix} 2 & -1 & & & -1 \\ -1 & 2 & -1 & & \\ & \ddots & \ddots & \ddots & \\ & & -1 & 2 & -1 \\ -1 & & & -1 & 2 \end{bmatrix},$$

or **G**reg = circ{**r**} <sup>⊤</sup>, where **r** = (2, −1, 0, . . . , 0, −1)⊤/1x and 1x = 2π/N. The matrix **G**reg extends the probability distribution by assigning a high energy (low probability) to configurations with large high-frequency components. To demonstrate this, we rewrite **G**reg as

$$\mathbf{G}\_{\text{reg}} = \text{circ}\{\mathbf{r}\}^{\top} = \mathcal{F}^{\*} \text{diag}\{\hat{\mathbf{r}}\} \mathcal{F}, \qquad \hat{r}\_{k} = \frac{1}{\pi} \left[ 1 - \cos\left(\frac{2\pi k}{N}\right) \right] \mathbf{J}$$

where **r**ˆ is the DFT of **r**. The components rˆ<sup>k</sup> of vector **r**ˆ increase with frequency k, with the zeroth frequency giving rise to rˆ<sup>0</sup> = 0. The latter is to be expected since the regularization operator does not affect the observability of the zeroth frequency, which corresponds to the average of the initial release.

Note that a Fourier-space discretization of the regularization Hamiltonian leads to a similar bilinear form for **G**reg, with rˆ<sup>k</sup> = 2πk 2 /N 2 . Indeed, these rˆ<sup>k</sup> have a similar asymptotic behavior as k → 0.

#### 4.2.3. Acceleration Matrix

For the full Hamiltonian H = Hobs +Hreg, the forcing is given by

$$\mathbf{F[q]} = -\hat{\mathbf{G}}\_{\text{obs}}(\hat{\mathbf{q}} - \hat{\mathbf{q}}\_{\text{obs}}) - \hat{\mathbf{G}}\_{\text{reg}}\hat{\mathbf{q}} = -\mathbf{G}\_{\text{obs}}(\mathbf{q} - \mathbf{q}\_{\text{obs}}) - \mathbf{G}\_{\text{reg}}\mathbf{q} \tag{28}$$

where **G**obs = F∗**G**ˆ obsF and **G**reg = F∗**G**ˆ regF. This suggests that choosing the acceleration matrix **A**, such that **AA**<sup>∗</sup> (**G**obs + **G**reg) = **I**, would reduce the correlation of the Markov chain. Since **G**obs and **G**reg are Hermitian (semi)positive definite, their sum **G** = **G**obs+**G**reg is at least Hermitian (semi)positive definite. In fact, **G** is a full rank matrix and thus can be factorized via a Cholesky decomposition **G** = **S** <sup>⊤</sup>**S**. The matrix **A**, defined by **AA**∗**G** = **I**, is then given by

$$\mathbf{A} = \mathbf{A}\_1 = \mathbf{S}^{-1}, \quad \mathbf{S}^\top \mathbf{S} = \mathbf{G}. \tag{29}$$

The added cost of computing the acceleration matrix **A** is the Cholesky factorization cost, and the leapfrog scheme for gHMC incurs four matrix-vector products. For dense matrices, these costs are O(N 3 ) and O(N 2 ), respectively.

An advantage of the Cholesky factorization is that the vector of momenta **p** in (13) can be chosen as real and multivariate normal, with zero mean and identity covariance matrix. A drawback is its relatively high cost per step in the Markov chain. Moreover, the matrix **G** becomes more poorly conditioned as γ → 0, which affects the stability of the Cholesky decomposition.

A less computationally costly alternative for the construction of **A** is to employ the following heuristic: Instead of using the full correlation matrices in Fourier space, **G**ˆ obs and **G**ˆ reg, to define **G**, we approximate it as **G** ≈ F<sup>∗</sup> diag{¯**g**}F, where **g**¯ is the vector with components g¯<sup>i</sup> = {**G**ˆ obs + **G**ˆ reg}ii. This approximation allows one to factorize **G** as **G** ≈ **G**¯ = F∗**DD**∗F, where **D** = diag{(**g**¯) 1/2 } with the square root understood as element-wise. This argument suggests that the acceleration matrix **A** can be constructed as

$$\mathbf{A} = \mathbf{A}\_2 = \frac{1}{N} \mathcal{F}^{\mathbf{s}} \mathbf{D}^{-1}, \quad \mathbf{D} = \text{diag}\{ (\bar{\mathbf{g}})^{1/2} \},$$

$$\bar{\mathbf{g}} = \text{diag}\{ \hat{\mathbf{G}}\_{\text{obs}} + \hat{\mathbf{G}}\_{\text{reg}} \}, \tag{30}$$

which gives

$$\mathbf{A}\mathbf{p} = \frac{1}{N}(\mathcal{F}^\*\mathbf{D}^{-1}\mathbf{p}), \quad \mathbf{A}^\*\mathbf{F} = \mathbf{D}^{-1}\left(\frac{1}{N}\mathcal{F}\mathbf{F}\right).$$

Note that we have replaced the transpose of **A** with its Hermitian transpose due to the complex nature of the DFT. This implies that the transpose in (12) and (13) must be replaced with a Hermitian transpose, and that in order to guarantee that **q** ∈ R <sup>N</sup> we must generalize the momenta such that **p** ∈ C <sup>N</sup>. Once the acceleration matrix **A** in (30) is constructed, products of the form **Ap** and **A** <sup>∗</sup>**F** can be computed using DFTs. The computational cost per leapfrog step is reduced from four matrix-vector products of cost ∼ O(N 2 ) to four of cost ∼ O(N log N), and no Cholesky decomposition is necessary.

Since **g**¯ = ∂ <sup>2</sup>H/∂**q**ˆ 2 , the approximation (30) can be thought as building **A** from the diagonal of the Hessian of H with respect to **q**ˆ (a similar heuristic is employed in Neal, 1995 for Bayesian learning). This observation begs the following question: Why do we take **g**¯ = ∂ <sup>2</sup>H/∂**q**ˆ 2 instead of **g**¯ = ∂ <sup>2</sup>H/∂**q** 2 , which would produce a similar acceleration matrix **A** without the Fourier transforms? The answer is that the matrix **G**ˆ obs + **G**ˆ reg is more concentrated along its diagonal than **G** is. Hence, more information about the observation operator is conserved by taking the diagonal of **G**ˆ obs + **G**ˆ reg than the diagonal of **G**.

#### 4.2.4. Sampling of Momentum Vector

In order to retain the validity of the leapfrog method with generalized momenta, we require said momenta to be associated with a kinetic energy following a bilinear form. We achieve this by assuming that **p** ∈ C <sup>N</sup> has a general complex normal distribution CN(0, Ŵ,**C**) with unit covariance Ŵ and

$$\mathbf{C} = \mathbf{A}^\* \bar{\mathbf{G}} (\text{conj} \, \mathbf{A}) = \frac{1}{N} \mathbf{D} \mathcal{F} \mathcal{F}^\top \mathbf{D}^{-1}. \tag{31}$$

For Fpq given by (23),

$$\begin{aligned} \frac{1}{N} (\mathcal{F}\mathcal{F}^\top)\_{pq} &= \sum\_{r=0}^{N-1} \omega^{r(p+q-N)} = P\_{pq} \\ &= \begin{cases} 1 & \text{if } p+q = kN, \; k = \dots, -1, 0, 1, \dots, \dots \\ 0 & \text{otherwise}. \end{cases} \end{aligned}$$

where Ppq are the components of the permutation matrix **P**. Since **D** is diagonal and **P** is a permutation matrix (23), yields **C** = **P**.

Let **p** = **X** + i**Y** with **X**, **Y** ∈ R <sup>N</sup>. The vector **V** <sup>⊤</sup> = [**X** <sup>⊤</sup> **Y** ⊤] is multivariate normal with zero mean. Given Ŵ and **C**, the cross-covariance matrix of this vector is

$$\mathbb{E}[\mathbf{X}\mathbf{Y}^\top] = \frac{1}{2}\operatorname{Im}\{\boldsymbol{\Gamma} + \mathbf{C}\} = 0, \quad \mathbb{E}[\mathbf{Y}\mathbf{X}^\top] = \frac{1}{2}\operatorname{Im}\{-\boldsymbol{\Gamma} + \mathbf{C}\} = 0,$$

since both Ŵ and **C** are real. In other words, the real and imaginary parts of **p** are mutually uncorrelated. The covariance

matrix of this vector is

$$\mathbb{E}[X\_i X\_j] = \frac{1}{2} \operatorname{Re} \{ \Gamma\_{ij} + \mathcal{C}\_{ij} \} = \begin{cases} 1 & \text{if } i = j = 0, \ -N/2 \\ 1/2 & \text{if } i = -j, \ i, j \neq 0 \\ 0 & \text{otherwise} \end{cases} \tag{32a}$$

$$\mathbb{E}[Y\_i Y\_j] = \frac{1}{2} \operatorname{Re} \{ \Gamma\_{\vec{\eta}} - \mathcal{C}\_{\vec{\eta}} \} = \begin{cases} 1/2 & \text{if } i = -j, \, i, j \neq 0 \\ 0 & \text{otherwise.} \end{cases} \tag{32b}$$

It follows from (32) that only the components p<sup>k</sup> = X<sup>k</sup> + iY<sup>k</sup> with k = −k are correlated. Their covariances are E[XkX−<sup>k</sup> ] = 0.5, E[YkY−<sup>k</sup> ] = −0.5. Since **p** must be complex-symmetric to guarantee that **q** remains real, we generate **p** as

$$\begin{aligned} X\_{-N/2} &\sim \mathcal{N}(0,1), \\ X\_{-N/2+1} &\sim \mathcal{N}(0,1/2), \\ &\vdots \\ X\_0 &\sim \mathcal{N}(0,1), \\ X\_1 &= X\_{-1}, \\ &\vdots \\ X\_{N/2-1} &= X\_{-N/2+1}, \end{aligned} \qquad \begin{aligned} Y\_{-N/2+1} &\sim \mathcal{N}(0,1/2) \\\\ Y\_1 &= -Y\_{-1}, \\\\ Y\_{N/2-1} &= -Y\_{-N/2+1}, \\ Y\_{N/2-1} &= -Y\_{-N/2+1}. \end{aligned}$$

Hence the vector **p** is generated with N independent identically distributed normal random variables.

#### 4.2.5. Computational Example

We apply the gHMC algorithm to the model problem (16) with parameters D = 1.0, u = 10.0, N = 64, and σ<sup>ǫ</sup> = 0.02. Measurements are taken at locations x<sup>j</sup> , j ∈ J = {47, 63} over the time period (0, 6 × 10−<sup>1</sup> ). The measurements are shown in **Figure 3**. To infer the release profile from the observations, we employ **A** = **A**1, S = 5 leapfrog steps, regularization parameter γ = 1 × 10−<sup>2</sup> , and hybrid timestep tuned to achieve a rejection rate of 30–35%. A total of 10 chains were generated, each with 2× 10<sup>4</sup> samples retained after burn-in. These samples are employed to compute Monte Carlo estimates of the posterior mean and standard deviation, shown in **Figure 4**.

shown in dashed lines.

**Figure 4A** compares the posterior mean estimate of the release profile, together with its 95% confidence interval, against the MAP estimate and the true release profile. It can be seen that the gHMC scheme is able to infer the main feature of the release profile. The gHMC estimate also compares favorably to the MAP estimate. **Figure 4B** shows the posterior standard deviation, which, as in section 4.1, is largely due to the relatively small number of observation locations, the relatively high measurement error, and the ill-posedness of the inverse problem.

Next, we study the effect of the choice of acceleration matrix **A** on the Markov chains produced by the gHMC algorithm and the Monte Carlo standard error of the posterior mean estimate. Three alternatives for **A** are considered: **A** = **A**<sup>1</sup> in (29), **A** = **A**<sup>2</sup> in (30), and **A** = **I** (no acceleration). The Monte Carlo standard error is given by σ /ˆ <sup>√</sup>neff, where <sup>σ</sup><sup>ˆ</sup> is the posterior standard

TABLE 1 | Effective sample size reduction ratio η for various choices of regularization parameter γ and acceleration matrix A.


deviation of the inferred parameter, and neff denotes the MCMC effective sample size, given by neff ≡ n/η (Kass et al., 1998), where n is the number of MCMC samples, and η > 1 is a reduction factor due to the correlation between MCMC samples. This reduction factor is given by

$$\eta = 1 + 2 \sum\_{s=1}^{\infty} \rho(\mathbf{s})\_s$$

where ρ(s) is the autocorrelation of the MCMC chain at lag s. We note that, after convergence, gHMC chains for different choices of **A** converge to the same posterior mean and standard deviation. The difference of performance between choices of **A** is in terms of the reduction factor η: Better-performing choices of acceleration matrix result in smaller values of η, so that fewer samples are necessary to achieve a certain target error in the estimation of the posterior mean.

We compute 10 gHMC chains for each choice of **A** and for two choices of regularization parameter γ , 1 × 10−<sup>2</sup> and 1 × 10−<sup>3</sup> , and employ the samples to compute the autocorrelations ρ(s) and the effective sample size reduction ratios for each q<sup>j</sup> , j = 0, . . . , N − 1, except for j = 47, 63, which are included in the observations. **Figure 5** presents the autocorrelations for γ = 1 × 10−<sup>3</sup> and each choice of **A**. It can be seen that **A** = **A**<sup>1</sup> produces highly uncorrelated chains for each of the q<sup>j</sup> studied, **A** = **A**<sup>2</sup> produces more correlated chains, and **A** = **I** produces the most correlated chains. As expected, **A**<sup>2</sup> provides a compromise between the low autocorrelation / high expense of the full Cholesky decomposition and the high autocorrelation / low cost of **A** = **I**.

The maximum effective sample size reduction ratio η for each choice of **A** and γ is shown in **Table 1**. It can be seen that, consistent with **Figure 5**, **A** = **A**<sup>1</sup> produces highly uncorrelated chains, which leads to low values of η, and therefore to smaller standard errors. Similarly, **A** = **I** leads to the most correlated chains, the highest values of η, and the highest standard errors, while the choice **A** = **A**<sup>2</sup> leads to values of η between those of **A** = **A**<sup>1</sup> and **A** = **I**. We also note that the values of η for **A** = **A**<sup>2</sup> and **A** = **I** increase significantly by going from γ = 1 × 10−<sup>2</sup> to γ = 1 × 10−<sup>3</sup> , which is to be expected as the latter is a more challenging case due to weaker regularization. On the other hand, the value of η for **A** = **A**<sup>1</sup> does not increase as dramatically when going from γ = 1 × 10−<sup>2</sup> to γ = 1 × 10−<sup>3</sup> , which indicates that **A**<sup>1</sup> is the best choice of acceleration matrix despite its higher computational cost per leapfrog step. We conclude that the gHMC scheme leads to a significant reduction in the estimation error over the use of HMC without an acceleration matrix.

For problems with reaction terms, the forcing **F** = −∇**q**H is not a linear function of **q** as in (28). In such cases, the selection of the acceleration matrix **A** is not straightforward. The challenge is to find an approximation to the forcing that is linear in **q**, i.e., preserves the form (28) with **G**obs and **G**reg independent of **q**. This is required to guarantee the reversibility of the Hamiltonian dynamics.

Such an approximation can be obtained by disregarding the nonlinear reaction term and using **G**obs in (25) and **G**reg in (27), which are functions only of the temporal and spatial domain properties and the hyperparameters σ 2 ǫ and γ . This selection is equivalent to taking **F** ≈ −**G**lin(**q** − **q**obs), where **G**lin is the Hessian of the advection-diffusion (linear) portion of the Hamiltonian. It gives the acceleration matrix **A**<sup>1</sup> of (29). This choice is justified for non-periodic boundary conditions if the contaminant plume does not reach the domain's boundaries during the simulation time. An alternative is to take only the diagonal portion of the Hessian of an advection-diffusion portion of the Hamiltonian. This would produce the acceleration matrix **A**<sup>2</sup> in (30).

#### 5. CONCLUSIONS AND FURTHER WORK

We presented a computationally efficient and accurate algorithm for identification of sources and release histories of (geo)chemically active solutes. The algorithm is based on a generalized hybrid Monte Carlo approach, in which MC sampling is accelerated by the use of discrete adjoint equations. Some of the salient features of our approach are: (1) its ability to handle nonlinear systems, since it requires no linearizations, and (2) its compatibility with various regularization strategies.

The introduction of an acceleration matrix to the gHMC scheme was tested for an advection-dispersion problem. While the example presented was limited to one-dimensional domains, periodic boundary conditions, and homogeneous porous media, our analysis demonstrated that the proposed acceleration matrices improve upon basic HMC; therefore, we consider the proposed acceleration strategy to be promising. The generalization of these constructions to problems with nonlinear reaction terms, two- and three-dimensional, heterogeneous media, and non-periodic boundary conditions, will be the subject of future work.

Finally, we note the importance of considering the heterogeneity of flow and transport parameters, such as the

#### REFERENCES


hydraulic conductivity and dispersion coefficient tensors, for source identification tasks (Xu and Gömez-Hernández, 2018). Attempting to perform Bayesian inference when the values of these coefficients are assumed to be known but their values are erroneous may lead to model misspecification and consequently to posterior densities with little predictive value. Fortunately, the HMC and gHMC schemes presented in this work can accommodate the simultaneous identification of heterogeneous coefficients together with the release history by extending the state vector **q** to include the discretized heterogeneous coefficients. The calculation of the gradient of the data misfit with respect to the extended state vector can be accomplished via discrete adjoint sensitivity analysis (Zhang et al., 2017) for complex dynamical systems. The extension of the presented framework to the identification of heterogeneous parameters of geophysical models will be considered in future work.

## AUTHOR CONTRIBUTIONS

DB-S implemented the presented algorithms in MATLAB. All authors contributed equally to the preparation of this manuscript.

#### FUNDING

This work was supported in part by the Applied Mathematics Program within the U.S. Department of Energy Office of Advanced Scientific Computing Research Mathematical Multifaceted Integrated Capability Centers (MMICCS) project—AEOLUS—Advances in Experimental Design, Optimal Control, and Learning for Uncertain Complex Systems under award number DE-SC0019393, by the U.S. National Science Foundation under award number DMS-1802189, and by the U.S. Department of Energy Office of Basic Energy Research under award number DE-SC0019130. Pacific Northwest National Laboratory is operated by Battelle for the DOE under Contract DE-AC05-76RL01830.

## ACKNOWLEDGMENTS

We thank N. Gulbahce and T. Bewley for fruitful discussions.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling editor is currently co-organizing a Research Topic with one of the authors DT, and confirms the absence of any other collaboration.

Copyright © 2019 Barajas-Solano, Alexander, Anghel and Tartakovsky. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# APPENDIX

#### Discrete Sensitivity Analysis

For the problem in section 4.1 we use the linearized Runge-Kutta (Rosenbrock) method ROS2 of Verwer et al. (1999) for time stepping of the forward ODE problem. The advantage of using this method is that it allows for a linear implicit treatment of the dispersion operator and a linearization of the reaction operator, while the advection operator is treated explicitly.

We assume that the advection-dispersion-reaction equation can be discretized into an autonomous system of ODEs

$$\mathbf{c}\_t = \mathbf{f}(\mathbf{c}) = (\mathbf{A}\_\mathrm{D} + \mathbf{A}\_\mathrm{A})\mathbf{c} - \mathbf{R}(\mathbf{c})\_\mathrm{s}$$

where **c** is the state vector, **A**<sup>D</sup> is the discretized linear dispersion operator, **A**<sup>A</sup> is the discretized linear advection operator, and **R**(**c**) is the reaction vector. Time stepping is performed via a scheme

$$\mathbf{c}^{n+1} = \mathbf{c}^n + (2 - b)\Delta t \mathbf{k}\_1 + b\Delta t \mathbf{k}\_2,\qquad \text{(A1)}$$

$$(\mathbf{I} - \theta \Delta t \mathbf{J})\mathbf{k}\_1 = \mathbf{f}(\mathbf{c''}),\tag{A2}$$

$$(\mathbf{I} - \theta \Delta t \mathbf{J})\mathbf{k}\_2 = \mathbf{f}\left(\mathbf{c}^n + \frac{1}{2b} \Delta \mathbf{k}\_1\right) - \frac{1}{b} \mathbf{k}\_1,\tag{A3}$$

where **J** = **fc**(**c** n ) is the Jacobian of **f** with respect to the state. √ The coefficients θ and b are taken for this application as θ = 1 − 2/2 and b = 1/2, respectively. The left-hand side operators of (A2, A3) are approximated via approximate matrix factorization (AMF) to obtain the split form

$$(\mathbf{I} - \theta \Delta t \mathbf{J}) \approx (\mathbf{I} - \theta \Delta t \mathbf{A}\_{\mathbf{D}}) (\mathbf{I} + \theta \Delta t \mathbf{R}\_{\mathbf{c}} (\mathbf{c}^n)).$$

The discussion in section 4.1 led us to conclude that it is necessary to compute products of the form (d**c** i+1 /d**c** i ) <sup>⊤</sup>**u** in order to apply the discrete sensitivity technique of Daescu et al. (2000). The formulae for the computation of these products are derived from the time-stepping scheme (A1-A3). In particular, differentiating (A1) with respect to the state and multiplying by a test vector **u** gives the single-step sensitivity product as

$$\left(\frac{\mathrm{d}\mathbf{c}^{n+1}}{\mathrm{d}\mathbf{c}^{n}}\right)^{\top}\mathbf{u} = \mathbf{u} + \frac{3}{2}\Delta t \left(\frac{\mathrm{d}\mathbf{k}\_{1}^{n}}{\mathrm{d}\mathbf{c}^{n}}\right)^{\top}\mathbf{u} + \frac{1}{2}\Delta t \left(\frac{\mathrm{d}\mathbf{k}\_{2}^{n}}{\mathrm{d}\mathbf{c}^{n}}\right)^{\top}\mathbf{u}.$$

The next task is to derive formulas for the Jacobians of the stage derivatives **k**<sup>1</sup> and **k**2. Let **M** be the AMF-ed left-hand-side matrix of (A2, A3). Differentiating (A2, A3) with respect to the state and multiplying by the test vector **u** gives the formulae

$$\left(\frac{\mathrm{d}\mathbf{k}\_1^n}{\mathrm{d}\mathbf{c}^n}\right)^\top \mathbf{u} = \left[\mathbf{J}\_0 - \left(\frac{\mathrm{d}\mathbf{M}}{\mathrm{d}\mathbf{c}^n} \mathbf{k}\_1^n\right)\right] \mathbf{v}, \quad \mathbf{M}^\top \mathbf{v} = \mathbf{u}\_0$$

and

$$\mathbf{J}\left(\frac{\mathbf{d}\mathbf{k}\_2^n}{\mathbf{d}\mathbf{c}^n}\right)^\top \mathbf{u} = \mathbf{J}\_1^\top \mathbf{v} + \left(\frac{\mathbf{d}\mathbf{k}\_1^{n+1}}{\mathbf{d}\mathbf{c}^n}\right)^\top (\Delta t \mathbf{J}\_1 - 2\mathbf{I})\mathbf{v} - \left(\frac{\mathbf{d}\mathbf{M}}{\mathbf{d}\mathbf{c}^n} \mathbf{k}\_2^n\right)^\top \mathbf{v},$$

with **J**<sup>0</sup> = **fc**(**c** n ), **J**<sup>1</sup> = **f<sup>c</sup> c** <sup>n</sup> + 1t**k** n 1 .

The computation of the products (d**M**/d**c** n )**k** n i , i = 1, 2 is highly problem-specific. It depends on the structure of the second-order derivatives of the reaction vector with respect to the state. For the reaction model (15) and a method-of-lines discretization, the Jacobian **R<sup>c</sup>** is diagonal, and so the computation of these products is straightforward. For different reaction models and more sophisticated discretization schemes the computation might be more involved.

# A Machine Learning Based Hybrid Multi-Fidelity Multi-Level Monte Carlo Method for Uncertainty Quantification

#### Nagoor Kani Jabarullah Khan\* and Ahmed H. Elsheikh

School of Energy, Geoscience, Infrastructure and Society, Heriot-Watt University, Edinburgh, United Kingdom

This paper focuses on reducing the computational cost of the Monte Carlo method for uncertainty propagation. Recently, Multi-Fidelity Monte Carlo (MFMC) method (Ng, 2013; Peherstorfer et al., 2016) and Multi-Level Monte Carlo (MLMC) method (Müller et al., 2013; Giles, 2015) were introduced to reduce the computational cost of Monte Carlo method by making use of low-fidelity models that are cheap to an evaluation in addition to the high-fidelity models. In this paper, we use machine learning techniques to combine the features of both the MFMC method and the MLMC method into a single framework called Multi-Fidelity-Multi-Level Monte Carlo (MFML-MC) method. In MFML-MC method, we use a hierarchy of proper orthogonal decomposition (POD) based approximations of high-fidelity outputs to formulate a MLMC framework. Next, we utilize Gradient Boosted Tree Regressor (GBTR) to evolve the dynamics of POD based reduced order model (ROM) (Xiao et al., 2017) on every level of the MLMC framework. Finally, we incorporate MFMC method in order to exploit the POD ROM as a level specific low-fidelity model in the MFML-MC method. We compare the performance of MFML-MC method with the Monte Carlo method that uses either a high-fidelity model or a single low-fidelity model on two subsurface flow problems with random permeability field. Numerical results suggest that MFML-MC method provides an unbiased estimator with speedups by orders of magnitude in comparison to Monte Carlo method that uses high-fidelity model only.

Keywords: uncertainty quantification, POD, multi-fidelity Monte Carlo method, multi-level Monte Carlo method, machine learning

# 1. INTRODUCTION

Effective propagation of uncertainties through nonlinear dynamical systems has become an essential task for model based engineering applications (e.g., water resources management, petroleum reservoir management) (Elsheikh et al., 2013; Petvipusit et al., 2014; Kani and Elsheikh, 2018). There are many possible sources of uncertainties in the input of multi-phase porous media flow models such as material properties (e.g., permeability, and porosity), boundary conditions, and geometrical information of the simulated domain. In this work, we focus on the canonical problem of uncertainty propagation in subsurface flow models due to the stochastic model inputs mainly the spatially distributed hydraulic conductivity field. In this setting, the high-fidelity model outputs [quantities of interest (QoI)] are usually defined as a time series of transport variables at selected grid blocks (e.g., well locations) in the porous media domain. The propagation

#### Edited by:

Philippe Renard, Université de Neuchâtel, Switzerland

#### Reviewed by:

Maruti Kumar Mudunuru, Los Alamos National Laboratory (DOE), United States Niklas Linde, Université de Lausanne, Switzerland

\*Correspondence:

Nagoor Kani Jabarullah Khan nj7@hw.ac.uk

#### Specialty section:

This article was submitted to Freshwater Science, a section of the journal Frontiers in Environmental Science

> Received: 27 November 2018 Accepted: 20 June 2019 Published: 27 August 2019

#### Citation:

Jabarullah Khan NK and Elsheikh AH (2019) A Machine Learning Based Hybrid Multi-Fidelity Multi-Level Monte Carlo Method for Uncertainty Quantification. Front. Environ. Sci. 7:105. doi: 10.3389/fenvs.2019.00105 of uncertainties through multi-phase porous media flow models remains challenging because of high dimensionality of input parameter space (e.g., heterogeneous permeability) and the nonpolynomial model nonlinearities (Elsheikh et al., 2012, 2013). For this class of problems, probabilistic techniques, including stochastic Galerkin (Ghanem and Spanos, 1991; Stefanou, 2009), and stochastic collocation methods (Babuška et al., 2007; Doostan and Owhadi, 2011) have limited applicability despite they are computationally very effective for quasi-linear flow models with the small number of random variables (Li and Zhang, 2007; Lin and Tartakovsky, 2009).

One viable option to handle such situations is the Monte Carlo method (MC) where repeated evaluations of the high-fidelity flow models using different instantiations of the random input are performed. The output of these simulations is post-processed for estimates of the desired statistics such as the mean and the variance of the QoI. Generally, the estimators of the MC method are unbiased. However, since the accuracy of the MC method is measured in terms of the estimator variance (Giles, 2013), the convergence rate of MC estimators toward the desired statistics scales as <sup>√</sup> N, where N is the number of random samples. Given this slow convergence rate of MC methods, the MC method is computationally expensive since a large number of high fidelity simulation have to be performed to obtain a reasonably accurate statistical estimate for the QoI. One notable advantage of MC methods in-comparison to other techniques (Li and Zhang, 2007; Lin and Tartakovsky, 2009) is the ease of implementation using black-box simulators. Also, the rate of convergence is independent of the dimensionality of the random model inputs.

In this work, in order to make use of the aforementioned advantages of the MC method and to alleviate the slow convergence rate, we employ a variant of control variate method (Ng, 2013; Giles, 2015) called Multi-Level Monte Carlo method (Giles, 2013, 2015) which makes use of the correlation between the high-fidelity model output and a multilevel hierarchy of low-fidelity model outputs. The key aspect of MLMC method is the repartition of the computational cost between different hierarchical levels of models based on the number of samples required to decrease the variance at each level. More precisely, the MLMC method relies on the fact that increasing the number of samples reduces the variance at low levels and at high levels, the level variances are expected to be typically small and thus MLMC method incurs few expensive high-fidelity simulations (Giles, 2013; Müller et al., 2013).

Similar to MLMC method, Multi-Fidelity Monte Carlo method (Ng, 2013; Peherstorfer et al., 2016) is another control variate method which combines the outputs from an arbitrary number of low-fidelity models with the high-fidelity model in order to speedup the statistical estimation of the QoI. The key aspect of MFMC approach is the initial selection of low-fidelity models and the corresponding number of model runs for each model (Ng, 2013; Peherstorfer et al., 2016). Ng (2013) proposed a multifidelity approach to reduce the cost of expensive objective functions in stochastic optimization problems by making use of inexpensive, low-fidelity models. Peherstorfer et al. (2016) extended the MFMC method introduced in Ng (2013) to accelerate uncertainty quantification (UQ) tasks by making use of many number of low-fidelity models. Furthermore, MFMC method introduced in Ng (2013) can utilize low-fidelity models of any type, for example, up-scaled models (Durlofsky and Chen, 2012), POD reduced order models (Berkooz et al., 1993; Antoulas et al., 2001; Lassila et al., 2014) and response surface based models (Frangos et al., 2010) could be combined in the MFMC framework. We refer the readers to read the paper by Peherstorfer et al. (2018) for a complete review of MFMC method.

We now present a brief literature review of MLMC method as applied to uncertainty quantification (UQ) tasks. It appears Heinrich (2001) was the one to first apply MLMC in the context of parametric integration. Kebaier (2005) then used similar ideas for a two-level Monte Carlo method to approximate weak solutions to stochastic differential equations in mathematical finance. Giles (2008) extended the MLMC method to solve stochastic ordinary differential equations of Ito type. Barth et al. (2011) and Cliffe et al. (2011) introduced MLMC method for elliptic partial differential equations (PDEs) with stochastic coefficients. Abdulle et al. (2013) applied MLMC method to solve elliptic PDEs in divergence form, where the coefficients are random with multiple scales. Mishra et al. (2012) generalized MLMC method to nonlinear, scalar hyperbolic conservation laws with random initial data. Mishra et al. (2016) extended the work of Mishra et al. (2012) for systems of nonlinear, hyperbolic conservation laws in several space dimensions. Geraci et al. (2015) proposed a Multi-Level Multi-Fidelity method in which the MLMC estimator is modified at each level to benefit from a level specific low-fidelity model.

In the context of fluid flow in porous media, Müller et al. (2013) applied MLMC method for two-phase transport simulations of an oil reservoir with uncertain heterogeneous permeability. Efendiev et al. (2013) used mixed multi-scale finite element methods within the MLMC framework to speed up the computations involving multiphase flow and transport simulations. Efendiev et al. (2015) coupled the generalized multi-scale finite element method with the Multi-Level Markov chain Monte Carlo method (MLMCMC), which sequentially screens the proposal with different levels of approximations and combines the samples at different levels to arrive at an accurate estimate. Elsakout et al. (2015) demonstrated the performance of MLMCMC for uncertainty quantification tasks involving reservoir simulation with less computational cost in comparison to the standard Markov Chain Monte Carlo method. Fagerlund et al. (2016) combined selective refinement technique with the MLMC for estimating the sweep efficiency in a two-phase flow scenario where an absolute accuracy of failure probability in a magnitude 5 to 10 percent is required. Lu et al. (2016) applied MLMC method for estimating cumulative distribution functions of QoI obtained from the numerical approximation of largescale stochastic subsurface simulations. For a complete review of MLMC method, we refer the readers to the following papers by Giles (2013) and Giles (2015).

Historically, MLMC method constructs a hierarchy of coarse spatial and/or time discretization models as low-fidelity models. However, it is also possible to formulate a sequence of low-fidelity models utilizing projection based reduced order models (Wang et al., 2017; Xiao et al., 2017) of different dimensions. For example, Codina et al. (2015) employed different reduced basis ROMsin the MLMC framework to estimate the statistical outputs of stochastic elliptic PDEs. In that work, the authors proposed an algorithm for optimally choosing both the dimensions of the reduced basis ROMs and the number of Monte Carlo samples at each level to achieve a given error tolerance.

In this manuscript, we propose a Multi-Fidelity-Multi-Level Monte Carlo (MFML-MC) method to address some of the limitations of standard MLMC method with Galerkin projection based ROMs (Antoulas et al., 2001; Lassila et al., 2014; Codina et al., 2015) as low-fidelity models, in particular for large scale nonlinear UQ problems. We first note that the variance, and hence the mean square error of the standard MLMC estimator depends on the correlation between every two consecutive level ROMs. This requires a large number of levels with a small difference in the number of dimensions between every two consecutive ROMs. Therefore, the standard MLMC estimator not only requires many levels of ROMs but also requires ROMs of high dimensions until high correlation with the high-fidelity model is achieved. Hence, the MLMC method involving ROMs obtained directly from high-fidelity model solution data like the one mentioned in Codina et al. (2015) can significantly limit the performance of MLMC method. Second, Galerkin-projection ROMs like POD ROMs obtained from the nonlinear high-fidelity model are subject to severe convergence and stability issues especially when the dimensions of the ROMs are much smaller than the dimensions of the high-fidelity model (Bui-Thanh et al., 2007; He, 2010; Wang et al., 2012). This severely limits the use of POD ROMs with low dimensions in MLMC framework, and therefore we cannot expect the reduction in computational complexity by orders of magnitude as a result of state variable's dimension reduction (Kani and Elsheikh, 2018). Third, MLMC method based on ROMs requires reconstruction of the highfidelity model state variable for every sample at each level for nonlinear problems. Such reconstruction of the high-fidelity model state variable involves a high dimensional matrix-vector multiplication, and therefore employing ROMs in the MLMC method can easily cause computational overheads, in particular for UQ problems with nonpolynomial nonlinearity. However, we note that this limitation about reconstructing the high-fidelity model solution to predict outputs of interest does not apply to linear ROMs with linear outputs. Fourth, finding the optimal dimensions of the ROMs is not guaranteed despite the additional computational complexity in the nonlinear integer optimization problem formulated in Codina et al. (2016).

The proposed MFML-MC method utilizes a number of ideas that are detailed as follows. The first idea of the MFML-MC approach is to obtain a sequence of POD based approximations of the QoI and use these sequence of POD based approximations as low-fidelity models in MLMC framework. More precisely, we compute the optimal POD bases from the singular value decomposition of the snapshot matrix built directly from the training samples of the QoI. We then employ the computed POD bases in the least-squares reconstruction method to obtain a sequence of POD based approximations of the QoI (see section 4 for more details). Since the dimension of the QoI is much smaller than the state variable's dimension, the dimension of the basis vector utilized to approximate the QoI is much smaller than the basis vector utilized to build a standard POD ROM. Therefore, building QoI POD instead of a full state variable POD enables the efficient extraction of high-level PODs at a limited computational cost. The second idea is to employ the MFMC method at each level of the MLMC method so that the highfidelity model is utilized to provide an unbiased estimator, while the low computational cost of low-fidelity models are exploited to run a very large number of realizations in order to obtain a low variance estimator. The third idea in the MFML-MC approach is to represent the difference between every two consecutive level models of the MLMC framework in a reduced dimension. We utilize principal component analysis (PCA) to perform this dimensionality reduction. The main reason to utilize PCA for dimensionality reduction is to exploit the linearity of the expected value operator. The fourth idea is to use a data-driven approach to construct a non-intrusive ROM (Wang et al., 2017; Xiao et al., 2017) in order to compute the reduced representation mentioned in the third step of MFML-MC method. We use Gradient Boosted Tree Regressor (GBTR) (Friedman, 2001) to formulate such level specific low-fidelity non-intrusive ROM in the MFMC setup. We then utilize the constructed non-intrusive ROM as a low-fidelity model in the MFMC setup on every level of the MFML-MC method. To the best of our knowledge, this paper presents the first attempt to combine the features of MFMC method and MLMC method using machine learning techniques for UQ analysis of nonlinear dynamical systems representing multi-phase porous media flow with uncertainty in the permeability field. In addition, this paper presents the first attempt to use the MFMC method to estimate the statistics of the vector-valued time series QoI while the standard MFMC method is mainly used for estimating the statistics of scalar QoI (Ng, 2013; Peherstorfer et al., 2016).

The remaining of this manuscript is organized as follows. In section 2, multi-phase porous media flow problem is formulated. In section 3, MC, MFMC, and MLMC methods are briefly explained. In section 4, MFML-MC method is introduced. In section 5, Numerical results for two subsurface multi-phase porous media flow problems showing the performance of MFML-MC method are reported. We note that building reduced order models for these porous media flow problems is quite challenging where standard POD-Galerkin reduced order models produce inaccurate and unstable results even for the cases where a large number of POD basis vectors is utilized (He et al., 2011; Kani and Elsheikh, 2018). Hence, in the two numerical test cases, standard MLMC with POD-Galerkin ROM had all the four limitations as mentioned earlier in this section. Finally, in section 6, conclusions and perspectives are drawn.

## 2. PROBLEM FORMULATION

We consider an immiscible two-phase (oil and water) flow in an incompressible porous media domain. The flow behavior of oil and water in a porous media domain can be described by conservation of mass and Darcy's law for each phase (Bastian, 1999; Chen et al., 2006; Aarnes et al., 2007). Neglecting the effects of gravitation, capillary, and compressibility, and assuming the density ratio to be equal to one, Darcy's law for each phase can be described as

$$\mathbf{v}\_{\alpha} = -\mathbf{K} \frac{k\_{r\alpha}}{\mu\_{\alpha}} \,\nabla p \tag{1}$$

where the subscript α = w denotes the water phase, the subscript α = o denotes the oil phase, **v**<sup>α</sup> is the phase velocity, p is the global pressure, **K** is the absolute permeability tensor, kr<sup>α</sup> is the relative permeability of phase α, µ<sup>α</sup> is the viscosity of phase α (Bastian, 1999; Chen et al., 2006; Aarnes et al., 2007). The phase relative permeabilities kr<sup>α</sup> models the interactions between the two phases and usually, kr<sup>α</sup> is described as a function of phase saturation (volume of phase α in a given pore space of the porous media domain) (Aarnes et al., 2007).

The total conservation of mass can be expressed in terms of incompressibility condition that takes the form

$$\nabla \cdot \mathbf{v} = q \tag{2}$$

**v** = **v**o+**v**<sup>w</sup> is the total velocity vector and q is the total source and sink term. We can combine the equation of Darcy's law for each phase (Equation 1) and the conservation of mass (Equation 2) to derive equations for global pressure and water saturation:

$$
\nabla \cdot \mathbf{K} \lambda \,\nabla p = q$$

$$
\phi \frac{\partial s\_w}{\partial t} - \nabla \cdot (f\_\mathbf{w} \mathbf{v}) + q\_w = 0 \tag{3}$$

where λ = λ<sup>w</sup> + λ<sup>o</sup> is the total mobility, λ<sup>α</sup> = krα/µ<sup>α</sup> is the phase mobility, f<sup>w</sup> = λw/(λw+λo) is termed as the fractional flow function for the water phase and with the constraint s<sup>w</sup> + s<sup>o</sup> = 1. In the rest of the manuscript, we use s in place of s<sup>w</sup> to denote water saturation.

In this problem, we consider Equation (3) as the high-fidelity model and we solve Equation (3) for pressure and saturation using sequential formulation where we solve for pressure first and then solve for the water saturation. We use finite volume method to discretize the spatial derivatives of Equation (3) in a spatial domain of n grid blocks. We use implicit time stepping method to solve Equation (3) for the high-fidelity model state variable **y**<sup>s</sup> ∈ R n , where each component of **y**<sup>s</sup> is the water saturation value at the ith grid block.

The QoI is defined as **u**(t) ∈ R <sup>m</sup>, where u<sup>i</sup> = ys(x<sup>i</sup> , yi), i = 1 · · · m ≪ n at specific time steps (say t = 10, 20, · · · 200). In the following, we use **u** in place of **u**(t) to simplify the notation and we are interested in the first moment estimate (i.e., mean) of **u**. The grid points of interest (x<sup>i</sup> , yi) i = 1 · · · m can be a set of arbitrary user specific spatial locations. For example, a set of grid points where injectors and producers are located.

#### 3. MULTI-FIDELITY MONTE CARLO AND MULTI-LEVEL MONTE-CARLO METHOD

Let **x** be a realization of the input random vector **X**(ω), ω ∈ where is the sample space and the quantity of interest be the expectation of the random variable **u**. The standard Monte Carlo method estimates the expectation E[**u**] of the random variable **u** as

$$
\hat{\mathbf{u}} = \frac{1}{N} \sum\_{i=1}^{N} \mathbf{u}^{i} \tag{4}
$$

where **u**ˆ is the estimator of E[**u**], **u** <sup>i</sup> = **u**(**x**i), and N is the number of realizations of the model output. As per the law of large numbers (Central Limit Theorem) (Giles, 2015), a sample based estimate of the expectation E[**u**] introduces sampling error (mean square error) defined as

$$
\epsilon = \mathbb{V} \text{ar}(\hat{\mathbf{u}}) = \frac{1}{N} \mathbb{V} \text{ar}(\mathbf{u}) \tag{5}
$$

where <sup>V</sup>ar(**u**) is the variance of **<sup>u</sup>**. As <sup>√</sup> ǫ known as standard error scales with √ 1 N for a constant Var(**u**), MC simulations are computationally prohibitive because of the slow convergence rate. One way to achieve a lower ǫ is to reduce the numerator in Equation (5) (Ng, 2013).

Control variate is a variance reduction technique which uses alternative estimator for E[**u**], **u** ∈ R that takes the form

$$
\hat{\mathbf{u}}^{\varepsilon^{\prime}} = \hat{\mathbf{u}} + \beta \left( \hat{\mathbf{v}} - \mathbb{E}[\mathbf{v}] \right) \tag{6}
$$

where **v**(**x**) ∈ R is an auxiliary random variable. The estimator **u**ˆ cv is an unbiased estimator of E[**u**] with variance defined as Ng (2013)

$$\mathbb{V}\text{ar}(\hat{\mathbf{u}}^{\text{cv}}) = \mathbb{V}\text{ar}(\hat{\mathbf{u}}) \left(1 - \rho^2\right) \tag{7}$$

where ρ is the correlation between **u**(**x**) and **v**(**x**). Since ρ 2 lies between 0 and 1, Var(**u**ˆ cv) is always less than Var(**u**ˆ). For UQ tasks, where the QoI is governed by partial differential equations, **u**(**x**) is obtained from a high-fidelity model output and **v**(**x**) is generally obtained from a low-fidelity model output. In general, we do not know exactly E[**v**] and we have to use a more accurate estimate of E[**v**]. For example, Ng (2013) replaced E[**v**] in Equation (6) by **v**ˆ = 1 M P<sup>M</sup> i=1 **v**(**x**i), where M≫MHF and MHF is the number of high-fidelity model samples. Furthermore, it was proved in Ng (2013) that for a fixed computational budget p, a perfectly correlated low-fidelity model is not the only condition for variance reduction over the standard MC estimator but the low-fidelity model must also be cheaper to evaluate than the high-fidelity model.

The potential limitation in the aforementioned multifidelity estimator (Ng, 2013) is that it repartitions the given computational budget p between the high-fidelity model and only a single low-fidelity model such that the mean square error of the estimator is minimized. In order to allow an arbitrary number of low-fidelity models into the control variate method, Peherstorfer et al. (2016) extended the multi-fidelity approach introduced in Ng (2013). Multi-Fidelity Monte Carlo method introduced in Peherstorfer et al. (2016) formulated an optimization problem that used an arbitrary number of low-fidelity models to derive an unbiased MFMC estimator of E[**u**] that takes the form

$$\hat{\mathbf{u}}^{m^f} = \hat{\mathbf{u}} + \beta^1 \left(\hat{\mathbf{v}}^1 - \hat{\mathbf{u}}\right) + \sum\_{i=2}^{I} \beta^i \left(\hat{\mathbf{v}}^i - \hat{\mathbf{v}}^{i-1}\right) \tag{8}$$

where **v** 1 · · · **v** <sup>I</sup> ∈ R are auxiliary random variables obtained from I number of different low-fidelity models, **v**ˆ i estimates the expectation E[**v** i ] using M<sup>i</sup> samples of low-fidelity model i, β 1 · · · β <sup>I</sup> ∈ R are the coefficients. The low-fidelity model i uses **x**<sup>1</sup> · · · **x**M<sup>i</sup> realizations of the input random vector **X**(ω) to estimate **v**ˆ i , whereas the low-fidelity model i − 1 uses only the first Mi−<sup>1</sup> realizations of **X**(ω) to estimate **v**ˆ i−1 . Therefore the two consecutive estimators **v**ˆ i and **v**ˆ i−1 are dependent for all i = 1 · · · I. The cost of the MFMC estimator is C(**u**ˆ mf P ) = I <sup>i</sup>=<sup>1</sup> C<sup>i</sup> · M<sup>i</sup> + CHF · MHF, where CHF is the cost of evaluating a high-fidelity model, and C<sup>i</sup> is the cost of evaluating a lowfidelity model i for all i = 1 · · · I. In Peherstorfer et al. (2016), an optimization problem was formulated to select optimal values for the number of samples {M<sup>∗</sup> HF, M<sup>∗</sup> 1 · · · M<sup>∗</sup> I }, and for the coefficients {β 1 ∗ · · · β I ∗ } such that the mean square error of the MFMC estimator is lower than the Monte Carlo estimator for a fixed computational budget.

The multi-level idea is an another extension of the control variate approach in which a sequence of low-fidelity models at different levels (**v**<sup>i</sup> ∈ R <sup>m</sup> with i = 1 · · · I) is used to evaluate an approximate statistics of **u**. First, let the index i encodes the accuracy of **v**<sup>i</sup> with respect to the true solution **u** ∈ R m. This means, as i is increased, the accuracy of **v**<sup>i</sup> is refined to approximate **u**. Consequently, **u** can be written as a telescopic sum in terms of **v**<sup>i</sup> with i = 1 · · · I, that takes the form (Müller et al., 2013)

$$\mathbf{u} = \mathbf{v}\_1 - \mathbf{v}\_0 + \mathbf{v}\_2 - \mathbf{v}\_1 + \dots + \mathbf{v}\_I - \mathbf{v}\_{I-1} + \mathbf{u} - \mathbf{v}\_I = \sum\_{i=0}^I \mathbf{Y}\_i \tag{9}$$

where **Y**<sup>i</sup> = **v**i+<sup>1</sup> −**v**<sup>i</sup> with i = 0 · · · I −1, **Y**<sup>I</sup> = **u**−**v**<sup>I</sup> , and we set **v**<sup>0</sup> = 0. Exploiting the linearity of the expected value operator E, the expected value E[**u**] defined in Equation (9) can be written as

$$\mathbb{E}[\mathbf{u}] = \sum\_{i=0}^{I} \mathbb{E}[\mathbf{Y}\_i] \tag{10}$$

The MLMC estimator for the expected value of **u** is obtained by replacing the expected values on the right hand side of Equation (10) by ensemble averages and is defined as

$$\hat{\mathbf{u}}^{ml} = \sum\_{i=0}^{I} \hat{\mathbf{Y}}\_i = \sum\_{i=0}^{I} \frac{1}{M\_i} \sum\_{j=1}^{M\_i} \mathbf{Y}\_i^j \tag{11}$$

The mean square error (mse) of MLMC estimator **u**ˆ ml is derived as

$$\epsilon^{ml} = \sum\_{i=0}^{I} \mathbb{V} \text{ar}(\mathbf{\hat{Y}}\_i) = \sum\_{i=0}^{I} \frac{1}{M\_i} \mathbb{V} \text{ar}(\mathbf{Y}\_i) \tag{12}$$

It is evident from Equation (12) that the mse (ǫ ml) of MLMC estimator is sum of several smaller contributions <sup>1</sup> Mi Var(**Y**i) with i = 0 · · · I.

The MLMC method is mainly based on the fact that 1 Mi Var(**Y**i) at low levels are reduced by increasing number of samples (Mi) as low level samples are computed at low computational cost. At high levels, the level variances Var(**Y**i) are expected to be typically small, thus M<sup>i</sup> can be small and hence MLMC method incurs few expensive high-fidelity model simulations. In summary, MLMC method relies on the following variance hierarchy:

$$\mathbb{V}\text{Var}(\mathbf{Y}\_0) > \mathbb{V}\text{ar}(\mathbf{Y}\_1) > \mathbb{V}\text{ar}(\mathbf{Y}\_2) > \dots > \mathbb{V}\text{ar}(\mathbf{Y}\_I) \tag{13}$$

and also expects C<sup>0</sup> < C<sup>1</sup> < C<sup>2</sup> < · · · < C<sup>I</sup> , where Ci is the computational cost to compute one sample of **Y**<sup>i</sup> . In MLMC method, the optimal values for the number of samples {M<sup>∗</sup> 0 · · · M<sup>∗</sup> I } are computed by solving a constrained minimization problem where the cost function to be minimized is the total computational cost (P<sup>I</sup> <sup>i</sup>=<sup>0</sup> C<sup>i</sup> · Mi) of the MLMC method and constraint is set by fixing ǫ ml to a specific value (say ǫ 2 2 ) (Müller et al., 2013; Geraci et al., 2015). The optimal values for the number of samples are expressed as

$$M\_i^\* = \frac{2}{\epsilon^2} \left[ \sum\_{j=0}^I \sqrt{\mathbf{C}\_j \cdot \mathbb{V} \mathbf{ar}(\mathbf{Y}\_j)} \right] \sqrt{\frac{\mathbb{V} \mathbf{ar}(\mathbf{Y}\_i)}{C\_i}} \qquad i = 0, \dots, I \tag{14}$$

Although MLMC in general refer to control variate method with a sequence of I geometrical levels (mesh discretization levels), it can also be utilized with a sequence of I reduced basis models (Codina et al., 2016) or POD basis models. More specifically, a sequence of POD basis models can be employed as sequence of low-fidelity models **v**<sup>1</sup> · · · **v**<sup>I</sup> .

A practical implementation of the MLMC algorithm is the following (Müller et al., 2013)


#### 4. MULTI-FIDELITY-MULTI-LEVEL MONTE CARLO METHOD

In this section, we present a novel variance reduction method called Multi-Fidelity-Multi-Level Monte Carlo (MFML-MC) method addressing the limiting facts observed in the standard MLMC method with Galerkin projection based ROMs (see section 1 for more details). In MFML-MC method, we formulate

a MLMC framework with I levels and then apply the techniques of MFMC method on every level of MLMC framework. **Figure 1** displays the outline of the MFML-MC method and its detailed formulation is described as five steps in the rest of this section.

The first step of MFML-MC method is to formulate a sequence of POD approximations of the QoI **u** and utilize these sequence as low-fidelity models [**v**1, · · · , **v**I] in MLMC framework. More precisely, in this approach, **v**<sup>i</sup> is ith level POD approximation of **u** and is computed from least-squares reconstruction method defined as

$$\mathbf{u} \approx \mathbf{v}\_i = \mathbf{U}\_{\boldsymbol{\mu}}^{r\_i} \,\tilde{\mathbf{u}} = \mathbf{U}\_{\boldsymbol{\mu}}^{r\_i} \left(\mathbf{U}\_{\boldsymbol{\mu}}^{r\_i} \prescript{\top}{}{\mathbf{u}}\right) \tag{15}$$

where **u**˜ ∈ R ri is the reduced representation of **u**, **U** ri <sup>u</sup> ∈ Rm×r<sup>i</sup> is the orthonormal matrix containing r<sup>i</sup> orthonormal basis vectors in its columns. The optimal orthonormal basis vectors are computed from the singular value decomposition (SVD) of the snapshot matrix **X**<sup>u</sup> = (**u**<sup>1</sup> . . . **u**T) 1 . . . (**u**<sup>1</sup> . . . **u**T) L , where T denotes the number of time steps and L denotes the number of training samples corresponding to different realizations of the stochastic input parameters. The SVD of **X**<sup>u</sup> is expressed as (Kani and Elsheikh, 2018)

$$\mathbf{X}\_{\boldsymbol{\mu}} = \mathbf{U}\_{\boldsymbol{\mu}} \; \boldsymbol{\Sigma}\_{\boldsymbol{\mu}} \; \mathbf{W}\_{\boldsymbol{\mu}} \tag{16}$$

where **U**<sup>u</sup> ∈ R <sup>m</sup>×<sup>m</sup> is the left singular matrix, (σ<sup>1</sup> > σ<sup>2</sup> > σ<sup>3</sup> > · · · σ<sup>m</sup> ≥ 0) are the singular values of the snapshot matrix **X**u. The associated error termed as least–squares errors in approximating **u** by **v**<sup>i</sup> using only r<sup>i</sup> basis vectors is given by (Berkooz et al., 1993; Lucia et al., 2004)

$$\varepsilon\_{i} = \|\mathbf{u} - \mathbf{v}\_{i}\|\_{2} = \sum\_{j=r\_{i}+1}^{m} \sigma\_{j} \tag{17}$$

Please note that the dimension m of the basis vector in **U**<sup>u</sup> is much smaller than n (the number of grid points). Hence, for a large scale UQ problems where m ≪ n, we can easily form many levels with smaller ε<sup>i</sup> in this MLMC framework in comparison to standard MLMC method. Moreover, **v**<sup>I</sup> can be obtained by using less number of basis vectors (r<sup>I</sup> ≈ m with ε<sup>I</sup> ≈ 0) in comparison to standard MLMC method with a sequence of Galerkin projection based ROMs.

The second step of MFML-MC method is to compute the reduced representation of **Y**<sup>i</sup> over all levels in MLMC framework (see Equation 9). The reduced representation of **Y**<sup>i</sup> is expressed as

$$\tilde{\mathbf{Y}}\_{i} = \mathbf{U}\_{Y\_{i}}^{q\_{i}} \prescript{\top}{}{\mathbf{Y}}\_{i} \tag{18}$$

where **Y**˜ <sup>i</sup> ∈ R qi is the reduced representation of **Y**<sup>i</sup> , **U** qi Yi ∈ Rm×q<sup>i</sup> is the orthonormal matrix containing q<sup>i</sup> orthonormal basis vectors in its columns. The optimal orthonormal basis matrix **U** qi Yi is computed from the singular value decomposition (SVD) of the snapshot matrix **X**Y<sup>i</sup> = (**Y**i<sup>1</sup> . . . **Y**i<sup>T</sup> ) 1 . . . (**Y**i<sup>1</sup> . . . **Y**i<sup>T</sup> ) L , where **Y**i<sup>j</sup> = **v**i+1<sup>j</sup> − **v**i<sup>j</sup> (i = 0 · · · I − 1) and **Y**I<sup>j</sup> = **u**<sup>j</sup> − **v**I<sup>j</sup> for all j = 1, · · · , T. Since, **Y**<sup>i</sup> is computed from the difference between two consecutive levels of POD based approximations **v**i+<sup>1</sup> and **v**<sup>i</sup> , i.e., **Y**<sup>i</sup> = **v**i+<sup>1</sup> − **v**<sup>i</sup> , the least–squares error in approximating **Y**<sup>i</sup> by (**U** qi Yi **Y**˜ <sup>i</sup>) is equivalent to the difference of two consecutive level ε (see Equation 17) which is expressed as 1ε<sup>i</sup> = ε<sup>i</sup> − εi+<sup>1</sup> = Pri+<sup>1</sup> j=ri+1 σj .

Now the MLMC estimator (see Equation 11) for the expected value of **u** is expressed as

$$\mathbb{E}[\mathbf{u}] = \sum\_{i=0}^{I} \mathbb{E}[\mathbf{Y}\_i] \approx \sum\_{i=0}^{I} \mathbb{U}\_{Y\_i}^{q\_i} \mathbb{E}[\tilde{\mathbf{Y}}\_i] = \hat{\mathbf{u}}^{ml} = \sum\_{i=0}^{I} \hat{\mathbf{Y}}\_i = \sum\_{i=0}^{I} \mathbb{U}\_{Y\_i}^{q\_i} \hat{\tilde{\mathbf{Y}}}\_i \tag{19}$$

The third step of MFML-MC method is to set r<sup>i</sup> for all i = 1 · · · I. In this framework, we set r<sup>i</sup> = i. Now, 1ε<sup>i</sup> = σ<sup>i</sup> and therefore, we expect **Y**<sup>i</sup> to be attracted to a certain low dimensional subspace of dimension q<sup>i</sup> = 1r<sup>i</sup> = (ri+<sup>1</sup> − ri) = 1 over all the levels.

In the fourth step, we extend the Multi-Level Multi-Fidelity method introduced in Geraci et al. (2015) by adopting multi-fidelity approach (see Equation 8) on every level of MLMC framework to derive an unbiased estimator of E[**Y**˜ <sup>i</sup>] in Equation (19) that takes the form

$$\hat{\mathbf{Y}}\_{i}^{mf} = \hat{\mathbf{Y}}\_{i} + \boldsymbol{\beta}\_{i}^{1} \left( \hat{\mathbf{Y}}\_{i}^{1} - \hat{\mathbf{Y}}\_{i} \right) + \sum\_{f=2}^{F\_{i}} \boldsymbol{\beta}\_{i}^{f} \left( \hat{\mathbf{Y}}\_{i}^{f} - \hat{\mathbf{Y}}\_{i}^{f-1} \right) \tag{20}$$

where **Y**˜ 1 i · · · **Y**˜ Fi i are auxiliary random variables obtained from F<sup>i</sup> number of level specific low-fidelity models of **Y**˜ i , ˆ**Y**˜ f i estimates the expectation E[**Y**˜ f i ] using M f i samples of **Y**˜ f i for all f = 1 · · · F<sup>i</sup> , and β 1 i · · · β Fi <sup>i</sup> <sup>∈</sup> <sup>R</sup> are the coefficients on level <sup>i</sup>, <sup>i</sup> <sup>=</sup> <sup>0</sup> · · ·I. In this paper, we set F<sup>i</sup> = 1 for all i = 0 · · · I. Next, we use the optimization problem formulated in Peherstorfer et al. (2016) to select optimal values M<sup>∗</sup> HF, M<sup>1</sup> ∗ i such that the mean square error of the MFML-MC estimator <sup>ˆ</sup>**Y**˜ mf i on every level is lower than the Monte Carlo estimator <sup>ˆ</sup>**Y**˜ i for the same computational budget.

Now, the MFML-MC estimator for the expected value of **u** (see Equation 19) is expressed as

$$\mathbb{E}[\mathbf{u}] \approx \hat{\mathbf{u}}^{ml} = \sum\_{i=0}^{I} \hat{\mathbf{Y}}\_{i} \approx \sum\_{i=0}^{I} \mathbf{U}\_{Y\_{i}}^{q\_{i}} \hat{\mathbf{Y}}\_{i}^{mf} \tag{21}$$

In the fifth step, we utilize a data-driven approach to derive a level specific low-fidelity model **Y**˜ 1 i in the MFMC setup. In this datadriven approach, we first consider a discrete nonlinear dynamical system on every level (i = 0 · · · I) that takes the form

$$
\tilde{\mathbf{Y}}\_i^1(t+1) = \tilde{\mathbf{Y}}\_i^1(t) + \mathbf{F}\_i(\mathbf{x}, \tilde{\mathbf{Y}}\_i^1(t)),
\tag{22}
$$

where **F**i(**x**, **Y**˜ 1 i (t)) is the nonlinear term utilized to update **Y**˜ 1 i (t + 1) on level i for all i = 0 · · · I (Nagoor Kani and Elsheikh, 2017). Next, we use GBTR (Friedman, 2001) on every level to approximate **F**i(**x**, **Y**˜ 1 i (t)). We use (**x**, **Y**˜ 1 i (t)) as an input to GBTR and compute **F**i(**x**, **Y**˜ 1 i (t)) as an output. We fit GBTRs using the same training samples (**Y**i<sup>1</sup> . . . **Y**i<sup>T</sup> ) 1 . . . (**Y**i<sup>1</sup> . . . **Y**i<sup>T</sup> ) L utilized in the second step.

#### 5. NUMERICAL EXPERIMENTS

In this section, we present numerical results to evaluate the performance of MFML-MC method. The numerical results are based on two UQ tasks involving two-phase flow in the heterogeneous porous media domain. The two test cases are quarter five spot problem and the uniform flow problem with the uncertainties in the permeability field (Kani and Elsheikh, 2018). In section 5.1, we describe high-fidelity model setup, in section 5.2, we describe low-fidelity model setup in order to formulate MFML-MC framework, in section 5.3, we describe MLMC with Galerkin-POD ROMs setup, and in section 5.4, we define a set of error metrics that we utilize to compare MFML-MC method with standard MC method that uses either high-fidelity model or low-fidelity model. In section 5.5, we provide the results for quarter five spot problem and in section 5.6, we provide the results for uniform flow problem.

#### 5.1. High-Fidelity Model Setup

We consider two-phase flow of oil and water in a two dimensional porous media domain [0 1] × [0 1] where water is injected to displace the residual oil. We consider Equation (3) as a highfidelity model to describe the flow behavior of oil and water. We define the relative permeability based on Corey's model krw(s) = s ∗2 , kro = (1 − s ∗ ) 2 , where s <sup>∗</sup> = (s − swc)/(1 − sor − swc), swc is the irreducible water saturation and sor is the residual oil saturation (Aarnes et al., 2007). We set swc = 0.2, sor = 0.2, and initial water saturation to swc(0.2). We set the porosity field in the porous media domain to a constant value of 0.2. We set viscosity ratio of water and oil to 0.2. We consider uncertainties from the permeability field and assumed to be modeled as a log-normal distribution function with zero mean and exponential covariance that takes the form

$$\mathbb{C}ov = \sigma\_k^2 \, \exp\left[-\frac{|\mathbf{x}\_1 - \mathbf{x}\_2|}{\iota\_k}\right] \tag{23}$$

where σ 2 k is the variance, ι<sup>k</sup> is the correlation length. We set σ 2 k to 1 and ι<sup>k</sup> to 0.1. Sample realizations of log-permeability values are displayed in **Figure 2**.

As mentioned in section 2, we use sequential formulation to solve Equation (3) for pressure and water saturation (Aarnes et al., 2007). We first generate a uniform mesh of 96 × 96 blocks in a spatial domain. We use finite volume method with two point flux approximation (Aarnes et al., 2007) to solve for pressure and an upwind finite-volume method with an implicit backward Euler method combined with Newton-Raphson iterative method to solve for saturation. We set time step size to 0.015 and we solve Equation (3) for 200 time steps. We solve pressure and update velocity field at every 8th time step as pressure field changes much slower than saturation field over time. Time is measured by a non dimensional unit called pore volumes injected (PVI) (Ibrahima, 2016).

As defined in section 2, QoI is **u** ∈ R <sup>m</sup>, where u<sup>i</sup> = ys(x<sup>i</sup> , yi), i = 1 · · · m ≪ n at specific time steps. The first moment estimate of **u**(t) at specific time steps are the desired statistic. The interested grid points ((x<sup>i</sup> , yi) i = 1 · · · m) are 6 × 6 grid points (m = 36) uniformly selected from the 96 × 96 spatial domain. The interested time steps are t = 10, 20, · · · , 200. We solve Equation (3) for 25,000 random realizations of the permeability field and use Monte Carlo method to estimate the statistics of **u** (Ibrahima, 2016).

#### 5.2. Low-Fidelity Model Setup

We first compute the optimal POD bases matrices **U**<sup>u</sup> and **U**Y<sup>i</sup> for all i = 0 · · · I. We compute the POD matrices from the SVD of the snapshot matrices **X**u,**X**Y<sup>i</sup> , i = 0 · · · I. We built the snapshot matrices from 10 random samples of high-fidelity model solution data. In order to select the 10 random highfidelity models to build snapshot matrices, we use K-means clustering algorithm to cluster 25,000 random permeability realizations into 10 clusters (Ghasemi, 2015). Then, we solve the high-fidelity model for a single permeability realization from each cluster.

Following that, the obtained matrix **U**<sup>u</sup> is utilized to build a sequence of POD approximations of **u** (as detailed in the section 4) from the collected training data. Then the matrix **U** qi Yi is utilized to compute training samples of **Y**˜ <sup>i</sup> (the reduced representation of **Y**i) for all i = 0 · · · I. We set I = 18, r<sup>i</sup> = i, and therefore q<sup>i</sup> = 1 as already mentioned in section 4.

Next, we build a level specific GBTR on every level (i = 0 · · · I) to estimate **F**<sup>i</sup> and utilize the estimated **F**<sup>i</sup> in Equation (22) to compute **Y**˜ 1 i (t+1). We use Scikit-learn (Pedregosa et al., 2011) a machine learning python package to implement the GBTRs. We use the training samples of **Y**˜ <sup>i</sup> to fit the level specific GBTR.

FIGURE 2 | Sample plots of log-permeability field. Uncertain permeability field is modeled from a log-normal distribution function with zero mean and exponential covariance.

# 5.3. Standard MLMC With POD-Galerkin ROMs Setup

We first compute optimal POD basis vectors for the pressure and saturation solution vectors from the SVD of the corresponding snapshot matrices. We built the snapshot matrices from the solution vectors (pressure and saturation) collected from the solutions of the high-fidelity model for 45 random realizations of the permeability field. We then built low-fidelity ROM models of different dimensions via Galerkin projection of the discretized system of the high-fidelity model Equation (3) on to the POD space spanned by the POD basis vectors.

Following that one can obtain MLMC framework using Galerkin projection POD ROMs as low-fidelity models. However, in the two numerical test cases namely, the quarter five spot problem (5.5), and the uniform flow problem (5.6), we obtained accurate and stable POD results only when the dimensions of the POD-Galerkin ROMs were on the order of magnitude nearly equivalent to the dimension of the high-fidelity state variable (Xiao et al., 2017; Kani and Elsheikh, 2018). Hence, the computational cost to obtain the QoI from the POD-Galerkin ROM is more than the computational cost to obtain the QoI from the high-fidelity model for a single realization. Therefore, it was infeasible to derive an effective MLMC framework with a hierarchy of low-fidelity models based on standard POD ROMs. This is expected because the governing equations of the flow problem Equation (3) has nonpolynomial nonlinearity and is well known issue in reduced order modeling for multi-phase subsurface flow problems (Chaturantabut and Sorensen, 2011; He et al., 2011; Jansen and Durlofsky, 2017; Kani and Elsheikh, 2018). We also note that we conducted extensive study on reduced order modeling for these two problems in Kani and Elsheikh (2018) and we obtained inaccurate and unstable results when using POD ROMs. At this point, we request the readers to refer figures included in the numerical results section of Kani and Elsheikh (2018), where some of the standard POD ROM unstable results are displayed. Hence, we have not included the comparison of MFML-MC method with MLMC method based on standard POD ROMs as low-fidelity models.

### 5.4. Evaluation Metrics

We evaluate the performance of MFML-MC method using two time specific error metrics defined by

$$\begin{aligned} \hat{\mathbf{e}}\_{t}^{\text{bias}} &= \frac{1}{N\_{\text{c}}} \sum\_{j=1}^{N\_{\text{c}}} \|\hat{\mathbf{u}}\_{t}^{\text{ref}} - \hat{\mathbf{u}}\_{t}^{(j)}\|\_{2}^{2} \\ \hat{\mathbf{e}}\_{t}^{\epsilon} &= \frac{1}{N\_{\text{c}}} \sum\_{j=1}^{N\_{\text{c}}} \mathbb{V} \text{ar}(\hat{\mathbf{u}}\_{t}^{(j)}) \end{aligned} \tag{24}$$

where N<sup>e</sup> is the number of runs utilized to estimate the errors, **u**ˆ ref t is the reference result of E[**u**t] obtained from Monte Carlo estimate **u**ˆ (MC) t computed with N = 25, 000 high-fidelity model samples. **u**ˆ (j) t is the approximation of E[**u**t] that can be obtained from various estimators including Monte Carlo estimate that uses only high-fidelity model, Monte Carlo estimate that uses only low-fidelity model, and the MFML-MC estimator. We note that, **u**ˆ (j) t is obtained for a fixed computational budget p. The computational budget is measured in terms of the cost required to run p number of MC realizations that uses only high-fidelity

number of MC realizations that uses only high-fidelity model. (Top Row) Estimation of E[u<sup>t</sup> ] at time t = 0.3 PVI. (Bottom Row) Estimation of E[u<sup>t</sup> ] at time t = 0.8 PVI.

model. We also note that, **u**ˆ (j) t is evaluated from a different set of independent samples for set j = 1 · · · N<sup>e</sup> .

Additionally, we utilize two global error metrics defined as

$$\begin{aligned} \hat{\mathbf{e}}^{\text{bias}} &= \frac{1}{N\_{\epsilon}} \sum\_{j=1}^{N\_{\epsilon}} \max\_{t=1}^{T\_{\epsilon}} \|\hat{\mathbf{u}}\_{t}^{\text{ref}} - \hat{\mathbf{u}}\_{t}^{(j)}\|\_{2}^{2} \\ \hat{\mathbf{e}}^{\epsilon} &= \frac{1}{N\_{\epsilon}} \sum\_{j=1}^{N\_{\epsilon}} \max\_{t=1}^{T} \text{Var}(\hat{\mathbf{u}}\_{t}^{(j)}) \end{aligned} \tag{25}$$

where all the time snapshots of **u** are used. We set N<sup>e</sup> = 15 to evaluate the two time specific error metrics and the two global error metrics.

#### 5.5. Numerical Test Case 1

Test case 1 is two dimensional quart-five spot problem where water is injected in the lower left corner (0, 0) of the porous media domain to produce oil and water in the top right corner (1, 1) (Kani and Elsheikh, 2018). We set q defined in the saturation equation (Equation 3) to 0.05 at (0, 0) and −0.05

at (1, 1). We set no flux boundary condition in all the four sides of the porous media domain. The left panel of **Figure 3** displays the quart-five spot problem set up and the right panel of **Figure 3** displays the decay of the singular values of the snapshot matrix **X**u.

**Figure 4** shows the results for the estimation of E[**u**t] (first moment of **u**) obtained from the reference result (MC estimate with 25,000 samples) and from various MC estimators. In **Figure 4**, MC estimator that uses only high-fidelity model is denoted as MC-HF and that uses only low-fidelity model is denoted as MC-LF. In **Figure 4**, results shown in the top row are obtained at time = 0.3 PVI and results shown in the bottom row are obtained at time = 0.8 PVI. As shown in **Figure 4**, the estimation of E[**u**t] obtained from MC-LF deviates significantly from the reference result. This clearly shows that utilizing only low-fidelity model in MC framework resultant in biased estimation with respect to the reference result. Furthermore, **Figure 4** shows that the estimation of E[**u**t] obtained from MFML-MC estimator is almost indistinguishable from the reference result. This result confirm that combining higher number of low-fidelity model realizations with the high-fidelity model in MFML-MC framework can improve the estimator of the first moment of the saturation field.

**Figure 5** reports the comparison of eˆ bias t and eˆ ǫ t (see Equation 24) obtained from various estimators. The left of **Figure 5** reports eˆ bias t and the right of **Figure 5** reports eˆ ǫ t as a function of computational budget p = [1, 2, 3, 4, 5] × 10<sup>2</sup> , where p is the number of MC-HF realizations. The results of eˆ bias t from **Figure 5** shows that Monte Carlo estimator that uses MC-LF is a biased estimator of the mean QoI value. The results of MFML-MC estimator displayed in left of **Figure 5** confirm that the MFML-MC estimator is an unbiased estimator of the

TABLE 1 | Performance chart of MFML-MC estimator for test case 1.


ǫ defined in Equation (5) is shown as a function of computational budget p, where p is the number of MC realizations that uses the high-fidelity model only. ǫ is estimated at time = 0.3 PVI.

expectation. This shows that despite the low-fidelity model is a poor approximation of the high-fidelity model, the error of the MFML-MC estimator can be significantly reduced if the lowfidelity model is combined with the high-fidelity. The right of **Figure 5** shows that the variance of the MFML-MC and MC-LF estimators are at least an order of magnitude less when compared to MC-HF. Nevertheless, while MC-LF is a biased estimator as shown in left of **Figure 5**, MFML-MC estimator that uses the lowfidelity model in combination with the high-fidelity model is an unbiased estimator of the expectation.

**Figure 6** reports the comparison of eˆ bias and eˆ ǫ (see Equation 25) obtained from various estimators. We can clearly observe the trend of eˆ bias, and eˆ ǫ in **Figure 6** are similar to the one observed in **Figure 5** which confirms that MFML-MC method leads to variance reduction with unbiased estimation at all time steps.

**Table 1** compare the speedup factors of MFML-MC method with respect to the Monte Carlo estimator that uses the highfidelity model only. In **Table 1**, MFML-MC achieves a speedup with respect to MC-HF that range from 8 up to 15 for the same specific ǫ.

## 5.6. Numerical Test Case 2

Test case 2 is a two dimensional uniform flow problem where water is injected from the left side of the porous media domain to produce oil and water from the right side. We set no flow boundary conditions in the remaining two sides (top and bottom) of the domain. We set inflow rate to 0.08 and outflow rate to 0.08 due to incompressibility constraint set in the problem (Kani and Elsheikh, 2018). The left panel of **Figure 7** displays uniform flow problem set up and the right panel of **Figure 7** displays the decay of the singular values of the snapshot matrix **X**u.

**Figure 8** shows the results for the first moment of the saturation field (**u**) obtained from the reference result (MC estimate with 25000 samples) and from various MC estimators. The display settings defined in **Figure 8** are the same as the one defined in **Figure 4**. In **Figure 8**, we can see that the results obtained from MFML-MC method are almost indistinguishable

FIGURE 9 | Test case 2: Plot of eˆ bias t and eˆ ǫ t (Equation 24) for the estimation of E[u<sup>t</sup> ] (water saturation field at 6 × 6 spatial grid) obtained from various estimators. eˆ bias t and eˆ ǫ t are shown as a function of computational budget p = [1, 2, 3, 4, 5] × 10<sup>2</sup> , where p is the number of MC realizations N that uses only high-fidelity model. (Left) eˆ bias t at time t = 0.3 PVI. (Right) eˆ ǫ t at time t = 0.3 PVI.

from the reference results whereas MC-LF yields extremely inaccurate results.

**Figure 9** reports the comparison of eˆ bias t and eˆ ǫ t (see Equation 24) obtained from various estimators. The variance reduction can be clearly observed in **Figure 9** and the trend of **Figure 9** is similar to the one observed in **Figure 5** (Test case 1). The results of **Figure 9** again confirm that combining the highfidelity model with the low-fidelity model leads to a variance reduction. Please note that a similar confirmation was observed in **Figure 5**.

**Figure 10** reports the comparison of eˆ bias and eˆ ǫ (see Equation 25) obtained from various estimators. As observed in

TABLE 2 | Performance chart of MFML-MC estimator for test case 2.


ǫ defined in Equation (5) is shown as a function of computational budget p, where p is the number of MC realizations that uses the high-fidelity model only. ǫ is estimated at time = 0.3 PVI.

**Figure 6**, the results displayed in **Figure 9** shows that MFML-MC method leads to variance reduction with unbiased estimation.

**Table 2** compare the speedup factors of MFML-MC method with respect to the MC method that uses the high-fidelity model only. In **Table 2**, MFML-MC achieves a speedup with respect to MC-HF that range from 10 up to 19 at a specific ǫ.

#### 6. CONCLUSION

In this paper, we proposed a MFML-MC method combining the features of both the MFMC method and the MLMC method. In MFML-MC method, we formulated MLMC framework with a sequence of POD approximations of high-fidelity model outputs. Furthermore, in MFML-MC method, we formulated a MFMC setup on every level of MLMC framework in order to compute an unbiased statistical estimation. Finally, we utilized GBTR in the MFMC setup to formulate a level specific low-fidelity model.

We applied MFML-MC method on two uncertainty quantification problems involving two-phase flows in random

#### REFERENCES


heterogeneous porous media where standard MLMC method with POD-Galerkin ROMs is ineffective. The uncertain permeability field is modeled from log-normal distribution function with exponential covariance function. Estimate of the first statistical moments of the water saturation at uniformly selected spatial grid points over a specific instant in time are calculated by MFML-MC, MC-HF, and MC-LF methods. Comparisons between MFML-MC and MC-LF suggested that MC-LF as a biased estimator and MFML-MC estimator as an unbiased estimator of the expectation. Comparisons between the MFML-MC and MC-HF computing times showed speedups of MFML-MC with respect to MC-HF that ranged from 8 up to 19 at equivalent accuracy.

Future work should consider the extension of MFML-MC method by utilizing two or more level specific low-fidelity models in the MFMC setup. In addition, it will also be interest to use MFML-MC method for history matching (Elsheikh et al., 2012, 2013), where we aim to minimize the mismatch between field observation data and the one computed from the high-fidelity model simulations by adjusting the geological model parameters. Future work should also verify the applicability of MFML-MC method for large-scale realistic problems with many wells and time varying injection rates by which the potential of MFML-MC method in speeding up a realistic Monte Carlo simulation can be magnified.

#### AUTHOR CONTRIBUTIONS

NJ developed the algorithm, coded the algorithm in python and obtained the results, and wrote the manuscript. AE is the Ph.D. supervisor of NJ. NJ did this paper under the guidance of AE.

in porous media. Math. Comput. Modell. Dyn. Syst. 17, 337–353. doi: 10.1080/13873954.2011.547660


Commun. Comput. Phys. 17, 259–286. doi: 10.4208/cicp.021013. 260614a


transport in randomly heterogeneous porous media. Adv. Water Resour. 32, 712–722. doi: 10.1016/j.advwatres.2008.09.003


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Jabarullah Khan and Elsheikh. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Using Meta-Models for Tsunami Hazard Analysis: An Example of Application for the French Atlantic Coast

Vito Bacchi<sup>1</sup> \*, Hervé Jomard<sup>1</sup> , Oona Scotti<sup>1</sup> , Ekaterina Antoshchenkova<sup>1</sup> , Lise Bardet<sup>1</sup> , Claire-Marie Duluc<sup>1</sup> and Hélène Hebert<sup>2</sup>

1 Institute for Radiological Protection and Nuclear Safety, Fontenay-aux-Roses, France, <sup>2</sup> CEA, DAM, DIF, Arpajon, France

This paper illustrates how an emulator (or meta-model) of a tsunami code can be a useful tool to evaluate or qualify tsunami hazard levels associated with both specific and unknown tsunamigenic seismic sources. The meta-models are statistical tools permitting to drastically reduce the computational time necessary for tsunami simulations. As a consequence they can be used to explore the tsunamigenic potential of a seismic zone, by taking into account an extended set of tsunami scenarios. We illustrate these concepts by studying the tsunamis generated by the Azores-Gibraltar Plate Boundary (AGPB) and potentially impacting the French Atlantic Coast. We first analyze the impact of two realistic scenarios corresponding to potential sources of the 1755-Lisbon tsunami (when uncertainty on seismic parameters is considered). We then show how meta-models could permit to qualify the tsunamis generated by this seismic area. All the results are finally discussed in light of tsunami hazard issued by the TSUMAPS-NEAM research project available online (http://ai2lab.org/tsumapsneam/ interactive-hazard-curve-tool/). From this methodological study, it appears that tsunami hazard issued by TSUMAPS-NEAM research project is envelop, even when compared to all the likely and unlikely tsunami scenarios generated in the AGPB area.

#### Edited by:

Philippe Renard, Université de Neuchâtel, Switzerland

#### Reviewed by:

Jacopo Selva, Istituto Nazionale di Geofisica e Vulcanologia (INGV), Italy Tipaluck Krityakierne, Mahidol University, Thailand

> \*Correspondence: Vito Bacchi vito.bacchi@gmail.com

#### Specialty section:

This article was submitted to Solid Earth Geophysics, a section of the journal Frontiers in Earth Science

Received: 08 May 2019 Accepted: 04 February 2020 Published: 03 March 2020

#### Citation:

Bacchi V, Jomard H, Scotti O, Antoshchenkova E, Bardet L, Duluc C-M and Hebert H (2020) Using Meta-Models for Tsunami Hazard Analysis: An Example of Application for the French Atlantic Coast. Front. Earth Sci. 8:41. doi: 10.3389/feart.2020.00041 Keywords: kriging surrogate, uncertainty quantification, tsunami modeling, hazard analysis, sensitivity analysis

# INTRODUCTION

The evaluation of tsunami impact requires accurate simulation results for planning and risk assessment purposes because of the severe consequences which could be associated to this kind of event. Considering that tsunami phenomena involve a large span of parameters at different spatial and temporal scales (Behrens and Dias, 2015), even a single run of a tsunami numerical model can

**Abbreviations:** µ, shear modulus; [N(m(x), s 2 (x))], Gaussian process of mean "m(x)" and variance "s 2 (x)"; [M(x)], kriging surrogate; {X,Y}, are the coordinates of the design simulations used for kriging parameters evaluation; C(.), covariance kernel; CEA, commissariat à l'énergie atomique et aux énergies alternatives; D [m], average slip along the rupture surface; DTHA, deterministic tsunami hazard assessment; GSA, global sensitivity analysis; L [m], length of the rupture surface; MCS, maximum credible scenario; MCS\_h, tsunami hazard level issued by an exploration of a very wide range of tsunamigenic scenarios; Mo, seismic moment; Mw, seismic moment magnitude; MSE, mean squared error; PTHA, probabilistic tsunami hazard assessment; R 2 , squared correlation coefficient; RMSE, root mean squared error; UQ, uncertainty quantifications; W [m], width of the rupture surface.

be prohibitively long, in the order of minutes to days, according to the study area characteristics and to the resolution of the numerical model. Hence, a common practice when the computational code is time-consuming is the use of meta-models (also denoted surrogate-models or emulators). A meta-model is a mathematical model of approximation of the numerical model, built on a learning basis (Razavi et al., 2012a). Meta-models have been applied, for example, in hydraulic fields to model physical variables such as flows (Wolfs et al., 2015; Machac et al., 2016), flood damages (Yazdi and Salehi Neyshabouri, 2014), or in the field of the design for civil flood defenses (Richet and Bacchi, 2019). A comprehensive review of the use of meta-models in environmental research was proposed by Razavi et al. (2012a) for the interested reader.

Meta-models have also already been used in the field of tsunamis. For instance, Sraj et al. (2014) investigate the uncertainties in the resulting wave elevation predictions due to the uncertainty in the Manning's friction coefficient, using polynomial chaos expansion to build a surrogate model that is a computationally cheap approximation of the computer model. Sarri et al. (2012) used in a similar way a statistical emulator of the analytical landslide-generated tsunami model developed by Sammarco and Renzi (2008). More recently, Rohmer et al. (2018) studied the uncertainty related to the source parameters through a Bayesian procedure to infer (i.e., learn) the probability distribution of the source parameters of the earthquake. However, to our knowledge, meta-modeling has never been used for tsunami hazard analysis.

In this paper, we propose to apply meta-modeling techniques in the framework of deterministic tsunami hazard assessment (DTHA) and evaluate how it can be useful in seismic areas with no (or poor) seismotectonic knowledge. In such cases, when seismotectonic parameters are uncertain, it may be of interest to provide a first order idea of the tsunami hazard potential through DTHA, the implementation of DTHA being simpler than the probabilistic method (PTHA). The scenario-based (or DTHA) approach classically relies on the study of "maximum credible scenarios" (MCS). In particular, DTHA tries to explore the potential of the largest scenarios, by selecting some of the extreme ones (i.e., a recorded/reconstructed historical event) and simulating them for the target area through numerical modeling (JSCE, 2002; Lynett et al., 2004), without addressing the likelihood of occurrence of such a big event (Omira et al., 2016). With this approach, MCS is assessed through an expert opinion. The outputs of the deterministic analysis are, in general, tsunami travel time, wave height, flow depth, run-up, and current velocity maps corresponding to the chosen scenario (Omira et al., 2016).

As mentioned above, DTHA relies on a refined knowledge of the seismic sources generating the tsunamis. As a consequence, it could be hampered by the use of specific values of input parameters which may be subjective depending on the person or group carrying out the analysis (Roshan et al., 2016). A good example is the 1755 Lisbon tsunami, generated by an earthquake in the Azores Gibraltar Plate Boundary (AGPB). The Great 1755 Lisbon earthquake generated the most historically destructive tsunami near the Portugal coasts (Santos and Koshimura, 2015). Source location and contemporary effects of such tsunami are not precisely identified and several earthquake scenarios have already been published in the literature in the last decades (Johnston, 1996; Baptista et al., 1998, 2003; Zitellini et al., 1999; Gracia et al., 2003; Terrinha et al., 2003; Gutscher et al., 2006; Grandin et al., 2007; Horsburgh et al., 2008; Barkan et al., 2009; Cunha et al., 2010). All of these studies show how variable the parameters of the seismic source can be and the importance to take into account their uncertainty.

In DTHA, the classical approach to deal with uncertainties consists in performing a limited number of deterministic simulations with conservative values of the seismic sources (e.g., JSCE, 2002; Lynett et al., 2004; Allgeyer et al., 2013). However, the great M 9.0 Tohoku-Oki subduction earthquake of 2011, the largest ever recorded in Japan (Saito et al., 2011), has clearly shown the limitations of the classical approach focused on the identification of known maximum tsunamigenic sources (MCS approach). Recently, Roshan et al. (2016) improved the DTHA procedure detailed in Yanagisawa et al. (2007) in order to better evaluate the effects of the seismic source uncertainties through Monte-Carlo simulations of a limited number of seismic source parameters (the dip angle, the strike and the source location), leading to around 300 tsunami scenarios. The authors presented an improvement of the classical MCS approach by introducing uncertainties on seismic source parameters.

In this context, the objective of this work is to propose a new methodology to evaluate or qualify tsunami hazard levels associated to both specific and unknown tsunamigenic seismic sources by integrating the uncertainty related to the seismic parameters in the DTHA procedure. The main idea is to develop a meta-model, or emulator, of the tsunami numerical model, that makes it possible to perform a large number of tsunami scenarios with reduced computational time and, consequently, to intensively explore a tsunamigenic area for which geological and geophysical datasets may be limited. The constructed metamodels, when exploited with the statistical criteria classically employed in uncertainty quantification (UQ) studies (Saltelli et al., 2000, 2008; Saltelli, 2002; Iooss and Lemaître, 2015), can permit to go beyond the classical approach for DTHA and perform a better quantification of uncertainties of a large set of seismic source parameters.

In the next sections we will first present the methodology used in this study (see section "Methodology"), and then develop and validate the meta-models for AGPB related tsunamis impacting the French Atlantic Coast (see section "Application of the methodology to the French Coast"). In section "Potential application of meta-models for tsunami hazard analysis," we (1) evaluate the impact of two realistic scenarios corresponding to potential sources of the 1755-Lisbon tsunami (when uncertainty on seismic parameters is considered), and (2) present the analysis of the tsunamigenic potential of the AGPB zone, considered for the purpose of the exercise as a poorly known tsunamigenic area. Finally, we show how the numerical results obtained with this method can be discussed in light of tsunami hazard issued by the TSUMAPS-NEAM research project available on line<sup>1</sup> .

<sup>1</sup>http://ai2lab.org/tsumapsneam/interactive-hazard-curve-tool/

# METHODOLOGY

feart-08-00041 February 29, 2020 Time: 17:20 # 3

The proposed approach relies on three main steps, as reported in **Figure 1**. STEP 1 consists in the construction of a numerical model able to reproduce the tsunami heights generated by a given seismic area and impacting a target area. In STEP 2, the numerical model simulates a regular set of physical tsunamigenic scenarios (so called design data-base) that are used for the construction and the validation of an emulator (the meta-model) able to reproduce the original model results in the target zone. In STEP 3, the validated meta-models may be used for DTHA assessment and/or qualification of other THA results. The UQ performed using meta-models instead of the original model permits to assess the uncertainty related to a given tsunami scenario (see section "In the case of expert opinion: uncertainty quantification in DTHA") and to explore intensively the tsunamigenic area with nearly zero computational time (see section "Without expert opinion: exploration of a large tsunamigenic area").

# Step 1: Tsunami Simulations

The first step consists in the construction of a tsunami numerical model of the area to explore. In this study, the tsunami numerical simulations were performed by using the tsunami code reported in Allgeyer et al. (2013) (or CEA-code), which exploits two models, one for tsunami initialization and the other one for tsunami propagation. The initial seabed deformation caused by an earthquake is generated with the Okada model (Okada, 1985) and is transmitted instantaneously to the surface of the water.

This analytical model uses simplistic planar fault parameters with uniform slip, satisfying the expression of the seismic moment Mo:

$$M\_0 = \mu \cdot D \cdot L \cdot W \tag{1}$$

where µ denotes the shear modulus, D [m] the average slip along the rupture of length L [m] and width W [m]. Then, the seismic moment magnitude "Mw" is directly computed through equation 2 (Hanks and Kanamori, 1979), as follow:

$$M\_W = \frac{2}{3} \cdot \log\_{10}(M\_0) - 6,07 \tag{2}$$

The following parameters are also required for tsunami initiation: longitude, latitude, and depth [km] of the center of the source, strike [degrees], dip [degrees], and rake [degrees]. A conceptual scheme of the input parameters for the tsunami-code and the numerical domain used in this study are reported in **Figure 2**. Then, the computation of the tsunami propagation is based on hydrodynamic equations, under the non-linear shallow water approximation (the Boussinesq equations as reported in Allgeyer et al., 2013). Shallow water equations are discretized using a finite-differences method in space and time (FDTD). Pressure and velocity fields are evaluated on uniform separate grids according to Arakawa's C-grid (Arakawa, 1972). Partial derivatives are approximated using upwind finitedifferences (Mader, 2004). Time integration is performed using the iterated Crank-Nicholson scheme. No viscosity terms are taken into account in our simulations. The only parameters of this model are the bathymetry (space step and depth resolution) and the time step.

# Step 2: Meta-Model Design and Validation

#### Meta-Models Design

A variety of metamodels have been applied in the water resources literature (Santana-Quintero et al., 2010; Razavi et al., 2012b). Moreover, some examples of applications have already been published in the context of flood management (e.g., Yazdi and Salehi Neyshabouri, 2014; Löwe et al., 2018) and in the field of tsunamis (e.g., Sarri et al., 2012; Sraj et al., 2014; Rohmer et al., 2018). The classical steps for meta-models construction and validation are reported in various studies (i.e., Saltelli, 2002; Saltelli et al., 2008; Faivre et al., 2013), and are shortly summarized in **Table 1**.

For the study presented here, we rely on conditional Gaussian processes (also known as kriging (Roustant et al., 2012), derived from Danie Krige's pioneering work in mining (Krige, 1951), later formalized within the geostatistical framework by Matheron (1963). Kriging meta-model has already shown good predictive capacities in many practical applications (see Marrel et al., 2008, for example), it became a standard meta-modeling method in operational research (Santner et al., 2003; Kleijnen, 2005) and it has performed robustly in previous water resource applications (Razavi et al., 2012b; Villa-Vialaneix et al., 2012; Löwe et al., 2018).

A general kriging model "M(x)" (which later provides an estimation of the maximum tsunami height in a given location)

can be defined for x = (x1, ..., xd) ∈ D ∈ R d as the following Gaussian process "N(.)":

$$M(\mathbf{x}) = N(m(\mathbf{x}), s^2(\mathbf{x})) $$

Where, for simple kriging:


TABLE 1 | Steps for meta-models construction, validation and evaluation of the uncertainty.



It must be noted that {X,Y} are the coordinates of the design simulations used for kriging parameters evaluation:


$$X = \begin{pmatrix} \mathbb{1}\_{1,1} & \dots & \mathbb{1}\_{1,9} \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \mathbb{1}\_{N,9} & \dots & \mathbb{1}\_{N,9} \end{pmatrix}$$


$$Y = \begin{pmatrix} h\_{\max, \, 1} \\ \vdots \\ h\_{\max, N} \end{pmatrix}$$

More than a commonplace deterministic interpolation method (like splines of any order) this model is much more informative owing to its predicted expectation and uncertainty. The fitting procedure of this model includes the choice of a covariance model [here a tensor product of the "Matern52" function (Roustant et al., 2012)], and then the covariance parameters (e.g., range of covariance for each input variable, variance of the random process, nugget effect), could be estimated using Maximum Likelihood Estimation (standard choice we made) or Leave-One-Out minimization [known to mitigate the arbitrary covariance function choice (see Bachoc, 2013)].

It must be noted that the kriging interpolation technique requires computing and inverting the n × n covariance matrix C(X, X) between the observed values Y(X), which leads to a O(n<sup>2</sup> ) complexity in space and O(n<sup>3</sup> ) in time (Rullière et al., 2018). In practice, this computational burden makes Gaussian process regression difficult to use when the number of observation points is in the range [10<sup>3</sup> ,10<sup>4</sup> ] or greater, as in this study. As a consequence, we used in this article the procedure for estimating the parameters of kriging reported in Rullière et al. (2018), by using an adapted R-tool available on line<sup>2</sup> . The full details of this methodology are reported in the abovementioned paper. This approach is proven to have better theoretical properties than other aggregation methods that can be found in the literature, and permitted us to drastically reduce the computational time necessary for metamodels construction and validation.

<sup>2</sup>https://github.com/drulliere/nestedKriging

#### Meta-Models Validation

feart-08-00041 February 29, 2020 Time: 17:20 # 5

A general method for meta-models validation is the K-fold crossvalidation method (Friedman et al., 2001). The principle of crossvalidation is to split the data into K folds of approximately equal size A1A1,. . .,AKAK. For k = 1 to K, a model Yˆ (−k) is fitted from the data Uj6=kA<sup>k</sup> (all the data except the A<sup>k</sup> fold) and this model is validated on the fold A<sup>k</sup> . Given a criterion of quality L as the Mean Square Error:

$$L = MSE = \frac{1}{n} \sum\_{i=1}^{n} (\hat{\nu}\_i - \wp\_i)^2 \tag{3}$$

the quantity used for the "evaluation" of the model is computed as follow:

$$L\_k = \frac{1}{n/K} \sum\_{i \in A\_K} L(\mathbf{y}\_i Y^{(-k)}(\mathbf{x}\_i)),\tag{4}$$

where yˆ<sup>i</sup> and y<sup>i</sup> are, respectively, the meta-model and the model response and n is the number of simulations in the k th sample. When K is equal to the number of simulations of the training set, the cross-validation method corresponds to the leave-one-out technique not performed in this study. The methodology employed is described in the DiceEval R-package reference-manual (Dupuy et al., 2015). In our application case, we considered K = 10.

In this study, the accuracy of the meta-model is evaluated through several statistical metrics permitting to quantify the overall quality of regression models. This includes:


# Step 3: Uncertainty Quantification and Global Sensitivity Analysis

Considering the variety and the complexity of the geophysical mechanisms involved in tsunami generation, tsunami hazard assessment is generally associated with strong uncertainties (aleatory and epistemic). In PTHA, uncertainties are classically integrated in a rigorous way (Sørensen et al., 2012; Horspool et al., 2014; Selva et al., 2016) and quantified using the logictree approach (Horspool et al., 2014) and/or random simulations performed using the Monte-Carlo sampling of probability density functions of geological parameters (Sørensen et al., 2012; Horspool et al., 2014). An alternative and interesting approach was recently proposed by Selva et al. (2016), consisting in the use of an event tree approach and ensemble modeling (Marzocchi et al., 2015). Moreover, a new procedure was recently proposed by Molinari et al. (2016) for the quantification of uncertainties related to the construction of a tsunami data-base based on the quantification of elementary effects.

In this work we propose a classical methodology that could also be adapted to analyze tsunamigenic regions with poor (or no) information on crustal characteristics and based on the classical uncertainty study steps (Saltelli et al., 2004, 2008; Faivre et al., 2013; Iooss and Lemaître, 2015). This methodology, which was already tested in other hydraulic context in recent years (Nguyen et al., 2015; Abily et al., 2016), relies on Monte Carlo simulations for UQ steps and on GSA (Global Sensitivity Analysis) approaches for the analysis of the AGPB tsunamigenic potential, by computing Sobol indices (Sobol, 1993, 2001). These methods rely on sampling based strategies for uncertainty propagation, willing to fully map the space of possible model predictions from the various model uncertain input parameters and then, allowing to rank the significance of the input parameter uncertainty contribution to the model output variability (Baroni and Tarantola, 2014). The objectives with this approach are mostly to identify the parameter or set of parameters which significantly impact model outputs (Iooss et al., 2008; Volkova et al., 2008). GSA approaches are robust, have a wide range of applicability, and provide accurate sensitivity information for most models (Adetula and Bokov, 2012). Moreover, even if they are theoretically defined for linear mathematical systems, it was demonstrated that they are well suited to be applied with models having non-linear behavior and when interactions among parameters occur (Saint-Geours, 2012), as in the present study. For these reasons, these indices were already adopted for the analysis of bi-dimensional hydrodynamic simulations in urban areas (Abily et al., 2016) or of complex coastal models including interactions between waves, current and vegetation (Kalra et al., 2018) and they seem well suited for the present work.

For the computation of Sobol' indices, a large variety of methodologies are available, as the so-called "extended-FAST" method (Saltelli et al., 1999), already used in previous studies by IRSN (Nguyen et al., 2015). In this study, we used the methodology proposed by Jansen et al. (1994) already implemented in the open source sensitivity-package R (Pujol et al., 2017). This method estimates first order and total Sobol' indices for all the factors "v" at higher total cost of "v × (p + 2)" simulations (Faivre et al., 2013).

# APPLICATION OF THE METHODOLOGY TO THE FRENCH COAST

The French Atlantic coast is subjected to two main seismogenic sources that could generate tsunamis, one in the lesser Indies, and a second one from the AGPB. In this application we only consider the AGPB and we only compute water heights offshore for four locations (**Figure 3**), ignoring the necessary refinements for propagation to the coast. Because in this study case the sources are far from the considered gauges, we propose a very simplified approach to characterize the source region. In the following we first present how the meta-model was constructed and validated through a series of statistical tests in comparison with published data from Allgeyer et al. (2013).

#### Numerical Tool and Design Data-Base

All the simulations were performed on the same bathymetric grid with a space resolution of 2' (∼3.6 km). The numerical model was not directly validated by the comparison with similar simulations from literature. In fact, considering the rough bathymetrical grid resolution, the developed numerical model is not adapted to the estimation of the tsunamis run-up and the inundation areas and it can't be used for a real assessment of the tsunami hazard along the French Atlantic Coast. However, this work being methodological, we consider that the numerical results are consistent with the objectives of the study. Moreover, the tsunami-code was largely validated through extensive benchmarks in the framework of the TANDEM research project (Violeau et al., 2016) by ensuring its ability to reproduce tsunamis generation and propagation. As a consequence, the order of magnitude of the tsunami heights computed in this study should be realistic and adapted to the test of the methodology.

In order to perform the numerical simulations needed for the meta-model construction and validation (see section "Step 2: meta-model design and validation"), the CEA-code was coupled with the IRSN Promethee bench. Promethee is an environment for parametric computation that allows carrying out UQ studies, when coupled (or warped) to a code. This software is freely distributed by IRSN<sup>3</sup> and allows the parameterization with any numerical code and is optimized for intensive computing. Promethee was first linked to the numerical code by means of a set of software links (similar to bash scripts). In this way, numerical simulations were directly lunched by the IRSN environment. Then, the statistical analysis, such as the Monte-Carlo simulations used for the meta-model construction (see section "Meta-models design") and UQ (see section "Analysis of the Impact of Tsunamis on a Target Area through UQ") or the Sobol indices computation (see section "Global Analysis of Seismic Source Influence on the Target Area") were also driven by Promethee, which integrates R statistical computing environment by permitting this kind of analysis (R Core Team, 2016).

In this methodological work, we considered a widened AGPB tsunamigenic area and we chose to explore as largely as possible

<sup>3</sup>http://promethee.irsn.org/doku.php


TABLE 2 | Summary of the variation range of the seismic source input parameters for the design, the western, the eastern database and for the tsunami scenarios associated to the Gorringe bank and the Horseshoe bank (hypothesis from Duarte et al., 2013; Grevemeyer et al., 2017).

\*Seismic source parameters are assumed uniformly distributed and are randomly sampled for the construction of the Global database. \*\*The global database is composed of the scenarios simulated with the meta-models (more than 50,000 tsunamis scenarios) for the construction of the western and the eastern database. \*\*\*Note that depth is considered to be the depth of the seismic source which is assumed to be in the middle of the fault (see Figure 2).

the potential tsunami height along the French Atlantic Coast generated by earthquakes from 34◦ to 40◦N and from 18 to 7◦W, encompassing to the East the southern part of Portugal down to Morocco, and reaching the oceanic sea-floor west of the Madeira-Tore rise (as reported in **Figure 3**). Because the design database is a learning base for meta-modeling, the range of variation of the input parameters (column "Design database" in **Table 2**) need to be large in order to cover a wide range of earthquake scenarios. Thus, if correctly estimated, meta-models will be able to reproduce the model behavior for a large range of variations of the seismic inputs parameters, including physical scenarios from geological studies of the zone.

In order to build the design database, fault parameters as defined in section "Step 1: tsunami simulations" and **Table 2** were sampled randomly and independently with the Monte-Carlo method and supposing uniform distributions. The uniform distribution was chosen in order to build meta-models able to reproduce tsunami heights generated by various tsunamigenic sources with the same accuracy. The resulting earthquake magnitudes are computed using the sampled parameters with equation 2. The shear modulus chosen for the magnitude estimation is a constant value assumed to be equal to 30 GPA. This design database is a matrix which associates to a given combination of fault parameters estimates the maximum simulated water height at each point of the numerical grid and also at four selected locations along the French Atlantic Coast called gauges (**Table 3** and **Figure 3**), namely, from North to South, "Saint-Malo," "Brest," "La Rochelle" and "Gastes." The

TABLE 3 | Location and water depth of the French Gauges chosen for meta-model construction.


maximum tsunami water height is the relevant parameter when estimating tsunami hazard.

# Meta-Models Construction and Validation

Meta-models were constructed using the NestedKriging procedure described in Rullière et al. (2018), as reported in section "Step 2: meta-model design and validation." The design database contains 5839 scenarios used for the meta-model construction and validation. The water height characteristics associated to these scenarios are reported in **Table 4**. Each metamodel is a function able to compute the maximum tsunami water height at the gauge location for a given set of seismic source parameters (strike, length, dip, rake, width, slip, longitude, latitude and depth). Obviously, the input parameters should be included in the parameter range used for the meta-model construction and reported in **Table 2**.

For meta-model validation, the design data-base is split into K folds (K = 10 in this study, for a total of 584 simulations) of approximately equal size and a model is fitted from the data and validated on the fold A<sup>k</sup> . K-fold cross validation is used for two main purposes: (i) to tune hyper parameters of the metamodel and (ii) to better evaluate the prediction accuracy of the meta-model. In both of these cases, the choice of k should permit to ensure that the training and testing sets are drawn from the same distribution. Especially, both sets should contain sufficient variation such that the underlining distribution is represented.

TABLE 4 | Maximum water height associated with the design database; µ, σ and Max correspond to the mean, standard-deviation and maximum modeled values.


TABLE 5 | Summary of meta-model evaluations using cross-validation technique (mean values).


The statistical parameters for cross validation are defined in section "Meta-models validation."

From a practical point of view, the value of K is typically chosen as a good compromise between the computational times needed for the analysis and its reliability (Hastie et al., 2009). Indeed, there is not, to our knowledge, a well-established methodology allowing identifying the optimum number of folds necessary for crossvalidation. In this methodological work, we considered K = 10 as a robust value with regards to the objectives of the study.

Results in terms of the criteria of quality L are reported in **Table 5** and **Figure 4**. It appears that the mean computed values from cross validation are satisfying when considering the large range of parameters variations of the design data-base and the methodological purpose of the study. Indeed, except for the "Gastes" gauge, the mean R 2 is higher than 80%, which is satisfying according to Marrel et al. (2009) and Storlie et al. (2009), and the mean RMSE of few centimeters, indicating that the kriging meta-model is a good emulator choice for reproducing the CEA-tsunami code behavior.

However, it must be underlined that, out of four gauges, results obtained for "Gastes" gauge are not satisfying, at least in terms of statistical performance. In fact, the large variation of the RMSE parameter (from 0.03 to 0.25 m) and the low values of R 2 (varying from 0.6 to 0.9) reported in **Figure 4** suggest that further numerical runs should be necessary to improve the accuracy of kriging.

It must be noted that the methodology used for the metamodel validation is a "state of the art" methodology permitting to focus on the ability of the meta-model to reproduce the mean model response and to estimate the model variability (represented by the variance). Even if this is common in literature, for hazard studies it would also be of interest to focus in the future on other criteria that account for the behavior in the tails of the distributions of the simulated values (extreme values).

# Validation With Results From Allgeyer et al. (2013)

We perform an additional test in order to evaluate the ability of our meta-models to reproduce state of the art tsunami scenarios generated by the AGPB and impacting the French coast. With this aim, we compare our meta-models results with tsunami height simulated at the same location by Allgeyer et al. (2013). This comparison is of interest in order to confirm the ability of the constructed meta-models to reproduce the order of magnitude of the modeled tsunami height at a given location.

In this study, the authors analyzed the impact of a Lisbonlike tsunami on the French Atlantic Coast through numerical modeling. The authors focused on the simulated maximum tsunami water height in the North Atlantic associated to three different sources for the 1755 events derived from Johnston (1996), Baptista et al. (2003), and Gutscher et al. (2006), for a total of five tsunami scenarios (see **Table 6**). The same scenarios were simulated with the constructed meta-models for the four French Atlantic Gauges and with the CEA-tsunami model with a more refined grid, spacing of 1'. Even if meta-models prediction slightly overestimate the modeled tsunami height (see **Figure 5** and **Table 7**), these results indicate a good agreement between the meta-modeled and the modeled water height, for the "Saint Malo," "Brest" and "La Rochelle" gauges. On the contrary, these results confirm that further numerical runs should be necessary to improve the accuracy of kriging for "Gastes" gauge, which largely overestimate the tsunami heights of the original model (**Figure 5**).

Considering the methodological purpose of the study, these results are satisfying. However, for a practical application, a more extended set of physical scenarios should be of interest for the validation of the tsunami height predicted by meta-models.

# POTENTIAL APPLICATION OF META-MODELS FOR TSUNAMI HAZARD ANALYSIS

The objective of this section is to present how meta-models **can** be employed for (i) the integration of uncertainties of "known"

#### TABLE 6 | Seismic sources simulated in Allgeyer et al. (2013) for the 1755 Lisbon-tsunami.


FIGURE 5 | Comparison between the meta-modeled ant the modeled tsunami height issued by the tsunami scenarios reported in Allgeyer et al. (2013). The black line indicates the perfect match (y = y–).

TABLE 7 | Maximum tsunamis height simulated in Allgeyer et al. (2013) for the 1755 Lisbon-tsunami and computed with meta-models.


tsunami scenarios (see section "In the case of expert opinion: uncertainty quantification in DTHA") and (ii) the analysis of the tsunamigenic potential of a poorly known tsunamigenic area (see section "Without expert opinion: exploration of a large tsunamigenic area"). Finally, the obtained results are discussed in light of the tsunami hazard issued by a probabilistic analysis (see section "Qualifying these approaches with respect to Probabilistic Tsunami Hazard Assessment").

We recall here that this is not an operational tsunami hazard assessment and that all the presented results are purely methodological.

# In the Case of Expert Opinion: Uncertainty Quantification in DTHA

DTHA is classically assessed by means of considering particular source scenarios (usually maximum credible scenario) and the associated maximum tsunami height is generally retained as hazard level (MCS). A more sophisticated method was recently proposed by Roshan et al. (2016). The authors tested around 300 tsunamis scenarios in a [8 – 9.5] magnitude range associated with various faults potentially impacting the Indian coast. Finally, the authors suggested that an appropriate water level for hazard assessment (e.g., mean value or mean plus sigma value) should be retained. They proposed the mean value of the simulated water heights, as test, by considering that this value may need to be revisited in the future.

In the case of the tsunamis hazard associated with the AGPB, the 1755 Lisbon tsunami is the classical reference scenario. In order to illustrate how to integrate the evaluation of uncertainties in a MCS approach, we focus on two specific and nearly deterministic scenarios considered as likely sources generating the Lisbon 1755 tsunami, namely the Gorringe and Horseshoe structures (Buforn et al., 1988; Stich et al., 2007; Cunha et al., 2012; Duarte et al., 2013; Grevemeyer et al., 2017). Both structures were modeled taking into account available maps (Cunha et al., 2012; Duarte et al., 2013) and fault parameters (Stich et al., 2007; Grevemeyer et al., 2017) summarized in **Table 2**. For the computation of the tsunami heights associated with these scenarios, fault parameters are considered uniformly distributed and are randomly sampled in their range of variation (**Table 2**), for a total of nearly 10,000 tsunami scenarios. The magnitude range associated with the sources of these tsunami scenarios varies from 7.7 to 8.9, which is coherent with the range of estimated magnitudes for the 1755 earthquake (Johnston, 1996; Gutscher et al., 2006). The convergence of statistics (defined as the evolution of the mean modeled tsunami height and of the mobile standard deviation) is largely achieved, indicating that the number of tsunami scenarios is sufficient to represent the expected variability of tsunamis height.

The distribution of tsunami heights resulting from the two Gorringe and Horseshoe sources are reported in **Figure 6** for each gauge and a summary of the numerical values in **Table 8**. It can be observed that the tsunami heights generated by the Horseshoe sources are globally lower than those generated by the Gorringe sources. At first glance, this result does not seem surprising considering the closer proximity of the French gauges to the Gorringe bank. It can also be observed that both scenarios are affected by strong uncertainties. Indeed, the ratio between the maximum and the mean tsunami height is very large and it can vary from 2 to 5, according to the chosen gauges. Moreover, the variability of the modeled values around the mean value can be higher than 1.0 m for the Gorringe scenarios and it is always higher than 0.24 m (for Saint Malo gauge).

If we assume that the MCS approach is to take into account the worst possible scenarios, a hazard level corresponding to the maximum modeled water height could be retained for each gauge. However, if we consider the excursion of our numerical results this hazard level could be too high. This is probably why, as previously mentioned, other authors proposed to set a mean level as representative of the tsunami hazard (Roshan et al., 2016). In general, a first methodological conclusion is that the impact of the uncertainty on the source parameters on water height can be high and it should be taken into account by decision makers.

# Without Expert Opinion: Exploration of a Large Tsunamigenic Area

Let's now assume, for a methodological purpose, that AGPB is a poorly known seismic area and that the objective of the study are to (i) evaluate the "possible" impact of tsunamis generated by this area on a target location (in this example, the French gauges) and (ii) to better understand the relative influence of the source parameters on these tsunamis. This latter analysis should permit to better guide the geological investigations in the area.

With this objective, UQ and GSA appear as two useful and complementary tools, respectively to answer to objectives (i) and (ii).

#### Analysis of the Impact of Tsunamis on a Target Area Through UQ

We propose to evaluate the tsunami hazard level, called here MCS\_h, by exploring meta-models results based on a very wide range of tsunamigenic scenarios, beyond those proposed so far in the literature as we suppose that this tsunamigenic area is not well known. As a consequence, we accept that through this approach we explore the effects of both likely and very unlikely scenarios relative to the MCS (the "MCS\_h" scenarios) that could potentially arise in a context of poor seismotectonic knowledge. As in the previous case, depending on the specific hazard target (civil or industrial facilities), and its location with respect to the source zone, the end-user of this methodology needs to decide which level of water height to choose from the obtained distribution. In this exercise, we will consider the maximum simulated water height for MCS\_h.

We build a database (called global database) of tsunami scenario generated by the considered AGPB area at four French Atlantic Gauges, with the aim to cover a wider range of tsunamigenic scenarios (**Table 2**). For the purpose of this methodological paper, the modeled area, which encompasses different seismotectonic domains, was split in a very simplistic way into two main seismic source zones (**Figure 3**): a western domain where normal to transtensive earthquakes occur within a thin crust and an eastern domain where reverse to transpressive earthquakes mainly occur on a thicker crust. We considered the


TABLE 8 | Maximum tsunami height distributions associated with the global database (GD), HorseShoe (HS), and Gorringe (GR) scenarios; µm, σ<sup>m</sup> and Max<sup>m</sup> correspond to the mean, standard-deviation and maximum meta-modeled values.

Western domain west of the 10◦W meridian and the Eastern domain east of this meridian, coinciding roughly with the base of the continental slope facing the Portuguese coastline. The main fault characteristics considered to build the tsunami data-base in both eastern and western domains are reported in **Table 2**, following data contained in Buforn et al. (1988), Molinari and Morelli (2011), Cunha et al. (2012).

The considered seismogenic thickness takes into account the depth of the observed seismicity as well as the fact that part of the upper mantle can potentially be mobilized during major earthquakes: the western domain seismogenic crust is considered to be up to 20 km deep (after Baptista et al., 2017), and up to 60 km for the eastern domain (after Silva et al., 2017). The fault parameters are considered uniformly distributed and are randomly sampled in their range of variation. We finally filtered the resulting database according to an aspect-ratio criterion, allowing the ratio between the length and the width of the faults not to exceed the value of 10, which corresponds to an upper bound of what is observed in nature (Mc Calpin, 2009). The final global database contains nearly 50,000 tsunami scenarios, resulting in earthquake magnitudes varying from 6.7 to 9.3 (**Table 2**), depending on the explored earthquake source characteristics and calculated from Eq 1. This range of magnitudes is consistent with the magnitude range of the design database.

The tsunami water heights distributions associated to the four gauges are reported in **Table 8** and **Figure 6**. As for the previous paragraph, these results show a very large variability in tsunamis height, which is not surprising considering the large range of variation of the source parameters. However, if we compare these results with those obtained in the previous section, we can also observe that Gorringe and Horseshoe banks are among the major contributors to tsunami hazard along the French Atlantic Coast, generated by seismic sources in the AGPB. Indeed, **Figure 6** clearly shows that even if some isolated tsunami scenarios can generate a hazard level higher than the Gorringe Scenarios, globally, these sources are representative of the higher tsunamis from the global data-base.

#### Global Analysis of Seismic Source Influence on the Target Area

A sensitivity analysis is hereafter performed in order to decipher the relative influence of the seismic source parameters. Homma and Saltelli (1996) introduced the total sensitivity index which measures the influence of a variable jointly with all its interactions. If the total sensitivity index of a variable is zero, this variable can be removed because neither the variable nor its interactions at any order have an influence on the results. This statistical index (called Sobol index "ST" in this paper) is here of particular interest in order to highlight the earthquake source parameters that mostly control the tsunamis height at each tested gauge. In **Figure 7**, we reported the total Sobol index for the four meta-models of the French Atlantic Gauges computed with the methodology proposed by Jansen et al. (1994) using the sensitivity-package R (Pujol et al., 2017). The accuracy of Sobol indices performed with Jansen's method depends on the number of model evaluations. For instance, in this study, we performed nearly 20 000 simulations using meta-models. Results show that the slip parameter is globally the most-influencing parameter for all the French Atlantic Gauges meta-models. Concretely, nearly 50% of the variance of the tsunami water height (the uncertainty) could be reduced by a better knowledge of this parameter. This result is quite obvious considering that the faultslip directly conditions the ocean floor deformation and hence the tsunami amplitude.

However, this analysis also suggests that the most influencing parameters for the four gauges are slightly different, depending on their location. One can differentiate results obtained for the southern gauges (i.e., La Rochelle and Gastes) from those obtained at northern gauges (i.e., Saint Malo and Brest):


## Qualifying These Approaches With Respect to Probabilistic Tsunami Hazard Assessment

Let's imagine now that for the target area a reference hazard level is provided through a probabilistic study and that we aim to qualify the robustness of this hazard level with regards to possible tsunamis height issued by expert scenarios (MCS, see section "In the case of expert opinion: uncertainty quantification in DTHA") or by the exploration of poorly known tsunamigenic area (MCS\_h, see section "Without expert opinion: exploration of a large tsunamigenic area"). The idea is here to compare the results from a probabilistic analysis with the deterministic approach proposed in this study, which should permit to completely cover the uncertainty related to seismic source parameters.

With this aim, we compare our results to the tsunami hazard level (**Figure 6**) issued from TSUMAPS-NEAM research project<sup>4</sup> in four points located near to the gauges used in this study. TSUMAPS-NEAM results are provided in terms of Maximum Inundation Height (MIH), which is the estimated maximum flow depth from the envelope of the tsunami wave at all times, as reported in the NEAMTHM18 documentation (Basili et al., 2018, 2019). TSUMAPS-NEAM has developed a long-term PTHA for

earthquake-induced tsunamis for the coastlines of the NEAM region (NE Atlantic, the Mediterranean, and connected seas). TSUMAPS-NEAM results largely relied on inputs from the EU FP7 project ASTARTE, of the GAR15 (global risk quantification under the HFA), and national PTHAs like those of USA and Italy. One of the major results of the project is the Interactive Hazard Curve Tool<sup>5</sup> which represents online hazard maps for different hazard probability/average return periods (mean, median, 2nd, 16th, 84th, and 98th percentiles hazard curves). In TSUMAPS-NEAM, tsunamis are computed using an approach which relies on the use of unit sources to reproduce tsunami scenarios (e.g., Molinari et al., 2016; Baptista et al., 2017). The main differences with our methodology are that these methods rely on a decomposition of the tsunami waves form recorded in a target area with simplified methods (e.g., the Green's law), and on the main assumption that the non-linear terms of tsunami propagation are negligible. The scope of these studies is to develop a fast emulator, permitting to replace a tsunami model at some selected locations, along the same philosophy of the meta-models developed here. As a consequence, according to the author's purpose, MIH is suitable for a regional, initial screening assessment type such as the objective of TSUMAPS-NEAM. These results must be considered only as input reference study for

<sup>4</sup>http://www.tsumaps-neam.eu/

<sup>5</sup>http://ai2lab.org/tsumapsneam/interactive-hazard-curve-tool/

further site-specific assessment. Considering the methodological objective of this work, it is of interest to compare our results with this previous work.

For this exercise, we selected two targets return periods of 1,000 and 10,000 years from the hazard curves of TSUMAPS-NEAM. To take into account the uncertainty related to these results, we consider the mean, the 2th and the 98th percentile hazard curves. These targets, and the associated uncertainty, are typical of classical risk analysis and permit to compare the distributions issued by PTHA with our results. For instance, after the Fukushima accident, a return period of 10,000 years is considered as the target for hazard assessment in the field of nuclear safety (WENRA, 2015).

**Figure 6** illustrates the comparison of our results (MCS and MCS\_h) with those selected from the TSUMAPS-NEAM project. One can notice that:


Even if these results are purely methodological, it appears that a target hazard level of 10,000 years issued from TSUMAP-NEAM project covers all the variability related to seismic sources from the AGPB, even by exploring very unlikely tsunami scenarios (MCS\_h) and most of the variability related to Gorringe and Horseshoe scenarios. However, concerning the "Brest" gauge, the methodology indicates a high sensitivity of this coastal area to the characteristics of the Gorringe seismic sources.

# DISCUSSION AND CONCLUSION

The research work presented in this paper was performed in order to test the interest of UQ for the analysis and the qualification of the DTHA generated by earthquakes. We propose a new methodology, permitting the assessment of the uncertainty related to tsunami hazard through the analysis of a wide range of tsunami scenarios at a given location. This concept should permit to define a hazard level which goes beyond the definition of the Maximum Credible Scenario (MCS) classically reported in the literature (JSCE, 2002; Lynett et al., 2004) and employed for DTHA and permit to integrate uncertainty in hazard quantification. Moreover, tsunami hazard evaluated through UQ can also permit the exploration of tsunamigenic potential of a poorly known seismic zone, as well as a qualification of PTHA.

From a methodological point of view, meta-models appear as a very efficient and viable solution to the problem of generating many computationally expensive tsunamis simulations. As reported in Behrens and Dias (2015), the statistical emulator gives perfect predictions at the input points that are used in its generation process (it interpolates). Statistical emulation does not accelerate the model itself. The significant advantage of using the emulator is that it is much less computationally demanding to be evaluated and, therefore, it can be employed to carry out fast predictions and inexpensive analyses, such as sensitivity and uncertainty analyses reported in this study. Even if it is the first time, to our knowledge, that meta-models are proposed for THA, they have been already employed for tsunami modeling (Sarri et al., 2012; Sraj et al., 2014; Rohmer et al., 2018). As reported in section "Qualifying these approaches with respect to Probabilistic Tsunami Hazard Assessment," in the field of tsunami hazard, an alternative approach relies on the use of unit sources to reproduce tsunami scenarios (e.g., Molinari et al., 2016; Baptista et al., 2017). Results from these studies are satisfactory for most of the practical applications such as probabilistic tsunami hazard analysis, tsunami source inversion and tsunami warning systems. However, we consider that our methodology can be proposed as an alternative to these studies as it does not rely on any assumption on tsunami propagation and can be applied everywhere in the model domain, without any limitation. For instance, meta-models could be constructed and validated from detailed simulations in a given target area, including complex physical phenomena as overtopping, run-up, and breaking. However, we want to stress that meta-model construction is time consuming and needs an appropriate design data-base, which requires a good compromise between the range of variations of the inputs and the number of simulations.

Concerning our application at four selected location, with the exception of "Gastes" gauge, both the statistical tests we performed and the comparison with result from Allgeyer et al. (2013) suggest that the meta-models are able to reproduce the tsunamis generated by the AGPB. Thus, the constructed metamodels could be employed in a further study to roughly evaluate the impact of other seismic scenarios from the AGPB and impacting the French Atlantic Coast offshore. In order to increase the accuracy of "Gastes" meta-model, the design data-base should be however enriched.

Results from GSA suggests that beyond earthquake magnitudes, the position and the orientation of the faults are influent parameters, at least for the sites considered along the northern French Atlantic Coast. Indeed, sensitivity analysis can be a useful tool not only to parametrize the design data-base but also to orient future geological surveys in a specific area.

In conclusion, MCS\_h in the AGPB study region as defined in our study could be implicitly associated to a mean return period of 10,000, when considering the strong hypothesis we did on source characteristics. In this sense, results from UQ shows that a hazard level of 10,000 years issued from TSUMAP-NEAM project covers a very wide range of uncertainties related to characterization of seismic sources from the AGPB, even by exploring very unlikely tsunami scenarios (MCS\_h) and most of the uncertainties related to Gorringe and Horseshoe banks expert opinion scenarios. However, concerning the "Brest" gauge, even a target hazard level of 10,000 years does not seem appropriate to cover all the uncertainties related to the Gorringe source, indicating a high sensitivity of this coastal area to the characteristics and the kinematics of the Gorringe seismic source.

#### PERSPECTIVES

feart-08-00041 February 29, 2020 Time: 17:20 # 15

For an operational far-field DTHA, it should be necessary, at first, to improve the actual numerical model in order to better represent the tsunamis run-up and the inundated areas, with a more accurate bathymetric grid. For an operational application to locations closer to the source zones (i.e., Portugal or Spain, Morocco), it may be challenging to gather the necessary details for a proper establishment of both models and metamodels. However this point deserves further attention because uncertainties will always remain. Thus even in such cases, where more refined databases need to be established, the proposed approach should be of interest to at least efficiently explore in a more exhaustive way the uncertainties (e.g., different probability distribution of inputs parameters, non-uniform slip distribution, fault geometries) in the source parameters linked to the MCS approach. Moreover, it must be noted that for this methodological study we chose a very simple but widely used source description (planar faults with homogeneous slip). However, more robust simulations should take into account a more complex source representation (e.g., 3D geometry, heterogeneous slip distribution), as recently suggested by Davies and Griffin (2019). Indeed, meta-models may also be useful to account for these parameters, allowing for many simulations.

From a methodological point of view, it could be of interest to compare our methodology with the alternative approaches using unit sources to reproduce tsunami scenarios (e.g., Molinari et al., 2016; Baptista et al., 2017), in terms of simulations needed, computational time, accuracy, for instance.

Finally, from a numerical point of view, it would be of interest to (1) introduce recurrence models for each tsunamigenic source to go toward PTHA calculations, and (2 introduce meta-models

#### REFERENCES


in systems developed for tsunami early warning, considering the low computational time inherent to this statistical tool.

#### DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the Institute for Radiological Protection and Nuclear Safety (IRSN).

#### AUTHOR CONTRIBUTIONS

VB: development of the methodology, meta-models construction and validation, and sensitivity and hazard analysis. HJ and OS: seismic sources definition, and seismological and hazard analysis. EA: tsunamis modeling (model construction and simulations of the design data-base). LB and C-MD: technical discussions. HH: tsunamis modeling and technical discussions. VB, HJ, and OS wrote the manuscript.

## FUNDING

This research was funded by the French National Agency for Research (ANR).

#### ACKNOWLEDGMENTS

This study was performed through a cooperative research between the Institute for Radiological Protection and Nuclear Safety (IRSN) and the CEA, in the framework of the French ANR-TANDEM research project. We want to thank ENS Paris and particularly the Geology Laboratory for their technical support on this project. We acknowledge especially Eric Calais for the availability of a part of the team's cluster and Pierpaolo Dubernet for the help he offered to install Promethee. The present work also benefited from the inputs of Msc. Hend Jebali and Dr. Audrey Gailler and from the valuable technical assistance of Dr. Maria Lancieri, Dr. Yann Richet, and Miss Ludmila Provost.

Computa. Stati. Data Anal. 66, 55–69. doi: 10.1016/j.csda.2013. 03.016


hydrological case study. Environ. Model. Softw. 51, 26–34. doi: 10.1016/j. envsoft.2013.09.022



**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Bacchi, Jomard, Scotti, Antoshchenkova, Bardet, Duluc and Hebert. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Inversion Algorithm for Civil Flood Defense Optimization: Application to Two-Dimensional Numerical Model of the Garonne River in France

The objective of this study is to investigate the "inversion approach" for flood defense

Yann Richet\* and Vito Bacchi

Institut de Radioprotection et de Sûreté Nucléaire, Fontenay-aux-Roses, France

#### Edited by:

Philippe Renard, Université de Neuchâtel, Switzerland

#### Reviewed by:

Jeremy Rohmer, Bureau de Recherches Géologiques et Minières, France Rachid Ababou, UMR5502 Institut de Mecanique des Fluides de Toulouse (IMFT), France Roland Löwe, Technical University of Denmark, Denmark

> \*Correspondence: Yann Richet yann.richet@irsn.fr

#### Specialty section:

This article was submitted to Freshwater Science, a section of the journal Frontiers in Environmental Science

Received: 18 December 2018 Accepted: 30 September 2019 Published: 06 November 2019

#### Citation:

Richet Y and Bacchi V (2019) Inversion Algorithm for Civil Flood Defense Optimization: Application to Two-Dimensional Numerical Model of the Garonne River in France. Front. Environ. Sci. 7:160. doi: 10.3389/fenvs.2019.00160 optimization in an inundated area. This new methodology within this engineering field consists in defining a "safety criterion" (for instance, "the water level in a given location must be lower than a given value") and the combined analysis of all the uncertain controlled parameters (i.e., flood defense geometry, location, etc.) that ensure the safety objective for all the possible combinations of uncontrolled parameters (i.e., the flow hydrograph parameters) representing the natural phenomenon is not exceeded. To estimate this safety set, a metamodeling approach will be used which significantly reduces the number of model evaluations required. This algorithm relies on a kriging surrogate built from a few model evaluations, sequentially enriched with new numerical model evaluations as long as the remaining uncertainty of the entire safety set remains too high. Also known as "Stepwise Uncertainty Reduction," this algorithm is embedded in the "Funz" engine (https://github.com/Funz) tasked with bridging the numerical model and any design of experiments algorithm. We applied this algorithm to a real two-dimensional numerical model of the Garonne river (France), constructed using the open-source TELEMAC-2D model. We focused our attention mainly on the maximum water depth in a given area (the "safety criterion") when considering the influence of a simplified flood defense during a flooding event. We consider the two safety control parameters describing the slab and dyke elevations of the flood defense system, to design against the full operating range of the river in terms of possible watershed flooding. For this application case, it appears that less than 200 simulations are needed to properly evaluate the restricted zone of the design parameters (the "safety zone") where the safety criterion is always met. This provides highly valuable data for full risk-informed management of the area requiring protection.

Keywords: kriging surrogate, Bayesian optimization, inversion, level set, uncertainty, hydraulic modeling

# 1. INTRODUCTION

It is well-known that the world's major lowland rivers (the Rhine, the Po, the Elbe River, and the Loire River) are protected against flooding by embankments or other flood defenses (Ciullo et al., 2019). The embankments and so-called primary flood defenses such as flood walls and dams (Kind, 2014) are aimed primarily at reducing the likelihood of flooding in the protected area, and historically they have been the most commonly-adopted flood risk reduction measure (Ciullo et al., 2019). The design of these flood protection measures is therefore of major importance society-wide, and can have a considerable impact on the economic and demographic development of the alluvial plains (White, 1945).

The classical approach for flood defense systems design was developed in the Netherlands in the wake of the 1953 disaster (see Vrijling, 2001). Since, reliability-based flood defense design strategies have been developed all over the world (see Vrijling, 2001; van Gelder and Vrijling, 2004; Ciullo et al., 2019). These strategies mainly involve the statistical quantification of the hazard (see Vrijling, 2001; Apel et al., 2004; van Gelder and Vrijling, 2004; Polanco and Rice, 2014) and the "economic" optimization of the flood defense systems (see Vrijling, 2001; van Gelder and Vrijling, 2004; Ciullo et al., 2019). The optimum design is considered to be the value at which the total cost of investment (which increases with the height of the flood defense) and the present value of the risk (which diminishes with the increasing height) takes its minimum (see Vrijling, 2001; van Gelder and Vrijling, 2004; Ciullo et al., 2019). Another good example is the optimization model introduced by van Dantzig (1956) for the embankment height, which was further developed by other authors, as reported by Eijgenraam et al. (2016).

These statistical studies are well-suited to the definition of a global flood protection strategy (at "country" scale), as they are not supported by intensive numerical modeling. Although the value of these models is indisputable, the flooding probabilities of the protected areas are assumed to be independent of one another, disregarding the change in hydraulic load along the river stretch as a consequence of the state (e.g., failure, increase in safety) of the embankments elsewhere (Ciullo et al., 2019).

At a local scale, in practical engineering applications, the classical method for the design of flooding protections for urban and industrial facilities relies on the development of sophisticated and refined numerical model systems able to reproduce surface flow accurately for a given chosen "design scenario" (such as the 100-year return period flood event). The quantities of major interest for the design of a flood defense (wall, urban structures, drainage network, etc.) are often related to the parameters describing this design scenario, such as the water height at a given location, or the water velocity in the inundated area (Milanesi et al., 2015). Once these quantities are evaluated, the flood defense is accordingly designed.

Although robust, this "classical approach" can appear too simplified considering the variability of natural phenomena and the numerous uncertainties related to natural hazard modeling. Ideally, flood disaster mitigation strategies should be based on a comprehensive assessment of the flood risk, combined with a thorough investigation of the uncertainties associated with the risk assessment procedure (Apel et al., 2004). Specifically, numerous studies have demonstrated the influence of uncertainties on flood hazard assessment (see Apel et al., 2004; Alho and Mäkinen, 2010; Domeneghetti et al., 2013; Maurizio et al., 2014; Nguyen et al., 2015; Abily et al., 2016; Bacchi et al., 2018), sometimes underling the hard-to-estimate damage caused by flooding (see Apel et al., 2004). These studies are often based on the use of simplified numerical models which reduce the computational time (see Apel et al., 2004; Alho and Mäkinen, 2010; Domeneghetti et al., 2013; Maurizio et al., 2014; Nguyen et al., 2015), and they are an example of how uncertainty quantification techniques could be employed for the assessment of natural hazards. If applied to the design of flood defenses, these studies can be considered an "improvement" on the classical approach, since they make it possible to better evaluate the uncertainties related to the "design" parameters, which in this case are the target values derived from uncertainty quantification (i.e., a target water height).

However, both the "classical" and "improved" approach for flood defense design suffer the same limitations, as they do not allow the end-user of the methodology to robustly evaluate the best flood defense configuration (the geometry) for the natural variability of the simulated phenomenon. Within this context, the objective of this work is therefore to investigate the "robust inversion approach," subsequently referred to as RSUR, for the design of a flood defense in a two-dimensional inundated area. This method consists in defining a "safety criterion" (such as "the water level in the slab must remain lower than 25 cm") and the analysis of suitable design parameters (for instance, the elevation of the flood defenses) that ensure the safety objective for all the possible combinations of uncertain input parameters (for instance, the flow hydrograph characteristics) describing the natural phenomenon to be met.

To estimate this safety set, a standard surrogate approach will be used (Jones et al., 1998), which significantly reduces the number of model evaluations needed. This algorithm relies on a kriging surrogate built from a few model evaluations, sequentially enriched with new numerical model evaluations as long as the remaining uncertainty of the entire safety set remains too high (Chevalier, 2013). This algorithm therefore appears well-suited to the rigorous study of the uncertainties of very refined numerical models traditionally used in engineering applications. Belonging to a more general class of "Stepwise Uncertainty Reduction," this algorithm is embedded in the "Prométhée" workbench (using the "Funz" engine) tasked with bridging the numerical model and any design of experiments algorithm (further technical information is provided in an **Annex**).

In this research work, we first introduce the engineering problem we want to solve (section 2). Our proposed methodology for model resolution is then presented (section 3) with a focus on the numerical tools we develop for the methodology's application to the chosen real case (section 4). Lastly, the results and main conclusions and perspectives are reported (section 5). The results will be introduced as the safety set and will be analyzed in terms of safety control within the river's operating range. Once properly evaluated, this constrained zone provides highly valuable data for full risk-informed management of the river.

# 2. PROBLEM SPECIFICATION

The study area is a 50-km-long reach on the Garonne river between Tonneins and La Réole (**Figure 1**). The area was settled to protect flood plains by organizing flooding and flood storage

between 1760 and 1850, when many earthen levees were built to protect the harvest against spring floods (LPCB, 1983; SMEPAG, 1989). The river was canalized to protect residents from flooding after the historic 1875 flood event (SMEPAG, 1989).

In this section of the Garonne river, the flood defense is actually designed for a river flood of nearly 3,500 m<sup>3</sup> /s. Specifically, successive storage areas give the Garonne profiles a particular configuration and allow flooding in the floodplain to be controlled. **Figure 2** shows three flow characteristics on a typical cross-section of the Garonne to illustrate the flooding sequence: (1) base flow 1,100 m<sup>3</sup> /s; (2) bankfull flow 2,400 m<sup>3</sup> /s; and (3) flow before overflow of the levees with the lowest protection level 3,500 m<sup>3</sup> /s. Consequently, flooding of the less protected areas between Tonneins and La Réole occurs with a low return period, i.e.,∼10 years, and only a few levees have a standard of protection higher than 30 years. Due to the flat topography and the presence of a steep floodplain lateral slope, the floodplains are largely inundated even for high-probability floods. The December 1981 flood was one of the largest floods occurring since the most severe flood on record (1875); this flood event was used as a reference for our study. During this 9-day event, the peak discharge measured at Tonneins reached 6,040 m<sup>3</sup> /s, corresponding approximately to a 20-year flood, and the floodplains were fully inundated.

Within this context, it was decided to site an industrial area spanning nearly 1 km<sup>2</sup> in the vicinity of the left bank of the Garonne river (see gray area in **Figure 1**). This zone is protected by a dyke that is nearly 2 km long, 20 m wide and at a constant elevation of nearly 25.8 m NGF<sup>1</sup> (see gray line in **Figure 1**). However, the local topography of this zone varies between 18 and 22 m NGF, and the area is fully inundated during a flooding event characterized by a peak discharge higher than 3,500 m<sup>3</sup> /s according to the current design of the dykes (see **Figure 2**). To protect the new industrial area against flooding, a decision was made to investigate the impact of modifying the current crest of the dyke and the basement of the future industrial area.

More practically, the objective of the study is to identify the actual dyke elevation and the platform elevation (the slab), to ensure the industrial area remains operational under all possible watershed flooding exceeding the defense (occurring at nearly 3,500 m<sup>3</sup> /s). Specifically, the water height in the vicinity of the industrial area must not exceed 0.25 m above the slab, which is the safety criterion of this study. With this aim, we made the following assumptions for the uncontrolled input parameters (the flow hydrograph) and the controlled input parameters (crest of the flood defense, basement of the industrial area), both of which have a different role in the RSUR algorithm workflow presented in **Table 1** below:

<sup>1</sup>Nivellement Général de la France (Above Ordnance Datum).

TABLE 1 | Qualified parameters of the study: time to peak discharge Tp, peak discharge rate Qp, dyke elevation Z<sup>f</sup> , slab elevation Za.


The methodology used and the numerical chain developed for the study are presented in the next sections.

# 3. METHODOLOGY FOR ROBUST DESIGN OF PROTECTIONS

The intrinsic nature of many engineering problems like flooding protection is a combination of several "canonical" problems (a word used to express a natural orthogonality), mainly within the following mathematical classes:


For instance, here we choose to describe the flood protection problem as a combination of designing protections to avoid exceeding a given water level (i.e., inversion part) when faced with the worst rain conditions for a given return period (i.e., max/optimization part). It should be noted that qualification of these aspects of the problem is somewhat arbitrary and may largely depend on the country's regulation practice and the safety objective considered.

However, regardless of the choice made, the main nature of the problem considered (the identification/inversion of a flood level) is often "tainted" by secondary issues (worst flooding conditions), and all the parameters belong to one of these canonical roles. As an example, we could also mention other common engineering problems like robust optimization (optimizing some parameters and considering others as random) or constraint optimization (optimizing some parameters, keeping others verifying an [in]equality).

This "real-world" engineering practice also brings more complexity than canonical problems and thus requires dedicated algorithms in order to be solved efficiently (Chevalier, 2013). In the following, we will focus on the "robust inversion" problem, where the objective is to restore the civil engineering safety set identified for the worst flooding conditions without any probabilistic assumption.

## 3.1. Bayesian Metamodeling

The fairly common practice of Bayesian metamodeling is now a standard for solving any engineering task requiring numerous CPU-expensive simulations. Indeed, since the seminal paper describing the Efficient Global Optimization (EGO) algorithm (Jones et al., 1998; Roustant et al., 2012), many improvements have been proposed, investigating the algorithms' efficiency (Picheny and Ginsbourger, 2013) or different problems to solve (Chevalier et al., 2014).

The rationale behind this approach consists in replacing most of the costly numerical simulations with an inexpensive surrogate function to investigate the properties that are relevant to our engineering purpose, like possible optimizers, excursion set or its main parameter effects. This surrogate function is designed to interpolate a few known "true" simulation points (**Figure 3**), taken as conditioning events of an initial uncertain/random function. Starting with a largely uncertain metamodel (**Figure 3**), this iterated process leads to a very precise metamodel around "true" simulated points, cleverly chosen in relation to our engineering objective (**Figure 3**):

A variety of metamodels have been applied in the water resources literature (Santana-Quintero et al., 2010; Razavi et al., 2012). Moreover, some examples of applications in the context of flood management have already been published (e.g., Yazdi and Salehi Neyshabouri, 2014; Löwe et al., 2018). However, like the previously mentioned EGO algorithm, for the study presented here, we will rely on conditional Gaussian processes (also known as kriging Roustant et al., 2012), derived from Danie Krige's pioneering work in mining, later formalized within the geostatistical framework by Matheron (1973). Our choice is mainly motivated by this non-parametric metamodeling because although some properties of the considered response surface may be assumed (like continuity, derivability, etc.), its precise

FIGURE 3 | (Left to right, top to bottom) Identification of a safety set on a synthetic case, based on a kriging metamodel iteratively filled/conditioned to reduce set uncertainty (the dots are detailed simulations performed).

shape cannot be assumed a priori. Thus, kriging became a standard metamodeling method in operational research (Santner et al., 2003; Kleijnen, 2015) and has performed robustly in previous water resource applications (Razavi et al., 2012; Villa-Vialaneix et al., 2012; Löwe et al., 2018). More practically, it is of considerable interest when investigating engineering objectives like max-minima or level sets of such random functions, inheriting convenient properties of Gaussian processes, and the method is fast provided the datasets exceed no more than a few hundred observations. The kriging model (which later provides an estimation of water height in an industrial area) is then defined for x ∈ S (x will then represent study variables like the slab and dyke height or flow hydrograph parameters) as in the following Gaussian process:

$$M(\mathbf{x}) = \mathcal{N}(m(\mathbf{x}), s^2(\mathbf{x})) \tag{1}$$

where (for "simple" kriging):


More than a commonplace deterministic interpolation method (like splines of any order), this model is much more informative owing to its predicted expectation and uncertainty. The fitting procedure of this model includes the choice of a covariance model [here a tensor product of the "Matern52" function, (Roustant et al., 2012)], and then the covariance parameters (e.g., range of covariance for each input variable, variance of the random process, nugget effect, etc.), could be estimated using Maximum Likelihood Estimation (standard choice we made) or Leave-One-Out minimization [known to mitigate the arbitrary covariance function choice (see Bachoc, 2013)], or even sampled within a full Bayesian framework (too far from our computational constraints at present). Once such a metamodel is specified, the following criteria will draw on such information to optimize the algorithm's iterative sampling policy.

#### 3.2. Design of (Numerical) Experiments

Once the Bayesian metamodeling framework is provided, the remaining issue is to define the sampling criterion used to fill the design of experiments: each batch of experiments will be proposed by the algorithm, then evaluated by the numerical simulator, and returned to the algorithm in order to propose the next batch (Algorithm 1). This general iterative process must be defined for each problem addressed: X input space, Y output




```
2 Evaluate Y = f(X)
```

target, and J criterion of interest (see later canonical or hybrid concrete instantiations):

Improvements to this fully sequential procedure have been proposed, like asynchronization (Le Riche et al., 2012) which may reduce the servers' sleep time between iterations (especially if the simulations have very different computing times). Nevertheless, it is often easier to consider synchronous batching (Ginsbourger et al., 2010) as an efficient turn-around to reduce user time, so the algorithm we will actually use becomes (Algorithm 2):

Note that this last algorithm may be greatly improved and likened to the previous one, when the criterion J is computable (in closed form) for a whole batch of points (see Chevalier and Ginsbourger, 2013 for such optimization criterion), which is sadly not often the case.

#### 3.2.1. Canonical Optimization Problem

The EGO algorithm (Jones et al., 1998) is the common entry point for batch sequential kriging algorithms, as it proposes an efficient criterion (standing for criterion of interest J) called the "Expected Improvement" (EI):

$$J: \mathfrak{x} \longrightarrow EI(\mathfrak{x}) = E[(\min(Y) - M(\mathfrak{x}))^{+}] \tag{2}$$


Computing this criteria (using the previous kriging formulas) is quite simple, so the main issue relates to the optimization of this criterion, whose maximum will define the most "promising" point for the next batch. This strategy will just propose the next point, and although a "multiple expected improvement" criterion (Chevalier and Ginsbourger, 2013) allows many points



**<sup>9</sup>** Append X <sup>∗</sup> = X <sup>∗</sup> ∪ x ∗ new and Y <sup>∗</sup> = Y <sup>∗</sup> ∪ y ∗ new


$$\text{A12 } \quad \text{Get } X\_{new} = \mathbb{C}\_{X^\*} \\ \text{X (ie. new points from } X^\* \text{ not yet in } X) $$


**<sup>16</sup> end**

to be proposed at a time, it remains practically more robust (numerically speaking) to use heuristics, which are usually preferred by practitioners: "Constant Liar," "Kriging Believer" (see Picheny and Ginsbourger, 2013).

#### 3.2.2. Canonical Inversion Problem

Beyond such criteria dedicated to optimization problems, others are proposed to solve the (also) canonical problem of inversion, like Bichon's criterion (Bichon et al., 2008) which is similar to Expected Improvement, but focuses on proposing points closest to the inversion target:

$$J: \mathfrak{x} \longrightarrow EE(\mathfrak{x}) = E[(s(\mathfrak{x}) - |T - M(\mathfrak{x})|)^{+}] \tag{3}$$


Other criteria for inversion have been proposed (Ranjan et al., 2008), but all of these so-called "punctual" criteria have the same intrinsic limitation of searching for a punctual solution to the inversion problem, while the answer should lie in a non-discrete space (usually a union of subsets of S).

Trying to solve the inversion problem more consistently will require defining an intermediate value like the uncertainty of the excursion set above (or below) the inversion target (Bect et al., 2012). A suitable criterion will then focus on decreasing this value, and the final inversion set will be identified through the metamodel instead of its sampling points. This leads to the "Stepwise Uncertainty Reduction" (SUR) family of criteria (also abusively naming the following inversion algorithm) which takes a non-closed form, unlike the previous punctual ones:

$$\begin{aligned} J: & \longrightarrow \text{SUM}(\mathbf{x})\\ &= -E\left[\int\_{S} P[T < M\_{n+\{\mathbf{x}\}}(\mathbf{x'})] \times P[T \ge M\_{n+\{\mathbf{x}\}}(\mathbf{x'})] \, d\mathbf{x'}\right] \\ &= -E\left[\int\_{S} P[T < M\_{n+\{\mathbf{x}\}}(\mathbf{x'})] \times \{1 - P[T < M\_{n+\{\mathbf{x}\}}(\mathbf{x'})]\} \, d\mathbf{x'}\right] \end{aligned} \tag{4}$$


At the cost of a significantly more computer-intensive task, the SUR criterion will propose more informative points and avoid the over-clustering defect often encountered with "punctual" criteria (even with the EGO algorithm whose exploration/exploitation trade-off is indeed due to heuristic tuning).

#### 3.2.3. Robust Inversion Hybrid Problem

A by-product of the SUR algorithm is the extensive formulation used, which allows a flexible expression of the interest expectation. Therefore, it is now possible to use more complex interest values, for instance relying on the process properties of each subspace: S<sup>c</sup> for controlled parameters and S<sup>u</sup> for uncontrolled parameters. The "Robust Inversion" will then use the statistic of the marginal uncontrolled maximum (on Su) of the process to integrate into the criterion (Chevalier, 2013):

$$\begin{split} J: \boldsymbol{\chi} &\longrightarrow \boldsymbol{R} \boldsymbol{S} \boldsymbol{U} \boldsymbol{R} \langle \boldsymbol{\chi} = \{ \boldsymbol{\chi}\_{\boldsymbol{\varepsilon}}, \boldsymbol{\chi}\_{\boldsymbol{u}} \} \rangle \\ &= -E\_{n} \left[ \int\_{\boldsymbol{S}\_{\boldsymbol{\varepsilon}}} P\_{n+1} (\boldsymbol{\chi}\_{\boldsymbol{\varepsilon}}') \times (1 - P\_{n+1} (\boldsymbol{\chi}\_{\boldsymbol{\varepsilon}}')) d\boldsymbol{\chi}\_{\boldsymbol{\varepsilon}}' \right] \end{split} \tag{5}$$

**where:**

	- x<sup>c</sup> stands for the coordinate of x in the input controlled subspace S<sup>c</sup> ,
	- x<sup>u</sup> stands for the coordinate of x in the input uncontrolled subspace Su,
	- Mn+{x} is the kriging metamodel conditioned by the n points {X, Y} (just like "M"), **plus** the new point x,
	- T being the target value of inversion.

Using such an expression combines non-homogeneous subspaces, here an inversion subspace (for controlled variables) Sc , and an optimization subspace (for uncontrolled variables) Su. Thus, this criterion leads to a hybrid algorithm of optimization "inside" inversion. More generally, this approach may be used to establish other criteria to solve hybrid problems. For instance, just replacing the "maxx<sup>u</sup> " statistic (used to define Pn+1(xc)) by a mean or quantile on S<sup>u</sup> may be useful for solving a probabilistic inversion problem, instead of the present worst-case inversion. Such hybrid algorithms are often much closer to solving real engineering concerns than purely canonical algorithms.

At this point of reasoning, it is very important to understand that a one-point marginal prediction of any kriging model is not sufficient to fully access the process behavior (and thus integrate it). Indeed, the correlation function behind process C(., .) is strongly related to the process sample functional properties, which may vary greatly depending on the kernel assumption (**Figure 4**). Such a process statistic (maximum here) is then impossible to compute as an independent sum of punctual evaluations of x ∈ S. A simple, but more costly approach consists in using a simulation of the random processes instead of its prediction statistical model.

Using this latter criterion (Equation 5) on our practical case study, we will now ask the algorithm (Algorithm 2) to sample the S space, trying to identify the safety set (dyke and slab height), irrespective of the flooding duration and discharge values.

In a self-supporting form, the batch sequential algorithm using this RSUR criterion on a TELEMAC-2D model of the Garonne river becomes:

## 4. SOFTWARE AND NUMERICAL TOOLS

This work lies within a trend of applied research aimed at engineering enhancement. In practice, the previous (Algorithm 3) requires a heterogeneous set of hardware and software to be applied. The workbench which drives the simulations according to the algorithm: (Richet, 2019) [which may also be used through its Graphical User Interface (Richet, 2011)] is intended to fill the gap between:


## 4.1. Hydrodynamic Model of the Study Area

The Institut de radioprotection et de sûreté nucléaire (Institute for Radiological Protection and Nuclear Safety, IRSN) has been involved in the "Garonne Benchmark" project instigated by EDF. The aim of the project was to obtain an uncertainty quantification with hydraulic modeling on a stretch of the Garonne river. Earlier work carried out by Besnard and Goutal (2011) and Bozzi et al. (2015), has investigated discharge and roughness uncertainty

**Algorithm 3:** Batch sequential procedure to sample the safety frontier (maxt(Hp1) < 0.25 m NFG) in control space {Za, Z<sup>f</sup> }

**Result**: Safety frontier (maxt(Hp1) < 0.25 m NFG) in control space {Za, Z<sup>f</sup> }

**Input** : number of iterations, batch size

**Output**: Sampling of safety frontier (maxt(Hp1) < 0.25 m NFG) in control space {Za, Z<sup>f</sup> }, corresponding output values from TELEMAC-2D, kriging metamodel


**<sup>5</sup> while** not reached end of iterations **do**


**<sup>8</sup>** Maximize on x ∗ new = {tpn, qpn, zan, zfn} criterion RSURM<sup>∗</sup> (x ∗ new) = −E hR [18,25]×[20,28] Pn+1(za, z<sup>f</sup> ) × (1 − Pn+1(za, z<sup>f</sup> )) ))] stands for the probability of exceeding 0.25 for

dzadz<sup>f</sup> ], where Pn+1(za, z<sup>f</sup> ) = P[0.25 < max∀tp,q<sup>p</sup> (M<sup>∗</sup> n+x ∗ new (tp, qp, za, z<sup>f</sup> any possible {tp, qp} at a given coordinate (za, z<sup>f</sup> )


**<sup>12</sup> end**

**<sup>13</sup>** Get Xnew = ∁X<sup>∗</sup>X (ie. new points from X <sup>∗</sup> not yet in X)

**<sup>14</sup>** Simulate with TELEMAC-2D Hp1(Xnew) (one simulation for each point of Xnew)


**<sup>18</sup> end**

in 1D and 2D hydraulic models. In order to contribute to the project, a version of the MASCARET 1D and TELEMAC-2D model, as well as hydraulic and hydrological data required to build these numerical models, were provided to the project's participants. In Besnard and Goutal (2011), the models' capacities to represent a major flood event were compared.

In this study, we use 2D model TELEMAC-2D from the open TELEMAC-MASCARET system (http://www.opentelemac.org). TELEMAC-2D solves 2D depth-averaged equations (i.e., shallow water Equation 6). Disregarding the Coriolis, wind and viscous forces and assuming a vertically hydrostatic pressure distribution and incompressible flow, the 2D depth-averaged dynamic wave equations for open-channel flows can be written in conservative and vector form as:

$$
\partial U / \partial \mathbf{t} + \nabla \cdot \mathbf{F} = \mathbf{S} \tag{6}
$$

where


The roughness is flow and sediment dependent, but for simplicity it is assumed to be constant in each of the numerical runs. In this work, turbulence is modeled using a constant eddy viscosity value.

A two-dimensional model of the study area was constructed by EDF within the framework of the "Garonne Benchmark" (Besnard and Goutal, 2011). The model is composed of nearly 82,116 triangular elements with different lengths varying from 10 m (for the dyke crest, or the main channel of the Garonne River) to 300 m for the inundated areas. The floodplain topography and bathymetry are represented by the interpolation of the triangular mesh on the photogrammetry data (downstream part of the study area) and the national topographic map (upstream part). The model covers nearly 136 km<sup>2</sup> of the Garonne river basin. It is forced upstream (the "Tonneins" section) by a flow hydrograph and downstream by considering steady flow conditions (the "La Réole" section). The model calibration is reported in Besnard and Goutal (2011).

The presented model was modified slightly in this study to introduce the industrial area reported in **Figures 1**, **2**. The industrial platform was inserted into the model by elevating the topography of the corresponding mesh at a constant mean level varying from 18.0 to 25.0 m NGF (**Table 1**).

For the purpose of the study, triangular flow hydrographs were used in the upstream section of the model. At the beginning of each simulation, a steady flow corresponding to a permanent discharge of 2,100 m<sup>2</sup> /s is imposed on the model. The flow discharge is then increased linearly until the maximum water discharge (Qp) is reached at the time step corresponding to Tp. Once the peak discharge Q<sup>p</sup> is reached, the discharge is decreased linearly until the permanent flow of 2,100 m<sup>3</sup> /s is reached at a time step of 2 ∗ Tp.

#### 4.2. Robust Inversion Algorithm

The Robust Stepwise Uncertainty Reduction (RSUR) algorithm was implemented in a dedicated R package (Chevalier et al., 2017). It should be used in the same way as other R packages dedicated to Bayesian optimization or inversion like DiceOptim (Picheny and Ginsbourger, 2013) or KrigInv (Chevalier et al., 2012). However, for convenient and simple integration (in the Funz workbench), the standardized wrapper of the MASCOT-NUM research group (Monod et al., 2019) was applied to provide a front API<sup>2</sup> (http://www.gdr-mascotnum.fr/template. html) with the following R functions (https://github.com/Funz/ algorithm-RSUR):


#### 4.3. Funz Workbench

Developed by the IRSN and distributed under the BSD<sup>3</sup> license (https://github.com/Funz), Funz is a server client engine designed to support parametric scientific simulations. An overhanging graphical user interface design for practical engineering is also available at http://promethee.irsn.fr. Funz can be easily and quickly linked to any computer simulation code through a set of wrapping expressions (a set of regexp-like lines in the ASCII file). It uses the R programming language that is freely available and widely used by the scientific community working with applied mathematics. In addition to its use by the research community, this language is also used by many regulatory organizations. It is therefore simple to integrate algorithms developed and validated by the scientific community into Funz, which can then be applied to the previously-linked computer codes. Thus, to perform this study, Funz was used to link Telemac (plugin available at https://gtihub.com/Funz/ plugin-Telemac) and the RSUR algorithm (available at https:// github.com/Funz/algorithm-RSUR).

Moreover, in order to perform a given set of computations, Funz can bring together independent servers, clusters (above their own queue manager if available), workstations, virtual servers (see **Annex** for Amazon Web Services / EC2 example), and even desktop computers running Windows, MacOS, Linux, Solaris, or other operating systems. While more focused on medium-sized performance computing (usually less than 100 concurrent connected instances), the high performance bottleneck is indeed delegated to each connected simulator instance, able to require dozens of CPUs independently (or even launch a Funz master itself).

## 5. RESULTS, ANALYSIS, AND CONCLUSIONS

#### 5.1. Overview Results of Algorithm

As previously mentioned, the RSUR algorithm is parameterized with:

• Controlled parameters: Z<sup>a</sup> ∈ [18, 25] (slab elevation), Z<sup>f</sup> ∈ [20, 28] (dyke elevation),

<sup>3</sup>Berkeley Software Distribution.

<sup>2</sup>Application Program Interface.


Among these, the computing parameters (20 batches of 8 simulations) are defined arbitrarily, considering that:


The convergence of the algorithm is measured by the remaining uncertainty on the control set volume (numerically, this is the opposite of the RSUR criterion value (see Chevalier et al., 2014 for computing details) in **Figure 5**. This quantifies the "fuzzy zone" where it is still unclear whether the safety limit is exceeded or not, or in other words, where the limit is exceeded with a probability which is not 0 or 1 (visually the "gray zone", while the "white zone" contains definite unsafe points, and the "black zone" definite safe points, **Figure 6**).

It should be noted that some intermediate raising of the RSUR criterion occurs (e.g., between iterations 10 and 11), when the kriging metamodel is changing abruptly because of the last data acquired, thus correcting the fitting of some kriging parameters. Between iterations 10 and 11, the range of covariance over the Z<sup>f</sup> variable decreases (from 9.1 to 7.43), so the algorithm

on the controlled Z<sup>a</sup> and Z<sup>f</sup> parameters, along some RSUR iterations (last added points as triangles, exceeding the points in red). The fairly dense sampling of the "safe" zone represents a practical safety guarantee.

expects a lower regularity on the Z<sup>f</sup> dimension, which leads us to "discover" an unexpected safe zone for Z<sup>a</sup> > 24 & Z<sup>f</sup> < 23.

The safety-controlled set S<sup>c</sup> = {Za, Zc} is iteratively identified, with increasing accuracy along the iterations (**Figure 6**):

It should be noted that the sampling density of the three zones is intrinsically different because of the asymmetry of the algorithm concerning the property of "unsafety":


In addition to this raw result on the S<sup>c</sup> subspace, it is also useful to consider some specific coordinates in their S<sup>u</sup> projection. The response surface interpolated (by the kriging mean predictor) on such interest points, whose side of the frontier may lead to the safety criterion being exceeded (**Figures 7**–**9**) (maxt(Hp1) > 0.25 are red points):


It should be noted that it was quite expected that maximum duration and discharge would be observed as the most penalizing configuration of this study, which is confirmed by these projections. When such an assumption may be proven prior to a study, it should be reached by using just an SUR algorithm, reducing the optimization space to a known value: S<sup>u</sup> = {Tp, Qp} = {max(Tp), max(Qp)}.

#### 5.2. Validation of Results

Validation of the previous results [i.e., the accuracy of the safe/unsafe limit where maxt(Hp1) ≃ 0.25] should be performed considering the accuracy of the kriging metamodel obtained at the end of the algorithm iterations. Nevertheless, it is sufficient to verify that the metamodel is sufficiently accurate for maxt(Hp1) ≤ 0.25, as the inaccuracy of the metamodel for higher values where maxt(Hp1) >> 0.25 has no crippling impact on safety.

The best possible measurements of metamodel inaccuracy should be obtained using a fully-independent test basis of points, which is unattainable in most practical cases, where financial and resource constraints prevail. A cheaper alternative lies in cross-validation inaccuracy estimators, keeping in mind that this estimate is a "proxy" of the metamodel prediction error (see Bachoc, 2013). We will use the standard "Leave-One-Out" (LOO) estimator which computes the Gaussian prediction at each design point x<sup>i</sup> ∈ X when x<sup>i</sup> is artificially removed to re-build the kriging metamodel (see Equation 7):

$$M\_{-i} = M\_{n - \{\mathbf{x}\_i\}} = \mathcal{N}(m\_{-i}(\mathbf{x}), \mathbf{s}\_{-i}^2(\mathbf{x})) \tag{7}$$

Where Mn−{xi} is the kriging metamodel conditioned by the n points {X, Y} (just like "M" in (1)), **minus**the point x<sup>i</sup> (previously belonging to X).

We compare this "blind" (somewhat, considering that x<sup>i</sup> was not really chosen randomly) prediction y−<sup>i</sup> (expectation and 95% confidence interval) to the true value y<sup>i</sup> we already know (see **Figure 10**):

We observe that:


Moreover, along the algorithm iterations the number and accuracy of the interest points increases (i.e., 95% confidence interval and bias decreasing together), so that the safety limit uncertainty decreases at the same time (see **Figure 11**):

# 5.3. Engineering Analysis

From an engineering perspective, the numerical tool developed in this study (sections 3.1, 3.2, and 3.3) allows us to identify the possible design parameter combinations (Za, Z<sup>f</sup> ) suitable to protect the industrial area against all the possible flooding events identified for the Garonne river (Qp, Tp) (see all hydrographs simulated in **Figure 12**):

In particular, according to the parameters reported in **Table 1**, we study the combination of the elevation of the industrial basement Z<sup>a</sup> and the current dyke height Z<sup>f</sup> for all the possible flood events that could cause flooding of the area of interest, and characterized by a peak discharge varying from 2,600 to 8,000 m3 /s and a total duration varying from 2 to 48 h. This choice of uncontrolled parameters for the study seems robust considering that the historic flood event for this area does not exceed 6,040 m3 /s (SMEPAG, 1989).

The results presented in **Figure 6** show that the most relevant parameter for the industrial area's defense against flooding is elevation of the industrial area Za. Specifically, the numerical results show that:


<sup>4</sup>This will remain an hypothesis, as long as this black-box algorithm does not consider any physical knowledge about the case study. A deeper physical study might have rigorously excluded any risk for Z<sup>a</sup> > 24, for instance.

# 5.4. Perspectives

The general methodology proposed in this study seems quite efficient for designing civil engineering for safety purposes. Although the true penalizing configuration of the worst flood event may have been assumed beforehand (which was at least confirmed by a blind metamodel), it might not be the general case for more complex models (in terms of protection degrees of freedom). It should be noted that the rising number of variables (both controlled and uncontrolled) will have a cost in terms of metamodel predictability, and therefore in terms of the number of simulations needed to achieve enough accuracy for the control set limits. Tuning of such algorithm convergence parameters remains to be done for general cases.

Moreover, the developed numerical tool could be employed in several flooding hazard applications, for instance:

FIGURE 10 | Kriging prediction of max<sup>t</sup> (Hp1) vs. true values at the last iteration (19). Gray bars are issued from kriging variance, red line is the safety target, and dashed line is unbiased prediction.


controlled and uncontrolled parameters reproducing the historical data).

Lastly, the primary question that remains is the connection of such an engineering practice to the probabilistic framework often adopted in regulation practice. The mainstream idea should be to weight the hydrograph by their probability of occurrence (thus inside the integral expression of the RSUR criterion), which will link the protection design to the flood return period as required.

In a probabilistic framework, results from this kind of analysis should indicate which return period the protections are

## REFERENCES


associated with and even which return period is hazardous for the civil or industrial installation.

## AUTHOR CONTRIBUTIONS

YR: algorithmic inversion part and computing process. VB: Garonne model and case study definition.

## ACKNOWLEDGMENTS

The authors want to thank Nicole Goutal and Cédric Goeury (EDF R&D) that provide the data necessary for this study.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Richet and Bacchi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## ANNEX: SUPPLEMENTARY INFORMATION FOR IMPLEMENTATION

As already mentioned, this study relies on the Funz engine (Richet, 2019) which provides the computing back end to distribute calculations according to the RSUR algorithm. Considering that the reproducibility of this study is a standard deliverable, the following supplementary information details both the software and hardware implementation.

The "master" computer hosting Funz is a basic desktop computer (just running the Funz engine and RSUR algorithm), which will also remotely start the 8 EC2 instances (**Script 1**), create the SSH tunnels for protocol privacy and install Telemac on each instance.

```
for i in 'seq 1 8'
do
  FunzDaemon-EC2.sh -d "lib/Funz/scripts/
  install_OpenTelemac.sh lib/*.slf" \
     -c "bash./scripts/install_OpenTelemac
.sh" -o $i &
done
```
**Script 1:** Deploy 8 Funz services on EC2, including the Telemac installation.

All TELEMAC-2D calculations are then performed on the 8 servers (suited to the mesh used in this model) started on the Amazon Web Services EC2 cloud computing platform (**Screenshot 1**).

The main Funz script (**Script 2**) then starts the RSUR algorithm on the Telemac Garonne model, which is compiled (meaning study parameters are inserted in the template model files) and sent to the EC2 instances as required when the RSUR asks for a calculation point.

```
Funz.sh RunDesign \
  -m Telemac \
  -if t2d_garonne_hydro.cas t2d_garonne.cli
Qmax_Garonne_CMWR3 \
      princi_wall.f poi.txt loihq_Garonne \
  -iv $Q_p$=[2600,8000] $T_p$=[3600,86400]
$Z_f$=[20,28] $Z_a$=[18,25] \
  -oe "Numeric:max_t(H_p1)" \
  -d RSUR -do xinv.index='3,4'
 xopt.index='1,2' ytarget='<0.25' \
              initBatchSize='8'
 initBatchBounds='true' batchSize='8'\
              iterations='20'
```
**Script 2:** The main Funz command which starts the RSUR algorithm on the Telemac Garonne river model (files t2d\_garonne\_hydro.cas t2d\_garonne.cli Qmax\_Garonne\_CMWR3 princi\_wall.f poi.txt loihq\_Garonne).

Ultimately, this comprehensive study of 20 RSUR iterations requires about 10 h availability for all computing instances, equating to a cost of less than \$100 (at the current standard rates of the main cloud computing platforms).

# Combining Clustering Methods With MPS to Estimate Structural Uncertainty for Hydrological Models

Troels Norvin Vilhelmsen<sup>1</sup> \*, Esben Auken<sup>1</sup> , Anders Vest Christiansen<sup>1</sup> , Adrian Sanchez Barfod<sup>1</sup> , Pernille Aabye Marker<sup>2</sup> and Peter Bauer-Gottwein<sup>3</sup>

<sup>1</sup> HydroGeophysics Group, Department of Geoscience, Aarhus University, Aarhus, Denmark, <sup>2</sup> Sweco, Glostrup, Denmark, <sup>3</sup> DTU Environment, Department of Environmental Engineering, Technical University of Denmark, Lyngby, Denmark

This study presents a novel expansion of the clay-fraction (CF)/resistivity clustering method aiming at developing realizations of subsurface structures based on multiple point statistics (MPS). The CF-resistivity clustering method is used to define a data driven training image (TI) for MPS simulations. By combining this TI with uncertainty estimates obtained from correlation between the resistivity models and the unique categories in the TI, subsurface realizations are generated honoring geophysical and lithological data. The generated subsurface realizations were calibrated in a steady state groundwater model. Forecasts of well catchment zones were derived based on two wells located in areas with different levels of structural uncertainty. The catchment probability maps derived from the structural realizations were compared with the well catchment forecasted by a deterministic subsurface structure, and we are able to capture this catchment within the estimated uncertainties. We believe that this study is the first to combine MPS methods with a complete data driven workflow going directly from lithological and geophysical data to realizations of the subsurface structures. The main benefits of this is that it is data driven, fast, reproducible, and transparent.

#### HIGHLIGHTS

\*Correspondence: Troels Norvin Vilhelmsen troels.norvin@geo.au.dk

Edited by: Philippe Renard,

> Reviewed by: Liangping Li,

Frederic Nguyen, University of Liège, Belgium

Ryan Martin,

Université de Neuchâtel, Switzerland

South Dakota School of Mines and Technology, United States

University of Alberta, Canada

#### Specialty section:

This article was submitted to Hydrosphere, a section of the journal Frontiers in Earth Science

Received: 16 December 2018 Accepted: 24 June 2019 Published: 10 July 2019

#### Citation:

Vilhelmsen TN, Auken E, Christiansen AV, Barfod AS, Marker PA and Bauer-Gottwein P (2019) Combining Clustering Methods With MPS to Estimate Structural Uncertainty for Hydrological Models. Front. Earth Sci. 7:181. doi: 10.3389/feart.2019.00181 - Fast, transparent, and data driven uncertainty estimate for subsurface structures in groundwater models.


Keywords: groundwater modeling, uncertainty analysis, multiple point statistics, structural uncertainty, SkyTEM, SNESIM, MODFLOW

# INTRODUCTION

Groundwater models are routinely used as tools for decision support in water resource related questions (e.g., Boronina et al., 2003; Almasri and Kaluarachchi, 2007; Mylopoulos et al., 2007; Saravanan et al., 2011; Sedki and Ouazar, 2011; Manghi et al., 2012; Enzenhoefer et al., 2014). However, despite being advocated often (e.g., Pappenberger and Beven, 2006;

Tartakovsky et al., 2012) uncertainty analysis is still not common practice in practical applications of groundwater modeling studies (Sanchez-Vila and Fernandez-Garcia, 2016; Delottier et al., 2017). Some arguments for the missing adaptation amongst professionals include limited access to software solutions (Renard, 2007; Tartakovsky et al., 2012), insufficient teaching and training of students, that can bring methods into practice (Renard, 2007; Sanchez-Vila and Fernandez-Garcia, 2016), reluctance among clients and decision makers to embrace model results presented with uncertainty (Freeze, 2004), and a further push for application of stochastic methods in practice (Renard, 2007), and finally limited access to regional scale, high resolution datasets. Most of the literature cited above, refers to uncertainty analysis in a broad context.

In the following, we will limit our focus to the sub-domain that deals with stochastic methods to represent uncertainty on subsurface structures in hydrological models. Traditionally, when performed, these stochastic simulations have been based on 2-point statistics such as sequential indicator simulation or sequential Gaussian simulation (Deutsch and Journel, 1998). However, the simplified assumptions of the 2-point statistics will often be insufficient to describe the complexity of the subsurface that governs groundwater flow (Zinn and Harvey, 2003). An alternative to the 2-point statistics is multiple-point statistics (MPS) (Hu and Chugunova, 2008; Mariethoz and Caers, 2015). In MPS, the statistical model consists of a training image (TI). This TI describes the spatial variability and patterns expected in the subsurface. As the name implies, MPS use multiple point information to estimate the correlation in structural patterns, and thus yields a more realistic representation of subsurface structure. A detailed review of the various MPS methods available will not be provided here, additional information can be found in e.g., Hu and Chugunova (2008) or Linde et al. (2015).

A common challenge for practical application of MPS methods is the lack of availability of a TI (Bastante et al., 2008; dell'Arciprete et al., 2012). TI's are often of conceptual nature, and encapsulates the experts knowledge of the system being analyzed (Caers, 2001). TI's can also be generated using parametric equations or simple templates (Maharaja, 2008). Using this approach a set of TI can easily be developed that encapsulates different conceptual interpretations. This approach was used by Hermans et al. (2015) for a local scale study of an alluvial aquifer. A limitation of this approach is that it is difficult to produce TI's that resemble the full complexities of the system, and the resulting TI will often be an over simplified version of the field conditions. Another common approach to generating TI is through manual interpretation of subsurface structures (Huysmans et al., 2008; Hoyer et al., 2017; Barfod et al., 2018). This approach integrates geological expert knowledge about the system, thus making it a more direct representation of it. For practical applications this approach has two limitations. First, it is a more time consuming methodology, resulting in a more expensive end product. Second, the derived TI may be highly non-stationary, therefore putting high constraints on additional auxiliary data to constrain the modeling (Strebelle, 2002). A TI can also be generated directly from the datasets collected in the field (Silva and Deutsch, 2014). Similar to manual interpretations, this approach will often result in a non-stationary TI, and the simulations must therefore be made by including additional auxiliary variables. Another limitation to this approach is that the resolution of the TI is limited by the collected data (Linde et al., 2015). Processes resulting in structural variability below this scale of resolution can therefore not be encapsulated in the TI. It is therefore important, to acknowledge that model realizations cannot be produced with a higher resolution than the input dataset can provide. This methodology also requires input datasets with a sufficiently high spatial density and areal coverage rarely possible to obtain by e.g., drilling.

Large spatial coverage can be obtained using airborne electromagnetic (AEM) methods. The SkyTEM system (Sørensen and Auken, 2004) is one of the available AEM methods that have been used extensively for groundwater mapping (Møller et al., 2009b; Pryet et al., 2012; Korus et al., 2017; Knight et al., 2018). By inverting the EM dataset a model of the electrical resistivity of the subsurface can be produced. In sedimentary environments where the groundwater is uninfluenced by pore water salinity, structures with high resistivity will often be the water bearing units or aquifers, and structures with low resistivity will be the aquitards or confining units. This link between subsurface hydrological units and resistivity can, however, be both spatially variable and uncertain. To reduce this uncertainty, the interpretations can be assisted using lithological logs from boreholes. This process has been formalized in the clay fraction (CF) inversion method (Christiansen et al., 2014; Foged et al., 2014). Marker et al. (2015) further extended this methodology using k-means clustering to reduce the dimensionality of the CF and resistivity data to derive a set of hydrogeological units. In the following this methodology will be named CF-resistivity clustering.

In this study, we investigate how to utilize the CF-resistivity clustering method to create a 3D TI. A TI can thereby be created specifically for the investigated area using a fast and data driven approach. Subsequently, this TI is used together with an estimate of the subsurface uncertainty and hard data from boreholes to generate multiple realizations of the subsurface structures. Structural realizations are then incorporated into a groundwater flow model and calibrated to available hydrological data. Finally, the model realizations are used to estimate the uncertainty of a well catchment zone. By presenting an approach to deal with structural uncertainty analysis, we show how geostatistical methods can be applied to problems relevant to most groundwater resource professionals.

## MATERIALS AND DATA

#### Study Area

The study area used to document the proposed methodology is located north-west of the town of Aarhus, Denmark (**Figure 1**). It is of principle interest to the local authorities and the public water supply company, due to its proximity to Aarhus, and its rich groundwater resources are important for the supply of drinking water. The area has been subject to several geophysical and geological mapping campaigns

(Kronborg et al., 1990; Danielsen et al., 2003; Sandersen and Jørgensen, 2003; Jørgensen and Sandersen, 2006; Møller et al., 2009a; Høyer et al., 2015). These studies have revealed a dense network of buried valley structures, which are incised into a thick sequence of Paleogene clays. In most parts of the area, these clays form the lower impermeable bed for the primary groundwater reservoirs located inside these valley structures. However, not all valleys are filled with sand and gravel (thus constituting aquifers); some are filled up with glacial deposits mainly consisting of clay till. These clay-filled valleys can act as hydraulic barriers and clay layers covering the primary aquifers. Besides the valley structures, plateaus between the valleys can hold till deposits consisting of either sand/gravel and clay deposited during the last glaciation. A more detailed description of the geological structure of the area can be found in Høyer et al. (2015).

## Geophysical and Lithological Data

Lithological information from boreholes was extracted from the Danish national well database, Jupiter (Møller et al., 2009a). Within the test site, approximately 400 boreholes were available. The majority of these boreholes are shallow (below 50 m depth), and only a small subset (∼3%) reaches depths of more than 100 m (**Figure 1**). The quality of the lithological information provided by these boreholes varied, based on their drilling method, purpose, age, etc. Therefore, to utilize these boreholes in a modeling context, their quality must be quantified. In the present study, this was done using the weighting scheme presented by He et al. (2015).

The geophysical dataset was collected using the SkyTEM304 system (Sørensen and Auken, 2004) in August 2013. The data consist of 24.600 SkyTEM soundings collected over 330 line kilometers flown with a line spacing of 100 m. Soundings were made with a spacing of 25 m along the flight lines. The geophysical dataset was processed using Aarhus Workbench following the methodology described by Auken et al. (2009). During processing, data biased by EM-noise are removed. The resulting data positions after processing are shown in **Figure 1**, where each position represents one 1D EM model. The inversion was carried out in a spatially constrained inversion (SCI) (Viezzoli et al., 2008) with a 1D sharp model formulation (Vignoli et al., 2015) as implemented in the AarhusInv inversion code (Auken et al., 2015). In an SCI, a stratified geological environment is mimicked through horizontal and vertical regularization constraints between proximate 1D resistivity models. The sharp inversion setup secures that smooth vertical and horizontal transitions in resistivity are penalized in the objective function, thus promoting more homogeneous resistivity models with fewer but sharper boundaries. To avoid over interpreting resistivity models with low sensitivity at large depths, all models have been individually blanked at their estimated depth of investigation (DOI) calculated as described by Christiansen and Auken (2012).

# Hydrological Data and Forcing

Time series of stream base-flow estimates were determined for three sub-catchments within the regional model domain (outlined in **Figure 1C**), using the automatic filter approach presented by Arnold and Allen (1999) and Arnold et al. (1995).

Estimates of steady state hydraulic heads were obtained for a total of 506 samples within the regional model area (**Figure 1C**). 106 of these samples were collected in the 62 wells that are screened in multiple aquifers, while 336 of the wells only have one screening interval, and thus a single steady state head estimate. Within the local groundwater model area, a total of 94 observations were available. 19 of the samples were collected from the 11 wells screened at multiple depths, while 75 of the wells only have one screening interval. Uncertainty of each head estimate was evaluated, taking into account factors such as the source of xy coordinates (GPS, read from map etc.); the method used to determine the reference level of a borehole; the length and temporal variation of time series used for estimating steady state heads; and the quality of the borehole (ranging from boreholes being part of the national groundwater monitoring program to old wells used for single household water supply).

Recharge rates were estimated using input from the national groundwater resource model of Denmark (Henriksen et al., 2003; Sonnenborg et al., 2003). First, the three dominating soil types near terrain were extracted from the model (sand, clay, and humus). On top of that, four land use types were defined (urban, farmland, forest, and water). By combining these, a total of 11 unique combinations were achieved. Subsequently, information on temperature, daily potential evapotranspiration, and precipitation was extracted from the national water resource model (assumed constant over the domain), and these were mapped onto the 11 unique zones. The recharge time series was then estimated using a simple linear root zone model (Jeppesen, 2001). Since the model used in the present study is steady state, these recharge estimates were averaged to a single value for each of the 11 zones.

## Groundwater Model Setup

The groundwater model used in the study is set up in MODFLOW-USG. This version of MODFLOW was chosen such that the regional groundwater flow could be incorporated into the model setup while still having a fine numerical resolution within the local focus area. Details on the definition of the regional flow model can be found in Vilhelmsen et al. (2018), and the same model setup was used in Marker et al. (2017). The outline of the regional scale model can be seen from **Figure 1C** where it is marked with an orange line. The outline of the local scale model can be seen from **Figure 1A** where it is marked with a gray dotted line. Both the local model and the regional model have 11 model layers, and the local and the regional model have a horizontal discretization of 50 m and 100 m, respectively.

Two pumping wells have been chosen as focus for the present study. Both wells are located in existing well field areas. The location of the wells is marked with black stars on **Figure 1**. The Kasted well is part of the Kasted well field, and the Ristrup well is located within the Ristrup well field. These two locations were also chosen to represent a subarea with more complex geological settings (Kasted well), where noise in the geophysical data set resulted in areas with limited data coverage. The other well (Ristrup well) is located in a more homogeneous geological setting, where there was almost full coverage of the EM data.

# METHODS AND THEORY

# Clay Fraction-Resistivity Clustering

To obtain categorical information from lithological logs and resistivity models, we used the clay fraction (CF)-resistivity clustering methodology at borehole locations and sounding sites (Marker et al., 2017). The basic idea of the CF-resistivity clustering is to reduce the dimensionality of the input data (resistivity models and lithological logs), such that these are represented by a few (three or four) categorical variables. These categorical variables form the basis for the structural analysis of the subsurface.

The first step in this analysis is the clay-fraction inversion. Details on this methodology can be found in Christiansen et al. (2014) and Foged et al. (2014), and only a short resumé will be given here. In the CF inversion, boreholes are subdivided into equal discrete vertical intervals. Within each interval, the fraction of clay is determined, such that a CF of 1.0 is obtained if the discrete interval only contains clay and a CF of 0.0 if the interval does not have clay layers. The CF estimated at the boreholes, together with the resistivity models (averaged over the same elevation intervals), constitutes the dataset used in the CF inversion. The objective of the inversion is then to estimate a two-parameter error function that translates resistivities into CF such that the optimal translation occurs between CF estimated from boreholes and CF estimated from the resistivity models. Here, the two parameters are defined as the upper threshold, where all resistivities are translated to 0% clay, and the lower threshold is where resistivity values are translated to 100% clay. Since the correlation between resistivity and borehole lithology can vary spatially within the survey area, the translation function parameters are defined on a 3D grid, where each grid node holds the twoparameter translation model. To constrain the inversion, the translation functions are regularized such that smooth translation parameters are preferred.

Once the CF inversion has been completed, each sounding site will have a set of resistivity and CF values with a predefined vertical resolution. To include the information at borehole locations, we also chose to interpolate a resistivity value using kriging from the geophysical soundings onto the boreholes. This provided us with both a resistivity and a CF at borehole locations.

For dimensionality reduction of the multivariable cloud to obtain a categorical dataset, we used K-means clustering (Wu, 2012), as implemented in Scikit-Learn in Python (Pedregosa et al., 2011). The clustering was performed on the principal components of the CF and the normalized resistivities (these were normalized to a scale of 0.0 to 1.0).

The outcome of the cluster analysis is a set of categorical values distributed in a three-dimensional space, which we can use to condition the generation of the subsurface realizations. The location of the categorical values is defined by the location and depth of the SkyTEM soundings and the locations and drilling depth of the boreholes. To fill the gaps between these data points, we use simple indicator kriging (IK) (Goovaerts, 1997, p. 293). Here, each cluster was treated as an indicator variable, which was each fitted with a variogram model. IK was then used to estimate the probability of each cluster distributed on a 3D grid. Based on these probabilities, an estimate of the most likely representation of the subsurface structure, given a predefined number of clusters, was created. This model constitutes the TI in the following simulations.

#### Multiple-Point Geostatistics

We could have used the estimated distribution of categorical variables together with the estimated variogram models to generate realizations of the subsurface structures with sequential indicator simulation (SIS) (Deutsch and Journel, 1998). This approach was used by Marker et al. (2017) for the same study area. However, SIS is limited by the amount of structural information that can be carried in the variogram model (Journel, 2005; Journel and Zhang, 2006). As the density of conditioning data increases, the dependency of the simulation result on the variogram model is reduced. However, to approximate the required data coverage the geophysical data had to be treated as hard data in the simulation. This constitutes a problem, since the probability of a given cluster will be dependent on the CF and the resistivity value of the pertaining data point. By treating the geophysical data as soft data, this information can be incorporated into the analysis, but it reduces the amount of hard data for the SIS, thus rendering it inapplicable.

To alleviate this problem, we have employed the MPS method single normal equation simulator (SNESIM) (Strebelle, 2002) in the present study. The higher order statistics incorporated in MPS methods allows them to capture and reproduce more complicated structures compared to variogram-based methods. Structural realizations will also create more coherent geological units, often making a better representation of actual field conditions. In SNESIM, the conditional information needed for the geostatistical simulation is obtained by scanning the TI. In SNESIM, this scanning is performed once, and the information is stored in a search tree structure. SNESIM is a sequential approach and it will visit each simulation node in the grid based on a random path. To incorporate hard data in the simulation, the data categories are assigned to the most proximate grid node prior to simulation. Soft data is incorporated into the analysis by updating the TI based probabilities, using the soft data probabilities. This update is done using the tau model (Journel, 2002). Conditional soft data must therefore be transformed into probabilities that can be used for conditioning. In the present study, this is done based on the conditional probabilities derived from cluster histograms. Alternatively, this information could be obtained from other statistical correlations between geophysics and lithology (Hermans et al., 2015; Barfod et al., 2016; Christensen et al., 2017b).

Single normal equation simulator is a complex algorithm, with many adjustable parameters. Fine tuning of the algorithm is therefore not always straight forward. In the present study, we derived our settings using a combination of the guidelines defined by Liu (2006) and He et al. (2014), and looping over parameter combinations using the Python interface implemented in SGeMS (Remy et al., 2009). The optimal parameters for the present study is defined in **Table 1**. These parameters were defined based on a compromise between simulation results and computational burden. Parameters are only defined for the subset of parameters that deviate from default values (Remy et al., 2009).

In the present study we used vertical proportions and soft conditioning to define the target distributions. Both were defined based on the outcome of the cluster analysis. The entire workflow for generating structural realizations from MPS based on CFresistivity clustering is outlined in **Figure 2**.

#### Groundwater Model, Particle Tracking and Inversion

The groundwater model was defined in MODFLOW-USG (Panday et al., 2015). MODFLOW-USG is a control-volume finite difference version of the well-known USGS groundwater flow code MODFLOW (Harbaugh et al., 2000; Harbaugh, 2005). Recharge was incorporated using the recharge package. Major stream segments were simulated using the River package. Trenches and minor stream segments were simulated using the drain package. Groundwater abstraction wells were simulated using the well package, and outer boundary conditions on the regional model edge were defined as either no-flow or constant head boundaries. This was determined based on the local hydrological conditions.

All model realizations, generated using SNESIM, were incorporated into the groundwater flow model and calibrated using PEST (Doherty, 2016). This was done by introducing the local groundwater model in a regional model used to simulate the potential interactions with the regional flow system. The parameters included in the inversion were the horizontal hydraulic conductivities of the determined clusters (except for the largest cluster, which was fixed based on sensitivity analysis in both the three- and the four-cluster model). Vertical hydraulic conductivities were tied to their horizontal counterparts at a ratio of 1:10. The two other parameter groups were the conductance terms of the drains and the streams within the local area, and the recharge multiplier. The objective function was defined as

$$\phi = \sum\_{i=1}^{Nh} \left( \left( h\_{sim,i} - h\_{obs,i} \right) / \sigma\_{h,i} \right)^2 + \sum\_{i=1}^{Nq} \left( \left( q\_{sim,i} - q\_{obs,i} \right) / \sigma\_{q,i} \right)^2 \tag{1}$$

Where Nh is the number of steady state head observations, hsim is the model simulated hydraulic heads, and hobs are their observed counter parts. The residuals on the heads were weighted with the inverse of the observation standard deviation. Similar, qsimq, qobs, and σ<sup>q</sup> are the simulated, observed and uncertainty of the stream flow data. For the head data we also introduced an arbitrary weight (wi) to the hydraulic heads. This was done to reduce the individual contribution to groups of observations located closely. All inversions were well posed, and we did not need to impose any regularization constrains.

To determine the catchment zones for the two pumping well scenarios we used mod-PATH3DU (Muffels et al., 2014).

TABLE 1 | Parameters used in the SNESIM simulations. Parameters are only defined when they deviate from the default values.



FIGURE 2 | Outline for generating structural realizations based on MPS.

Catchment zones were determined using backward tracking with 16.000 particles from the pumping well to terrain. This number of particles was the minimum required to make the catchment independent on the particle starting locations. Particle starting locations were randomly distributed within a cylinder. The height of this cylinder was equal to the pumping-well screening interval, and the diameter of the cylinder was determined such that the amount of recharge occurring within the well catchment zone balanced the amount of groundwater abstracted by the well. This estimation was done by trial and error based on the calibrated mode model. The catchment was discretized into 50 m by 50 m cells, and a cell was determined to be part of the catchment if at least one particle terminated in the cell. This tracking was performed for each of the stochastically generated subsurface structures. Once completed, the catchment probability maps were generated based on an averaging over all realizations.

#### RESULTS

**Figure 3** shows the results of the k-means cluster analysis. The clustering was performed for three and four clusters. This was chosen based on two criteria. The three-cluster model was chosen since this is most comparable with the deterministic model available for the area, and the four-cluster model was chosen since this was deemed optimal in Marker et al. (2017), based on an optimal data fit criteria. The number of units are also limited by the resolution of the input variables. In this case consisting of resistivity and CF values. The subdivision of clusters in the clouds spanned by the resistivities and the CF can be seen in the bottom subplots (E and F). Each point in these clouds thus represents a single pair of CF and resistivity values. These clusters are then mapped in histograms for the resistivities (subplot A and B). These histograms can also be used to estimate the conditional probabilities based on the kernel density functions (subplot C and D).

Each point in the data clouds represents a specific discrete interval in a resistivity sounding or a borehole. This means that the clusters can be mapped back as a point distribution in three dimensions. The fitted variogram models used for IK can be seen from **Figure 4**. Very importantly, the model obtained with IK (the mode model) is used as a TI in the MPS simulations. Cross sections through the mode model are shown in **Figure 5**. From the three-dimensional distribution of clusters, we can also calculate the vertical fractions of each cluster with depth. The result of this calculation can be seen from **Figure 5**. This information is used as additional conditional information in the SNESIM algorithm.

**Figure 5** contains the information needed for the MPS simulations for the three-cluster model. A similar plot can be made for the four-cluster model, but this was considered redundant in the present case. The figure shows cross sections through the TI in the top left, and the vertical fractions in the top right. In the bottom, probability maps for each cluster are shown. These probability maps are determined from the conditional probabilities estimated using the histograms shown in **Figures 3A,B**. For each data point in space, we calculated the conditional probability of a given cluster based on the histograms. Subsequently, a variogram model was fitted to the probabilities for each cluster, and the probabilities were interpolated in three dimensions to fill out the model grid. Finally, the probabilities over all clusters were scaled such that they summed to one within each cell in the model grid. These probability grids are the ones shown in the bottom of **Figure 5**. The same methodology was used to estimate the probabilities in the four-cluster model. A similar methodology to estimate the probabilities was used by Hermans et al. (2015). The only conditional data that is not presented in **Figure 5** is the hard data used in the analysis. Hard data were only used for screen intervals of pumping wells in groundwater model. In the MPS simulations, these intervals were conditioned to be part of cluster 0, namely the cluster with the highest sand fraction. This is a reasonable assumption, since most of these wells have been active for several years and must therefore be screened in sandy deposits.

Besides using the estimated probabilities for conditioning the MPS simulations, they can also be used to illustrate the uncertainty of the underlying structures. Such plots have been generated in **Figures 6**, **7**. **Figure 6** shows a cross section through the Kasted well field (the location of the cross section can be seen from **Figure 1**). **Figure 6A** shows the cross-section through the three-cluster model, **Figure 6C** shows the cross section through the four-cluster model, and **Figure 6D** show the cross section through the deterministic model structure. Each of these models have then been overlaid with a transparency filter based

on the uncertainties determined using the approach described above. Here transparent parts indicate high uncertainty. For the deterministic model we used the uncertainties derived from the three-cluster model. Similar plots can be seen for the cross section through the Ristrup well field in **Figure 7**. Based on the cross sections- some interesting aspects can be observed. First, many of the dissimilarities between the deterministic and the auto generated models are smeared out. Second, the plots conform with the background information from the area, namely that the heterogeneity and the uncertainty in the structure appears larger in the Kasted well field compared to the Ristrup well field. Making this comparison, it should of course be taken into account that the Kasted profile is three times as long as the Ristrup profile.

**Figure 8** shows a horizontal cross-section through the same depth interval of the deterministic model, the TI, three out of the 100 generated model realizations generated using SNESIM, and three model realizations generated using SISIM.

The model realizations generated with SNESIM have much more resemblance with both the manual interpretation and the TI. Also, the realizations generated with SISIM, show patterns with limited correlation to either of the remaining models presented on the figure. It should also be noted, that the three realizations generated with SISIM are quite similar. This is a feature caused by using the geophysical data as hard data in the simulations.

The generated model realizations from SNESIM were all incorporated into the groundwater model, and calibrated to hydraulic heads and stream flows. The result of this calibration with respect to simulated and observed hydraulic heads can be seen from **Figure 9**. By comparing the three plots, the performance of the models is similar. There is a slight scatter about the identity line, but it appears similar for the three setups, both with respect to distance and outliers. Taking the structural uncertainty into account, most simulations estimated using the deterministic modeling approach fall within the simulated uncertainties of the structural realizations.

Analyzing the estimated parameter values, the log transformed hydraulic conductivity for the three-cluster model is −4.97 m/s for cluster 0 with a standard deviation of 0.14 and −5.21 m/s for cluster 1 with a standard deviation of 0.27. Cluster 2 was fixed in the inversion, due to the very low hydraulic conductivity of this unit. The value for cluster 1 may be a bit high compared to the expected value, but it is not unreasonable, taking the uncertainty into account. It is also in line with the expected result, that the uncertainty of the more clay dominated cluster 1 is larger than the sand dominated cluster 0. For the four-cluster model the parameters are (uncertainty in brackets) −5.00 m/s (0.14) for cluster 0, −5.02 m/s (0.75) for cluster 1, and −5.82 m/s (0.59) for cluster 2. Cluster 3 was fixed for the same reasons as outlined above. These parameter values are also in line with the expected, except maybe for cluster 1, which is a bit high, but its pertaining uncertainty is also high due to its limited presence in the model. This can also be seen from the cross sections on **Figures 6C**, **7C**. The data fits obtained in this study are very similar to those obtained by Marker et al. (2017).

All the calibrated models were subsequently used to forecast the location of the catchment zones for the wells at the two different well fields. The result for the Kasted well using the three cluster models can be seen from **Figure 10**. The figure has been subdivided into two plots, one containing the estimated probability using the structural realizations (subplot A), and one containing the catchment probability overlaid with the particle endpoints obtained from the deterministic structure. Based on the model realizations, the majority of the deterministic catchment is contained within the uncertainty of the stochastic

the cluster probabilities derived from the conditional probabilities estimated based on the histograms in Figure 3A.

simulations. The only discrimination between the two is in the tail stretching toward the west. The tail of the stochastic realizations is offset toward the north. Subplot B also shows the outline of the catchment estimated by Marker et al. (2017). The general trends in the estimated catchments using the SISIM and the SNESIM approach are similar. However, the SNESIM approach is closer to the catchment of the model based on the deterministic structure.

The catchment probability for the Kasted well estimated using the four-cluster model is presented in **Figure 11**. Only small differences can be observed between the catchments estimated with the three and the four cluster models. These are all observed in the low probability region, but almost the entire catchment estimated by the deterministic model structure is contained by the stochastic models.

For the Ristrup well, we have only presented the well catchment using the four-cluster model. These results can be seen from **Figure 12**. Here, the solution is dominated by the effect of the better resolved geology. This is caused by a full coverage with geophysics and the higher resistivity contrasts. Combined, this results in an overall decrease in uncertainty which can be overed by comparing observed in **Figures 7B,D** compared to **Figures 6B,D**. In the plot this is also apparent from the estimated catchment probabilities, especially when these are compared to the catchments estimated using the deterministic structure. These are much better aligned compared to the Kasted Well (**Figure 11**).

# DISCUSSION

In this study we have documented a methodology to automatically generate subsurface structures directly from geophysical data and borehole descriptions. The methodology is fast and reproducible, meaning that realistic subsurface structures for hydrological models can be generated in a few days, once the input data have been prepared. For the geophysical data this includes processing and inversion, and for the lithological data this includes a quality control. In the study, we have extended the CF-resistivity clustering method (Foged et al., 2014; Marker et al., 2015, 2017) to include MPS. This has a major benefit with respect to reproducing the structural patterns observed in the TI, which cannot be resolved using the two-point statistics contained in a variogram model. In the present study this is evident by comparing structural realizations generated using SISIM and SNESIM (**Figure 8**). Here, the

and minus 5 m offsets from the identity line.

SISIM realization have limited resemblance with both the TI and the manual interpretations of the subsurface structures. Moreover, we also argue, that similarities between the SIS generated models are much larger than the models generated with MPS. This is an artifact of applying the geophysical data as hard constraints, and applying 2-point statistics that cannot preserve to expected connectivity of the subsurface structures. To avoid over conditioning the MPS simulations we have updated the framework such that CF estimates and resistivity data are not considered as hard data in the analysis. This is an important contribution, since the uncertainty of the pertaining cluster is highly dependent on the estimated resistivity. This can easily be acknowledged from the conditional probabilities in **Figure 3B**. A resistivity value of approximately 40 m would result in cluster 1, but the probability that this is actually cluster 0 is approximately 30%. In the present study we have utilized the SNESIM algorithm, but the proposed methodology could be utilized with other MPS methods such as e.g., direct sampling Mariethoz et al. (2010) or image quilting Hoffimann et al. (2017).

FIGURE 10 | Probability plots of well catchment zones for the Kasted well estimated from the three cluster model realizations. (A) Catchment probability (B) Probability map overlaid with catchment from deterministic model and outline from Marker et al. (2017).

One of the challenges faced when including geophysical data in automated structural modeling is that the smoothness constraints often imposed as regularization in the geophysical inversion lead to smooth transitions between different subsurface units. Most often, these results are not directly related to changes in lithology, since the smoothing effect caused by the regularization results in a smearing of layer boundaries (Jørgensen et al., 2013). To avoid this unwanted effect, we have adopted the sharp inversion formulation in the present study (Vignoli et al., 2015). This regularization imposes a penalty on gradients in resistivity, thus producing fewer, but sharper transitions between layers in the geophysical model. This methodology has already proven advantageous in groundwater modeling studies (Christensen et al., 2017a) and structural interpretations (Barfod et al., 2018). Despite having sharp layer boundaries in individual 1D resistivity models, these have to be interpolated if a full 3D representation of the subsurface resistivity is required. Traditionally this is done using kriging, thus resulting in smooth transmissions in-between the sharp resistivity models. Similar to the limitations outlined above, these smooth transitions are not representations of the subsurface lithology, but an effect of the interpolation methodology (Jørgensen et al., 2013; Høyer et al., 2015). Utilizing such resistivity grids in automated structural modeling

will unavoidably lead to unwanted artifacts that cannot be related to structures. Such effects can be seen in Foged et al. (2014), Marker et al. (2015), and Jørgensen et al. (2013). Similar to Marker et al. (2017), we therefore choose to manage the geophysical data interpretations at the sounding sites instead of at the interpolated grids. This was done by transforming resistivity values into clusters directly at the resistivity and borehole data locations. Using this methodology, we have minimized the smoothing effects to the extent that was present in the resistivity models, while at the same time acknowledging the uncertainty of the underlying structure at the sounding sites.

Based on the results we argue, that the use of MPS methods compared to SIS as applied by Marker et al. (2017) has several advantages. First, we have not treated the geophysical data as hard data in the simulation. Second, we can easily incorporate supplementary information into the framework, such as the vertical proportions or other auxiliary variables, and third, we are not limited to the information that can be carried in the variogram model. The latter being particular important in areas where the data density is low. In a more general context, similar methodologies could be applied using SIS (Hadavand and Deutsch, 2017), however, this methodology would still be limited to the information content that can be carried in the variogram model.

Compared to previous studies employing CF resistivity clustering, the proposed methodology is not as limited by ability of the EM methods to resolve the major lithological units. This could be areas where clay units have high resistivities or where salinity reduces the resistivity of aquifers. In such areas this would result in none uniqueness visualized as overlapping probability density functions, thereby increasing the conditional uncertainties. The result would be a relatively increase in the uncertainty of the derived structural realizations. In studies, where the majority of the structural modeling is performed based on EM data, this could potentially be an important supplement to the derived model or models. To make the modeler aware of such potential issues, it is recommended to perform an analysis of the EM resolution capabilities. This could either be done using a probability density function as the ones shown in this study, by utilizing a resistivity atlas (Barfod et al., 2016), or based on background geological knowledge of the area. The proposed method could also be restricted to a subdomain of the field site, where the assumption is valid, and other modeling strategies could be utilized elsewhere, similar to Høyer et al. (2016) or He et al. (2015).

The suggested methodology is different from most applied methods to perform MPS modeling, where the TI is derived from important expert knowledge (Linde et al., 2015). This study is not meant to neglect such background or process specific information. We do, however, argue, that the strength of the methodology is limited to the time it takes from data to model development, and it does not exclude incorporation of expert knowledge into the TI in cases where this is deemed necessary to obtain a satisfactory modeling result.

In the present study, we limited our analysis to three and four categorical units. This is a simplification of the true complexity in large scale aquifer systems. However, it is still common practice to parameterize subsurface structures in groundwater models with only a few distinct units. In the present study we set this limit based on two criteria, namely the comparison to the deterministic structure and the comparison to the study by Marker et al. (2017). However, the optimal number of clusters is also limited by the input data. Here these consists of clay fractions, which to a large extent is a binary input. 13% of the clay fractions have values below 0.05 and 73% of the clay fractions have values above 0.95. The geophysical data applied in the present case will have

the highest sensitivity to subsurface units with high electrical conductance. In this area this will mainly be Paleogene clays and clay tills. The method will also be able to distinguish between sand and clays, but due to the limited sensitivity to units with low electrical conductivity, it is more difficult to separate e.g., units of sand and gravel, which both have low electrical conductance in the present field site.

Non-stationarity in the TI is a challenge normally faced when performing MPS. In the present case we accounted for non-stationarity by conditioning structural simulations to a probability distribution of the different units. In some areas, this may not be sufficient to address non-stationarity. Under such circumstances the proposed methodology may not be applicable, and other methodologies may have to be applied. Alternatively if the area can be subdivided into more stationary subparts, a TI can be estimated for each subdomain. These subareas can then either by managed individually using the proposed methodology, or the can be simulated using multiple TI. Such capabilities are not available in the implementation of SNESIM applied here, but they are available in the Direct Sampling algorithm (Mariethoz et al., 2010).

Finally, the present study is limited to an analysis of a method to quantify contribution from structural uncertainty to forecast of a well catchment zone. Several other sources of uncertainty exists in groundwater models, including, hydraulic conductivities, heterogeneity in the distinct units, boundary conditions, forcing data etc. We did not include these sources, as we wanted to single out the contribution from the uncertain structures. The proposed methodology could, however, easily be combined with these other contributions to encapsulate a more complete estimate of the forecast uncertainty.

#### CONCLUSION AND PERSPECTIVES

This study documents a methodology, where the recently developed CF-resistivity clustering method is combined with MPS to estimate the contribution from large scale subsurface structural uncertainty. The derived structures were used to estimate the contribution to groundwater model forecasts uncertainty of two different well catchment zones; one screened in a highly resolved and simpler geological setting and one in a noisy EM environment with a more complex geology.

Compared to the previous studies using the CF-resistivity clustering method, the present combination with MPS methods allows for an improved framework for handling uncertainties, and it addresses the limitations by the two-point variogram models.

#### REFERENCES


Similar to previous uses, the main advantage of the methodology lies in the fact that it is fast, data driven, reproducible, and transparent. We therefore believe that the potential for the method is large with respect to model screening and analysis, either as a direct input to groundwater flow models, or as a support tool for structural modeling tasks.

The methodology is limited by the direct resolution capabilities of the EM methods. Effectively this restricts the resolution to a few distinct units. In the present study, this resulted in modeling of three and four categorical variables. We found limited differences in the performance of the respective groups of models, both with respect to their ability to fit the hydraulic data and their forecast uncertainty. It is therefore not expected that the proposed methodology can provide higher meaningful resolutions than this, except if the model domain is subdivided into separate regimes, which have distinct different geophysical responses or hydrogeological structures. To introduce more units or variability in the models, this must be done in subsequent steps, e.g., by using MPS methods to simulate small scale heterogeneity in the large scale structures (Strebelle, 2002) or using pilot points in the groundwater model calibration (Doherty, 2003). The SkyTEM dataset and the borehole descriptions used in the study can be downloaded from the national Danish databases (Møller et al., 2009a). Borehole and lithological information is stored in the Jupiter database, and the geophysical data is stored in the Gerda database.

#### AUTHOR CONTRIBUTIONS

TV: main contributor to the manuscript and primary developer of the presented content. EA: expert on airborne geophysical methods, leader in the interpretations of geophysical data, and codeveloper of the methods. AC: developer of the clay fraction inversion technique. AB: introduced TV to MPS methods, and assisted in the development of the structural models. PM: assisted in the development of the groundwater flow model. PB-G: original supervisor in the development of the clustering methods used to derive the structures.

## FUNDING

This manuscript was supported by the HyGEM, Integrating geophysics, geology, hydrology for improved groundwater and environmental management, Project No. 11-116763 and rOPEN, Open landscape nitrate retention mapping, Project No. 6150- 00006B. Both project are funded by Innovation Fund Denmark.

Arnold, J. G., Allen, P. M., Muttiah, R., and Bernhardt, G. (1995). Automated base-flow separation and recession analysis techniques. Ground Water 33, 1010–1018. doi: 10.1111/j.1745-6584.1995.tb00046.x

Auken, E., Christiansen, A. V., Fiandaca, G., Schamper, C., Behroozmand, A. A., Binley, A., et al. (2015). An overview of a highly versatile forward and stable inverse algorithm for airborne, ground-based and borehole electromagnetic and electric data. Exploration Geophys. 2015, 223–235. doi: 10.1071/eg 13097


Jeppesen, J. (2001). Vandbalancen for Rodzonen på Als. Aarhus: Aarhus University.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Vilhelmsen, Auken, Christiansen, Barfod, Marker and Bauer-Gottwein. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Hydrogeophysical Parameter Estimation Using Iterative Ensemble Smoothing and Approximate Forward Solvers

#### Corinna Köpke<sup>1</sup> \* † , Ahmed H. Elsheikh<sup>2</sup> and James Irving<sup>1</sup>

1 Institute of Earth Sciences, University of Lausanne, Lausanne, Switzerland, <sup>2</sup> School of Energy, Geoscience, Infrastructure and Society, Heriot-Watt University, Edinburgh, United Kingdom

#### Edited by:

Philippe Renard, University of Neuchâtel, Switzerland

#### Reviewed by:

Jaime Gomez-Hernandez, Universitat Politècnica de València, Spain Thomas Mejer Hansen, Aarhus University, Denmark

> \*Correspondence: Corinna Köpke corinna.koepke@dlr.de

†Present Address:

Corinna Köpke, Deutsches Zentrum für Luft- und Raumfahrt, Institut für den Schutz Maritimer Infrastrukturen, Bremerhaven, Germany

#### Specialty section:

This article was submitted to Freshwater Science, a section of the journal Frontiers in Environmental Science

Received: 14 November 2018 Accepted: 27 February 2019 Published: 20 March 2019

#### Citation:

Köpke C, Elsheikh AH and Irving J (2019) Hydrogeophysical Parameter Estimation Using Iterative Ensemble Smoothing and Approximate Forward Solvers. Front. Environ. Sci. 7:34. doi: 10.3389/fenvs.2019.00034

In iterative ensemble smoother approaches and ensemble methods in general, the ensemble size governs the accuracy of the parameter estimates obtained. However, employing large ensembles may be computationally infeasible in applications with expensive forward solvers. Here, we reduce the computational cost of using large ensembles in iterative ensemble smoothing through the use of a proxy solver. To correct the proxy response for the corresponding model error, the latter of which can bias posterior parameter estimates if left untreated, we propose a local basis approach. With this approach, the discrepancy between the detailed and proxy solvers is learned for a subset of the ensemble and collected in a dictionary that grows with each iteration. For each ensemble member, the K-nearest neighbors in the dictionary are employed to build an orthonormal basis which is used to identify the model-error component of the residual by projection. The proposed methodology reduces the effects of overfitting the data with the proxy solver, but may lead to underfitting of the data in the absence of a sufficient number of dictionary entries, meaning that the number of ensemble members relative to the number of detailed-solver runs cannot be inflated arbitrarily. We present our approach in the context of the ensemble smoother with multiple data assimilations (ES-MDA) algorithm, and show its successful application to a high-dimensional synthetic example that involves crosshole ground-penetrating radar (GPR) travel-time tomography.

#### Keywords: ensemble methods, ES-MDA, proxy model, model error, inversion, uncertainty quantification

# 1. INTRODUCTION

Inverse problems commonly involve computationally expensive forward solvers and large numbers of unknown parameters that are spatially distributed. For risk assessment and effective environmental decision making, parameter uncertainties are required. These can be obtained through, for example, Bayesian stochastic inversion whereby the corresponding posterior distributions are typically sampled using Markov-chain-Monte-Carlo (MCMC) methods. The Bayesian-MCMC framework offers the advantages of providing a natural quantification of parameter uncertainties, as well as the flexibility to incorporate probabilistic information about priors and measurement errors into the inverse problem (Robert and Casella, 2004). However, depending on the forward solver and the dimensionality of the model-parameter space involved, it can be extremely computationally expensive. In many real-world applications, for example, millions of forward model executions may be required to obtain meaningful posterior statistics with Bayesian-MCMC methods (Ruggeri et al., 2015). Although several recent modifications to the standard Metropolis-Hastings algorithm have significantly improved the computational efficiency of MCMC (e.g., Haario et al., 2001; Hansen et al., 2012; Cotter et al., 2013; Chen et al., 2016; Vrugt, 2016; Beskos et al., 2017), these modifications are often still not enough to make such methods practically feasible for many inverse problems.

One way to significantly reduce the computational cost of stochastic parameter estimation is to employ ensemble-based methods. With such methods, an initial ensemble of model parameter sets, drawn from the Bayesian prior distribution, is updated into posterior samples taking into account the available data. The most popular ensemble-based method is the ensemble Kalman filter (EnKF) (Evensen, 1994, 2007), which was developed as a robust sequential data-assimilation technique. A modification of the EnKF for solving parameter-estimation problems is the ensemble smoother (ES), whereby all available data are assimilated in one global update step rather than sequentially. The underlying equations for both the EnKF and ES may be derived from Bayesian statistics (e.g., van Leeuwen, 2001; Evensen, 2007). To deal with non-linear problems, iterative ensemble techniques have been proposed (e.g., Reynolds et al., 2006; Emerick and Reynolds, 2012b; Elsheikh et al., 2013; Stordal and Elsheikh, 2015). The ensemble smoother with multiple data assimilation (ES-MDA) is one of such techniques, in which the single update step of ES is replaced with a number of smaller updates (Emerick and Reynolds, 2012a). The large advantage of ES methods over MCMC for stochastic parameter estimation is that the executions of the forward solver can be parallelized in a straightforward manner.

Despite the computational advantages of ensemble methods over Bayesian-MCMC approaches, it is well known that large ensembles are required for the most accurate parameter estimates and predictions (e.g., Buizza and Palmer, 1998; Chen and Zhang, 2006; Evensen, 2007). As a result, we still have with ensemble methods the possibility that, for highdimensional inverse problems involving expensive forward solvers, accurately sampling from the posterior distribution will remain computationally prohibitive. In such cases, the only solution is to employ an approximate forward solver or proxy. Generating such a proxy can be achieved by simplifying the physics of the problem (e.g., Scholer et al., 2012; Josset et al., 2015a,b), by coarsening the forward model discretization (e.g., Arridge et al., 2006; Calvetti et al., 2014), or by constructing a surrogate model based on, for example, polynomial chaos expansion, Gaussian processes, or neural network techniques (e.g., Khu and Werner, 2003; Rasmussen and Williams, 2006; Marzouk and Xiu, 2009; Goh et al., 2013). However, using a proxy forward solver in the inversion introduces model error, which has the potential to strongly bias posterior statistics (Laloy et al., 2013) and can lead to highly overconfident estimates of the wrong parameters (i.e., posterior collapse) if not accounted for.

To address the issue of model error arising from the use of proxy models in stochastic inversion, researchers have typically focused on two general approaches, both of which rely upon pairs of detailed and proxy solver runs corresponding to different sets of model parameters. In the first approach, these pairs are used to construct a global error model, whose statistics are incorporated into the estimation procedure through, for example, the Bayesian likelihood function (e.g., Kaipio and Somersalo, 2007; Lehikoinen et al., 2010; Schoups and Vrugt, 2010; Evin et al., 2014; Hansen et al., 2014; Smith et al., 2015; Piccolo and Cullen, 2016; Oliver and Alfonzo, 2018). Although this can be highly effective in some cases, we have found that the model errors for many inverse problems exhibit complex behavior that cannot be described in the same way over the entire parameter space. With the second approach, the aim is to construct a local error model, which is generally accomplished through some kind of interpolation between known model-error realizations (e.g., Kennedy and O'Hagan, 2001; Xu et al., 2014; Josset et al., 2015a; Cui et al., 2018). Although doing this effectively addresses the non-global nature of the model errors, it is implicitly assumed that the model-response surface is smooth enough for interpolation to be effective, and problems may arise in regions of the model parameter space that are not well-sampled by the model-error realizations.

Recently, Köpke et al. (2018) presented a new approach to account for model error arising from the use of proxy forward solvers in Bayesian-MCMC inversion, whereby information about the error is gathered during the inversion procedure through occasional runs of the proxy and detailed solvers together, the results of which are stored in a dictionary. In contrast to the existing methods mentioned above, the approach of Köpke et al. (2018) focuses on identifying by projection the model-error component of the residual through the construction of a local orthogonal model-error basis, rather than on the development of a global or local error model. In this paper, we adapt this methodology for use with ES parameter-estimation methods. In particular, we incorporate the related ideas into the ES-MDA algorithm, where for each ensemble member, the local basis is created using the K-nearest-neighbor (KNN) entries in the model-error dictionary. Doing this enables us to accurately solve the parameter-estimation problem using large ensembles, while at same time reduce computational costs through the use of a proxy solver.

The paper is organized as follows: In section 2 we begin with a short review of ensemble methods followed by the presentation of our approach to account for model error. In section 3, we then show the results of applying this methodology to the geophysical inverse problem of estimating spatially distributed radar-wave slowness from synthetic crosshole ground-penetrating radar (GPR) travel-time data. In this regard, results are compared with inversions based on the standard ES-MDA procedure for reference. Based on these findings, we discuss in section 4 how our results compare with standard MCMC sampling, the choice of parameters in our algorithm needed to provide an optimal balance between computational efficiency and accuracy, as well as how the inversion results progress as a function of ES-MDA iteration. Finally, in section 5, we conclude with some general comments on the methodology and provide directions for future research in this domain.

# 2. METHODOLOGY

#### 2.1. Ensemble Methods

In a generic formulation of the ensemble Kalman filter (EnKF), the model state vector **y** n at data assimilation time step n is updated after the state forecast step. The update from forecast f to analysis a is carried out using the following equation (Emerick and Reynolds, 2012a):

$$\mathbf{y}\_{j}^{n,a} = \mathbf{y}\_{j}^{n,f} + \mathbf{K}^{n} (\mathbf{d}\_{pert}^{n} - H(\mathbf{y}\_{j}^{n,f})),\tag{1}$$

with Kalman matrix

$$\mathbf{K}^n = \mathbf{C}\_{\mathrm{YD}}^{n,f} (\mathbf{C}\_{\mathrm{DD}}^{n,f} + \mathbf{C}\_{\mathrm{D}}^n)^{-1}. \tag{2}$$

Here, j = 1, 2, ..., ne, where n<sup>e</sup> is the number of ensemble members; **C** n,f YD is the calculated cross-covariance matrix between the forecast state vector **y** n,f j and the predicted data **d** n <sup>j</sup> = H(**y** n,f j ) obtained through the observation operator H(·); **C** n,f DD is the calculated auto-covariance matrix of the predicted data; **C** n D is the covariance matrix of the observed-data measurement errors; and **d** n pert is the vector of perturbed observations. The latter is obtained using **d** n pert ∼ <sup>N</sup> (**d** n obs,**C** n D ), where **d** n obs denotes the observed data.

The ensemble smoother (ES) is a variation of the EnKF update formula presented in Equations (1) and (2) that is specifically formulated for parameter estimation problems. The general forward problem

$$\mathbf{d}\_{obs} = F(\mathbf{m}\_{true}) + \epsilon\_d \tag{3}$$

links a set of observed data **d**obs to a set of "true" model parameters **m**true through the forward operator F(·) with measurement errors ǫ<sup>d</sup> ∼ N (0,**C**D). The corresponding ES update equation is given by (Emerick and Reynolds, 2012a)

$$\mathbf{m}\_{j}^{a} = \mathbf{m}\_{j}^{f} + \mathbf{K}(\mathbf{d}\_{pert} - F(\mathbf{m}\_{j}^{f})),\tag{4}$$

with

$$\mathbf{K} = \mathbf{C}\_{\text{MD}}^{\ell} \left( \mathbf{C}\_{\text{DD}}^{\ell} + \mathbf{C}\_{\text{D}} \right)^{-1}. \tag{5}$$

Here, **m** f j and **m**<sup>a</sup> j denote the forecast and analyzed modelparameter vectors, respectively, which correspond to an update from prior to posterior; **C** f MD is the cross-covariance matrix between **m** f j and the predicted data **d**<sup>j</sup> = F(**m** f j ); **C** f DD is the autocovariance matrix of the predicted data; and **d**pert ∼ N (**d**obs,**C**D) is again a vector of perturbed observations. The idea with equations (4) and (5) is that, after defining an initial parameter ensemble by drawing from the Bayesian prior distribution, the ensemble members are updated to represent samples from the posterior distribution in a single analysis step that incorporates all of the available data.

#### 2.2. The ES-MDA Algorithm

ES offers an efficient tool to solve parameter-estimation problems under the assumptions that the prior parameter distribution is Gaussian and the forward operator F(·) is linear. If these conditions are not satisfied, then ES can lead to unacceptable data matches and unphysical results (Aanonsen et al., 2009). To deal with this issue, we focus in this paper on a recent development by Emerick and Reynolds (2012a), namely the ensemble smoother with multiple data assimilation (ES-MDA). With this approach, one standard ES step, which is comparable to a single Gauss-Newton iteration when maximizing the posterior probability of the model parameters (Tarantola, 2005), is replaced by a number of smaller update steps (or assimilation iterations) based on a Kalman matrix and perturbed data vector that are recalculated at each iteration. In order to correctly sample from the posterior distribution, the measurement-error covariance matrix **C**<sup>D</sup> must be inflated in this procedure. Typically this is done by scaling **C**<sup>D</sup> by the number of assimilation iterations; however, more generalized inflation coefficients may be used (Emerick and Reynolds, 2012a). For linear forward solvers, the ES-MDA algorithm is theoretically equivalent to standard ES (Emerick and Reynolds, 2012b). For non-linear problems, it can be shown that the methodology has links to annealed importance sampling (Stordal and Elsheikh, 2015).

Algorithm 1 outlines the steps involved in the ES-MDA procedure where, for simplicity, the measurement-error covariance matrix is assumed diagonal with entries σ 2 and the corresponding inflation coefficient α is set to equal the number of assimilation iterations niter. To estimate the inverse of matrix (**C**DD + α · **C**D) we use the truncated singular value decomposition (TSVD) and retain 99% of the total energy of the singular values (Emerick and Reynolds, 2012a).


#### 2.3. Model Error

When working with a perfectly known forward solver F(·) in the ES-MDA procedure outlined above, the residual **r**<sup>j</sup> corresponding to the jth ensemble member **m**<sup>j</sup> , which quantifies the misfit between the perturbed observations and the predicted (forward-calculated) data, is given by

$$\begin{split} \mathbf{r}\_{j} &= \mathbf{d}\_{\text{pert}} - F(\mathbf{m}\_{j}) \\ &= \underbrace{F(\mathbf{m}\_{\text{true}}) - F(\mathbf{m}\_{j})}\_{\text{parameter-error}} + \tilde{\epsilon}\_{d}, \end{split} \tag{6}$$

where ǫ˜<sup>d</sup> denotes the sum of the measurement errors and perturbation noise. In the case where **m**<sup>j</sup> = **m**true, we see from Equation (6) that the parameter-error term, which represents the component of the residual related to being at the wrong set of model-parameter values, will be zero and that the residual energy will tend to be minimized. In the case where a proxy forward solver Fˆ(·) is used in the ES-MDA algorithm, however, the latter does not generally hold true because

$$\begin{split} \mathbf{r}\_{j} &= \mathbf{d}\_{\text{pert}} - \hat{F}(\mathbf{m}\_{j}) \\ &= F(\mathbf{m}\_{\text{true}}) - \hat{F}(\mathbf{m}\_{j}) + \tilde{\epsilon}\_{d} \\ &= \underbrace{F(\mathbf{m}\_{\text{true}}) - F(\mathbf{m}\_{j})}\_{\text{component}} + \underbrace{F(\mathbf{m}\_{j}) - \hat{F}(\mathbf{m}\_{j})}\_{\underset{\text{component}}} + \tilde{\epsilon}\_{d}. \end{split} \tag{7}$$

Indeed, the presence of an additional model-error component in Equation (7) compared to Equation (6) means that the residual energy may be minimized for model parameter vectors **m**<sup>j</sup> that are substantially different from **m**true, as such parameter sets well tend to compensate for the model errors. As mentioned previously, this can lead to strongly biased and overconfident posterior statistics.

In order to deal with model error in the ES-MDA procedure arising from use of a proxy solver, we build on the methodology presented in Köpke et al. (2018) for Bayesian-MCMC inversion, which focuses on identifying the model-error component of the residual using a projection-based method. We refer the reader to that paper for details beyond those given here. Algorithm 2 outlines the steps involved in our modified ES-MDA methodology, again assuming that α = niter and **C**<sup>D</sup> = σ 2 **I** for simplicity, where **I** is the identity matrix. In addition we introduce nd, defined as the number of detailed solver runs used to learn about the model error, and set it to a value to less than or equal to the number of ensemble members ne.

In the modified ES-MDA algorithm, initial ensemble members **m**<sup>j</sup> are drawn from the prior parameter distribution and the corresponding predicted data **d**ˆ <sup>j</sup> = Fˆ(**m**j) are computed using the proxy solver. In each assimilation iteration, a subset of the ensemble members having size n<sup>d</sup> is randomly chosen, for which the detailed forward responses **d**<sup>j</sup> = F(**m**j) are also calculated. The resulting n<sup>d</sup> model-error vectors (i.e., **d**<sup>j</sup> −**d**ˆ <sup>j</sup>) and corresponding parameter sets **m**<sup>j</sup> are stored in the dictionaries **D**<sup>E</sup> and **D**M, respectively. As **D**<sup>M</sup> and **D**<sup>E</sup> are further enriched with n<sup>d</sup> entries in each ES-MDA iteration, more detailed information about the model error around the posterior solution is gathered.

For each ensemble member **m**<sup>j</sup> , the model-error component of the residual is identified and used to correct the proxy response in order to mimic the detailed forward solver. To this end, the current model-parameter dictionary **D**<sup>M</sup> is searched for the K-nearest-neighbor (KNN) parameter sets to **m**<sup>j</sup> using a standard Euclidean distance measure (e.g., Hastie et al., 2009). An orthonormal basis **B**<sup>j</sup> for the model error at **m**<sup>j</sup> is then constructed from these parameter sets using the Gram-Schmidt technique (e.g., Strang et al., 1993). We assume in our work that the data-measurement-error and parameter-error components of the residual are orthogonal to the model-error component, and therefore cannot be well represented by the basis. An estimate of the model error **e**˜<sup>j</sup> is thus obtained by projecting the residual onto **B**<sup>j</sup>

$$
\tilde{\mathbf{e}}\_{j} = \mathbf{B}\_{j} \cdot \mathbf{B}\_{j}^{T} \cdot \mathbf{r}\_{j}. \tag{8}
$$

This result, which represents the details missing in the proxy solution, is then added to the proxy response to obtain a corrected forward response

$$
\tilde{\mathbf{d}}\_{\dot{\jmath}} = \hat{\mathbf{d}}\_{\dot{\jmath}} + \tilde{\mathbf{e}}\_{\dot{\jmath}} \tag{9}
$$

with corresponding corrected residual

$$
\tilde{\mathbf{r}}\_{\circ} = \mathbf{d}\_{\text{pert}} - \tilde{\mathbf{d}}\_{\circ}.\tag{10}
$$

The corrected forward responses for all of the ensemble members are used to compute the corrected Kalman matrix

$$\tilde{\mathbf{K}} = \mathbf{C}\_{\text{MD}} \cdot (\mathbf{C}\_{\text{D\tilde{D}}} + \boldsymbol{\alpha} \cdot \mathbf{C}\_{\text{D}})^{-1} \tag{11}$$

which is used with the corrected residuals to update the ensemble.

Under the stated assumptions and with appropriate choices of ne, nd, and K, Algorithm 2 allows us to effectively reduce the computational cost of ES-MDA when considering large ensembles through the use of a proxy solver. The dimensionality of the parameter-estimation problem and the difference in computational cost between the proxy and detailed forward solutions determine how much computational benefit is derived from this methodology. We refer the reader to Köpke et al. (2018) for a detailed discussion of the orthogonality assumption between the model-error and other components of the residual.

#### 3. APPLICATION TO CROSSHOLE GPR TOMOGRAPHY

#### 3.1. Experimental Design and Forward Models

As an example, we now apply our modified ES-MDA algorithm with model-error correction to the crosshole GPR traveltime tomography inverse problem. A transmitter and receiver antenna, located in two adjacent boreholes, are used to obtain the travel times of radar energy between the holes for different antenna positions. These times are linked to the spatial distribution of subsurface radar-wave velocity, the estimation of which is the goal of the inverse problem. Crosshole GPR traveltime tomography represents an excellent test problem for our purposes because (i) it has been extremely well-studied, most notably from a stochastic inverse standpoint (e.g., Giroux et al.,



2007; Looms et al., 2008; Scholer et al., 2012; Hansen et al., 2013; Linde and Vrugt, 2013); (ii) it involves a high-dimensional and spatially distributed set of model parameters that must be estimated; and (iii) the forward problem can be solved in a variety of different ways using different physical approximations.

GPR travel times are linked to the spatial distribution of electrical properties between the two boreholes, predominantly the dielectric permittivity, through Maxwell's equations. Numerical solution of these equations represents the most accurate means of calculating the travel times, but at the same time it is highly computationally expensive. To reduce the computational cost, the physics of the electromagnetic wave propagation can be approximated using ray theory, whereby the effects of frequency are ignored and we solve the eikonal equation (e.g., Nowack, 1992). To decrease the computational cost even further, the straight-ray approximation may also be considered, which means that the ray paths that connect transmitter and receiver locations are assumed to be straight lines (e.g., Cordua et al., 2008). The latter approximation is typically applied in cases where contrasts in velocity do not exceed 10%; however it is only truly valid when the subsurface is homogeneous. Here, we consider the eikonal equation to be our detailed forward solver F(·) and the straight-ray approach to be our proxy solver Fˆ(·). This choice was made for demonstration purposes, as it allows us to compare the results of ES-MDA inversions obtained using our approach to those obtained using standard ES-MDA, as well as MCMC, based on the detailed solver alone. That is, the eikonal solution is fast enough to allow it to be used in the standard ES-MDA algorithm with a large number of ensemble members, as well as for MCMC posterior sampling. Note that, instead of estimating directly velocity from GPR travel times in our analysis, we focus on the estimation of subsurface slowness (the reciprocal of velocity) which makes the straight-ray forward problem linear.

The survey configuration for our synthetic experiments consists of two boreholes that are 8-m deep and 4-m apart (**Figure 1**). Transmitter and receiver antenna positions are distributed equally in depth every 0.2 m down the left and right boreholes, respectively. Sending a radar pulse from all transmitter positions to all receiver positions yields 1,600 travel-time data. We consider a pixel-based parameterization of the subsurface whereby the region between the boreholes by discretized into 20 × 40 square cells of constant-slowness and side length 0.2 m. The synthetic "true" subsurface and initial prior ensemble members are generated by sequential Gaussian simulation using the GSLIB software package (Deutsch and Journel, 1992). The mean slowness is set to 10 ns/m and an exponential autocovariance kernel having a standard deviation of 1.7 ns/m is assumed, with horizontal and vertical correlation lengths of 6 m and 1.5 m, respectively. The corresponding synthetic observed data are generated by solving the eikonal equation and adding measurement errors, the latter of which are simulated as Gaussian random noise having covariance matrix **C**<sup>D</sup> = σ 2 **I** with standard deviation σ = 0.2 ns.

#### 3.2. ES-MDA Results

Our goals in this analysis are to (i) study the effects of model error on ES-MDA inversions; (ii) investigate the influence of the ensemble size on the accuracy of the results obtained; and (iii) explore how the parameters of our modified ES-MDA procedure with model-error correction can be chosen to provide an optimal balance between computational efficiency and accuracy. To this end, we compare parameter-estimation results for different numbers of ensemble members when (i) there is no model error, meaning that the detailed (eikonal-equation) forward solver is used within the standard ES-MDA procedure (Algorithm 1); (ii) model error is present but not accounted for, meaning that the proxy (straight-ray) forward solver is used within the standard ES-MDA procedure; and (iii) model error is present and accounted for through the use of Algorithm 2. In each case, we examine the combined results from 10 ES-MDA inversions obtained using different initial ensembles and niter =

8 assimilation iterations. Ensemble sizes of n<sup>e</sup> = 20, 40, 80, 160, 320, and 640 are considered in our analysis.

To assess the quality of the inversion results, we consider two metrics. The average root-mean-square (RMS) travel-time misfit, which quantifies globally the ability of the posterior ensemble to represent the observed data, is defined for ES-MDA run i (i = 1, 2, ..., 10) as follows:

$$M\_i^T = \frac{1}{n\_e} \sum\_{j=1}^{n\_e} \frac{1}{\sqrt{n\_T}} \left\| \mathbf{d}\_{obs} - \mathbf{d}\_{i\circ} \right\|\_2. \tag{12}$$

where n<sup>T</sup> is the number of travel-time data. For the case where model error is absent and data errors are zero-mean and Gaussian distributed with covariance matrix **C**<sup>D</sup> = σ 2 **I**, the expected value of M<sup>T</sup> <sup>i</sup> will be σ. Note that, in the case where model error is present but not accounted for, the detailed forward solver **d**i,<sup>j</sup> in equation (12) is replaced with the proxy solver **d**ˆ i,j .

We also consider in our analysis the average RMS slowness misfit, defined by

$$M\_i^S = \frac{1}{n\_\varepsilon} \sum\_{j=1}^{n\_\varepsilon} \frac{1}{\sqrt{n\_T}} \left\| \mathbf{m}\_{true} - \mathbf{m}\_{i,j} \right\|\_2,\tag{13}$$

where n<sup>S</sup> is the number of slowness cells. This metric quantifies how well the posterior ensemble captures the true underlying model parameters, and can only be employed in the case of synthetic data where the true subsurface slowness distribution is known. In addition to the two metrics in Equations (12) and (13), we plot the mean slowness fields over all ensemble members and all ES-MDA runs in order to visually compare them with the true slowness distribution.

**Figure 2** summarizes the parameter-estimation results obtained for the case where there is no model error. In **Figure 2A** we observe that the average RMS travel-time misfit decreases consistently with larger numbers of ensemble members toward the expected value of σ = 0.2 ns which reflects the prescribed data errors. After around 320 ensemble members, adding more members is seen to only slightly further improve the results. **Figure 2B** shows that the slowness misfit also decreases consistently as a function of ensemble size. This is supported by **Figure 2C**, which shows that the mean slowness fields become increasingly detailed and similar to the true subsurface distribution as the number of ensemble members increases. The increasing overall accuracy of the ES-MDA results with larger ensemble size is based on the reduction of sampling errors following the central limit theorem (Evensen, 2007). We can conclude that larger ensembles combined with the detailed forward solver enable us to obtain more reliable posterior parameter estimates, but at the cost of significantly greater computational effort when the detailed solver is computationally expensive.

**Figure 3** summarizes the parameter-estimation results obtained for the case where model error is present but not accounted for. In **Figure 3A** we observe that, in accordance with **Figure 2A**, the travel-time misfit consistently decreases with larger ensemble size. However, it approaches a stable value that is well above the target value of σ = 0.2 ns, because the presence of model error does not allow data fitting to a level that is in accordance with the prescribed data errors. In addition, unlike in **Figure 2B**, the slowness misfit now decreases only until n<sup>e</sup> = 40, after which it increases again (**Figure 3B**). For n<sup>e</sup> ≤ 40, we do not have enough ensemble members to resolve the details of the posterior distribution and therefore only the posterior mean can be represented in the parameter estimation results (Chen and Zhang, 2006). Conversely, when n<sup>e</sup> > 40, the solution moves toward a biased posterior distribution. That is, with more ensemble members the parameters have the ability to compensate for the model error and the data become over-fitted, meaning that a better data match is achieved but the parameters do not represent the true subsurface model. The mean slowness fields in **Figure 3C** allow us to see how the model error introduces bias in the parameter-estimation results as n<sup>e</sup> increases; strong artifacts are clearly observed in these fields for ensemble sizes larger than 160.

Finally, we examine the inversion results obtained for the case where model error is present and accounted for through our modified ES-MDA approach. We first consider inversions where n<sup>d</sup> = 20 detailed solver runs per iteration are used to build the model-error dictionary and K = 20 KNN from this dictionary are used to construct the model-error basis for each ensemble member. The corresponding results are shown in **Figure 4**. Note that, in the case with n<sup>e</sup> = 20, the detailed solver is executed for each ensemble member, which corresponds to the standard ES-MDA procedure in Algorithm 1. In **Figures 4A,B** we see that both the travel-time and slowness misfit decrease

assuming that the residuals follow the prescribed Gaussian distribution for the data errors.

from n<sup>e</sup> = 20 until they reach a minimum at around 80–160 ensemble members. This demonstrates that the consideration of larger ensembles through use of a proxy solver combined with our model-error correction can lead to more accurate results compared to standard ES-MDA based on small ensembles and the detailed forward solver. These results are confirmed in **Figure 4C**, where we observe that the bias is largely removed from the mean slowness fields for n<sup>e</sup> ≤ 160 in comparison to

**Figure 4C**. However, for ensemble sizes larger than around 160, the travel-time and slowness misfit are seen to again increase, meaning that the data become under-fitted. That is, the ensemble size becomes too large compared to the number of detailed solver calculations for the model error to be well represented in the dictionary, meaning that projection onto the model-error basis will not properly identify the model-error component of the residual. This has the effect of introducing bias into the inversion

expected value of the travel-time misfit assuming that the residuals follow the prescribed Gaussian distribution for the data errors.

results, which is clearly seen in the mean slowness fields in **Figure 4C** when n<sup>e</sup> > 160.

To explore the latter findings, we consider again Algorithm 2, but this time using n<sup>d</sup> = 40 detailed solver runs per iteration and K = 40 KNN to build the model-error dictionary and construct the model-error basis, respectively. Here, when n<sup>e</sup> = 40, the detailed solver is executed for each ensemble member, which again corresponds to the standard ES-MDA procedure. Similar

the number of ensemble members, ne, are (A) box plots of the average RMS travel-time misfit [ns]; (B) box plots of the average RMS slowness misfit [ns/m]; and (C) the mean posterior slowness fields [ns/m]. Added to the box plots are the mean (circles), and minimum and maximum values (crosses). The dashed horizontal line in (A) represents the expected value of the travel-time misfit assuming that the residuals follow the prescribed Gaussian distribution for the data errors.

to before, we observe in **Figures 5A,B** that the travel-time and slowness misfit decrease from n<sup>e</sup> = 40 until a minimum is reached. Although it is difficult to determine the exact position of this minimum due to the limited discretization, we see that it falls somewhere around 160 ensemble members. After the minimum value, the travel-time and slowness misfit are again seen to increase and the data become under-fitted. This behavior is well reflected in the mean slowness fields in **Figure 5C**, which show good agreement with the true field for n<sup>e</sup> ≤ 320, but clearly contain model-error-related artifacts when n<sup>e</sup> = 640.

## 4. DISCUSSION

We saw above that use of the modified ES-MDA approach described in Algorithm 2 can allow for a significant reduction in posterior bias when employing a proxy forward solver compared to the standard ES-MDA procedure. This offers the possibility of considering large ensemble sizes within ES-MDA, which can be computationally prohibitive in the context of an expensive detailed forward solver. One issue requiring further discussion, however, is the balance between (i) the number of ensemble members considered ne, which in the case of no model error controls the accuracy of the results obtained; and (ii) the number of detailed solver runs nd, which determines the success of the model-error correction. We saw in **Figure 4** that, when n<sup>d</sup> = 20 detailed solver runs per iteration were considered in the modified ES-MDA procedure, use of ensemble sizes between 40 and 160 allowed for an improvement in parameter estimates compared to standard ES-MDA based on the detailed solver with n<sup>e</sup> = 20. When n<sup>d</sup> = 40 detailed solver runs per iteration were considered, on the other hand, a corresponding improvement was found for ensemble sizes between 80 and 320. These results suggest that, at least for the application presented in this paper, Algorithm 2 can be successfully applied only for ensembles having size less than 8 times the number of detailed solver runs per iteration. Past this number, there will not be enough entries in the model-error dictionary to allow for an accurate correction of the model error for all ensemble members, and the benefits of using an approximate solver with model-error correction will be compromised. Further exploration of these findings in the context of other inverse problems is required.

Another issue in need of some discussion is how the results of using the standard and modified ES-MDA algorithms presented in **Figures 2**–**5** compare with samples from the "true" posterior distribution, the latter of which we assume to be available through MCMC sampling based on the detailed forward solver. To this end, we show in **Figure 6** five randomly chosen posterior slowness realizations obtained via MCMC sampling based on the eikonal equation (**Figure 6A**); standard ES-MDA based on both the eikonal equation and straight-ray approximation (**Figures 6B,C**); and our modified ES-MDA procedure with n<sup>d</sup> = 20 and n<sup>d</sup> = 40 (**Figures 6D,E**). The point-wise posterior mean and standard deviation, computed over all available samples, are also shown for reference. The results in **Figure 6A** were obtained using the sequential geostatistical simulation technique (e.g., Ruggeri et al., 2015), where after burn-in, the results of 140,000 MCMC iterations were thinned to provide 140 posterior samples. For **Figures 6B–E**, the number of ensemble members considered was chosen to be the maximum investigated value (n<sup>e</sup> = 640) for standard ES-MDA, whereas for our modified ES-MDA procedure it was set equal to 8 times the number of detailed solver runs, as discussed above.

In comparing the posterior realizations in **Figures 6A,B**, we see that they are highly similar, which suggests that ES-MDA based on the detailed forward solver and using a large number of ensemble members allows for adequate sampling of the Bayesian posterior distribution. The corresponding standard deviation images generally show a pattern that reflects the degree of ray coverage; regions of higher slowness contain a smaller ray density. However, the ES-MDA solution is seen to contain more variability, which may arise because the 140,000 MCMC iterations utilized were not enough to adequately explore the posterior space. In examining the stochastic realizations in **Figure 6C**, the proxy-related bias in this solution is clearly apparent. Here, the symmetric pattern of variability reflects variations in ray density that are controlled solely by the antenna locations in the straight-ray case. Finally, in comparing the results in **Figures 6D,E** with those in **Figure 6A**, we see that our modified ES-MDA algorithm largely removes the proxy-related bias and allows for the generation of posterior samples that are close in appearance to the MCMC solution, which again validates our approach. These samples do, however, show a slightly higher degree of variability with less correlation compared to **Figures 6A,B**, with the n<sup>d</sup> = 40 solution providing a better match than the n<sup>d</sup> = 20 solution. As discussed above, with a fixed number of detailed solver runs per iteration, there is an upper limit to the number of ensemble members that can be effectively considered in our procedure, which in turn may not be enough to characterize exactly the posterior distribution (see **Figure 2**). More accurate results would thus likely require greater numbers of detailed solver runs to allow for an increase in the ensemble size. The higher degree of variability in these results may also reflect imperfect removal of the model error, or the lesser number of samples used to compute the point-by-point mean and standard deviation.

Lastly, we wish to elaborate on the number of internal ES-MDA assimilation iterations considered in our approach, which was held constant at a value of niter = 8 for all of the results presented section 3.2. To this end, we study in **Figure 7** the travel-time and slowness misfit as a function of iteration for an ensemble size of n<sup>e</sup> = 160 when (i) no model error is present; (ii) model error is present but not accounted for; and (iii) model error is present and accounted for using n<sup>d</sup> = 40 detailed solver runs per iteration and K = 40 KNN. We observe overall that the first assimilation iteration has the largest influence on reducing the travel-time and slowness misfit from prior to posterior. This arises because of the linearity of the proxy (straight-ray) solver and the weak non-linearity of the detailed (eikonal) solver in our travel-time tomography application. Indeed, Emerick and Reynolds (2012b) proved that, for linear problems, one ES update using the measurement noise covariance matrix is equivalent to multiple ES-MDA updates using the inflated covariance matrix. When using the detailed solver in the inversion where there is no model error, for example, **Figures 7A,D** show a large decrease in travel-time and slowness misfit after one iteration and a slow decrease from iterations 2–8. In this case 4 iterations would be enough to obtain similar parameter-estimation results to those obtained using 8 iterations. When model error is present but not accounted

for, we see in **Figure 7B** that a good travel-time data match is achieved and no further improvement is observed after one iteration. However, **Figure 7E** shows that the corresponding slowness misfit is still large after one iteration compared to **Figure 7D**, which arises because of overfitting; that is, the inversion attempts to fit the model error. Applying our proposed method, we see in **Figure 7C** that the travel-time misfit is again primarily reduced in the first iteration and behaves similarly to the case where no model error is present (**Figure 7A**). More importantly, over-fitting is significantly reduced (**Figure 7F**) and the slowness misfit after only 4 iterations is similar to that seen in **Figure 7B**. This again confirms that employing our proposed approach can effectively remove proxy-related bias and allow the ES-MDA procedure to yield results that are comparable to inversions when no model error is present. Although it may be possible to arrive at these results in less iterations than the 8 considered in this paper, it is difficult to know in advance how the combined approach of proxy solver and model-error correction behaves in terms of the internal ES-MDA iterations.

#### 5. CONCLUSIONS

We have presented in this paper an approach that builds on the work of Köpke et al. (2018) in order to remove the bias associated with the use of proxy forward solvers in ES-MDA inversions. This allows for the consideration of larger ensemble sizes, which help to improve the accuracy of the parameter estimates obtained. Instead of constructing a local or global error

model, our approach importantly aims to identify the modelerror component of the residual during the ES-MDA procedure, which is used to correct the proxy forward response. This is accomplished through construction of an orthonormal modelerror basis for each ensemble member and at each iteration based on a prescribed number of KNN entries selected from a modelerror dictionary. The latter is created as the inversion proceeds, and thus no prior information about the model error is required before running the procedure.

With regard to the considered example problem of estimating the spatial distribution of subsurface slowness from crosshole GPR travel times, we saw that our modified ES-MDA approach allows us to obtain accurate posterior estimates characteristic of large ensembles with a computational cost comparable to a small number of runs of the detailed forward solver. The results did show, however, that the success of the approach depends on the ratio between the number of ensemble members and the number of detailed solver runs per iteration used to learn about the model error. In particular, for the crosshole GPR tomographic example considered, this ratio should not exceed a value of approximately 8.

Despite the successful application of our model-error approach, there remain a number of topics that should be investigated further. For example, in the work presented here, we set the number of KNN equal to the number of detailed solver runs per ES-MDA iteration used to learn about the model error. In this case, the same model-error basis is constructed in the first iteration for each ensemble member. In subsequent iterations and with a growing dictionary, for each ensemble member the KNN are used to extract local information about the model error from the model-error dictionary and a local model-error basis is calculated, respectively. However, this choice could be validated and optimized to improve the identification of the model-error component of the residual. Further, we assume with our method that the latter is approximately orthogonal to the data-measurement and parameter-error components, which allows for its identification using a projection approach. Although we have found this assumption to yield acceptable results for a range of test problems examined so far, it requires further investigation.

## AUTHOR CONTRIBUTIONS

CK: developing ideas, setting up the codes and writing the manuscript; AE: discussing ideas and revising the manuscript; JI: discussing ideas, revising the manuscript, and obtaining funding for the research.

# FUNDING

This work was supported by a research grant to JI from the Swiss National Science Foundation (number 200021\_140864).

# REFERENCES


prior information: part 2—application to crosshole GPR tomography. Comput. Geosci. 52, 481–492. doi: 10.1016/j.cageo.2012.10.001


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Köpke, Elsheikh and Irving. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

**175**