Comparative analysis of the structures and outcomes of geophysical flow models and modeling assumptions using uncertainty quantification

We present a new statistically driven method for analyzing the modeling of geophysical flows. Many models have been advocated by different modelers for such flows incorporating different modeling assumptions. Limited and sparse observational data on the modeled phenomena usually does not permit a clean discrimination among models for fitness of purpose, and, heuristic choices are usually made, especially for critical predictions of behavior that has not been experienced. We advocate here a methodology for characterizing models and the modeling assumptions they represent, using a statistical approach over the full range of applicability of the models. Such a characterization may then be used to decide the appropriateness of a model and modeling assumption for use. We present our method by comparing three different models arising from different rheology assumptions, and the data show unambiguously the performance of the models across a wide range of possible flow regimes. This comparison is facilitated by the recent development of the new release of our TITAN2D mass flow code that allows choice of multiple rheologies The quantitative and probabilistic analysis of contributions from different modeling assumptions in the models is particularly illustrative of the impact of the assumptions. Knowledge of which assumptions dominate, and, by how much, is illustrated in two different case studies: a small scale inclined plane with a flat runway, and the large scale topography on the SW slope of Volc\'{a}n de Colima (MX). A simple model performance evaluation completes the presentation.


Models and assumptions
This paper presents a systematic approach to the study of models of complex systems, motivated by and applied to the case of geophysical mass flow models.(Gilbert, 1991) defines science as a process of constructing predictive conceptual models where these models represent consistent predictive relations in target systems.A simpler though not necessarily comprehensive definition of a model that is more appropriate for the context of this study is that: A model is a representation of a postulated relationship among inputs and outputs of a system, usually informed by observation and based on a hypothesis that best explains the relationship.
The definition captures two of the most important characteristics • models depend on a hypothesis, and, • models use the data from observation to validate and refine the hypothesis.
Errors and uncertainty in the data and limitations in the hypothesis (usually a tractable and computable mathematical construct articulating beliefs like proportionality, linearity etc.) are immediate challenges that must be overcome, to construct useful and credible models.
A model is most useful in predicting the behavior of a system for unobserved inputs, and, interpreting or explaining of the system's behavior.Since, models require a hypotheses, it follows that the model is a formulation of a belief about the data.The immediate consequence of this is that the model may not be reliable about such prediction, since, the subjectivity of the belief can never be completely eliminated (Kennedy and O'Hagan, 2011;Higdon et al., 2004) even when sufficient care is taken to use all the available data and information.Secondly, the data at hand may not provide enough information about the system to characterize its behavior at the desired prediction.This data inadequacy is rarely characterized even for verified and validated models.What makes this problem even more acute is that we are often interested in modeling outcomes that are not observed and often not observable.
The consequence of this lack of knowledge and limited data is the multiplicity of beliefs about the complex system being modeled and a profusion of models based on different modeling assumptions and data use.These competing models lead to much debate among scientists.Principles like "Occam's razor" and Bayesian statistics (Farrell et al., 2015) provide some guidance but simple robust approaches that allow the testing of models for fitness need to be developed.We present in this paper a simple data driven approach to discriminate among models and the modeling assumptions implicit in each model, given a range of phenomena to be studied.We illustrate the approach by work on geophysical mass flows.
An assumption is a simple concept -any atomic postulate about relationships among quantities under study.Models are compositions of many such assumptions.The study of models is, thus, implicitly a study of these assumptions and their composability and applicability in a particular context.Sometimes a good model contains a useless assumption that may be removed, sometimes a good assumption could be added to a different model -these are usually subjective choices, not data driven.Moreover, the correct assumptions may change through time, making model choice more difficult.
The 4 th release of TITAN2D1 offers multiple rheology options in the same code base.The availability of three distinct models for similar phenomena in the same tool provides us the ability to directly compare outputs and internal variables in all the three models and control for usually difficult to quantify effects like numerical solution procedures, input ranges and computer hardware.Given a particular problem for which predictive analysis is planned, the information generated here may be used to guide model choice and/or processes for integrating information from multiple models.
2 Analysis of modeling assumptions and models

Analysis Process
Let us define M (A), P M (A) , where A is a set of assumptions, M (A) is the model which combines those assumptions, and PM is a probability distribution in the parameter space of M .While the support of PM can be restricted to a single value by solving an inverse problem for the optimal reconstruction of a particular flow, this is not desirable if we are interested in the general predictive capabilities of the model, where we are interested in the outcomes over a whole range.
Stage 1: Parameter Ranges In this study, we assume: where NM is the number of parameters of M .This is not restrictive, and in case of correlation between the commonly used parameters (pj) j=1, NM , or non-uniform distributions, we can always define a function g such that g[(pi)] = (pj), and the (pi) are independent and uniformly distributed.In particular, we choose these parameter ranges using information gathered from the literature about the physical meaning of those values, together with a preliminary testing for physical consistency of model outcomes and range of inputs/outcomes of interest.This step is critical, because if the total friction of the models does not cover a similar span, then the statistical comparison is dominated by trivial macroscopic differences, and cannot focus on the rheology details.In the preparation of hazard analysis, expert elicitation processes can be used to ensure that the studies correctly account for all anticipated and possible flow regimes.
Stage 2: Simulations and Data Gathering The model inputs of M , the contribution variables include quantities in the model evaluation that are ascribable to specific assumptions Ai.These are usually not observed as outputs from the model.For example in momentum balances of complex flow calculations these could be values of different source terms, dissipation terms and inertia terms.Finally, the model outputs include explicit outcomes e.g., for flow calculations these could be flow height, lateral extent, area, velocity, acceleration, and derived quantities such as Froude number F r.In general, we use a Monte Carlo simulation, sampling the model inputs and obtaining a family of graphs plotting the expectation of the contribution variables and model outputs.We also include their 5 th and 95 th percentiles.Our sampling technique of the input variables is based on the Latin Hypercube Sampling (LHS) idea, and in particular, on the improved space-filling properties of the orthogonal array-based Latin Hypercubes (see Appendix A).The volume of data generated is likely to be large but modern computing and data handling equipment readily available to most modeling researchers2 in university and national research facilities are more than adequate.
Stage 3: Results Analysis These and other statistics can now be compared to determine the need for different modeling assumptions and the relative merits of different models.Thus, analysis of the data gathered over the entire range of flows for the state variables and outcomes leads to a quantitative basis for accepting or rejecting particular assumptions or models for specific outcomes.

Monte Carlo Process and Statistical Analysis
In our study, the flow range is defined by establishing boundaries for inputs like flow volume or rheology coefficients characterizing the models.Latin Hypercube Sampling is performed over [0, 1] d where d depends on the number of input parameters (see Appendix A).Those dimensionless samples are linearly mapped to fill the required intervals.Figure 10a,b,c displays examples of LHS design in the three models that are targets of this study, with respect to their commonly used parameters.
Following the simulations, we generate data for each sample run and each outcome and contribution variable f (x, t) calculated as a function of time on the elements of the computational grid.This analysis generates tremendous volume of data which must then be analyzed using statistical methods for summative impact.The contribution variables in this case are the mass and force terms in the conservation laws defined above.
We devise many statistical measures for analyzing the data.For instance, let (Fi(x, t))i=1,...,4 be an array of force terms, where x ∈ R d is a spatial location, and t ∈ T is a time instant.The degree of contribution of those force terms to the flow dynamics can be significantly variable in space and time, and we define the dominance factors [pj(x, t)] j=1,...,k , i.e., the probability of each Fj(x, t) to be the dominant force.Those probabilities provide insight into the dominance of a particular source or dissipation term on the model dynamics.Each term is identified with a particular modeling assumption.We remark that we focus on the modulus of the forces and hence we cope with scalar terms.It is also important to remark that all the forces depend on the input variables, and they can be thus considered as random variables.Furthermore, these definitions are general and could be applied to any set of contributing variables, and not only to the force terms.
Definition 1 (Dominance factors) Let (Fi)i∈I be random variables on (Ω, F, PM ).Then, ∀i, the dominant variable is defined as: In particular, for each j ∈ I, the dominance factors are defined as: Moreover, we define the random contributions, an additional tool that we use to compare the different force terms, following a less restrictive approach than the dominance factors.They are obtained dividing the force terms by the dominant force Φ, and hence belong to [0, 1].
Definition 2 (Expected contributions) Let (Fi)i∈I be random variables on (Ω, F, PM ).Then, ∀i, the random contribution is defined as: where Φ is the dominant variable.Thus, ∀i, the expected contributions are defined by E [Ci].
In particular, for a particular location x, time t, and parameter sample ω, we have Ci(x, t, ω) = 0 if there is no flow or all the forces are null.The expectation of Ci is reduced by the chance of Fi being small compared to the other terms, or by the chance of having no flow in (x, t).The meaning of the random variable Φ(x, t) is explained by the dominance factors, and hence they can be used to define a further statistical decomposition of the random contributions, as detailed in the Appendix B.

Modeling of geophysical mass flows
Dense large scale granular avalanches are a complex class of flows with physics that has often been poorly captured by models that are computationally tractable.Sparsity of actual flow data (usually only a posteriori deposit information is available), and large uncertainty in the mechanisms of initiation and flow propagation, make the modeling task challenging, and a subject of much continuing interest.Models that appear to represent the physics well in certain flows, may turn out to be poorly behaved in others, due to intrinsic physical, mathematical or numerical issues.Nevertheless, given the large implications on life and property, many models with different modeling assumptions have been proposed.For example in (Iverson, 1997;Iverson and Denlinger, 2001;Denlinger and Iverson, 2001;Pitman et al., 2003b;Denlinger and Iverson, 2004;Iverson et al., 2004), the depth-averaged model was applied in the simulation of test geophysical flows in large scale experiments.Several studies were specifically devoted to the modeling of volcanic mass flows (Bursik et al., 2005;Kelfoun and Druitt, 2005;Charbonnier and Gertisser, 2009;Kelfoun et al., 2009;Procter et al., 2010;Kelfoun, 2011;Charbonnier et al., 2013).In fact, volcanoes are great sources for a rich variety of geophysical flow types and provide field data from past flow events.
Modeling in this case proceeds by first assuming that the laws of mass and momentum conservation hold for properly defined system boundaries.The scale of these flows -very long and wide with small depth led to the first most generally accepted assumption -shallowness (Savage and Hutter, 1989).This allows an integration through the depth to obtain simpler and more computationally tractable equations.This is the next of many assumptions that have to be made.Both of these are fundamental assumptions which can be tested in the procedure we established above.Since, there is a general consensus and much evidence in the literature of the validity of these assumptions we defer analysis of these to future work.
The depth-averaged Saint-Venant equations that result are: Here the Cartesian coordinate system is aligned such that z is normal to the surface; h is the flow height in the z direction; hū and hv are respectively the components of momentum in the x and y directions; and k is the coefficient which relates the lateral stress components, σxx and σyy, to the normal stress component, σzz.The definition of this coefficient depends on the constitutive model of the flowing material we choose.Note that 1 2 kgzh 2 is the contribution of hydrostatic pressure to the momentum fluxes.Sx and Sy are the sum local stresses: they include the gravitational driving forces, the basal friction force resisting to the motion of the material, and additional forces specific of rheology assumptions.
The final class of assumptions are the assumptions on the rheology of the flows -in particular in this context assumptions used to model different dissipation mechanisms embedded in Sx, Sy that lead to a plethora of models with much controversy on the most suitable model.We focus here on three models derived from different assumptions for essentially the same class of flows.

Overview of the models
In the three following sections, we briefly describe Mohr-Coulomb (MC), Pouliquen-Forterre (PF) and Voellmy-Salm (VS) models.Models based on additional heterogeneous assumptions are possible, either more complex (Pitman and Le, 2005;Iverson and George, 2014) or more simple (Dade and Huppert, 1998).We decided to focus on these three because of their historical relevance.Moreover, if the degree of complexity in the models is significantly different, model comparison should take into account that, but this is outside the purpose of this study (e.g.(Farrell et al., 2015)).

Mohr-Coulomb
Based on the long history of studies in soil mechanics (Rankine, 1857;Drucker and Prager, 1952), the Mohr-Coulomb rheology (MC) was developed and used to represent the behavior of geophysical mass flows (Savage and Hutter, 1989).
Shear and normal stress are assumed to obey Coulomb friction equation, both within the flow and at its boundaries.In other words, where τ and σ are respectively the shear and normal stresses on failure surfaces, and φ is a friction angle.This relationship does not depend on the flow speed.We can summarize the MC rheology assumptions as: • Basal Friction based on a constant friction angle.
• Internal Friction based on a constant friction angle.
• Earth pressure coefficient formula depends on the Mohr circle (implicitly depends on the friction angles).
• Velocity based curvature effects are included into the equations.
Under the assumption of symmetry of the stress tensor with respect to the z axis, the earth pressure coefficient k = kap can take on only one of three values {0, ±1}.The material yield criterion is represented by the two straight lines at angles ±φ (the internal friction angle) relative to horizontal direction.Similarly, the normal and shear stress at the bed are represented by the line τ = −σ tan(δ) where δ is the bed friction angle.

MC equations
As a result, we can write down the source terms of the Eqs.(1): Where, ū ∼ = (ū, v), is the depth-averaged velocity vector, rx and ry denote the radii of curvature of the local basal surface.The inverse of the radii of curvature is usually approximated with the partial derivatives of the basal slope, e.g., 1/rx = ∂θx/∂x, where θx is the local bed slope.
PF rheology assumptions can be summarized as: • Basal Friction is based on an interpolation of two different friction angles, based on the flow regime and depth.
• Internal Friction is neglected.
• Earth pressure coefficient is equal to one.
• Normal stress is modified by a hydrostatic pressure force related to the flow height gradient.
• Velocity based curvature effects are included into the equations.
Two critical slope inclination angles are defined as functions of the flow thickness, namely φstart(h) and φstop(h).The function φstop(h) gives the slope angle at which a steady uniform flow leaves a deposit of thickness h, while φstart(h) is the angle at which a layer of thickness h is mobilized.They define two different basal friction coefficients.
An empirical friction law µ b ( ū ∼ , h) is then defined in the whole range of velocity and thickness.The expression changes depending on two flow regimes, according to a parameter β and the Froude number where γ is the power of extrapolation, assumed equal to 10 −3 in the sequel (Pouliquen and Forterre, 2002).
The functions µstop and µstart are defined by: The critical angles φ1, φ2 and φ3 and the parameters L, β are the parameters of the model.
In particular, L is the characteristic depth of the flow over which a transition between the angles φ1 to φ2 occurs, in the µstop formula.In practice, if h L, then µstop(h) ≈ tan φ2, and if h L, then µstop(h) ≈ tan φ1.

Voellmy-Salm
The theoretical analysis of dense snow avalanches led to the VS rheology (VS) (Voellmy, 1955;Salm et al., 1990;Salm, 1993;Bartelt et al., 1999).Dense snow or debris avalanches consist of mobilized, rapidly flowing ice-snow mixed to debris-rock granules (Bartelt and McArdell, 2009).The VS rheology assumes a velocity dependent resisting term in addition to the traditional basal friction, ideally capable of including an approximation of the turbulence-generated dissipation.Many experimental and theoretical studies were developed in this framework (Gruber and Bartelt, 2007;Kern et al., 2009;Christen et al., 2010;Fischer et al., 2012).
The following relation between shear and normal stresses holds: where, σ denotes the normal stress at the bottom of the fluid layer and g = (gx, gy, gz) represents the gravity vector.The two parameters of the model are the bed friction coefficient µ and the velocity dependent friction coefficient ξ.
We can summarize VS rheology assumptions as: • Basal Friction is based on a constant coefficient, similarly to the MC rheology.
• Internal Friction is neglected.
• Earth pressure coefficient is equal to one.
• Additional turbulent friction is based on the local velocity by a quadratic expression.
• Velocity based curvature effects are included into the equations, following an alternative formulation.
The effect of the topographic local curvatures is addressed with terms containing the local radii of curvature rx and ry.In this case the expression is based on the speed instead of the scalar components of velocity (Pudasaini and Hutter, 2003;Fischer et al., 2012).
VS equations Therefore, the final source terms take the following form: In our study, sampled input parameters are µ, and ξ, on ranges depending on the case study.In particular, ξ uniform sampling is accomplished in log-scale.In fact, values of ξ between 250 and 4,000 m/s 2 have been described for snow avalanches (Salm, 1993;Bartelt et al., 1999;Gruber and Bartelt, 2007).

Contributing variables
For analysis of modeling assumptions we need to record and classify the results of different modeling assumptions.In our case study, we focus on the right-hand side terms in the momentum equation and we call them RHS forces.They are contributing variables since internal to the computation and rarely visible as a system output.
it is the gravitational force term, it has the same formulation in all models.
The expression of basal friction force RHS2 depends on the model: The expression of the force related to the topography curvature, RHS3, also depends on the model: All the three models have an additional force term, having a different expressions and different meaning in the three models: These contributing variables can be analyzed locally and globally for discriminating among the different modeling assumption.
Finally, we also study the spatial integrals defined by where dx is the area of the mesh elements.This provides a global view of the results and is complementary to the observations taken locally.For instance, by integrating the scalar product of source terms in the momentum balance and velocity we can compare the relative importance of modeling assumptions when we seek accuracy on global quantities.

Small scale flow on inclined plane and flat runway
Our first case study assumes very simple boundary conditions, and corresponds to a laboratory experiment fully described in (Webb, 2004;Bursik et al., 2005;Webb and Bursik, 2016).It is a classical flow down an inclined plane set-up, including a change in slope to an horizontal plane (Fig. 1).Modeling flow of granular material down an inclined plane was explored in detail by several studies, both theoretically and experimentally (e.g.(Ruyer-Quil and Manneville, 2000;Silbert et al., 2001;Pitman et al., 2003a)).
In our setting, four locations are selected among the center line of the flow to accomplish local testing.These are: the initial pile location L1 = (−0.7,0) m, the middle of the inclined plane L2 = (−0.35,0) m, the change in slope L3 = (0, 0) m, the middle of the flat runway L4 = (0.15, 0) m.

Preliminary consistency testing of the input ranges
Addressing a similar case study, (Dalbey et al., 2008) assumed φ bed = [15 • , 30 • ], while (Webb and Bursik, 2016) performed a series of laboratory experiments and found φ bed = [18.2• , 34.4 • ].We relied on those published parameter choices to select a comprehensive parameter range.Figure 1b displays the screenshots of flow height observed in the extreme cases tested.The Digital elevation Map (DEM) has a 1mm cell size.Simulation options are -max time = 2 s, height/radius = 1.34, length scale = 1 m, number of cells across axis = 10, order = first, geoflow tiny = 1e-4 (Patra et al., 2005;Aghakhani et al., 2016).Initial pile geometry is cylindrical.We remark that small changes in the parameter ranges did not change significantly the results.
• Rheology models' parameters: Even if maximum and minimum runout are both matching, the shape and lateral extent of the flow are different between the three models.In particular, MC model can produce the largest lateral extent, and VS model displays an accentuated bow-like shape, due to the increased friction in the lateral margins.

Observable outputs
We express the flow height and acceleration as a function of time, measured in the four locations L1, . . ., L4 displayed in Fig. 1a.Uncertainty quantification (UQ) is performed, accordingly to the parameter ranges described above.We always show the mean values and the corresponding 5 th and 95 th percentile values, defining an uncertainty range.Estimates of the Froude Number are included in Supporting Information S1.

Flow height
Figure 2 displays the flow height, h(L, t), at the points (Li)i=1,...,4.Given a particular type of flow and collected data we can clearly distinguish model skill in capturing not only that flow but also other possible flows.Past work (Webb, 2004) allowed us to conclude that MC rheology is adequate for modeling simple dry granular flows.We must note the effect of neglecting the flow when its height is < 1 mm, which is at the scale of the smallest granular size (Aghakhani et al., 2016).In fact, continuity assumption would not be valid below this scale, and the 5 th and 95 th percentile plots are vertically cut to zero when they decrease over that threshold.The mean plot is not cut to zero but it is dulled by this cutoff.In plot 2a, related to point L1 placed on the initial pile, the values of ∼ 6 ± 1 cm are equal and express the assigned pile height.The flow height decreases slightly faster in PF model, and slower in MC, compared to VS. Differences are more significant in plot 2b, related to point L2, placed in the middle of the slope.Maximum flow height on average is greater in VS, 4.1 ± 0.2 mm, but more uncertain in MC, 3.9 ± 0.4 mm, and generally smaller in PF model, 3.0 ± 0.1 mm.After the peak, PF decreases significantly slower than the other models.These height values are about 15 times smaller than initial pile height.None of the models leaves a significant material deposit in L1 or L2, and hence the 95 th percentile of the height is null at the ending-time.In contrast, a deposit is left at points L3 and L4, i.e. plot 2c placed at the change in slope, and plot 2d in the middle of the flat runway.At L3 MC's deposit, 2 mm with uncertainty [-2,+8] mm, is higher than the other models' deposits.The plot profile is bimodal, showing a first peak at ∼ 0.6 s, and then a reduction until 1 s, before the final accumulation.At L4, deposit it is not significantly different between the three models.It measures ∼ 3 mm on average, slightly more than this in VS, with uncertainty [-3,+7] mm.

Flow acceleration
Figure 3 shows the flow acceleration, a (L, t), at the points (Li)i=1,...,4.Acceleration is the link between force terms and observable motion.We calculated it from the left-hand-side of the dynamical equation, but using the RHS terms produces very similar results.In plot 3a, related to point L1, MC and VS show a plateau before ∼ 0.4 sec, at ∼ 2.5 m/s 2 and ∼ 3.5 m/s 2 , respectively, while PF linearly decreases between those same values.Instead in plot 3b, related to point L2, MC and PF show a plateau, at ∼ 2.2 m/s 2 , while VS has a more bell-shaped profile.UQ tells us that PF is affected by a smaller uncertainty than the other models.In plot 3c, related to point L3, all the models show a bimodal profile, with peaks at ∼ 0.5 sec and 0.8 sec.This is more accentuated in MC and VS, whereas the second peak is almost absent from PF's profile.The second peak is motivated by an increase of velocity in the down-slope direction after its reduction due to lateral spreading of material.At the first peak, acceleration values are significant, with average peaks in MC and PF both at ∼ 15 m/s 2 , and 95 th percentile plot reaching ∼ 50 m/s 2 and ∼ 55 m/s 2 , respectively.VS shows about halved acceleration peak values.At the second peak, average acceleration values are similar in MC and VS, at ∼ 5 m/s 2 .In contrast, 95 th percentile plot is > 50 m/s 2 for MC, while ∼ 30 m/s 2 in VS.In plot 3d, related to point L4, the acceleration has a first peak at ∼ 4 m/s 2 , and a final asymptote at ∼ 2 m/s 2 in MC and VS, ∼ 1 m/s 2 for PF.These values indicate flow deceleration, and uncertainty is more relevant in MC and PF than in VS.Each dominance factor is the probability of a force term to be the greatest one, and hence belongs to [0, 1].The plots include also the probability of no-flow being observed at the considered point.

Statistical analysis of contributing variables
The plots 4a,b,c are related to point L1, placed on the initial pile.Only RHS1 can be the dominant variable, and no-flow probability is (1 − P1).Same thing in the plots 4d,e,f related to point L2, placed in the middle of the slope.Then, plots 4g,h,i are related to point L3, placed at the change in slope.In L3, RHS3 can be the dominant term for a short time, with a peak probability of ∼ 30%.Plots 4j,k,l are related to point L4, placed in the middle of the flat runway.In L4 only RHS2 can be the dominant term, except in PF where there is a ∼ 10% chance that RHS4 is the dominant term at the ending-time.Dominance factors describe the main dynamics of the flow, but they cannot tell anything on the notdominant variables.The expected contributions complete the statistical description of the contributing variables.They scale the force terms by the dominant dynamics, and represent the degree of relevance of the secondary assumptions with respect to the dominant one.They are averaged with respect to PM on the parameter range, and change as a function of time.In VS, C4 resembles C2, but it is bimodal instead than trapezoidal.The plots 5g,h,i are related to point L3.C1 and C2 are still the largest, but their profiles are bell-shaped.In VS, C4 is almost equal to C2.In all the models, C3 becomes significant, with a peak similar to C2.It shows different profiles -triangular for MC and PF, bimodal for VS.In MC the decrease occurs in two stages.Due to the presence of deposit, all the contributions are small (particularly small in PF), but not zero at the ending time.The plots 5j,k,l are related to point L4.Only C2 has a major role, with a bell shaped profile faster to wax than to wane.Contribution C4 has a minor role in VS and PF.

Flow extent and spatial integrals
Figure 6 shows the volumetric average of speed and Froude Number.Moreover, it shows the lateral extent and inundated area of flow, as a function of time.These global quantities have smoother plots than the local measurements describe above.However, most of the details observed in local measurements are not easy to discern anymore.In plot 6a the speed has a bell-shaped profile in all the models, with an average peak at ∼ 1.4m/s and uncertainty range of ±0.4m/s for PF and VS.VS is slightly slower, reaching ∼ 1.3m/s on average.MC shows a larger uncertainty range, of ±0.6m/s.The maximum speed is reached first by VS and PF at ∼ 0.55s, and last by MC at ∼ 0.65s.In plot 6b, also the Froude Number has a bell-shaped profile.F r peaks are temporally aligned with speed peaks, and are ∼ 10 in VS, ∼ 11 in MC, ∼ 13.5 in PF, on average.Uncertainty range is about ±4 in all models.In plot 6c inundated area shows similar maximum values in PF and VS, at ∼ 0.6m 2 on average, and uncertainty of ±0.15m 2 .MC is lower, at ∼ 0.45m 2 on average, and less uncertain, ±0.10m 2 .VS does not decrease significantly after reaching the peak, whereas the other models contract their area to approximately half its maximum extent.In plot 6d the lateral extent starts equal to the pile diameter ∼ 15cm, and then rises in two stages in MC and PF.The second and greater rise starts at ∼ 0.6s, and corresponds with the time of arrival at the change in slope (see Fig. 2c).In contrast, VS rises without showing two phases.At ∼ 0.6s, average lateral extent is ∼ 50cm in PF and VS, and ∼ 43cm in MC.Uncertainty range is ±7cm for all models at that time.Final extent is ∼ 75cm in PF, ∼ 65cm in MC, ∼ 55cm in VS.Uncertainty range is ±5cm in VS, but rises to ±10cm in MC and PF.

Power integrals
Figure 7 shows the spatial integral of powers (scalar product of force and velocity terms).The spatial integration is performed on half spatial domain, due to the symmetry with respect to the flow central axis.In particular, the power estimates assume a material density ρ = 805kg/m 3 .This a constant scaling factor, and the plots are not further affected by its value.Corresponding plots of the force terms are included in Supporting Information S2.The scalar product with velocity imposes a bell-shaped profile, as observed in Fig. 6a.In plot 7a the power of RHS1 represents the effect of the gravity in all the models.It starts from zero and rises up to ∼ 1.5W at ∼ 0.55s, then decreases to zero after the material crosses the change in slope.Uncertainty range of ±0.5W affects the peak values.MC decreases slower than the other models, and has a more significant uncertainty after the change in slope.PF decreases faster.In plot 7b the power of RHS2 represents the basal friction.It is negative and peaks to ∼ 1.1 ± 0.2W in MC, ∼ 1.0 ± 0.2W PF, ∼ 0.7 ± 0.3W in VS.A similar bell-shaped profile is shared by the three models, and basal friction power becomes negligible at the ending-time.In plot 7c the power of RHS3 is related to the curvature effects, and it is not null only at the change in slope.It is always dissipative, i.e. opposed to flow velocity, because it is equivalent to the friction due to the additional weigh generated by centrifugal forces.It is weaker than −0.1W on average, ten times smaller than the previous powers, although MC lower percentile reaches ∼ −0.25W .VS displays a bimodal profile, with a second and weaker peak at ∼ 0.75s.In plot 7d the power of RHS4 is related to the additional forces of the models, differently characterized.This term is really relevant in VS, although also in PF has a very short lasting positive peak up to 0.3W before to become null at ∼ 0.1s.This power in VS is a speed dependent term, always dissipative.It is bell shaped and null before ∼ 0.1s and after ∼ 1s.At the time of change in slope it is ∼ −0.7W , ±0.3W .
We assume the flow to be generated by the gravitational collapse of a material pile placed close to the summit area -at 644956N, 2157970E UTM13 (Rupp et al., 2006;Aghakhani et al., 2016).The volcano has produced several pyroclastic flows generated by lava dome collapse, called Merapi style flows (Macorps et al., 2018).A dome collapse occurs when there is a significant amount of recently-extruded highlyviscous lava piled up in an unstable configuration around a vent.Further extrusion and/or externals forces can cause the still hot dome of viscous lava to collapse, disintegrate, and avalanche downhill (Bursik et al., 2005;Wolpert et al., 2016;Hyman and Bursik, 2018).The hot, dense blocks in this "block and ash" flow (BAF) will typically range from centimeters to a few meters in size.Our computations were performed on a DEM of 5m-pixel resolution, obtained from Laser Imaging Detection and Ranging (LIDAR) data acquired in 2005 (Davila et al., 2007;Sulpizio et al., 2010).We placed 51 locations along the flow inundated area to accomplish local testing.Six of them are then adopted as preferred locations, being representative of different flow regimes.

Preliminary consistency testing of the input ranges
In this same setting, (Dalbey et al., 2008) assumed φ bed = [15 • , 35 • ], while (Capra et al., 2011) adopted φ bed = 30 • .Moreover, (Spiller et al., 2014;Bayarri et al., 2015;Ogburn et al., 2016) found a statistical correlation between flow size and effective basal friction inferred from field observation of geophysical flows.A BAF at the scale of our simulations would possess φ bed = [13 • , 18 • ] according to their estimates.Small changes in the parameter ranges do not change significantly the results.Figure 9 displays the maps of maximum flow height observed in the extreme cases tested.Simulation options are -max time = 7200 s (2 hours), height/radius = 0.55, length scale = 4e3 m, number of cells across axis = 50, order = first, geoflow tiny = 1e-4 (Patra et al., 2005;Aghakhani et al., 2016).Initial pile geometry is paraboloid.The models display different macroscopic features.In particular, VS lateral spreading is significant lower and the material reaches higher thickness, whereas PF model seems to stop more gradually than MC with more complex inundated area boundary lines.• Material Volume: [2.08, 3.12] × 10 5 m 3 , i.e. average of 2.6 × 10 5 m 3 and uncertainty of ±20%.
• Rheology models' parameters:  evolving, and that the contributions can reveal a large amount of information about it.We remark that the calculation of E[Ci] with respect to PM removes the effects of data discontinuity, and hence this is a fundamental step in our further analysis.We note that the above choices are easily changed, and if we are interested for instance in the performance of the models for very large or very small flows, a suitable volume range can be chosen and the procedure re-run.

Observable outputs
The number of spatial locations is significantly high.We placed 51 points to span the entire inundated area, in search of different flow regimes, as displayed in Fig. 8.We remark that all the distances reported in the following are measured in vertical projection, thus without considering the differences in elevation.2a.In this case the height decreases from the initial value to zero in ∼ 15s.In plots 11b,c,d,e, the locations are are set at less than ∼ 1 km radius from the initial pile.Their profiles are similar to point L2 in Fig. 2b.The height profile is bell-shaped, starting from zero and then waning back to zero in ∼ 20 s.All the dynamics occurs during the first minute.In plots 11f,g,h,i,j, points are set where the slope reduces, and the flow can channelize, and typically leaves a deposit.The distance from the initial pile is ∼ 2 − 3 km.The profiles are sometimes similar to L3 of Fig. 2c, other times to L4 of Fig. 2d, in a few cases showing intermediate aspects.In general is either observed an initial short-lasting bulge followed by a slow decrease lasting for several minutes and asymptotically tending to a positive height, or a steady increase of material height tending to a positive height.In both cases it is sometimes observed a bimodal profile in the first 5 minutes.Finally, plots 11k,l focus on three points set at about the runout distance of the flow, in the most important ravines, at ∼ 4 − 5 km from the initial pile.Profiles are similar to what observed in point L4 of Fig. 2d.

Flow height in six locations
We select six preferred locations, illustrative of a range of flow regimes.They are [L8, L10, L17, L39, L43, L46], as displayed in Figure 12.The first two points, L8 and L10, are both proximal to the initiation pile.Points L17 and L43 are placed where the slope is reducing and the ravines start, and L39 and L46 are placed in the channels, further down-slope.In particular, L8, L43, and L46 are at the western side of the inundated area, whereas L10, L17, and L39 are at the eastern side.In plots 13c,e, we show the flow height in points L17 and L43, both at ∼ 2km from the initial pile.All the models show a fast spike during the first minute, followed by a slow decrease.There is still material after 1800s.VS has a secondary rise peaking at ∼ 450s, which is not observed in the other models.This produces higher values for the most of the temporal duration, but similar deposit thickness after more than 1 hour.Maximum values are 1m for MC, 2m for PF, and 1.5m for VS, in both locations.The 5 th percentile is zero in all the models, meaning that the parameter range does not always allow the flow to reach these locations.The 95 th percentile is above 5m, except for VS in point L17.In plots 13d,f, we show the flow height in points L39 and L46, both placed at more than 3km from the initial pile.The three models all show a monotone profile except for MC in point L39, which instead displays an initial spike and a decrease before to rise again.A similar thing is observed in the 95 th percentiles of all the models.It is significant that the 5 th percentile of PF becomes positive after ∼ 5400s, meaning that almost surely the flow has reached that location.Deposit thickness in point L39 is ∼ 0.5m for all the models, whereas in point L46 it is 1.7m in VS, 1.6m in PF, and 1.2m in MC.

Three locations proximal to the initial pile
Figure 14 shows the dominance factors (Pi)i=1,...,4 of the RHS terms modulus, in the three proximal points L8, L10, and L17, all closer than 1 km to the initial pile.The plots 14a,b,c and 14d,e,f are related to point L8 and L10, respectively.They are significantly similar.The gravitational force RHS1 is the dominant variable with a very high chance, P1 > 90%.In MC and PF there is a small probability, P3 = 5% − 30%, of RHS3 being the dominant variable for ∼ 5s.In VS it is observed a P4 = 5% chance of RHS4 being dominant, just for a few seconds.Plots 14g,h,i concern the relatively more distal point L17.They are split in two sub-frames at different time scale.In all the models, RHS2 is the most probable dominant variable, and its dominance factor has a bell-shaped profile.In all the models, also RHS1 has a small chance of being the dominant variable.In MC this chance is more significant, at most P1 = 30% for ∼ 20s, and again P1 = %2 in [100, 7200]s.In PF P1 = 15% in two peaks, one short lasting at about 55s, and the second extending in [100,500]  In all the models C1 is significantly larger than C2 and C3, which are almost equivalent in MC and PF, while C2 > C3 in VS.C4 always gives a negligible contribution, except in VS, where it is comparable to C2.In L8, following PF, C3 is bimodal, whereas it is unimodal in MC and VS.This is not observed in L10.In L8, C3 is greater than in L10, compared to the other forces.VS always shows a slower decrease of the plots.In plots 15g,h,i are shown the expected contributions in L17.The plots are split in two sub-frames at different time scale.Initial dynamics is dominated by C2, except for in MC, and only for a short time, [30,35]s.In MC there is an initial peak of C2 which is not observed in the other models.C3 has a significant size, in MC and PF, and unimodal profile.In PF, after C3 wanes, at about 60s also C4 becomes not negligible for ∼ 40s.The second part of the temporal domain is characterized by a slow decrease of C2 > C1.

Three locations distal from the initial pile
Figure 16 shows the dominance factors (Pi)i=1,...,4 in the three distal points L39, L43, and L46, all more than 2 km far from the initial pile.Plots 16a,b,c and L39 are dominated by RHS1.In all the models P1 is increasing and P1 > 90% at the end of the simulation.In MC, P1 shows a plateau at ∼ 40% in [90,2000]s preceded and followed by steep increases, while in the other models it rises gradually.P2 > 0 after ∼ 500s and 3600s, respectively, but is never greater than 2%.In MC P3 =∼ 10% at [50, 70]s.No-flow probability becomes zero in PF and VS, while stops at 20% in MC.Plots 16d,e,f are related to point L43, and are remarkably complex.In MC, either P1 and P2 are ∼ 35% in the first 200s.Then, P2 increases, and RHS2 becomes the only dominant variable after 3600s.The no-flow probability is never below 30%.P3 = 35% in [40, 60]s.Instead, in PF P1 > 90% until 3600s, and P2 rises only in the very last amount of time, reaching P2 = P1 = 40%.The no-flow probability is very low during the most of the temporal window, rising at 20% only at 7200s.Both P3 and P4 show short peaks, ∼ 10%, at [50, 60]s.In VS the no-flow probability is never below 20%, and the dominance factors are broadly equivalent to MC, although P1 is the greatest up to 4000s, and P3 ≡ 0. Plots 16g,h,i are related to point L46 and they are similar to those recorded at point L17, but P2 > 90% and the no-flow probability decreases to zero in the second half of the simulation.Moreover, in all the models P1 does not show any initial peak and instead increases slowly, reaching P1 = 10% after more than 3600s.Figure 17 shows the expected contributions in the distal points.All the plots are dominated by C1 and C2, and the remarkable differences observed in the dominance factors depend on which contribution is the greatest.In general these two contributions have similar profiles.Plots 17a,b,c are related to point L39 and C1 > C2.In MC, also is C3 > 0 for a short time.In MC and PF also C4 > 0, but it is significantly lower than the previous contributions, almost negligible in MC.Plots 17d,e,f concern point L43.In MC C1 < C2, in PF C1 > C2, in VS they C1 decreases and crosses C2 at ∼ 3600s.The two contributions form a plateau in MC, in [90,200]s.In MC and PF C3 > 0 for a few seconds, and also C4 > 0 with an initial spike at ∼ 60s.In PF it shows a long lasting plateau, while becomes negligible in MC.Plots 17g,h,i are related to point L46.In all the models C1 < C2, and these force contributions are monotone increasing.Only in MC C3 > 0 shortly, and C4 > 0, but almost negligible.

Flow extent and spatial integrals
Figure 18 shows the volumetric average of speed and Froude Number.It also shows the inundated area as a function of time.Like in Fig. 6, spatial averages and inundated area have smoother plots than local measurements, and most of the details observed in local measurements are not easy to discern.In plot 18a the speed shows a bell-shaped profile in all the models, but whereas the all values were close in the inclined plane experiment, in this case the maximum speed is ∼ 60m/s in MC, ∼ 50m/s in PF, ∼ 20m/s in VS, on average.Uncertainty is ±18m/s in MC, similar, but skewed, in VS, and ±10m/s in PF.In plot 18b, the F r profile is very similar to the speed, but the difference between VS and the other models is accentuated.Maximum values are ∼ 50 in MC, ∼ 38 in PF, ∼ 5 in VS, whereas uncertainty is ±10 in MC, ±7 in PF, and skewed [−5, +10] in VS.In plot 18c, inundated area has a first peak in MC and PF, both at ∼ 1.15km 2 , followed by a decrease to 0.55km 2 and 0.7km 2 ,respectively, and then a slower increase up to a flat plateau at 0.9km 2 and 1.5km 2 , respectively.Uncertainty is ∼ ±0.2km 2 in both MC and PF until ∼ 100s, and then it increases at ±0.3km 2 and [−0.5, +0.4]km 2 , respectively.In MC this increase in uncertainty is concentrated at ∼ 100s, while it is more gradual in PF.VS has a different profile.The initial peak is only significant in the 95 th percentile values, and occurs later, at ∼ 100s.The peak is of ∼ 1km 2 on the average, but up to ∼ 1.8km 2 in the 95 th percentile.The decrease after the peak is very slow and the average inundated area never goes below 0.85km 2 , and eventually reaches back to ∼ 1km 2 .Uncertainty is [−0.3, +0.2]km 2 .

Power integrals
Figure 19 shows the spatial sum of the powers.The estimates in this section assume ρ = 1800kg/m 3 as a constant scaling factor.Corresponding plots of the force terms are included in Supporting Information S6.The scalar product of force with velocity imposes the bell-shaped profile already observed in Fig. 7a.In plot 19a the power of RHS1 starts from zero and rises up to ∼ 1.4e11W in MC, ∼ 1.2e11W in PF, ∼ 6.5e10W in VS.Uncertainty is ±4e10W in MC, ±3e10W in PF, [−4e10, +5e10]W in VS.The decrease of gravitational power is related to the slope reduction, and this decrease is more gradual in VS than in the other models.In plot 19b the power of RHS2 is always negative and peaks to ∼ −6.5e10W in MC, ∼ −5e10W in PF, ∼ −2e10W in VS.In VS this dissipative power is significantly more flat than in the other models.MC and PF show negligible powers after ∼ 100s, VS after ∼ 200s.Uncertainty is ±2e10W in MC, ±1.5e10W in PF, [−2e10, +1e10]W in VS.In PF, the plot starts from stronger values than in the other models, but it is also the faster to wane.In plot 19c the power of RHS3 shows a negative peak at ∼ −7e10W in MC, ∼ −4.5e10W in PF, ∼ −5e9W in VS.Uncertainty on the peak value is [−4.5e10,+3.5e10]W in MC, [−2.5e10, +2e10]W in PF, [−1e10, +5e9]W in VS.The three models all show a bell-shaped profile, MC and PF waning to zero at 90s, VS at ∼ 30s.In plot 19d the power of RHS4 has a different meaning in the three models.In MC it is the internal friction term, and it only has almost negligible ripple visible in the first second.In PF it is a depth averaged correction in the hydrostatic pressure, and has an almost negligible effect only during the first second of simulation, at 5e9W .It becomes null at ∼ 10s.In VS, instead, it is a speed dependent term, and has a very relevant effect.The plot shows a bell-shaped profile, with a peak of ∼ −3.5e10W , [−2e10, +1e10]W .After that, this dissipative power gradually decreases, and becomes negligible at 200s.6 Discussion on the comparative performance of geophysical flow models and contributing variables The statistically driven method introduced in this study for analyzing complex models provides extensive and quantitative information.Geophysical flow modeling usually compares simulation to observation, and fits the model parameters coping with inverse problems.Nevertheless, this is not always sufficient to solve forecasting problems, in which the range of possible flows might not be limited to a single type and scale of flow.Our approach is different, and evaluates the statistics of a range of flows, produced by the couple (M, PM ) -i.e. a model M and a probability distribution for its parameters PM .
New quantitative information can solve classical qualitative problems, either model-model or modelobservation comparison.The mean plot represents the average behavior of flows in the considered range, and provides the same type of data that is provided by a single simulation.Moreover, the uncertainty ranges stem additional pieces of information that often exalt the differences between the models.
In the following, we summarize first the general features which differ between the model outputs.This is a traditional type of analysis, but performed by us in a new probabilistic framework that is likely more useful for extrapolating and forecasting.After that, we describe and compare the general features of contributing variables.This is a new type of analysis enabled by our approach and allows us to evaluate modeling assumptions and their relative importance.Finally, we give an example of model performance evaluation of the couple (M, PM ) according to a specific observation.

Characteristic features and their motivations
The different features in the models macroscopically affect the shape of the inundated area, both in the small flow on the inclined plane, and in the large scale geophysical flow.In Fig. 1b the flow runout displays a larger lateral extent in PF, and a bent shape in VS -the lateral wings remain behind the central section of the flow.Also in Fig. 9, the three models look clearly different.Even if the maximum runout is matched between the models, MC displays a further distal spread before entering the ravines, PF shows a larger angle of lateral spread at the initiation pile, VS is less laterally extended and generally looks significantly channelized, and displays several not-inundated spots due to minor topographical coulées.

Flow height and acceleration
Flow height gives additional insight on the model features.In Fig. 2, MC is more distally stretched, but starts to deposit material earlier and closer to the initial pile compared to the other models.PF height is generally shorter, and displays a small temporal anticipation in its arrival at the sample points.These features are probably due to the hydrostatic correction term, which additionally pushes forward the material during the pile collapse.A linear cut in the flow height profile of PF is also observed when the flow thins on the slope.That is probably generated by the interpolation between the two basal friction angles as a function of flow height and speed.VS tends to be higher than the other models, if observed at the same instant, because of the reduced lateral spreading of material.
In Fig. 13 the flow height plots allow us to classify the local flow regimes according to their similarity with what observed in the four sample points of the small scale case study.There is a new feature which was not present in the small scale flow -VS is temporally stretched, and material arrives later and stays longer in all the sample points.This is a consequence of the speed dependent term reducing flow velocity.
Flow acceleration brings our analysis from the mechanics to the dynamics of the flow.In Fig. 3, at the point on the slope, MC has a flat plateau, while PF is linearly decreasing, and VS is bell shaped.These differences are a consequence of the assumptions behind the models -double bed friction angle in PF, and speed dependent term in VS.At the slope change point, VS and MC display a bimodal profile in the acceleration.This is not a statistical effect, and it is also observed in single simulations.The first maximum is when the head of the flow hits the ground, while the second maximum is when the accumulating material in the tail arrives there.In VS the maxima are equal, because the tail is not laterally spread and hits the ground compactly.In contrast, PF does not show such a second peak, due to the accentuated lateral spreading in the tail.On the flat runway, the average acceleration in PF is lower than in the other models, probably because the material stops more suddenly.

Statistical analysis of contributing variables
The new concepts of dominance factors and expected contributions are adopted in the study of the contributing variables.We focus on the force decomposition, but a similar procedure can be applied to any set of contributing variables.The dominance factors illustrate the predominant force term through time, while the expected contributions provide a full comparison of the mean force terms.
In the inclined plane case study, Fig. 4 shows minor differences in the dominance factors between the models.In general, there is always a single dominant variable, and its profile is complementary with the no-flow probability.In the slope change point the differences between the models are more significant.Curvature term dominance probability is bimodal in MC and VS, and in VS the two peaks are equivalent.On the flat runway, in PF the hydrostatic correction can be the dominant variable with a small chance.In Fig. 5 the expected contributions illustrate the relative scale of all the forces, and show that the second strongest force is never above 60% of the dominant.
In the Volcán de Colima case study, the dominance factors and expected contributions are more complex.In the proximal points to the initiation, Fig. 14, gravitational force is dominant with a very high chance until the no-flow probability becomes large.In MC and PF curvature related forces can also be dominant for a short time.In VS gravitational force is dominant for a larger time span than in the other models, because of the longer presence of the flow.The speed dependent friction can be dominant with a small probability at the beginning of the dynamics.The expected contributions in Fig. 15 confirm these features.
In the distal points, 16, the gravity or the basal friction are dominant with high probability.Some of the points have a deposit at the end with a high chance, some other not, depending on the slope.In general, in MC the no-flow probability tends to be larger than in the other models, because some flow samples stops earlier, or completely leaves the site.Again, curvature can have a small chance to be dominant in MC and PF, particularly when the speed is high.The expected contributions, Fig. 17, show that the remarkable diversity between the different locations is a consequence of small unbalance between gravity and basal friction.
Point L43 is particularly interesting.It is not proximal to the initiation, but the no-flow probability is increasing at the end, meaning that all the material tends to leave the site.Moreover, the dominating force can be the gravity or the basal friction depending on the time and the model.In MC and VS both the two forces have similar chances to be dominant for the most of the time of the simulation.In PF, only the gravitational force is dominant with a high chance.This is probably because point L43 is situated downhill of a place where a significant amount of material stops.

Flow extent and spatial integrals
In Fig. 6 the spatially averaged speed and Froude Number are significantly similar between the models.The differing features appear to be mostly localized in space.However, VS is confirmed to be significantly slower than the other models after the initial collapse.Moreover, it is the only model which presents a significant amount of long lasting and slowly moving material.Inundated area in PF has a greater maximum value, because of the accentuated lateral spread.In VS the inundated area almost does not decrease from its peak, because of the strictly increasing lateral spread.Vice versa, lateral spread in MC and PF has a temporary stop when the bulk of the flow hits the ground.This is a consequence of the interplay of accumulating material and the push of new material, which is stronger in the middle than in the lateral wings.In PF lateral extent weakly decreases for a short time.The hydrostatic correction term may generate the pull reducing the lateral extent.In Fig. 18 average speed and Froude Number display that VS slower speed is not a local feature.Inundated area is again significantly larger in PF than in the other models.

Power integrals
Figure 7 and 19 produce a clear explanation of the source (gravity) and the dissipation of power.In both cases the dissipation lags the source (gravity) slightly as expected and additional terms RHS4 only has a major impact in VS.Similarly the curvature based term RHS2 has a minimal impact on VS.
Spatial integrals of power terms have several features in common with the corresponding forces, and provide a decomposition of the acceleration sources.Main dissimilarity between forces and powers is that gravitational and basal friction powers have a profile starting from zero when the flow initiates, because the flow speed starts from zero.In Fig. 7 the differences between models are particularly relevant in term RHS4.Speed dependent power in VS is at least one order of magnitude larger than the maximum values of the corresponding terms in MC and PF.Those are decreasing to zero after a short time from the initiation.Hydrostatic correction in PF is clearly positive in the speed direction, and hence contributing to push the flow ahead.The effects of internal friction in MC are almost negligible, and initially positive, then negative.This is motivated by an initial compression of the material during the pile collapse, followed by its stretching.It is worth remarking that RHS2 and RHS3 are both smaller in VS, due to the lower basal friction angles involved.In Fig. 19, the differences are accentuated, because of the topography complexity.Gravity term is larger in VS, because a portion of the flow lingers on the higher slopes for a long time.Basal friction has a higher peak in PF compared to the other models, due to the interpolation of the two basal friction angles.

Example of model performance calculation
Analysis of model suitability have been conducted.In past work (Patra et al., 2005), MC rheology was tuned to match deposits for known block and ash flows, but a priori predictive ability was limited by inability to tune without knowledge of flow character.The new procedure developed in this study enables an enhanced quantification of model performance, i.e. the similarity of the outputs and real data.We remark that the measured performance refers to the couple (M, PM ), and that different parameter ranges can produce different performances.Our example concerns the Volcán de Colima case study, and in particular we compare the inundated region in our simulations to the deposit of a BAF occurred 16 April 1991 (Saucedo et al., 2004;Rupp, 2004;Rupp et al., 2006).The inundated region is defined as the points in which the maximum flow height H is greater than 10cm.A similar procedure may be applied to any observable variable produced by the models, if specific data become available.Let M : P(R 2 ) → [0, 1] be a similarity index defined on the subsets of the real plane.An equivalent definition can be based on the pseudo-metric 1 − M. For example, we define where S ⊂ R 2 is the inundated region, and D ⊂ R 2 is the recorded deposit.In particular, MI is the area of the intersection of inundated region and deposit over the area of the deposit, MU is the area of the deposit over the area of the union of inundated region and deposit, J is the product of the previous, also called Jaccard Index (Jaccard, 1901).
Figure 20 shows the probability distribution of the similarity indices, according to the uniform probability PM on the parameter ranges defined in this study.Different metrics can produce different performance estimates, for example MC inundates most of the deposit, but overestimates the inundated region, while VS relatively reduces the inundated region outside of the deposit boundary, but also leaves several not-inundated spots inside it.

Conclusions
In this study, we have introduced a new statistically driven method for analyzing complex models and their constituents.We have used three different models arising from different rheology assumptions in geophysical mass flows to illustrate the approach.The data shows unambiguously the performance of the models across a wide range of possible flow regimes and topographies.The analysis of contributing variables is particularly illustrative of the impact of modeling assumptions.Knowledge of which assumptions dominate, and, by how much, will allow us to construct efficient models for desired inputs.Such model composition is the subject of ongoing and future work, with the purpose to bypass the search for a unique best model, and yet to go beyond a simple mixture of alternative models.In summary, our new method enabled us to break down the effects of the different physical assumptions in the dynamics, providing an improved understanding of what characterizes each model.The procedure was applied to two different case studies: a small scale inclined plane with a flat runway, and a large scale DEM on the SW slope of Volcán de Colima (MX).In particular, we presented: • a short review of the assumptions characterizing three commonly used rheologies of Mohr-Coulomb, Poliquenne-Forterre, Voellmy-Salm.This included a qualitative list of such assumptions, and a breaking down of the different terms in their differential equations.
• an new statistical framework, processing the mean and the uncertainty range of either observable or contributing variables in the simulation.The new concepts of dominance factors and expected contributions enabled a simplified description of the local dynamics.These quantities were analyzed at selected sites, and spatial integrals were also performed, illustrating the characteristics of the entire flow.
• a final discussion, explaining all the observed features in the results on the light of the known physical assumptions of the models, and the evolving flow regime in space and time.This included an example of model performance estimation method, depending on the metric and the cost function adopted.
Our statistical analysis based on UQ depicted the following main features: • Compared to a classical MC model, PF lacks of internal friction and this produces an accentuated lateral spread.This is increased by the hydrostatic correction, which briefly pushes the flow ahead and laterally during the initial collapse.That force can also have some minor effects in the final deposit accumulation.The interpolation of the smaller bed friction angle φ1 with the larger value φ2, suddenly stops the flow if it thin compared to its speed.This mechanism does not allow for large speed peaks.• Instead in VS, the speed dependent friction has a great effect in reducing lateral spread and producing channeling features even due to minor ridges in topography.The flow tends to be significantly slower and more stretched in the slope direction.The effects of the different formulation of the curvature term are less impacting than the effects of the lower basal friction and speed.
Additional research concerning other case studies, and different parameter ranges, might reveal other flow regimes, and hence differences in the consequences of the modeling assumptions under new circumstances.
For each column of Q we are replacing the λs r−1 elements with entry k by a random permutation of (k − 1)λs r−1 + h h∈1,...,λs r−1 .After the replacement procedure is done, the newly obtained matrix U is equivalent to a LHS which inherits from Q the property of fully covering s r equal r-dimensional hypercubes in every r-dimensional projection.Each hypercube contains λ points.In other words, inside each r-dimensional projection, the design associated to U fills the space like a regular grid at the scale of those s r hypercubes, but it is still an LHS at a finer scale, i.e. the λs r−1 one dimensional bins.A complete proof can be found in (Tang, 1993) and it is a straightforward verification of the required properties.
However, even in an LHS based on an OA(n, m, s, r), if r < m what happens in the projections with dimension r > r is not controlled, and randomizing procedures are made more difficult by the additional structure imposed by the OA.Moreover, the total number of points necessary to achieve a full design increases with r, and hence is affected by dimensionality issues.
Dealing with relatively small d, i.e. d ∈ {3, 4}, we adopt a LHS U created by a OA(s d , d, s, d).The strength is equal to the dimension d, hence the design fills the entire space like a d dimensional grid, but it is a LHS as well.In this case there is one point in each hypercube, and λ = 1.We take s = 8 for the 3-dimensional designs over the parameter space of Mohr-Coulomb and Voellmy-Salm models, i.e. 512 points; we took s = 6 for the 4-dimensional designs over the more complex parameter space of the Pouliquen-Forterre model, i.e. 1296 points.

B Conditional decomposition of expected contributions
Expected contributions are obtained after diving the force terms by the dominant variable Φ, which is an unknown quantity depending on time, location, and input parameters.Thus we provide an additional result, further explaining the meaning of those contributions through the conditional expectation.
Proposition 6 Let (Fi)i∈I be random variables on (Ω, F, P ).For each i, let Ci be the random contribution of Fi.Then we have the following expression: where pj := P {Φ = |Fj|}.
Proof.Let Z be a discrete random variable such that, for each j ∈ N, (Z = j) ⇐⇒ (Φ = |Fj|).Then, by the rule of chain expectation: Moreover, by definition, pj = P {Z = j}.This completes the proof.Consequently, we define the conditional contributions.

Figure 1 :
Figure 1: (a) Inclined plane overview, including samples sites (red stars).Pile location is marked by a blue dot.(b) Contours of h = 1.0 mm at last simulated snapshot (t = 1.5 sec) for simulated flows with minimum runout obtained from min.volume -max.resistance, and maximum runout obtained from max. volume -min.resistance.-: MC, -: PF, -: VS.

Figure 2 :
Figure 2: Flow height in four locations.Bold line is mean value, dashed/dotted lines are 5 th and 95 th percentile bounds.Different models are displayed with different colors.Plots are at different scale, for simplifying exposition.

Figure 3 :
Figure 3: Flow acceleration modulus in four locations.Bold line is mean value, dashed lines are 5 th and 95 th percentile bounds.Different models are displayed with different colors.Plots are at different scale.

Figure 4
Figure4shows the dominance factors (Pi)i=1,...,4, obtained from the modulus of the RHS terms in the slope direction.

Figure 4 :
Figure 4: Dominance factors of RHS forces on the slope direction in four locations.Different models are plotted separately: (a,d,g,j) assume MC; (b,e,h,k) assume PF; (c,f,i,l) assume VS.Different colors correspond to different force terms.No-flow probability is also displayed with a green dashed line.

Figure 5 :
Figure 5: Expected contributions of RHS forces on the slope direction in four locations.Different models are plotted separately: (a,d,g,j) assume MC; (b,e,h,k) assume PF; (c,f,i,l) assume VS.Different colors correspond to different force terms.

Figure 5
Figure5shows E[Ci]i=1,...,4, for the three rheology models.The plots 5a,b,c are related again to point L1.C1 and C2 give the major contributions, with a minor contribution from C4 in VS.Contributions profiles are flat plateaus that start to wane after 0.4s.The plots 5d,e,f are related to point L2.The major contributions are C1 and C2, and have a trapezoidal profile.In VS, C4 resembles C2, but it is bimodal instead than trapezoidal.The plots 5g,h,i are related to point L3.C1 and C2 are still the largest, but their profiles are bell-shaped.In VS, C4 is almost equal to C2.In all the models, C3 becomes significant, with a peak similar to C2.It shows different profiles -triangular for MC and PF, bimodal for VS.In MC the decrease occurs in two stages.Due to the presence of deposit, all the contributions are small (particularly small in PF), but not zero at the ending time.The plots 5j,k,l are related to point L4.Only C2 has a major role, with a bell shaped profile faster to wax than to wane.Contribution C4 has a minor role in VS and PF.

Figure 6 :
Figure 6: Comparison between spatial averages of (a) flow speed, and (b) Froude Number in addition to the flow (c) lateral extent, and (d) inundated area, as a function of time.Different models are displayed with different colors.

Figure 7 :
Figure 7: Spatial integral of the RHS powers.Bold line is mean value, dashed lines are 5 th and 95 th percentile bounds.Different models are displayed with different colors.

Figure 8 :
Figure 8: (a) Volcán de Colima (México) overview, including 51 numbered locations (stars) and major ravines.Initial pile is marked by a blue dot.Coordinates are in UTM zone 13N.(b) Regional geology map.(c) Digital elevation map.Six preferred locations are colored in yellow.Elevation isolines are displayed in blue, elevation values in red.

Figure
Figure 10d,e,f,g,h,i show examples of the contributions obtained assuming parameter values at the extremes of their range.

Figure 10 :
Figure 10: (a,b,c) Example of LHS design, Volcán de Colima case study.Colored stars mark the values producing the minimum and maximum friction.The parameter values are projected with respect to their volume value.Plots of random contributions are included for (d,e) MC, (f,g) PF, and (h,i) VS model.Each plot includes two graphs.One refers to a proximal location to the initial pile (left), and the other to a more distal location (right).Point numbers refer to Fig.12.Different colors correspond to different force terms.

Figure 11 :
Figure 11: MC model, mean flow height h(L, t) in 51 numbered locations (Fig. 8).Different plots have different scales on either time and space axes.

Figure 11
Figure11shows the mean flow height, h(L, t), at the 51 spatial locations of interest, according to MC. Plots of the Froude Number at the 51 points are included in Supporting Information S3.In plot 11a, the only location is set on the center of the initial pile, and the profile is similar to what observed in point L1 of the inclined plane case study, in Fig.2a.In this case the height decreases from the initial value to zero in ∼ 15s.In plots 11b,c,d,e, the locations are are set at less than ∼ 1 km radius from the initial pile.Their profiles are similar to point L2 in Fig.2b.The height profile is bell-shaped, starting from zero and then waning back to zero in ∼ 20 s.All the dynamics occurs during the first minute.In plots 11f,g,h,i,j, points are set where the slope reduces, and the flow can channelize, and typically

Figure 12 :
Figure 12: Volcán de Colima (México) overview, including six numbered locations (stars).In (a), (b), (c), (d) are enlarged the proximal topographic features to those locations.Initial pile is marked by a blue dot.Reported coordinates are in UTM zone 13N.Elevation isolines at every 10m are displayed in black, at every 100m in bold blue.Elevation values in red.

Figure 13
Figure13shows the flow height, h(L, t), at the points (Li)i=8,10,17,39,43,46.Froude Number and flow acceleration are included in Supporting Information S4 and S5.Distances from the initial pile are in vertical projection.In plots 13a,b, we show the flow height in points L8 and L10, ∼ 200m and ∼ 500m from the initial pile, respectively.Models MC and PF display similar profiles, positive for less than 15s and bell-shaped.VS requires a significantly longer time to decrease, particularly in point L10, where the average flow height is still positive after ∼ 200s.Peak average values in L8 are 3.4m in P F , 4.3m in MC, 4.7m in VS.Uncertainty is about ±2m, halved on the lower side in M C, and P F .In L10, models MC and PF are very similar, with peak height at 1.4m and uncertainty ±0.5m.Model VS, in contrast, has a maximum height of 1.1m lasting for 50s, and 95 th percentile reaching 3.7m.

Figure 13 :
Figure 13: Flow height in six locations.Bold line is mean value, dashed/dotted lines are 5 th and 95 th percentile bounds.Different models are displayed with different colors.Plots are at different scale, for simplifying lecture.

Figure 14 :
Figure 14: Dominance factors of RHS forces in three locations in the first km of runout.Different models are plotted separately: (a,d,g) assume MC; (b,e,h) assume PF; (c,f,i) assume VS.Different colors correspond to different force terms.No-flow probability is displayed with a green dashed line.

Figure 15
Figure15shows the corresponding expected contributions E[Ci]i=1,...,4.The contributions in points L8 and L10 are shown in 15a,b,c and 15d,e,f, respectively.The plots related to the same model are similar.In all the models C1 is significantly larger than C2 and C3, which are almost equivalent in MC and PF, while C2 > C3 in VS.C4 always gives a negligible contribution, except in VS, where it is comparable to C2.In L8, following PF, C3 is bimodal, whereas it is unimodal in MC and VS.This is not observed in L10.In L8, C3 is greater than in L10, compared to the other forces.VS always shows a slower decrease of the plots.In plots 15g,h,i are shown the expected contributions in L17.The plots are split in two sub-frames at different time scale.Initial dynamics is dominated by C2, except for in MC, and only for a short time,[30, 35]s.In MC there is an initial peak of C2 which is not observed in the other models.C3 has a significant size, in MC and PF, and unimodal profile.In PF, after C3 wanes, at about 60s also C4 becomes not negligible for ∼ 40s.The second part of the temporal domain is characterized by a slow decrease of C2 > C1.

Figure 15 :
Figure 15: Expected contributions of RHS forces in three locations in the first km of runout.Different models are plotted separately: (a,d,g) assume MC; (b,e,h) assume PF; (c,f,i) assume VS.Different colors correspond to different force terms.

Figure 16 :
Figure 16: Dominance factors of RHS forces in three locations after 2 km of runout.Different models are plotted separately: (a,d,g) assume MC; (b,e,h) assume PF; (c,f,i) assume VS.Different colors correspond to different force terms.No-flow probability is displayed with a green dashed line.

Figure 17 :
Figure 17: Expected contributions of RHS forces in three locations after 2 km of runout.Different models are plotted separately: (a,d,g) assume MC; (b,e,h) assume PF; (c,f,i) assume VS.Different colors correspond to different force terms.

Figure 18 :
Figure 18: Comparison between spatial averages of (a) flow speed, and (b) Froude Number, in addition to the (c) inundated area, as a function of time.Bold line is mean value, dashed/dotted lines are 5 th and 95 th percentile bounds.Different models are displayed with different colors.

Figure 19 :
Figure 19: Spatial integrals of the RHS powers.Bold line is mean value, dashed lines are 5 th and 95 th percentile bounds.Model comparison on the mean value is also displayed.Different models are displayed with different colors.

Figure 20 :
Figure 20: Volcán de Colima.Pdf of the similarity index of inundated regions and a real BAF deposit.(a) is based on M I , (b) on M U , (c) on J .The models are MC (red), PF (blue), VS (black).Data histograms are displayed in the background.Global 5 th and 95 th percentile values are indicated with dashed lines.Plots (d,e,f) display the average inundated region {x : E[H(x] > 10cm}.The boundary of the real deposit is marked with a colored line. Let g : [a, b] → [0, 1] be a score function defined over the percentile range of the similarity index.The global 5 th and 95 th percentile values[a, b]  are defined assuming to select the model randomly with equal chance, and are also shown in Fig.20a,b,c.Then the performance score Gg of model (M, PM ) is defined as:Gg (M, PM ) = [a,b] g(x)dfM (x),where fM is the pdf related to the model.Possible score functions include a step function at the global median, a linear or quadratic function, a sigmoid function.Supporting Information S7 reports a Table of alternative performance scores, according to changing similarity indices and score functions.