The relative effectiveness of empirical and physical models for simulating the dense undercurrent of pyroclastic flows under different emplacement conditions

High concentration pyroclastic density currents (PDCs) are hot avalanches of volcanic rock and gas and are among the most destructive volcanic hazards due to their speed and mobility. Mitigating the risk associated with these ﬂows depends upon accurate forecasting of possible impacted areas, often using empirical or physical models. TITAN2D, VolcFlow, LAHARZ, and ∆ H/L or energy cone models each employ different rheologies or empirical relationships and therefore differ in appropriateness of application for different types of mass ﬂows and topographic environments. This work seeks to test different statistically- and physically-based models against a range of PDCs of different unchannelized. The simple rheological models of TITAN2D and VolcFlow are not able to recreate all features of these more complex ﬂows. LAHARZ is fast to run and can give a rough approximation of inundation, but may not be appropriate for all PDCs and the designation of starting locations is difﬁcult. The ∆ H/L cone model is also very quick to run and gives reasonable approximations of runout distance, but does not inherently model ﬂow channelization or directionality and thus unrealistically covers all interﬂuves. Empirically-based models like LAHARZ and ∆ H/L cones can be quick, ﬁrst-approximations of ﬂow runout, provided a database of similar ﬂows, e.g., FlowDat, is available to properly calculate coefﬁcients or ∆ H/L . For hazard assessment purposes, geophysical models like TITAN2D and VolcFlow can be useful for producing both scenario-based or probabilistic hazard maps, but must be run many times with varying input parameters. LAHARZ and ∆ H/L cones can be used to produce simple modeling-based hazard maps when run with a variety of input volumes, but do not explicitly consider the probability of occurrence of different volumes. For forward modeling purposes, the ability to derive potential input parameters from global or local databases is crucial, though important input parameters for VolcFlow cannot be empirically estimated. Not only does this work provide a useful comparison of the operational aspects and behavior of various models for hazard assessment, but it also enriches conceptual understanding of the dynamics of the PDCs themselves.


INTRODUCTION Pyroclastic Density Currents
High concentration pyroclastic density currents (PDCs) are hot avalanches of volcanic rock and gas which, due to their ability to travel great distances at high speeds, are among the most destructive volcanic hazards. The transport of these flows has been a topic of much debate and research (e.g., Sparks, 1976;Francis and Baker, 1977;Dade and Huppert, 1998;Burgisser and Bergantz, 2002, etc.). They are conceptualized as complex dense-dilute coupled flows (Denlinger, 1987;Valentine, 1987;Burgisser and Bergantz, 2002;Sulpizio et al., 2014), with a dense, coarse-grained, basal undercurrent overridden by a dilute, turbulent cloud of ash and gas, called a surge, an ash cloud, or herein, an ash-cloud surge, which can detach from the basal flow and travel in unexpected directions (e.g., Yamamoto et al., 1993;Fisher, 1995;Bourdier and Abdurachman, 2001;Loughlin et al., 2002;Ogburn et al., 2014). An intermediate zone has also been proposed (Ishida et al., 1980;Denlinger, 1987;Fujii and Nakada, 1999) and observed in large scale experiments (Breard et al., 2016;Breard and Lube, 2017). Other conceptual models posit that end-members exist between forced-convection and inertia-dominated flows (Doronzo, 2012). Complexity, unsteadiness, and non-uniformity of PDCs also arise from variability of the collapse source through time (e.g., Mellors et al., 1988;Sato et al., 1992;Calder et al., 2002;Cole et al., 2002;Woods et al., 2002); the interaction of the flows with complex topography (e.g., Stinton, 2007;Charbonnier and Gertisser, 2012;Doronzo et al., 2012;Ogburn et al., 2014); erosion and sedimentation processes (e.g., Cole et al., 2002;Doronzo, 2012;Sulpizio et al., 2014); among other factors. Additionally, the rheology of PDCs changes through space and time (e.g., Iverson and Vallance, 2001;Dartevelle, 2004;Bursik et al., 2005;Iverson and George, 2012;Sulpizio et al., 2014;Dufek, 2016) due to the many of the factors that contribute to the unsteadiness of the flows, as listed previously. The complexity and limited understanding of many aspects of the physics of PDCs make predicting the behavior of these flows using empirical or physical models challenging, as every model must make simplifications. Determining which of these simplified models are useful for simulating different volume PDCs, in different environments, and with different operational constraints remains a problem.

Model Validation and Benchmarking
Mitigation of risk associated with PDCs depends upon accurate forecasting of possible flow paths or inundation areas, often using the distribution of past deposits, empirical models using various mobility metrics, or the application of geophysical flow models. Different models vary in their ability to reproduce the runout, inundated area, velocity, and local emplacement features of natural PDCs. Geophysical mass-flow models use simplified rheological laws that attempt to capture the bulk flow behavior in both time and space (e.g., TITAN2D, Pitman et al., 2003;Patra et al., 2005;VolcFlow, Kelfoun and Druitt, 2005;Kelfoun et al., 2009). Alternatively, empirical models use statistical relationships between flow volume and runout or area in order to forecast inundation (e.g., LAHARZ, Iverson et al., 1998;energy cone, Sheridan, 1979;Malin and Sheridan, 1982). While these popular models have been used to reproduce a variety of specific PDCs at different volcanoes (e.g., Widiwijayanti et al., 2008;Charbonnier and Gertisser, 2009;Kelfoun et al., 2009;Tierz et al., 2016), little work has been undertaken to compare models in a systematic way. The work that does exist compares a few models to a single, well-studied PDC (e.g., Charbonnier and Gertisser, 2012;Charbonnier et al., 2013).
This work expands upon single-deposit comparisons of models by validating and benchmarking popular models against each other to recreate all 13 mapped, dome-collapse PDCs that remained on land at Soufrière Hills Volcano (SHV), Montserrat, and were emplaced during the most recent eruption, with dates ranging from 1996 to 2010. Model validation is the comparison of model results to reality, while benchmarking involves the comparison of models that simulate the same physical processes using identical initial and boundary conditions. The purpose of this undertaking is not to rank or judge models, but rather to understand the extent to which the different modeling approaches can replicate reality in certain conditions, and to explore the dynamics of PDCs themselves (e.g., Charbonnier, 2013). Benchmarking exercises can inform users of the strengths and weaknesses of different models for their use in research and hazard management. In particular for the generation of hazard maps, model choices need to be defendable for each set of flow scenarios considered. Benchmarking can also constrain the circumstances under which each of these models performs best. The work presented herein is an example of benchmarking against natural cases, wherein the input parameters are estimated from any existing observations and field data and then adjusted in order to fit the inundation footprint of the actual PDC (Charbonnier, 2013). Importantly, this work also compares these best-fit, "tuned" model outputs and the input parameters used to achieve them with results and input parameters based on field measurements and empiricallyderived relationships (e.g., friction-volume relationship) from the global FlowDat database of mass flows (Ogburn, 2012(Ogburn, , 2014Ogburn et al., 2016). This is an important advance because it avoids only evaluating highly "tuned" model results, and instead strives to also compare models' ability to forecast inundation using field-or statistically-derived input parameters.

PDC Modeling at Soufrière Hills Volcano
PDCs at SHV have been modeled using a variety of tools, including TITAN2D (Widiwijayanti et al., 2007;Hidayat et al., 2008;Ogburn, 2008), LAHARZ/PFZ , PDAC (Esposti Ongaro et al., 2008), PYROFLOW (Wadge et al., 1998), and energy cones (Wadge and Isaacs, 1988). The Montserrat Volcano Observatory (MVO) and the Scientific Advisory Committee (SAC) have also used some of these models to simulate PDCs, usually for forward modeling and hazard management purposes, and multiple-model analysis of hazards. The SAC performed a limited comparison of these models for the simulation of the 8 January 2007 PDC (Scientific Advisory Committee, 2007), and found that PYROFLOW underestimated runout by 20%, while LAHARZ/PFZ and TITAN2D were able to replicate the runout. A follow-up study compared the performance of these models in replicating the 8 January 2010 and 11 February 2010 PDCs Scientific Advisory Committee, 2011;Wadge et al., 2014). The models produced relatively similar results, but were highly tuned (i.e., the input parameters were adjusted to produce the best outputs). Recent work has developed an advanced methodology for modeling PDCs, by using fast emulators of geophysical flow models in order to effectively simulate the ongoing hazard posed by PDCs on Montserrat as the situation at the volcano evolves (Bayarri et al., 2009;Spiller et al., 2014;Calder et al., submitted). Effectively combining multiple modeling techniques or the use of statistical emulators for hazard mapping requires that the limitations and strengths of each model be well understood.

SHV Deposits
The ongoing eruption (1995-present) at SHV is used as a case study throughout this work, due to the number and variety of PDC events, and the wealth of detailed studies of deposits and real-time observations of PDCs in motion. The eruption is characterized by the extrusion of a series of large lava domes and Vulcanian explosions. Volcanic mass flows have included dome-collapse block-and-ash flows (BAFs) Cole et al., 2002), column-collapse pumice flows Cole et al., 2002Cole et al., , 2014, ash-cloud surges Cole et al., 2002Cole et al., , 2014Ogburn et al., 2014), surge-derived PDCs Druitt et al., 2002), hydro-volcanic flows generated by PDCs interacting with seawater (Edmonds and Herd, 2005), debris avalanches , and frequent lahars Wadge et al., 2014). The eruption has produced many thousands of rockfalls and PDCs; however, there are 20 mapped, dome-collapse BAFs, ranging in volume from 0.2 to 210 × 10 6 m 3 , 13 of which remained on land. These 13 PDCs (Table 1) are modeled using TITAN2D, VolcFlow, LAHARZ, and ∆H/L cones. While modeling was performed using all 13 events, three main BAFs will serve as examples: • 5 June 1997 ( Figure 1A, Table 1) was a smaller-volume, discrete, channelized BAF with a minimal ash-cloud surge component . • 25 June 1997 ( Figure 1B, Table 1) was a larger-volume, sustained dome collapse, and the only PDC to cause fatalities at SHV. It was well documented by eyewitnesses, seismic signals, photographs and video footage . It consisted of three main flow pulses, successively overrunning the deposits of previous pulses and produced an extensive, detached ash-cloud surge. This PDC also produced a surge-derived flow (included in the figures herein to avoid distortion of the deposit maps), which exhibited a unique generation mechanism and different flow dynamics to the rest of the surge Loughlin et al., 2002). • 8 January 2010 ( Figure 1C, Table 1) was a medium-volume PDC, which was recorded in detail via HD and thermal video (Molle, 2012), and which entered the Belham Valley (Molle, 2012;Cole et al., 2014). This PDC was included despite being associated with a Vulcanian explosion because it exhibited characteristics that were transitional in nature between typical  . ‡ The 8 January 2010 flow entered multiple valleys, but the main 1.3 × 10 6 m 3 portion entered the BV.
pumice flows from Vulcanian fountain collapse and typical dome-collapse BAFs: while an eruption column reached 8.3 km, the PDC was largely sourced from the dome summit , possibly as a result of a small blast or boiling-over dome-collapse, and pumice clasts accounted for only 5 wt% by volume of the deposits. Furthermore, the morphology of the flow deposit resembled the BAFs from other dome-collapse events. This PDC also represents one of only three mapped flows to inundate the Belham Valley, which, from a hazards perspective, is especially important to investigate, as it is the only valley that directs PDCs toward currently inhabited areas.
Three other flows, 3 April 1996, 3 August 1997, and 21 September 1997, are examined more briefly and represent a wide range of emplacement characteristics . The 3 April 1996 PDC was a smaller-volume, discrete collapse, which propagated down the steepest valley on Montserrat, the Tar River Valley, with an average slope of >10 • . The 3 August 1997 PDC was a larger-volume dome-collapse, which traveled down the wide Fort Ghaut. This flow represents one of the only flows on Montserrat to have a significant portion of the flow that traveled unchannelized. The 21 September 1997 PDC was a larger-volume event, which mainly traveled down Tuitt's Ghaut, but with some volume also traveling down White's Ghaut, and thus represents the only flow with significant volume in multiple valleys. These factors (high slope, lack of channelization, and multiple valley emplacements) are useful conditions under which to test the behavior of the models.

MASS FLOW MODELS AND METHODS
Energy Cone or ∆H/L Models Early modeling of PDCs was based on the Heim coefficient, or ∆H/L ratio, where ∆H is the fall height traversed by a flow and L is the runout length, a mobility metric established by the debris avalanche community (Heim, 1932). ∆H/L is equivalent to the coefficient of friction (or basal friction angle) in a sliding-block or Coulomb friction model (Shreve, 1968;Scheidegger, 1973), in which shear stress at the initiation of failure is proportional to the normal stress. ∆H/L exhibits a roughly linear inverse relationship with PDC volume on a log-log plot (Hayashi and Self, 1992). To apply the concept to PDC runout forecasting, Sheridan (1979) used the ∆H/L vs. volume relationship to create an "energy line, " which intersects topography and predicts the stopping location of flows. Malin and Sheridan (1982) expanded this concept to an "energy-cone" and incorporated computer image-processing techniques to make simple volcanic hazard maps at Mount St. Helens. The energy-cone or ∆H/L cone concept has been used successfully at a variety of volcanoes (e.g., Sheridan and Malin, 1983;Sheridan and Macías, 1995;Alberico et al., 2002;Sheridan et al., 2004;Tierz et al., 2016), including SHV (Wadge and Isaacs, 1988). Validation tests of the energy cone model at Vesuvius and Campi Flegrei volcanoes (Italy) showed that the energy cone model performed as well as or better than other mass flow models (Tierz et al., 2016). An ∆H/L cone can be drawn by hand using nothing but a paper contour map, but this can be time consuming. Many software packages (Vhub Energy Cone, Palma, 2013; LAHARZ) have built-in tools to automatically generate ∆H/L cones over a digital elevation model (DEM).

TITAN2D
TITAN2D is a geophysical model that simulates dry granular mass-flows over DEMs of natural terrain in time and space. The model assumes that mass flows behave as shallow, incompressible flows . Following Hutter (1989, 1991), Iverson and Denlinger (2001) and Denlinger and Iverson (2001), TITAN2D employs depth-averaged shallowwater equations for the conservation of mass and momentum. The newest, two-phase, version of TTIAN2D (4.0.0) now allows the user to select from various material models (Mohr-Coulomb, Voellmy, or Pouliquen-Forterre). In this work, the Mohr-Coulomb friction terms are used for both the graingrain interactions within the flow (internal friction) and grainsurface interactions (basal friction). TITAN2D outputs flow depth, velocity, and momentum through time. These products can be used to map inundation area, run-up height, and velocity of the modeled flow using a GIS. A full description of the governing equations, and solution and computational methods employed by TITAN2D can be found elsewhere (Pitman et al., 2003;Patra et al., 2005). More information can also be found on the TITAN2D Users group (https://vhub.org/groups/ titan2dusers) and User's Guides (https://vhub.org/resources/ titan2d/supportingdocs).
Comparisons of TITAN2D with both experimental granular flows and real PDCs have shown success. Patra et al. (2005) and Bursik et al. (2005) performed extensive verification of the TITAN2D code, as well as validation against experimental sand flows and the 1991-1999 PDCs at Colima Volcano, Mexico. Since its development, TITAN2D has been used by numerous workers to simulate PDCs (Procter et al., 2004a;Bursik et al., 2005;Rupp et al., 2006;Widiwijayanti et al., 2007;Hidayat et al., 2008;Macías et al., 2008;Ogburn, 2008Ogburn, , 2014Charbonnier and Gertisser, 2009;Murcia et al., 2010;Charbonnier et al., 2013); debris avalanches Grieco and Capra, 2007;Groppelli et al., 2008); and lahars (Munoz et al., 2004;Procter et al., 2004b;Delaite et al., 2005;Darnell et al., 2008). However, most of these studies involve modeling PDCs in a forecasting capacity; only a few have validated the model against known deposits: at Colima Patra et al., 2005;Rupp et al., 2006) and Merapi, Indonesia (Charbonnier andGertisser, 2009, 2012;Charbonnier et al., 2013). Charbonnier and Gertisser (2009) used TITAN2D to simulate both discrete and sustained dome-collapse BAFs from the June 2006 eruption of Merapi, using a collapsing pile and a variable bed friction. The simulated results were comparable to the actual deposits in terms of runout distance, inundated area, and thickness. They also found that TITAN2D closely reproduced local features such as run-up over obstacles and super-elevation around bends. Most importantly for modeling hazards, TITAN2D predicted areas where the flows debouched from main channels and formed overbank deposits. Flow velocity and duration of runout, however, were not well replicated by TITAN2D.
TITAN2D can be run as a stand-alone Linux program or as a web-app through Vhub.org, using collapse-pile or volume flux source geometries. In this work, pile parameters are taken from observations of collapse scar location and shape, where available. No initial velocity is used, unless the particular flow had a recognized blast component. Volume flux sources are also used, with fluxes estimated from the deposit volume and any available information on active collapse duration.
In TITAN2D, the internal and basal friction angles must be supplied by the user. TITAN2D is fairly insensitive to changes in internal friction angle, as long as the value falls within a reasonable range (25-45 • ) Ogburn, 2008;Dalbey, 2009). Larger values provide resistance to lateral spreading. Basal friction, however, has a major effect on modeled flows (Stinton et al., 2004;Rupp et al., 2006). The ∆H/L values of the deposits themselves, converted to degrees, are used for the basal friction angles for deposit-derived model runs. Then, the basal friction parameter is varied in order to produce the best-fit results ( Table 1). Both the deposit-derived and best-fit input parameters are then compared to basal friction angles derived from empirical ∆H/L vs. volume relationships from FlowDat (i.e., Equations 1 and 2), which can be used for forward modeling.

LAHARZ and PFZ
Other researchers (Dade and Huppert, 1998;Iverson et al., 1998;Iverson, 2003;Kelfoun and Druitt, 2005) have disputed the appropriateness of the constant friction models described by ∆H/L ratios and used for energy cones and TITAN2D for simulating mass flows. Iverson et al. (1998) derived a simple physical scaling argument for the observed relationship between the area and volume of lahars, where planimetric area, A plan , is proportional to volume to the 2/3 power, V 2/3 .  applied this metric to PDCs at Montserrat. Schilling (1998) and Iverson et al. (1998) developed LAHARZ, a method of predicting lahar inundation zones that uses the A ∝ V 2/3 relationship directly using two equations with a physical basis derived from scaling analysis: where A plan is planimetric area, A xs is cross-sectional area, V is volume, and c and C are empirical calibrating coefficients derived from statistical analysis of field data. LAHARZ has been used mainly for simulating lahars, for which it was originally designed (Iverson et al., 1998). However, Griswold and Iverson (2007) performed the same analysis for debris flows and rockfalls and recalibrated the coefficients. LAHARZ is implemented through an ArcGIS toolbox, consisting of tools that create surface hydrology grids, define a proximal hazard zone boundary, and model the inundation area. The proximal-hazard zone boundary is created by specifying parameters for an ∆H/L energy cone. The user chooses the coefficients C and c (Equation 3) by selecting the type of mass flow (lahar, debris flow, or rock avalanche) they wish to model. In modified versions of LAHARZ, such as PFZ , and in this work, additional options have been added to this menu in order to model PDCs. Starting from either a user-selected upstream cell (usually the head of a valley), or the intersection of the stream and the proximal hazard zone boundary, LAHARZ fills in topography along the drainage, using the calculated cross-sectional area for a userspecified volume (Equation 3). The resultant planimetric area filled is added to a running tally, and LAHARZ uses the flow direction grid to move to the next downstream cell along the stream until the appropriate planimetric area is reached (from Equation 3). Widiwijayanti et al. (2008) collected planimetric area and cross-sectional area measurements for the dense-basal avalanche component of BAFs. They recalibrated coefficients for BAFs and for a subset of data from SHV, Montserrat, calling the modified code "PFZ." For start locations, they chose cells at what they deemed to be prominent breaks in slope of the valleys. In this way, they were able to recreate BAF deposit inundation areas at SHV (1996 and Merapi (2006 PDCs) with modest success. Lee et al. (2015) applied the same method to Merapi 2010 PDCs and achieved a 56% agreement between model results and actual inundation.
In this work, analysis of variance (ANOVA) was performed following Iverson et al. (1998) on parameters from the FlowDat mass flows database (Ogburn, 2012(Ogburn, , 2014 to test and calibrate the relationship between A plan and A xs and V 2/3 (Equation 3) for PDCs. In addition to the FlowDat coefficients for all global BAFs, coefficients were also calculated for several subsets for comparison ( Table 2). These subsets include BAFs from SHV only, BAFs < 2 × 10 6 m 3 in volume, and BAFs > 2 × 10 6 m 3 in volume. Coefficients used by Widiwijayanti et al. (2008) are also included for comparison. Coefficients from Table 2, and the exact coefficients for specific flows, are then used to run the model and these results are compared to produce best-fit results. Flows are started at the head of appropriate valleys, and at prominent breaks in slope, after Widiwijayanti et al. (2008). The best-fit results are compared to results using the globally-derived FlowDat coefficients.
While the relationship between planimetric area and volume of PDCs has been explored, it is important to note that measurements of the cross-sectional area A xs for PDCs are somewhat problematic. Unlike lahars or debris flows for which LAHARZ is intended, the thickness and therefore A xs of PDCs in motion is much more difficult to ascertain from the field, and PDC deposit thickness is poorly correlated with flow thickness. The appropriateness of using LAHARZ, an empirical (but physically based) topography-filling algorithm for PDCs is debatable, even if the resultant inundation footprints are similar to the real flows. Dade and Huppert (1998) derived an energy balance argument for the area and runout of debris flows that also used the TABLE 2 | Coefficients c and C for A plan and A xs vs. V 2/3 relationship calculated from the 2/3-slope regression model and ANOVA analysis of FlowDat data, compared to coefficients calculated by Widiwijayanti et al. (2008).

Dataset c (A plan ) C (A xs )
All A ∝ V 2/3 relationship, and argued that their analysis implied a constant resisting shear stress. Kelfoun and Druitt (2005), Kelfoun (2008), and Kelfoun et al. (2009) have argued that constant friction models are ill-suited for reproducing pyroclastic flows and instead built on the conceptual model of Dade and Huppert (1998) to create VolcFlow, a model that can use a constant retarding stress (CRS), which is independent of flow velocity or volume. Under this model, the ratio of the driving to retarding stresses is not constant as in a frictional model, but rather decreases with increasing thickness . Flows are able to move when thicker than a certain threshold as the driving stresses dominate; and when below this thickness threshold, CRS dominates and flows slow down . VolcFlow runs inside MATLAB and solves depth-averaged equations of mass and momentum using a topography-linked coordinate system in time and space, much like TITAN2D. The stress parameter in VolcFlow is not fixed to a frictional model, but instead can be used to simulate a variety of rheologies. Internal and basal friction, cohesion, viscosity, and coef_u2 (a dimensionless parameter that defines turbulent or collisional stress) form the basis of the rheological law chosen (frictional, viscous, Bingham, Voellmy, etc.) in the input files, which must be modified by the user. Cohesion is defined by the CRS in kPa. The CRS condition is a rheological law derived from the energy balance analysis of Dade and Huppert (1998) that states that the retarding stress is constant and independent of any other flow parameter (Kelfoun and Druitt, 2005;Kelfoun et al., 2009). Reasonable values for CRS were empirically determined by Kelfoun and Druitt (2005) and Kelfoun et al. (2009), and ranged from 5 to 50 kPa for PDCs, with larger-volume, thicker flows requiring higher CRS values . Kelfoun et al. (2009) could not provide a physical basis for this relationship however, and cited this as the main weakness of the VolcFlow CRS model. More information can also be found on the VolcFlow model webpage (Kelfoun, 2009; http://wwwobs.univ-bpclermont.fr/lmv/perso/ Kelfoun_Karim/VolcFlow/VolcFlow.html) VolcFlow has shown success in replicating experimental results, including shocks, rarefaction waves, and granular jumps for debris avalanche deposits using the CRS condition (Kelfoun and Druitt, 2005). Kelfoun and Druitt (2005) reproduced the inundated area, surface features, and thickness of the Socompa avalanche in Chile using VolcFlow with a CRS between 50 and 100 kPa. Kelfoun et al. (2009) also used VolcFlow (CRS 5 kPa) to replicate the inundated area, thickness, lobe shape, and velocity of the 2006 PDCs at Tungurahua volcano, Ecuador. These pyroclastic flows were sustained currents formed by the gravitational collapse of material being continuously deposited on the upper flanks of the volcano by volcanic fountaining , making these flows somewhat unusual in their source conditions. The emplacement of the VolcFlow simulations progressed as a series of pulses, as mass accumulated at the crater rim until reaching a critical thickness, at which point it began to flow, and then accumulate again , behavior not usually observed during other dome-collapse events. While VolcFlow has also been used to successfully replicate three PDCs from Merapi Volcano, (Charbonnier and Gertisser, 2012;Charbonnier et al., 2013), it has only been validated using a small sample of natural deposits.
In this work, VolcFlow is used to replicate the selected SHV PDCs using a CRS parameter and a bulk density estimate. CRS is estimated using SHV deposit parameters from FlowDat following Dade and Huppert (1998), where: where λ is a plan shape factor, equal to the ratio of the average width to the length of a flow; ρ is flow bulk density; g is gravity; H is the fall height traversed by a flow; V is volume; and A plan is planimetric area. CRS values calculated from deposit parameters are used as a starting point, and varied to produce best-fit simulation results. VolcFlow bulk density is roughly estimated, taking deposit bulk density and values used by Kelfoun et al. (2009) into account. Minimum, maximum, and average estimates of bulk density are used for different model runs, but only average estimates are discussed herein as the final results did not differ greatly.
VolcFlow outputs flow depth, velocity, and position of center of mass for each time step. The output files can be imported into GIS software for further analysis.

Digital Elevation Models
It is important that the DEM be as close to a "pre-mass flow" condition as possible when recreating past deposits, and as close to current topography when forward modeling, as changes to valley carrying capacity can have profound effects on modeled flows (Ogburn, 2014;Ogburn et al., 2014). Four DEMs were used for this work (Supplementary Table 1), depending on the date of the flow to be modeled. The dome areas of the DEMs were especially important for the TITAN2D, VolcFlow, and ∆H/L cone model runs starting from the dome summit.

Model Output Processing and Analysis
Two major post-processing considerations are required for TITAN2D and VolcFlow: (1) determining the stopping criteria and thus the stop time, and (2) creating a map that can be suitably compared to real flow deposit maps. Neither TITAN2D nor VolcFlow can stop a flow independent of a user defined maximum time limit, as the velocities will never reach zero. Instead, the user must decide upon reasonable criteria to consider the flows stopped. For this work, the stopping criteria of Yu et al. (2008) were applied in conjunction with other measures, including visual examination of maximum velocity plots. While the problem of unrealistically thin material at simulated flow margins propagating with high velocities in TITAN2D, i.e., the "thin layer problem, " has been largely reduced by Dalbey (2009), results still indicate high-velocity, unrealistically thin areas. Filtering the simulated thicknesses to exclude flows less than 10 cm eliminates these areas, but removes less than 5% of the total flow volume. This method of thickness filtering was employed by Charbonnier andGertisser (2009, 2012) and Charbonnier et al. (2013).
Goodness of fit between model runs and PDC deposits was determined using a combination of methods. Inundated area and runout were compared to PDC maps, along with specific features, such as areas where a flow over-spilled a channel. Velocity, flow thickness, and duration were also compared to field measurements and observations where available.
Not all of these parameters can be compared between models, or directly between models and actual flow parameters ( Table 3). For instance, although VolcFlow and TITAN2D are both time domain models, VolcFlow does not output total maximum velocities reached in each cell, while TITAN2D does. LAHARZ and ∆H/L only compute inundated area or runout limits respectively, with no velocity or thickness information. Both TITAN2D and VolcFlow produce a continuous record of maximum velocities at each time-step, but not flow front velocities. Flow front velocities are most commonly recorded for actual PDCs (e.g., Calder, 1999;Loughlin et al., 2002;Molle, 2012), and there can be significant differences between flow front and maximum velocities within a flow. Simpson (1999) reported that experimental flow front velocities were only 60% of maximum velocities reached by other parts of the flow. For these reasons, it is necessary to compare the models quantitatively wherever possible, but also to qualitatively examine circumstances under which each model performs well, or fails to reproduce the natural examples. It is also important to reiterate that these models only simulate the dense-basal avalanche component of the PDCs, and not the coupled, dilute, ash-cloud surge. Additionally, none of the models simulate deposition and thus do not produce deposit maps; final timestep inundation and thickness maps are compared with actual deposits in the figures.

MODEL COMPARISON RESULTS
To produce best-fit model runs, basal friction is varied in TITAN2D, CRS is varied in VolcFlow, and the coefficients C and c and starting locations are varied for LAHARZ. For the best-fit ∆H/L cones, the SHV-only ∆H/L vs. volume regression was used. Best-fit results are compared with "non-calibrated" deposit-derived results obtained using only observed PDC data as input parameters; namely, deposit-derived basal friction (∆H/L), deposit-derived CRS, globally-derived LAHARZ coefficients, and the global ∆H/L vs. volume regression. This allows a comparison of TITAN2D, VolcFlow, LAHARZ, and ∆H/L cones from both a fully calibrated or "best-fit" perspective, and from the perspective of a modeler using only past deposits (or empirically-derived global mobility relationships from FlowDat) to estimate model inputs. For TITAN2D and VolcFlow, best-fit, deposit-derived, and both global and local empirically-derived input parameters from FlowDat are later compared.
In the following figures, only TITAN2D flux sources are compared to the other models, although TITAN2D collapsingpile sources were also investigated. In general, simulations using TITAN2D with a collapsing pile often show considerable spread outside of the intended direction, with significant volumes entering other valleys. This is caused by the problem of the collapse pile geometry itself, as a pile of the appropriate volume can be quite large (both planimetrically and in height) relative to the dome summit. Tall pile sources also produce extremely high velocities on steep proximal slopes. These factors tend to produce decreased runout as compared to the actual deposit, as the volume is subsequently partitioned between multiple valleys. This issue poses significant problems when modeling smaller-volume flows, which are usually completely channelized in nature. This can be partially compensated for by slicing the summit topography in such a way as to direct the collapsing pile more readily (Ogburn, 2008), carefully selecting starting coordinates and initial directions through trial and error, or introducing a varying basal friction component for different slopes in order to slow the simulation on steep slopes (Charbonnier and Gertisser, 2012;Charbonnier et al., 2013). Alternatively, a volume flux starting geometry can be used instead, and is used here for the model comparisons in this work.

June 1997
Figure 3 and Table 4 shows the comparison between TITAN2D (Figures 3A,B), VolcFlow (Figures 3C,D), LAHARZ (Figures 3E,F), and ∆H/L cone (Figures 3G,H) model outputs of the 5 June 1997 event. The left column shows the best-fit results, while the right column shows the non-calibrated deposit-derived   (TITAN2D and VolcFlow) or globally-derived (LAHARZ and ∆H/L cone) results. All models were able to replicate the runout of the 5 June 1997 event reasonable well (<100 m error, within 3% for best-fit model runs). TITAN2D best-fit results ( Figure 3A) simulate some material entering White's Ghaut, and VolcFlow best-fit results ( Figure 3C) simulate material that travels down the Tar River Valley to the east, and Mosquito Ghaut to the northwest, which was not observed in reality. TITAN2D fails to model any final thickness in the proximal area of the channel, but simulated flows did travel through this area. Conversely, the final time-step of VolcFlow unrealistically shows material on the slopes of the dome. VolcFlow final time-step thicknesses do a better job in replicating the morphology of the deposit front, with a more sharply defined, steeper margin; while TITAN2D final time-step thicknesses show more gradual tapering than was seen in the field. Deposit-derived TITAN2D input parameters underestimate the runout of the flow ( Figure 3B) by 380 m (12%), while deposit-derived VolcFlow input parameters severely underestimate runout by 1.43 km (45%). LAHARZ also replicates this flow well (runout within 100 m, or 3%), using the values for Montserrat PDCs ( Table 2) as calculated by Widiwijayanti et al. (2008) (Figure 3E), while globally-derived FlowDat input coefficients slightly overestimate runout ( Figure 3F) by 260 m (8%). The SHV-only ∆H/L cone estimates runout within 100 m (3%) (Figure 3G), while the globally-derived ∆H/L cone underestimates flow runout by 650 m (20%) (Figure 3H). Plots of maximum velocities from TITAN2D flux and pile sources and VolcFlow ( Figure 4A) are over 50 m·s −1 , whereas the highest observed flow front velocity was about 30 m·s −1 . Again, maximum velocities can be up to 60% higher than flow front velocities (Simpson, 1999), so this may not represent a serious mismatch in simulated velocity.

June 1997
Because the 25 June 1997 event was comprised of 3 distinct flow pulses, with known volumes, run-outs, velocities, and deposit footprints , TITAN2D simulations were performed using both 1 flux source and 3 different flux sources from the head of the valley using the known pulse volumes and initial velocities. However, the results did not show significant differences when varying the number of pulses, probably because the subsequent simulated flow pulses do not interact with previous simulated pulses in the way they do in reality, with the first pulses smoothing the topography as they deposit, and subsequent pulses bulking up as they traveled over and possibly entrained the earlier pulses. In fact, the 3 flux source simulation resulted in even more material escaping the channel and traveling north into uninvolved areas, due to the higher initial velocities used for the second and third flow pulses. For these reasons, only the single pulse source TITAN2D simulations were compared with VolcFlow. All models produced the runout of the 25 June 1997 event well (Figure 5, Table 4) (<240 m error, within 3% for best-fit model runs), except the TITAN2D flux source (Figures 5A,B), which underestimated the runout by 0.6 km (9%). The TITAN2D pile source performed better in this regard (replicated the exact runout), but for comparison with VolcFlow (which uses flux sources), only the TITAN2D flux source is displayed. For both TITAN2D and VolcFlow, the non-calibrated, deposit-derived input parameters produced the best-fit model runs (Figures 5A-D), thus an identical simulation is shown for the best-fit and deposit-derived simulations for each model. Both the LAHARZ best-fit and globally-derived model runs produced similar results (Figures 5E,F), capturing the inundated area and runout well (within 0 and 36 m or 0 and 0.5%, respectively). The ∆H/L cone produced from SHV-only data slightly overestimated runout ( Figure 5G) by 240 m (3%); while the ∆H/L cone derived from the complete global FlowDat dataset underestimated the flow runout ( Figure 5H) by 1.6 km (23%).
As the 25 June 1997 flow debouched from the valley, it spread out onto the plain and gradually sheared out, forming poorlydefined margins. TITAN2D final time-step thicknesses capture this gradual thinning better than VolcFlow, which retains the well-defined, steep snout seen only in smaller-volume flows on Montserrat. TITAN2D simulations of the 25 June 1997 event sent material toward Streatham and into the Belham Valley, recreating the deposit map of the surge-derived PDC. The surgederived flow is thought to have formed from rapid sedimentation from the ash-cloud surge as it traversed a break in slope and topographic barriers at the head of the Belham Valley Druitt et al., 2002), and thus had a very finegrained dense basal avalanche. While VolcFlow (Figures 5C,D) reproduced the runout within 30 m (0.4%) and the proximal to medial morphology well, it failed to recreate the lateral spreading and tapering of the flow in the distal reaches where the flow spilled out of the valley onto the plain. LAHARZ is also able to capture some of this spreading, as it continues to follow the stream network across the plain, filling a wider planimetric area as the channel shallows and widens. Similar to the 5 June 1997 results, the final time steps of TITAN2D fail to maintain thicknesses for the proximal 2 km of the flow, while VolcFlow unrealistically retains final time-step thicknesses on the slopes of the dome. Velocities (Figure 4B) of the TITAN2D pile source were very high, approaching 90 m·s −1 . However, seismic analysis of the flow front arrival times did point to relatively high velocities (∼75 m·s −1 ) for the second flow pulse (Calder, 1999), so this may not be entirely unrealistic. The TITAN2D flux source and the VolcFlow simulation both produce similar maximum velocity records (Figure 4), with long plateaus that indicate the approximate duration that the modeled flux was active.

January 2010
It is important to note that the very thin distal end of the actual 8 January 2010 deposit is somewhat ambiguous in origin, representing either a very sheared portion of the basal avalanche, a portion of ash-cloud surge that overran the main flow, or possibly a surge-derived flow. With this in mind, each model performs reasonably well (within 5%) at reproducing the runout of the main flow (Figure 6, Table 4). It is also of note that TITAN2D, VolcFlow, and the ∆H/L cones were set to simulate the entire 3.4 × 10 6 m 3 dense-basal avalanche component of the BAF, which entered multiple valleys; whereas LAHARZ was set to model only the main 1.3 × 10 6 m 3 portion of the flow in the Belham Valley. In order to start LAHARZ runs in each of the inundated valleys, we would need to know how the volume was partitioned between those valleys; however, only the total flow volume and the volume that entered the Belham Valley are known.  All best-fit model runs are able to replicate this flow within 280 m, or 5% (Figure 6 left column). TITAN2D and LAHARZ both perform poorly when only using depositderived or globally-derived variables (Figures 6B,F), greatly underestimating runout by 1.3 km (22%) and 1.6 km (26%), respectively, perhaps because this flow was unusual in nature due to its somewhat explosive initiation mechanism. The same is true for the globally-derived ∆H/L cone (Figure 6H), which underestimates runout by 1.4 km (22%). The SHV-only ∆H/L cone reproduces the runout well, however, within 90 m (1.5%). Non-calibrated deposit-derived input parameters provided the best-fit model run for VolcFlow (Figures 6C,D, which are identical) and replicated the PDC runout (within 80 m or 1.3%) and inundated area well.
Velocities for both TITAN2D and VolcFlow ( Figure 4C) are similar, with VolcFlow having slightly lower velocities. The observed velocities do not include the blast component, which was estimated at 70-100 m·s −1 . The detailed flow front map produced by Molle (2012) for the 8 January 2010 event allows for the extraction of TITAN2D and VolcFlow flow front velocities ( Figure 4C) by measuring the location of the simulated flow at the same time-steps recorded by Molle (2012). TITAN2D flow front velocities are higher than observed in proximal areas at early time-steps, but slightly lower than observed velocities in distal reaches and later time-steps ( Figure 4C). VolcFlow flow front velocities match remarkably well in proximal areas and early time-steps, but are higher than observed flow front velocities in some distal locations and later time-steps ( Figure 4C). These flow front velocities can also be compared to the maximum flow front velocities from each model. TITAN2D flow front velocities are about 42% of the maximum velocities, while VolcFlow flow front velocities are ∼47% of the maximum velocities ( Figure 4C), both of which are less than the 60% of maximum velocity recorded by Simpson (1999).

Other Selected Examples
While few PDC deposits have the associated level of recorded dynamical information (velocity, video, etc.) as the previous example events, it is useful to compare the models' ability to reproduce PDCs under different emplacement conditions. Figure 7A shows TITAN2D, Figure 7B VolcFlow, and Figure 7C LAHARZ model outputs of the 3 April 1996, 3 August 1997, the 21 September 1997 events. ∆H/L cone results are not compared here because they do not model flows down channels and therefore are not sensitive to these different emplacement conditions. TITAN2D and VolcFlow replicate the very small-volume 3 April 1996 event well (runout within 150 and 40 m or 9 and 2%, respectively), although the TITAN2D simulation exhibits more lateral spread than the actual flow. LAHARZ globally-derived coefficients greatly underestimate the runout of this flow by 450 m (28%), and to produce a better fit, the simulation must be run from a break in slope using coefficients calculated for largervolume flows ( Table 4). This PDC was moderately confined, but traveled down the steepest valley on Montserrat (Tar River Valley), making its planimetric area vs. volume ratio somewhat unusual compared to similar volume PDCs. Because the deposit map is also somewhat cruder than other maps, we consider the proximal lateral extent of the flow to be overestimated, perhaps contributing to this effect.
The 3 August 1997 PDC was channelized in the upper reaches of Fort Ghaut, was strongly deflected by St. George's Hill, and spread out in more distal areas to form a broad deposit. Both TITAN2D and VolcFlow reproduce the runout of this flow well (within 160 and 80 m or 3 and 1%, respectively), but both fail to completely capture the true amount of lateral spreading. VolcFlow captures the proximal spreading very well, whereas TITAN2D captures more of the deflection and spreading around St. George's Hill. VolcFlow, however, replicates the deposit front more accurately than TITAN2D. VolcFlow produces a moderate amount of material that travels to the south into the White River Valley, which was not observed for the actual event. LAHARZ is unable to capture any of the proximal spreading, as it can only infill the stream network. It overestimates the distal spreading, but replicates the flow runout within 90 m (2%).
The results for the 21 September 1997 PDC illustrate one of the major problems with using LAHARZ to model PDC inundation. While the empirical coefficients are often able to reproduce channelized flow runout reasonably well, LAHARZ cannot automatically diverge from a single stream network. In this case, LAHARZ (using globally-derived coefficients) greatly overestimates the runout of the 21 September 1997 event (by 1.3 km or 19%) because the entire modeled volume must travel down one valley, instead of two. Usually overspill into one stream network from another is handled manually in LAHARZ; that is, if a model run from one stream network intersects a previously unselected stream network, the user would start an additional model run down the intersected stream network. Volume partitioning can only be estimated in these cases, however. TITAN2D and VolcFlow do a better job of simulating this flow, although only TITAN2D is able to capture the full runout (within 180 m or 3%) and the lateral spreading as the flow exits the valley in the most distal areas. VolcFlow fails to reproduce the extent of the runout (by 560 m or 8%), as well as the distal spreading. TITAN2D, however, unrealistically sends a large amount of material to the north down Mosquito Ghaut.

Implications for Hazard Modeling
The results of this modeling are important for hazard assessment in a number of ways. They illustrate the importance of appropriate selection of both models and model inputs, and demonstrate the utility of deriving model input parameters from global databases, such as FlowDat (Ogburn, 2012(Ogburn, , 2014Ogburn et al., 2016).
This work has focused on retrospective modeling of known PDCs, where the best-fit model runs are "tuned, " that is, input parameters are adjusted until the results replicate real flows. The need to fit parameters is a weakness of many geophysical models and is especially problematic for forward modeling. For this reason, the models were also tested using non-calibrated, deposit-derived or globally-derived input parameters. To this end, Figure 8 compares the deposit-derived and best-fit input parameters for TITAN2D and VolcFlow, and provides the equations for calibrating deposit-derived input parameters.
The well-characterized relationship of ∆H/L with PDC volume makes it easy to estimate empirically-derived basal friction angles for given simulation volumes for TITAN2D forward modeling purposes ( Figure 8A). Empirically-derived ∆H/L values (converted to basal friction) can be calculated from either the entire global FlowDat dataset (Figure 8A, blue; Equation 2) or from the SHV-only subset (Figure 8A, red;  Equation 1). Deposit-derived and empirically-derived input parameters work especially well for smaller-volume flows, but for larger-volume flows, best-fit TITAN2D simulations required a lowering of the basal friction angle (Figure 8A). This is because TITAN2D generally tends to overestimate the spread of flows, often inundating ("splashing" into) uninvolved valleys. If too much of the simulated volume enters other valleys, the overall runout is reduced.
The CRS parameter for VolcFlow can be calculated from existing PDC deposits using data from FlowDat ( Figure 8C), but requires more calibration than TITAN2D to produce best-fit results ( Figure 8D). CRS has been shown to relate to flow thickness , which is difficult to estimate for forward modeling applications; and, in contrast to PDC volumes, statistical distributions of flow thicknesses are unknown. CRS must therefore be varied within a reasonable range of values (i.e., 5-50 kPA, Kelfoun et al., 2009) for forward modeling purposes Iverson (2003) and Sparks and Aspinall (2004) assert that the limitations and uncertainties involved with physicallybased models may make statistically calibrated models, such as LAHARZ or ∆H/L cones, an alternative and complimentary approach to multi-parameter physical models. However, for PDCs, the LAHARZ calibration coefficients for the planimetric and cross-sectional area-volume relationships are based on relatively few observations, and should ideally represent the total inundated areas rather than just associated deposit areas, as were used in Widiwijayanti et al. (2008). There are also uncertainties introduced when selecting the starting location for LAHARZ model runs (Table 4). LAHARZ cannot diverge from a single stream network, which is problematic from a forecasting perspective, although new ∆H/L cones worked reasonably well at recreating deposit runout, but only when using the SHV subset of the data. This is due to the fact the most PDCs in FlowDat are less mobile than those at SHV (see Figures 2, 8A), possibly because of channelization and topography (Ogburn, 2014;Ogburn et al., 2016). Thus, when the entire global dataset of PDCs is used, the runout of the more mobile SHV PDCs is underestimated. This reliance on local data poses challenges for forward modeling using ∆H/L cones or empirically-derived basal friction angles for TITAN2D at volcanoes with sparse data. However, Ogburn et al. (2016) investigated the use of hierarchical Bayesian models for ∆H/L estimation from PDC volume that allow one to "borrow strength" from the global record of PDCs while retaining characteristics of local data (namely the intercepts), and greatly reduce the uncertainty inherent in traditional linear regression models of sparse data.
Other workers have investigated the sensitivity of these models (included in Table 5) in detail Kelfoun et al., 2009;Charbonnier and Gertisser, 2012). Charbonnier and Gertisser (2012) showed that TITAN2D was most sensitive to initial collapse volume, starting coordinates, and basal friction angle while VolcFlow was most sensitive to initial collapse volume, CRS, and bulk density. For TITAN2D, small changes in input parameters affected larger-volume simulations more than smaller-volume simulations, while for VolcFlow, the opposite was true (Charbonnier and Gertisser, 2012).
However, when simulating PDCs over natural terrain (DEMs), the sensitivity analysis, as discussed above, is not only testing the sensitivity of the model itself to input parameters, but also the sensitivity of the topography of individual channels to input parameters. For example, when modeling the 25 June 1997 PDC, the runout increases as basal friction is lowered from 20 • to 8 • . However, when basal friction is lowered below 7 • , runout actually decreases, because the low basal friction leads to higher velocities in proximal areas, which causes material to spill out of the channel at the first channel bend. Subsequently, there is less volume in the relevant channel, and the runout decreases. Therefore, simulations in each valley would have different sensitivities to input parameters, which could prove useful for hazard analysis. The sensitivity analysis performed by other workers (e.g., Sheridan et al., 2005;Charbonnier and Gertisser, 2012), involved simulations in only one channel, and thus did not encounter this.
The models also differ in their sensitivity to the resolution and error of the DEM. Capra et al. (2011) andStefanescu et al. (2012) showed that the type of DEM (e.g., TOPSAR, ASTER, SRTM), the interpolation scheme, and the resolution of the DEM (e.g., 10 m vs. 90 m) had significant impacts on geophysical mass flow simulations. DEMs of 30 m resolution or better are usually sufficient for modeling purposes, though 10 m resolution or better is preferred (Capra et al., 2011). Very high resolution DEMs (10 m or better) can significantly increase simulation time, however. DEMs are also rarely accurate for long at active volcanoes as strong depositional and erosional processes 5 | Operational comparison of TITAN2D, VolcFlow, LAHARZ, and the ∆H/L cone model.

Model
Advantages Disadvantages TITAN2D (pile) • Input parameters obtainable from observations or empirically-derived mobility-volume relationships • Reasonable reproduction of inundation and velocity • Reproduces gradually thinning margins of larger-volume deposits • Small file size • Can be run using only open-source software, either online at Vhub.org or on Linux • Well documented and distributed • Sensitive to pile shape/volume, geometry, basal friction, DEM* • High velocities cause unrealistic inundation of interfluves • Doesn't reproduce well-defined margins of smaller-volume deposits very well • Slow (4-8 h) • For probabilistic hazard mapping, requires many model runs (10,000+ ideal) TITAN2D (flux) • Input parameters obtainable from observations or empirically-derived mobility-volume relationships • Reproduces inundation, thickness, and velocity very well • Reproduces gradual thinning of larger-volume deposits • Small file size • Can be run using only open-source software, either online at Vhub.org or on Linux • Some input parameters (velocity at head of valley) unavailable for many flows, but can be estimated • Sensitive to volume, flux, initial velocity, geometry, basal friction, DEM* • Slowest (8-24+ h) • For probabilistic hazard mapping, requires many model runs (10000+ ideal) VolcFlow • Reproduces inundation, thickness, and velocity very well • Reproduces well-defined margins of smaller-volume flows • Faster than TITAN2D (1-2 h) • CRS related to thickness, but poorly understood, and cannot be empirically estimated for forward modeling • Sensitive to volume, flux, CRS, bulk density, DEM † • Doesn't reproduce gradually thinning margins of larger-volume deposits very well • Extremely large total file size (up to 100 GB) • Requires MATLAB, Surfer • Less well-documented • For probabilistic hazard mapping, requires many model runs rapidly change the landscape. TITAN2D and VolcFlow are extremely sensitive to initial geometries and starting locations. Rapidly changing volcanic summits and domes thus pose a significant challenge for using geophysical models. LAHARZ is sensitive to channel geometry as it is essentially a valley-filling algorithm; DEM resolution greatly affects the cross-sectional areas of channels. However, because LAHARZ is started within valleys, it is easier to avoid issues with DEM errors at the summit. ∆H/L cones are perhaps the least sensitive to DEM issues, as the start height of the cone does not need to be located on the DEM surface; and, thus, the cone height can be easily adjusted to reflect the most recent measurement of dome height, for example.
Another concern for using these models for hazard mapping is the ease of use. The operational aspects of these models are summarized in Table 5. For example, TITAN2D is available as a Linux program, or through a web interface, and uses open-source software, while VolcFlow uses proprietary software. LAHARZ requires the proprietary ArcGIS software, but has a (soon to be public) online interface. ∆H/L cones can be produced using free standalone tools such as the Vhub Energy Cone tool (Palma, 2013) or within the LAHARZ toolbox. ∆H/L cones take seconds to produce, LAHARZ takes 5-20 min, while VolcFlow averages around 5 h, and TITAN2D pile sources take around 8-10 h using a single processor on a personal computer. Unfortunately, some TITAN2D flux sources can take a long time (24 h+) to complete, unless super-computing resources are available. The computing time per model run and the output file sizes are important considerations when using modeling for both scenario-based hazard assessment or implementing a probabilistic modeling scheme that samples input parameter space for forecasting purposes.
In light of these results, it is clear that LAHARZ should be used cautiously for PDC hazard mapping, perhaps as a very quick initial investigation, and with a range of input coefficients. TITAN2D could be considered as a maximum estimate of lateral spreading of the dense-basal avalanche and a good estimate of runout, especially if there are abundant local data on which to base the regression between basal friction (∆H/L) and volume (see Figures 2, 8A,B), or if a hierarchical Bayesian approach is used to improve upon global regressions (Ogburn et al., 2016). VolcFlow may be considered a minimum estimate of spreading and a good estimate of runout, provided many model runs are produced with different CRS values, as these cannot be estimated from volume. ∆H/L cones provide quick estimates of runout that, provided abundant local data exists from which to derive the inputs, match deposits as well as the other models (similar to the findings of Tierz et al., 2016 for Campi Flegrei and Vesuvius). Relying solely on global data, however, led to poor results for the mobile SHV PDCs. While ∆H/L cones cannot model flows down valleys, and thus potentially provide over-estimates of hazards on interfluves or in unlikely flow directions, common sense approaches exist to modify the cone into more reasonable maps of inundation. Finally, none of these models is able to explicitly capture the behavior of the accompanying ash-cloud surge, which is able to detach from the dense-basal avalanche and travel in unexpected directions, primarily at channel constrictions and bends (Ogburn, 2014;Ogburn et al., 2014). The overestimation of inundation on interfluves by ∆H/L cones may, however, provide a rough estimate of the total proximal hazard zone, including overriding ash-cloud surges in interfluves, ballistics, and other proximal hazards. Some simulation tools exist for modeling coupled dense-dilute flows using Lagrangian methods (e.g., Dobran et al., 1993;Takahashi and Tsujimoto, 2000;Todesco et al., 2002;Neri et al., 2003;Dartevelle, 2004;Esposti Ongaro et al., 2007;Valentine et al., 2011a), but these models are complex and too computationally expensive to use for operational hazard mapping purposes at this time.

Implications for PDC Behavior
In general, all the models reproduce smaller-volume flows better than larger-volume flows, but differ in their ability to reproduce specific deposit features in different emplacement environments ( Table 6). TITAN2D final time-step inundation and thickness maps are able to replicate the lateral spread (controlled largely by the internal friction angle) and gradual thinning of largervolume PDCs as they exit valleys onto broad plains. TITAN2D simulations require basal friction angles lower than those measured for materials themselves (typically ∼31 • for granular material; Bagnold, 1966), and instead correspond roughly to the ∆H/L of measured deposits (Figures 8A,B). Because of these low basal frictions, TITAN2D simulations often produce very high Same as pile sources.
Recreates steeper, well-defined snouts of smaller-volume flows. Does not recreate the distal, gradually thinning margins of larger-volume flows.
Only models inundation, not flow thickness. Margins of modeled flows display unnatural shaped margins.

See above
Frontiers in Earth Science | www.frontiersin.org initial velocities when sourced from the steep topography of the dome summit, especially when using a pile source, which adds to the initial height. As a result, the final time-steps of TITAN2D fail to retain material on higher slopes, whereas thin deposits are often recorded in these areas. Flux sources positioned on the dome summit or the heads of valleys with a slower release of material remain more channelized. TITAN2D recreates smallervolume, channelized flows very well, but can underestimate the runout of larger-volume flows by sending material into valleys that were uninvolved in reality, thus reducing the volume available in the channel of interest. Charbonnier and Gertisser (2012) and Charbonnier et al. (2013) were not able to reproduce deposits using a single basal friction angle, but the use of a varying bed friction was able to reproduce flows at Merapi. While TITAN2D velocities were also found to overestimate those calculated from super-elevation measurements, TITAN2D reproduced the overall inundation area and runout well, although it could not produce the steep, narrow flow front of the smallervolume 2006 Merapi BAFs. TITAN2D was, however, able to simulate the four areas where the dense-basal avalanche escaped the main channel and traveled down adjacent ones, a process of extreme concern for hazard mapping at Merapi (Charbonnier and Gertisser, 2012). The use of a variable bed friction was not necessary to produce flows in this work perhaps because Montserrat has less-extreme slope variations than Merapi, or because the deep channels on Montserrat prevent even very highvelocity simulations from escaping channels. Ogburn (2014) and Ogburn et al. (2014Ogburn et al. ( , 2016 explored the effects of channel geometry on PDC mobility, and showed that Merapi flows were generally less channelized than those on Montserrat, which may exacerbate the effects of overly-high TITAN2D simulated velocities. VolcFlow final time-step inundation and thickness maps tend to exhibit margins that are steeper and more defined, and the model recreates smaller-volume flows better than larger-volume ones. Steep, well-defined margins are more common for smallervolume or valley-confined flows on Montserrat . These steep, defined margins in VolcFlow thickness maps are generated by the CRS rheology, in that only flows thicker than a certain threshold continue to move, while those thinner than this threshold come to an abrupt stop ). Both Kelfoun et al. (2009) and Charbonnier and Gertisser (2012) noted that VolcFlow reproduced the margins of PDC deposits better than TITAN2D, but they were simulating flows that remained largely channelized. Charbonnier and Gertisser (2012) also found that VolcFlow did not simulate the overbanking of the large volume 2006 Merapi PDCs as well as TITAN2D. However, VolcFlow performed better than TITAN2D in replicating the larger-volume 2010 Merapi PDCs . Simulations of larger-volume flows using VolcFlow tend to reproduce the runout, but not the expansive distal thinning and spreading seen for larger-volume flows on Montserrat. Kelfoun and Druitt (2005) note that VolcFlow does not necessarily describe an alternative rheology to the frictional model employed by TITAN2D and other frictional geophysical mass flow models. Instead, the constant retarding shear stress is inferred to represent an average, approximately constant value of retarding stresses of all types that individually can vary through time and space, although the behavior may remain frictional.
In essence, all these models are averaging a number of variables that are actually highly unsteady and nonuniform in nature, especially in sustained, larger-volume flows with significant ash-cloud surge components. Smaller-volume, discrete PDCs may, therefore, be more readily replicated by models that average aspects of flow behavior. This unsteadiness and non-uniformity results from a number of factors including: 1. The variability from the source through time. Dome-collapse PDCs, especially if larger in volume, are often produced by the retrogressive collapse of pressurized domes. Initial pulses of material are often smaller in volume and consist of largely degassed dome carapace, with successive pulses resulting from the excavation of the pressurized dome interior Woods et al., 2002). This can result in the explosive fragmentation, or auto-brecciation, of particles, generating large amounts of ash and gas, and has been proposed as a major factor in the generation of large ash-cloud surges (Sato et al., 1992;Fink and Kieffer, 1993;Woods et al., 2002). Autobrecciation has been observed directly at Montserrat Woods et al., 2002), Unzen (Sato et al., 1992), and Mt. St. Helens (Mellors et al., 1988). 2. The interaction of the dense-basal undercurrent with the overriding ash-cloud surge. Segregation processes within the flow mean that flows become density stratified, with a highconcentration, dense-basal undercurrent and a dilute ashcloud surge (Denlinger, 1987;Valentine, 1987;Burgisser and Bergantz, 2002). The mass partitioning between these layers may change in space and time (Fisher and Heiken, 1982;Denlinger, 1987;Fisher, 1995;Takahashi and Tsujimoto, 2000;Burgisser and Bergantz, 2002;Doyle et al., 2010). Indeed, recent large-scale experimental work by Breard et al. (2016) and Breard and Lube (2017) identified the existence of an intermediate zone between the dense and dilute layers that contains mesoscale turbulence structures, which are known to reduce locally drag and counter aerodynamic resistance, leading to a high particle settling rate and considerable mass flux from the ash-cloud to the dense undercurrent. This kinematic coupling between layers causes the flow to enter a state of "forced super-criticality, " but with reduced entrainment, sedimentation, and dilution, and is thus proposed be a major contributor to the high mobility of granular PDCs (Breard and Lube, 2017). 3. The interaction of the PDC with topography. Largervolume flows are more likely to be affected by topographic constrictions and to surmount barriers, and can also overwhelm topography. Topography can induce changes in the elutriation, entrapment, ingestion of air, and comminution of ash (Branney and Kokelaar, 2002;Dufek and Manga, 2008;Doronzo et al., 2012). Charbonnier and Gertisser (2012) justified the use of a spatially varying basal friction for TITAN2D simulations of the 2006 Merapi PDCs because of field data showing a decrease in median diameter and increase in ash content with larger-volume PDCs and longer runout distances, indicating increased effects of fragmentation and fluidization during emplacement (Charbonnier and Gertisser, 2012). These observed changes in deposit granulometry were closely linked with abrupt changes in the topography, including slope breaks, and changes in channel cross-sectional area and morphology (Charbonnier and Gertisser, 2012). Stinton (2007) posited that channel constrictions can help keep flows energized by increasing the frequency of grain collisions as grain concentration increases. 4. Erosion and sedimentation. Larger-volume flows on Montserrat are highly erosive, and result in both flow bulking and the lowering of flow temperature . The 25 June 1997 flow traveled over areas not inundated by the eruption to date, and eroded into the hydrothermally altered prehistoric deposits . The 21 September 1997 flow eroded and incorporated pumice from the August 1997 column-collapse flows, trees, and material from houses. Larger-volume flows remained erosive on slopes as low as 6-8 • . Other workers have also investigated sedimentological differences between PDCs of different sizes (e.g., Charbonnier and Gertisser, 2011;Doronzo, 2012). 5. The polydispersity of grain sizes, shapes, and densities. Clasts are supported differently within a flow, which leads to segregation and stratification of flows, and thus variations in flow rheology. Grain size variations also affect sedimentation and erosion. The polydispersity of PDCs has also been shown to greatly affect runout (Phillips et al., 2006). Calder et al. (2002) identified a number of flow properties that differed between passively generated (gravitational collapse of the degassed dome carapace) and more actively generated (collapse of the growing lava dome) PDCs. Similarly, Charbonnier and Gertisser (2011) classified PDCs at Merapi into smallmedium runout BAFs and long-runout BAFs. Combining these classifications results in Table 7: Larger-volume flows are characterized by factors that promote high degrees of unsteadiness, including: sustained, actively generated collapse sources often lasting tens of minutes to hours and high velocity, highly erosive fine ash-rich flows that are often unconfined and accompanied by large ash-cloud surges. This may indicate why TITAN2D and VolcFlow tend to simulate smaller-volume flows more readily than larger-volume flows, as smaller-volume, discrete, PDCs may be easier to replicate with models that utilize flow-average behavior through time and space. While both models can simulate pulses as a series of fluxes through time, they cannot capture the effects of increasing excavation of pressurized dome material, the smoothing effects of initial pulses on the topography, or the erosion and incorporation of previous pulses by subsequent ones during a sustained collapse event, which can last for hours . TITAN2D simulations using multiple fluxes do not differ in runout from those using single fluxes. Charbonnier et al. (2013) showed similar results using both TITAN2D and VolcFlow.
Additionally, TITAN2D and VolcFlow are single-phase and depth-averaged, and thus only simulate the dense-basal undercurrent. None of the interplay between dense-basal undercurrent and ash-cloud surge (e.g., Breard et al., 2016;Breard and Lube, 2017) can be captured. Creating coupled two-phase flow models with an ash-cloud surge component and simple sedimentation laws is an active area of development for both TITAN2D (Pitman et al., 2013;Sheridan et al., 2013) and VolcFlow (Kelfoun, 2017).
It is well-known that mass flows such as PDCs and debris flows do not behave as simple materials with fixed rheologies (e.g., Iverson and Vallance, 2001;Dartevelle, 2004;Bursik et al., 2005;Iverson and George, 2012;Sulpizio et al., 2014;Dufek, 2016). Instead, the rheology of the flows changes through space and time due to many factors that contribute to the unsteadiness of the flows, as listed previously. Iverson and George (2012) propose that Coulomb friction forces may vary with the pore-fluid pressure and granular temperature (a measure of granular vibration), which would better capture the complex rheologies of these flows. The resultant numerical model has shown success in replicating both experimental and observed debris flows without calibration (George and Iverson, 2011;Iverson and George, 2012) but has not yet been tested for PDCs.
While the models compared herein have been tested against natural examples from SHV, Montserrat, and previously against small bench-top experiments (Kelfoun and Druitt, 2005;Patra et al., 2005), it may also be pertinent to test them against other comprehensive field-based datasets at other volcanoes. Ogburn et al. (2016) showed that differences in mobility of PDCs at different volcanoes may be due to differences in channelization of flows, which could, in turn, affect model selection. It would also be interesting to compare these models against more controlled and measured, large-scale experimental results using natural materials (Iverson, 2003). Large-scale granular flow experiments at the PELE experimental apparatus (Lube et al., 2015) have revealed important insights into the dynamic behavior of PDCs (Lube et al., 2015;Breard et al., 2016;Breard and Lube, 2017); and alongside other proposed endeavors (e.g. Valentine et al., 2011b) could provide an important benchmarking case for geophysical mass flow models. With more benchmarking studies, PDC modeling best-practices for hazard assessment purposes could be developed by the PDC modeling community and could provide an important resource for using these models operationally.

CONCLUSIONS
The geophysical and empirical models investigated here are able to reproduce the inundated area and runout of smaller-and larger-volume, confined and unconfined dense-basal avalanche components of PDCs with varying degrees of accuracy.
• TITAN2D and VolcFlow are both able to recreate a variety of PDCs, although they perform better when simulating discrete, single pulse, smaller-volume flows. • Larger-volume flows are often sustained, pulsatory and thus highly unsteady, and density stratified, and the simple rheological models of TITAN2D and VolcFlow are not able to recreate all features of these flows.  Calder et al. (2002) and Charbonnier and Gertisser (2011).

COLLAPSE MECHANISMS
Volume <1-2 × 10 6 m 3 >1-2 × 10 6 m 3 Generation mechanism Discrete collapses, passively generated Sustained collapses with multiple pulses, actively generated • Empirically-based LAHARZ and ∆H/L cones can be quick, first-approximations of flow runout, provided a database of similar flows, e.g., FlowDat, is available to properly calculate ∆H/L or the coefficients for planimetric and cross-sectional area. • For TITAN2D and VolcFlow, field measurements of real PDCs can provide estimates for input parameters, with the best-fit model runs only requiring a slight and systematic calibration of input parameters (Figure 8), and thus avoiding the necessity of using completely fitted parameters, a weakness of many geophysical models (Iverson, 2003). For many PDCs, the deposit-derived model run results were roughly comparable to the best-fit results. • For forward modeling purposes, TITAN2D basal friction can be estimated from the empirical relationship between the basal friction angle (∆H/L) and volume (Figures 2, 8). However, VolcFlow CRS cannot be empirically-estimated, and instead must be varied within a reasonable range (5-50 kPA, Kelfoun et al., 2009). • Operational aspects of these models should also be taken into account when deciding on a choice of models for use in hazard mapping (e.g., Table 5). For example, geophysical models like TITAN2D and VolcFlow can be useful for producing both scenario-based or probabilistic hazard maps, but must be run many times (indeed, many thousands of times for true probabilistic maps) with varying input parameters. LAHARZ and ∆H/L cones can be used to produce simple modelingbased hazard maps when run with a variety of input volumes, but do not explicitly consider the probability of occurrence of different volumes.

AUTHOR CONTRIBUTIONS
SO performed the modeling, analyzed and interpreted the data, and drafted the work. SO and EC designed the work, acquired the data, revised the work, and approved of the final version.

FUNDING
SO and EC were supported by NSF grants EAR-0809543 and NSF DMS-0757367 during the completion of this research. EC was additionally supported by NERC RiftVolc NE/L013932/1. SO was supported by the USGS/USAID Volcano Disaster Assistance Program during the revision and submission of this work.