Perspectives on Open Access High Resolution Digital Elevation Models to Produce Global Flood Hazard Layers

Global flood hazard models have recently become a reality thanks to the release of open access global digital elevation models, the development of simplified and highly efficient flow algorithms, and the steady increase in computational power. In this commentary we argue that although the availability of open access global terrain data has been critical in enabling the development of such models, the relatively poor resolution and precision of these data now limit significantly our ability to estimate flood inundation and risk for the majority of the planet’s surface. The difficulty of deriving an accurate ‘bare-earth’ terrain model due to the interaction of vegetation and urban structures with the satellite-based remote sensors means that global terrain data are often poorest in the areas where people, property (and thus vulnerability) are most concentrated. Furthermore, the current generation of open access global terrain models are over a decade old and many large floodplains, particularly those in developing countries, have undergone significant change in this time. There is therefore a pressing need for a new generation of high resolution and high vertical precision open access global digital elevation models to allow significantly improved global flood hazard models to be developed.

Around the turn of the millennium, high quality two dimensional hydraulic models capable of simulating the dynamics of flood inundation became a reality at the reach scale as a result of faster computers, improved algorithms (Bates and De Roo, 2000;Bradford and Sanders, 2002;Bradbrook et al., 2004), and new forms of rapidly-collected remotely sensed digital elevation models (DEMs; Marks and Bates, 2000;Cobby et al., 2001;Bates et al., 2003;Bates, 2004;Sanders, 2007). Of particular value to hydraulic modelers in developed countries was the commencement of routine LIDAR collection due to its high horizontal and vertical precision and accuracy, its ability to penetrate vegetation cover and its reduced susceptibility to scatter and shadowing relative to other forms of remotely sensed elevation data such as Interferometric Synthetic Aperture Radar (InSAR; Bates, 2004). These three key properties made it ideally suited to the creation of "bare-earth" Digital Terrain Models (DTMs), a type of DEM in which surface features such as vegetation and built structures are removed to leave, as the name suggests, a three dimensional representation of the bare-earth surface. Such data are ideally suited for the purposes of flood hazard simulation using hydraulic models, and form the basic datasets from which developed world flood hazard layers, such as the Federal Emergency Management Agency (FEMA) flood maps in the USA, and the Environment Agency Flood Maps in the UK, are produced.
Whilst there have been significant advances in the models and data available for relatively small scale modeling of flood inundation where high quality terrain data exist, the computational and data costs associated with such models tends to restrict their application to populated areas in wealthier nations. Furthermore, due to the potential impact on property prices and local economies, local or national authorities may be reluctant to release the results of such models even where they do exist. However, flood risk is very clearly a global problem and, consequentially, a number of research, and commercial groups are currently working on the development of flood hazard models at the global scale (Hallegatte et al., 2013;Hirabayashi et al., 2013;Winsemius et al., 2013;Sampson et al., 2015;Ward et al., 2015). Projections of rapidly escalating economic losses due to flooding (Hallegatte et al., 2013) (Lamb et al., 2009;Neal et al., 2010), algorithmic improvements , and emerging global datasets (Elvidge et al., 2007;Jarvis et al., 2008;Lehner et al., 2008;Andreadis et al., 2013;Yamazaki et al., 2014;Smith et al., 2015). The data challenges are particularly onerous because, whereas at the reach scale most of the required "secondary" spatial data other than the DEM (such as river locations, channel geometries, and flood defenses) can viably be obtained using manual survey or are contained in the data produced by national mapping agencies, at the global scale all such data must be derived in an automated or semi-automated manner from remotely sensed data. The DEM is the core dataset from which many of these secondary datasets are derived and, as we argue in this perspective, it is the limited quality of the present generation of global DEMs that presents the greatest challenge to flood inundation modelers today.
Although a number of free and commercial global DEMs exist, two in particular have received the majority of attention from flood modelers: the Shuttle Radar Topography Mission (SRTM; Rabus et al., 2003;Farr et al., 2007) DEM and the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER; Abrams, 2000) DEM, and their respective derivatives Fujisada et al., 2012;Kobrick, 2013). These data sets are popular because they are open access and offer greater levels of detail than the previous generation of open access DEMs [such as ACE GDEM (Berry et al., 2000), GLOBE and GTOPO30] due to their greatly increased resolutions. For example, ASTER and SRTM have ground spatial resolutions of 1 arc-seconds (∼30 m at the equator, respectively, compared to ∼30 arc-seconds (∼1 km) for the previous generation DEMs. A number of studies (Hirt et al., 2010;Jing et al., 2013;Rexer and Hirt, 2014;Jarihani et al., 2015) have compared the SRTM and ASTER DEMs across a range of locations globally to assess their applicability to hydraulic models (e.g., Sanders, 2007), and despite its lower nominal resolution it is SRTM-particularly the void-filled CGIAR-CSI version 4 variant )-that has emerged as the favored choice. This is due to SRTM's greater feature resolution, reduced number of artifacts and lower noise than ASTER, particularly in the flatter areas of concern to flood modelers (Jing et al., 2013;Rexer and Hirt, 2014). The prohibitive cost and restricted rights associated with commercial DEMs (such as the Intermap Nextmap R World 10 ™ and World 30 ™ , and Airbus WorldDEM ™ , data sets) restricts significantly the application of such products. This results in limited (or no) public and independent validation of commercial DEMs, a lack of independent studies comparing them to other DEMs, and a lack of the types of derived datasets, such as global hydrography data, that have emerged from their open access counter-parts.
User generated "secondary" datasets derived from global topography offer a valuable resource for a range of activities and can been directly attributed to the production of open access global DEMs. From a flood modeling perspective perhaps the most valuable example is the Hydrosheds global hydrography dataset (Lehner et al., 2008). This dataset was produced by executing a number of hydrology-based GIS operations over a suitably void-filled SRTM dataset, and contains layers such as flow direction maps, river networks (with upstream accumulation areas) and catchment masks. The Hydrosheds data has been used as the basis for a number of large scale hydrology and river routing models (Gong et al., 2011;Wood et al., 2011;Alfieri et al., 2013;Schumann et al., 2013;Yamazaki et al., 2013;Sampson et al., 2015) because, in conjunction with the SRTM DEM, it provides a framework within which hydraulic model structures can be assembled. The availability of such datasets reduces significantly the total workload for groups attempting to construct global models, making previously intractable problems manageable for the first time and allowing developers the time to focus on other critical aspects such as efficient numerical schemes and automation.
However, significant as these achievements may be, the current generation of global DEMs have serious limitations that heavily restrict the skill of models developed around them. Taking SRTM as an example, the critical limitations of the dataset are: (a) poor vertical accuracy due to noise or "speckle" (Rodriguez et al., 2006); (b) the difficulty in obtaining a bare-earth DTM due to radar reflection from the top of the vegetation canopy; (c) the inability to resolve street-scale features in urban areas, resulting in large positive elevation biases in urban areas; (d) other systematic errors, such as "striping, " that are a result of the pitch and yaw of the spacecraft during the data collection phase (Rodriguez et al., 2006); and (e) the inability of SRTM to resolve the bathymetry of water bodies due to radar reflection from the water surface. These limitations have a highly detrimental effect on both derived hydrography datasets and the simulated flow dynamics of flood hazard models. There is a tendency amongst many users to fixate on the nominal horizontal resolution of DEMs, but for flood modeling it is the vertical accuracy and precision that is critical. This is because the dominant control on the flow of water in a hydraulic model, as in the real world, is the change in elevation of the topography; after all, it is gravity that moves water downslope. The five critical limitations of the SRTM dataset outlined above all concern the vertical accuracy and precision of the DEM, and all can affect simulated flow dynamics adversely. Vertical noise within a DEM will fundamentally affect the propagation of a floodwave because pixels will serve as blockages or sinks. Where noise is random, it can be reduced by resampling the DEM grid to a coarser resolution as the positive and negatively biased pixels cancel when aggregated onto the larger grid. This approach reduces noise, but also reduces the resolution of the DEM and limits its ability to represent small scale features. More challenging still are the elevation biases imparted by vegetation and urban areas. Such biases can be 10s of meters and, if left uncorrected, forests, and urban areas act as walls or islands that block the flow of water across a floodplain and (erroneously) never flood themselves within the model. As many flood hazard models are used to help assess flood risk, a model that identifies urban areas as always being safe is of little value. Finally, the systematic "striping" caused by the pitch and yaw of the Space Shuttle itself create false wave like artifacts on the DEM that can corrupt the modeled flow of water across the DEM. It also needs to be noted that SRTM is now quite old (the data were collected in February 2000) and many of the world's floodplains have undergone dramatic change since, mostly because of human development. This is particularly true in developing countries, and there is an increasingly pressing need for a new global topographic mapping mission producing open data.
The effects of systematic elevation errors on derived hydrography datasets are equally severe. When a flow direction map is calculated from the DEM, erroneously elevated surfaces caused by areas of vegetation or urbanization cause errors in the calculated flow directions. This in turn leads to incorrect flow accumulation calculations and stream network locations. The effect can be severe in the case of large forests and cities, leading to grossly misplaced river channels and even missing or invented connections between channels and resultant errors in catchment delineation. The most obvious of these errors can be rectified by painstaking manual editing, as was done for the Hydrosheds dataset (Lehner et al., 2008), but many errors remain that can be hard to identify in a systematic manner. These errors impart structural errors on models that rely upon them for their construction, compounding the DEM-induced errors in flow dynamics discussed above.
The errors discussed above have such a marked effect on flood hazard simulations that it has been necessary for practitioners to develop methods that attempt to reduce their severity. One example of this involves attempts to remove vegetation bias from SRTM to produce a bare-earth DTM in forested areas (Baugh et al., 2013). This poses a substantial challenge as the necessary data content is not present in the SRTM data itself, meaning that other datasets are required to quantify the height and location of the vegetation (Simard et al., 2011). Furthermore, because the extent to which the radar pulse penetrates the canopy depends on the density of the vegetation (it is not sufficient to assume the return is always from the top of the canopy), a spatial measure of vegetation density is required. Finally, elevation control points (e.g., ICESat laser altimeter data) are necessary for calibration and validation of the algorithm. Such algorithms can offer significant improvement, as demonstrated in Figure 1. However, their effectiveness is limited by the accuracy and precision of the vegetation datasets, which are themselves uncertain, and nonnegligible residual errors in the resultant bare-earth DTM are unavoidable; examples of such errors are provided in Figure 2 below. Figure 1 shows reduced vertical error following the systematic removal of vegetation bias by comparing corrected and uncorrected SRTM DEMs to a high precision bare-earth DTM produced using 1 m aerial LIDAR data resampled to SRTM resolution. The algorithm employs satellite vegetation height and density datasets (Schwarz et al., 2004;Simard et al., 2011) that estimate vegetation location, height and density to produce an estimated bias layer which is then removed from the SRTM DEM and yields a change in bias from 15.8 to -0.1 m. . Figure 2 demonstrates the effect of this correction on a simulation of a category five storm surge event along the Belize coast. In the uncorrected DEM, the vegetation acts as a virtual "sea wall, " preventing the surge waters from penetrating inland to flood areas known to be at risk such as the Belizean coastal mangroves. With the vegetation removed, the coastal wetlands flood, providing a far more plausible realization of the inundation that one would expect for an event of this magnitude. However, while the improvement is obvious, the transects in Figure 1 shows that significant differences still exist between the corrected SRTM DEM and the LIDAR-derived DEM at the local scale due to limitations in the correction method. One key limitation is the resolution of the vegetation datasets (∼1 km for the vegetation heights and ∼250 m for the vegetation density). The yellow circles in Figure 2 show areas where the vegetation removal tool failed to resolve and remove ∼100 m wide strips of mangroves from the SRTM DEM. While the overall removal still allowed water behind the mangrove "wall, " this is an example of typical residual vegetation artifacts. It is also known that most of Belize City should be flooded (Belize government engineers and planners, personal communication); however, dry areas remain due to the residual urban artifacts even after the urban filter is applied to the SRTM DEM (purple circle in Figure 2).
There is therefore a clear need for an improved open-access global DEM for global flood hazard modeling. The value of high resolution terrain data with good vertical precision has long been recognized at the local scale by the hydraulic modeling community (Marks and Bates, 2000;Horritt and Bates, 2001;Bates et al., 2003;Lane, 2006, 2011;Fewtrell et al., 2008), and the benefits for global scale models may be even greater. This is because reach scale models often rely upon manual correction of the DEM using secondary data sources such as surveyed river cross sections; such corrections are not possible on a systematic basis at the global scale because suitable secondary data does not exist for most rivers, and because the scale of the task would render it unfeasible. The DEM is therefore the only source of data used to determine river locations and river bank elevations for most locations within a global model, and it is reasonable to expect any improvements to this dataset to yield substantial improvements to model performance. For example, the representation of flood defenses within flood hazard models is known to be critically important (te Linde et al., 2011;Brandimarte and Di Baldassarre, 2012;Wesselink et al., 2013), but current large scale models are either forced to assume total failure of defenses, or adopt heavily simplified approaches such as masking off urban areas for event scales below a "defense standard" inferred from socioeconomic data (Feyen et al., 2012) A global DEM of increased horizontal resolution and vertical precision would offer improved representation of micro-topography; if the quality is able to approach that of an aerial LIDAR DEM (>5 m spatial resolution and 1 m vertical precision), features such as large river levees could be resolved directly. This would lead to the explicit representation of major defense features in large scale models, allowing an improved representation of the flood hazard in protected areas. As even the finest aerial LIDAR DEMs fail to completely capture smaller defense features such as narrow defense walls it is unlikely that any foreseeable global DEM could capture all of the detail necessary for ultra-fine models (Gallien et al., 2014). However, a high quality global DEM could act as a "base layer" onto which local detail (potentially collected through crowd-sourced platforms such as OpenStreetMap) could be added. Defenses are not the only consideration either, as previous studies have shown a step change in model skill for urban areas when the DEM becomes able to resolve individual streets due to correct representation of floodplain connectivity (Fewtrell et al., 2008). A final topic that should be mentioned is cost. According to the Sampson et al. (2015) model, the African 1 in 100 year floodplain covers ∼7% of the continental area. Scaled to the globe, this gives an approximate 1 in 100 year floodplain area of 35 million km 2 . Assuming some economies of scale, a collection cost of $200 per km 2 is plausible and yields a global cost estimate of approximately $7 billion. As the benefit of the highest resolution data would be most strongly felt in cities, which constitute <0.5% of the Earth's land area (Schneider et al., 2009) but a much larger proportion of the flood risk, one way to significantly reduce the cost of producing such a DEM would be to adopt a hybrid resolution approach where the highest resolution data are collected in urban areas and a lower resolution adopted for rural areas. However, in the context of future annual flood loss estimates that exceed a trillion dollars (Hallegatte et al., 2013), the cost of collecting a high quality global DEM may be justifiable on the basis of its applicability to flood risk modeling alone.
To conclude, high accuracy and precision DEM data are critical for skillful flood hazard modeling and the limitations with current open access DEM data sets limit significantly our ability to estimate flood inundation and risk for the majority of the planet's surface. There is a clear need (c.f. Schumann et al., 2014) for a concerted global effort to collect or collate a new open access DEM with ∼10 m resolution and sub-meter scale vertical accuracy for use in a variety of applications. Flood modeling is one such task, but better global DEM data would have wide value for governments, humanitarian organizations, NGOs and industry.