Toward Urban Water Security: Broadening the Use of Machine Learning Methods for Mitigating Urban Water Hazards

Due to the complex interactions of human activity and the hydrological cycle, achieving urban water security requires comprehensive planning processes that address urban water hazards using a holistic approach. However, the effective implementation of such an approach requires the collection and curation of large amounts of disparate data, and reliable methods for modeling processes that may be co-evolutionary yet traditionally represented in non-integrable ways. In recent decades, many hydrological studies have utilized advanced machine learning and information technologies to approximate and predict physical processes, yet none have synthesized these methods into a comprehensive urban water security plan. In this paper, we review ways in which advanced machine learning techniques have been applied to specific aspects of the hydrological cycle and discuss their potential applications for addressing challenges in mitigating multiple water hazards over urban areas. We also describe a vision that integrates these machine learning applications into a comprehensive watershed-to-community planning workflow for smart-cities management of urban water resources.


INTRODUCTION
A recent United Nations report projects that 60% of the world's total population will live in cities by the year 2030 (U.N., 2018). This highly-urbanized population will face vulnerability to waterrelated hazards in many ways. For example, the combined effect of natural changes and human intervention on the landscape can lead to flooding, drought, and morphologic instabilities (e.g., stream erosion and instability, erosion, and sedimentation at structures) in and around urban areas, as well as deterioration of water quality, riverine ecology, and natural habitats (Crossman et al., 2013;Krajewski et al., 2016). Because of the accelerated pace of anthropogenic activity, hazard frequency, and intensity is exacerbated requiring immediate delivery of science-based solutions for mitigation, resilience, and adaptation that can be quickly deployed in any hazard-prone area. Mitigating these urban water hazards is challenging for watershed management and the urban planning community (Eriksson et al., 2015) due to the following hydro-complexities. First, these hazards exist in a variety of forms (e.g., floods, droughts, increased soil erosion, and water pollution) and are associated with multiple urban risks (e.g., property inundation and infrastructure failure, water shortage, landslide, and eco-habitat deterioration) (Carson et al., 2018). Second, these urban water hazards may occur separately or in a multi-hazard chain (Kappes et al., 2012;Komendantova et al., 2014), in which the occurrence of one hazard (e.g., urban flooding) may trigger another hazard (e.g., bank erosion and landslide). Third, the occurrences of different urban water hazards are connected through the flow of the water and watershed processes over a range of spatial scales (Souchère et al., 2010;Santelmann et al., 2019), pressing the need for multiscale mitigation strategies that target hazard drivers at both watershed and urban neighborhood scale (Bertolotto et al., 2007;Xu et al., 2019b).
Given these challenges, a holistic approach to water security is articulated by Ait Kadi and Arriens (2012), as one that produces a world in which each community has access to enough water for social and economic development, and for ecosystems in and beyond those communities; and where those communities are protected from floods, droughts, landslides, erosion, and waterborne diseases (Carson et al., 2018;Aboelnga et al., 2019). Additionally, ensuring urban water security is a complex endeavor, as it involves dynamic processes and requires the interaction and participation of multiple planning actors (stakeholders, resource managers, and policy makers) to safeguard the integrity and security of urban water systems and assets in a continuous, physical, and legal manner. Subsequently, these actors must formulate policies and make investments using robust, adaptive, and accessible strategies that balance the socioeconomic and ecological benefits and urban sustainability with the cost of mitigation measures and management practices, and increase the resilience and preparedness of urban communities against extreme weather and natural disasters (Medema et al., 2014;Carson et al., 2018).
Fundamentally, these methods must have the capability of identifying and assessing the risk of multiple interconnected urban water hazards simultaneously (Kappes et al., 2012;Komendantova et al., 2014). Further, these methods must include system-based techniques for providing generalized predictions and acquiring unseen data in order to obtain reliable and accurate depictions of both current and future states of water resources in both urban areas and their associated watersheds. The projections and updates provided through these techniques must be easy to interpret and to understand, so that researchers, decision makers, and communities can readily obtain useful insights that support the planning of urban water resources, including the mitigation of existing hazards and the prevention of future hazards (Carson et al., 2018;Zaidi et al., 2018).
To fulfill these management needs, comprehensive disaster management frameworks are proposed to promote the collaborative planning and management of water, land, and related resources (Selin and Chevez, 1995;Emerson et al., 2012). These frameworks are developed to reduce the risk of multiple water hazards equitably without compromising the sustainability of vital ecosystems. Examples of these frameworks include Integrated Water Resources Management (IWRM), Adaptive Management (AM), and the Ecosystem Approach (EA) (Cardwell et al., 2009;Dörendahl, 2013;Palmer et al., 2013;Carson et al., 2018). In general, these frameworks entail a series of planning processes that can be categorized into four major stages Sun and Scanlon, 2019): 1. Long-term planning and mitigation 2. Early warning and prediction of hazards 3. Rapid response and rescue 4. Recovery and restoration.
Within the long-term planning and mitigation stage, we summarize here a list of common planning processes from several planning frameworks (Yoe and Orth, 1996;NRCS, 2003;USEPA, 2012), and we address machine learning (ML) methods for application to these processes throughout the paper. These steps are as follows: 1. Identification and assessment of multi-hazard risk in urban water systems. 2. Determination of the objectives of urban water planning and hazard mitigation. 3. Inventory of useful data resources that can define urban water hazards and risks, indicate the performances of existing urban water systems, and reflect the current state of the urban water system and the watershed to which it pertains. 4. Identification, evaluation, and selection of Best Management Practices (BMPs) from a variety of planning alternatives for water quality improvement, stormwater management, and erosion controls (NRCS, 2011;USEPA, 2018). 5. Evaluation of the performance and effectiveness of the implemented plan by examining information and monitoring data collected from pilot studies. 6. Identification, evaluation, and selection of proposed modifications for ongoing or existing plans and implementation schedules based on the future scenarios of urban water.
Despite the usefulness of these planning directives, the implementation of these processes is sophisticated and faces both methodological and technical challenges. Methodological challenges are associated with the long-term planning and mitigation processes and include: (a) assessing the multihazard risk and vulnerability of a municipal water system (Kappes et al., 2012;Jetten et al., 2014;Lambert, 2014), and (b) optimizing the selection of the BMPs from a variety of mitigation alternatives based on multiple criteria and objectives (FHWA, 2000). Technical challenges are associated with the implementation of multiple planning processes. One of the major technical challenges is related to the discovery and integration of a large volume of interdisciplinary data and simulation models (Adamala, 2017), which is essential for supporting the multi-hazard risk assessment in the long-term planning and mitigation process, as well as for informing rapid response and rescue during a hazardous event. These information resources can provide data-driven and model-driven insights for informing the current and future state of urban water systems and watersheds. Another major technical challenge is related to the accurate and timely prediction of hazardous events, which help facilitate early warning and prevention of hazard.
Conventionally, these challenges are approached using domain models and human justification of decision-makers, and therefore require computation-and labor-intensive efforts for coupling multiple models and investigating the underlying physical processes of different hazards. In recent decades, developments in advanced ML techniques has offered a more time efficient method for overcoming these challenges in an intelligent manner. Many review papers have enumerated ML and big data applications for enhancing various water resources management related applications and hydrological analysis (Adamala, 2017;Holzbecher et al., 2019) and for mitigating a specific water hazard, such as flooding (Mosavi et al., 2018), water pollution (Haghiabi et al., 2018), and erosion (Abdulkadir et al., 2019). In this paper, we explore and discuss benefits and potential opportunities of the ML applications for enhancing the mitigation of multiple urban water hazards. Herein, we review a selection of successful studies that apply various ML techniques and hybrid modeling techniques (i.e., the fusion of ML methods with process-based domain models) to overcome challenges encountered by different planning processes for integrated urban water management. Hybrid models are a mixture of inductive (data-driven) and deductive (process-based) approaches (Goldstein and Coco, 2015;Hajigholizadeh et al., 2018;Frame, 2019) and are referred to by Goldstein and Coco (2015) as the use of empiricisms built from ML in process-based models. Other researchers (e.g., Karpatne et al., 2016) approach hybrid modeling from the opposite direction-as "theory-guided data science, " in which data analysis, given sufficient grounding in physical principles, can represent causative relationships among parameters.
Additionally, we provide a vision for ways in which ML techniques can be used to facilitate different processes in the planning framework for the future. Different from previous review articles that focus on the machine learning application in the water management sector (Sun and Scanlon, 2019;Chen et al., 2020), we review innovative and applicationready machine learning solutions to facilitate urban water hazard mitigation from the practical aspect of addressing technical and methodological challenges in water resources and disaster management frameworks. The target audience of this paper includes watershed management authorities (WMAs), urban and regional planners, and research professionals in the water resources management sectors. To retrieve the relevant literature in this field that applies various ML techniques for urban water management, we conducted searches using tools such as Google scholar (https://scholar.google.com) and Scopus (https:www.scopus.com). Figure 1 shows the result of the query: ("Random Forest" OR "Artificial Intelligence" OR "ANN" OR "Support Vector Machine" OR "ANN" OR "Artificial Neural Network" OR "Neural Network" OR "SVM" OR "Machine Learning") AND ("water management" OR "water resources management" OR "watershed management" OR "watershed planning" OR "urban water systems" OR "multi-hazard" OR "water hazard" OR "flood disaster" OR "water pollution") AND [EXCLUDE (PUBYEAR, 2020)]. We executed the query for years 1999-2019, and excluded year 2020. The above query retrieved a total of 46,145 documents from Scopus such that either article title, list of keywords or abstract satisfies the query. It is clear from Figure 1A that there is a significant growth in ML based approaches for water related areas such as water management and urban water hazards. Figure 1B shows the top four scientific journals which receive research on ML application to water related areas. The graph in Figure 1B also confirms the increasing trends in the applications of ML techniques in water management and hazards.
Among the thousands of literature identifies from the Scopus, we select a handful of studies that are either published in recent years or are most relevant to and practical for improving specific processes and steps in the generic hazard mitigation stages and long-term water planning frameworks that are discussed early in the introduction section. We also consider the diversity and novelty of the machine learning techniques during the selection of studies for more detailed reviews and discussions. Based on the challenge and planning process targeted by these studies, we divide our review here into the following sections. Section 2 reviews the predictive data analytics powered by various ML techniques that help planners predict water-related hazards (e.g., flood, drought, water quality, and soil erosion and sediment transport). Multiple applications of hybrid modeling are also discussed in this section. Additionally, a subsection reviewing innovative combinations of ML and remote sensing technologies for disaster management is included, as remote sensing technologies are increasingly applied for improving the discovery and extraction of useful information and features (e.g., land use and land cover, flood inundation extent, and reservoir storage from satellite imagery) that are critical for early warning of hazards and rapid response and rescue during hazardous events (Hodgson et al., 2010). Section 3 presents the ML applications for the identification and assessment of water-related multi-hazard risks and vulnerability (e.g., building inundation, infrastructure failure, and economic loss) in urban water systems. In section 4, we review a few case studies that utilize ML algorithms to optimize the selection of urban BMPs, which can improve long-term planning and mitigation and recovery and restoration processes. Finally, in section 5, we present our vision for the application of next-generation ML techniques to efficient generation of mitigation strategies in response to urban water hazards. ML methods and their performance as applied to each issue are summarized in Table 1.

EARLY WARNING AND PREDICTION OF URBAN WATER HAZARDS
The capability to predict timely and accurate occurrence, intensity, and frequency of natural hazards is essential to every planning process that develops disaster preparedness and response to ensure public safety and mitigate unfavorable consequences associated with hazardous events (de Goyet et al., 2006). Traditionally, hydrological processes that contribute to water-related hazards have been analyzed using probabilistic modeling and physics based modeling approaches. The probabilistic approaches are devised to estimate the available stock over relatively short future time horizons (Philbrick and  Kitanidis, 1999). However, since the overall global climate is changing, rainfall data in any given area are non-stationary; thus the past does not necessarily predict the future, and the information given in recent data points may be more predictive than that of the data points from the more distant past (Tay and Cao, 2002). Limitations of probabilistic methods to produce realistic and specific results for water security planning have required the employment of physics based models for these predictions. Modeling hazardous events using physics based approaches requires the theoretical understanding of the atmospheric, land, and human processes and their interconnections; along with dynamics behind multiple hazards. However, many physics based models are designed to simulate pristine watersheds where hydrology is assumed to behave in a "pure" way, untainted by human interference (Joslin, 2016); therefore these physics based models are not suitable alone for predicting water-related hazards in urban watersheds. In addition, physics based models require large parallel machines and long periods of time for computation, neither of which may be available to water managers. Compared with the traditional modeling approaches, predictive data analytics powered by ML models can directly extract knowledge of natural disaster processes based on previous disaster occurrences and geoenvironmental factors without prior knowledge (Pham et al., 2016;Rahmati et al., 2019). Unlike physics based modeling approaches, ML techniques can provide a bridge between physics based and probabilistic models because they can highlight patterns, trends, and regularities in data without requiring detailed understanding of the physical processes (Dibike and Solomatine, 2000;Rahmati et al., 2019), even when data are sparse, and with less complexity of construction and at relatively low computational cost (Mekanik et al., 2013).
Based on the scientific reasoning behind them, ML applications for predicting water-related parameters can be categorized either as inductive, whereby classifications are made based on statistical similarity in the hydrologic data directly; or deductive, whereby environmental variables (e.g., watershed characteristics) are analyzed as key drivers of hydrology to create classification (Wagener et al., 2007(Wagener et al., , 2010Olden et al., 2012;Auerbach et al., 2015). Because the inductive approach requires abundant hydrologic data (although all watersheds are ungauged at some point with unavailable or insufficient measurements; Joslin, 2016) many studies have favored the deductive approach, which classifies rivers and watersheds based on readily available environmental data that reflect the main drivers of hydrologic processes (Auerbach et al., 2015). Many researchers have utilized the deductive approach to relate stream condition (e.g., flow regimes, biodiversity, streamflow) with upstream watershed characteristics for different water resource management purposes (Poff and Allan, 1995;Snelder and Biggs, 2007;Carlisle et al., 2008;Reidy Liermann et al., 2012;Rice et al., 2015). The rationale for deductive classification methods, such as hydrologic regionalization, environmental regionalization, and environmental classification is to group river hydrological characteristics by spatial representation (e.g., river basin, region, catchment) based on environmental, hydrological, physical, and climatic similarity (Olden et al., 2012) to develop reliable class and empirical relationships between predictor and watershed characterizations.

Floods
Long term processes of change, including changes in climate, shifts in population, and increases in urbanization, will likely increase future urban flood risk changing the assumptions upon which flood risk analysis and management has long been based (Gangrade et al., 2019), and requiring new tools for risk assessment (Milly et al., 2008). In order to understand how to predict floods and to mitigate their effects on urban areas using new tools, it is important to understand the events that lead to flooding. The locations and processes that contribute to floods include atmospheric processes, catchment-level floods, river flooding, and accumulation of water in flood-prone urban areas (Merz et al., 2010). We discuss next the ML methods applied to each of these processes. The performance of SVMs for forecasting regional rainfall varies across the geographic area. For non-stationary time series forecasting, DSVMs generalize better than the standard SVMs. The evaluation of these methods is conducted using both real data and simulated data. Cao and Gu, 2002;Mohanty and Mohapatra, 2018 Anomaly detection Various anomaly detection algorithms have been proposed for detecting point anomalies to improve hydrological and climate data quality, as well as to mine potentially meaningful pattern anomalies within a given time series or spatio-temporal data. Chandola et al., 2009a;Das and Parthasarathy, 2009;Sun et al., 2017 Catchment (section 2.1.2) Evolutionary algorithms Genetic programming approach performs better than the traditional hydrological models during scenarios where surface water movement and water losses are poorly understood. Whigham and Crapper, 2001 Cellular automata (CA) The CA technique provides a versatile approach for modeling complex physical systems using a simplified 5-feature cell-based system. Compared with physically-based models, CA can dramatically reduce computational load, while providing a minimum required accuracy for rapid flood analysis in large-scale applications.  XGBoost, RF Lu and Ma (2020) evaluated the prediction performances of two novel hybrid decision tree-based ML models (based on XGBoost and RF) using the absolute percentage errors. The RF-based model has the best performance for predicting temperature, dissolved oxygen, and specific conductance, and the XGBoost-based model is best for predicting the pH value, turbidity, and fluorescent dissolved organic matter.
Lu and Ma, 2020 Random forests, M5P, RT, REPT Bui et al. (2020) demonstrated the capability of hybrid algorithms to improve the predictive power of several standalone ML models. Among these models, the Hybrid BA-RT showed the best performance. Bui et al., 2020 Soil erosion (section 2.4) Tree-based ML methods Rahmati et al. (2017) found that many tree-based models (e.g., RF, RBF-SVM, BRT, and P-SVM) performed excellently both in the degree of fit and in performance for predicting gully headcuts. Hosseinalizadeh et al. (2019) proved that random forests were the most effective of these models for predicting and mapping gully headcuts in the future.

ANN
The performance of ANN varies based on the training dataset (e.g., the time span and data quality) and the type of sediment for prediction.
These ANN predictions are often tested against domain models and theories. Tayfur, 2002;Lin and Montazeri Namin, 2005;Bhattacharya et al., 2007;Yang et al., 2009 Adaptive-network-based fuzzy inference system (ANFIS) Wieprecht et al. (2013) demonstrated that the ANFIS approach could be a useful alternative technique for predicting both bedload and total bed-material load. Lin and Montazeri Namin (2005) found that the method can be used to model both uniform and non-uniform suspended sediment. Bakhtyar et al. (2008) revealed that the ANFIS model provides higher accuracy and reliability for longshore sediment transport techniques than other methods, such as Fuzzy Inference System and CERC.  Genetic algorithms (GA) Yadav et al. (2019b) suggested that GA models outperform other models, such as ANN and SVM, for estimating suspended sediment yield. Altunkaynak (2009) found that GA models outperform the regression method for predicting sediment loads. Altunkaynak, 2009;Yadav et al., 2019b Unsupervised techniques These methods are self-organizing, and their results are often validated using domain models and knowledge. For example, Xu et al. (2019a) used the concept of geological landform regions to verify the clustering results of sedimentation potential from a self-organizing map.
Frontiers in Water | www.frontiersin.org  Govedarica and Jakovljević (2019) found that the SVM algorithm worked better than ANN with Landsat 8 data and ANN worked better than SVM when Sentinel-2 data was used for water quality monitoring. Multi-hazard assessment (section 3) BRT, GAM, SVM Rahmati et al. (2019) investigated and mapped multi-hazard exposure using a combination of ML models. They found that the different ML models differed in their accuracy in predicting the different hazards, but that the applied ML models were nevertheless useful and generalizable for multi-risk mapping.

Rahmati et al., 2019
Random forests, RBF neural network Chen et al. (2019) evaluated the risk of regional flood disaster in the Yangtze River Delta (YRD) region. They discovered that the level of urban flood disaster is closely related to rainfall, topography, economic development, land use, soil erosion, urban flood control investment, and disaster emergency response capability.

Chen et al., 2019
Random forests, SOM Xu et al. (2019a) showed that ML application can be used not only for multi-risk assessment and hazard prediction but also for exploring the complex and interconnected processes behind multiple hazards.

Xu et al., 2019a
Random forests Pourghasemi et al. (2020) developed the Sendai framework, which used random forests to produce a reasonable understanding of the factors controlling flood, forest fire, and landslide occurrence, and to produce a multi-hazard probability map for facilitating integrated and comprehensive watershed management and land use planning. Best management practices (BMP) GA/adaptive search Hadka and Reed (2013) developed a high-performance adaptive search "Borg" algorithm, which was shown to be the most scalable and the best performing of five best performing multi-objective optimization algorithms applied to rainfall-runoff calibration, long-term groundwater monitoring, and risk-based water supply portfolio planning. Others applied GA-based optimization models to find solutions to water quality problems for several watersheds in the United States by connecting non-point pollution reduction models with economic components. Srivastava et al., 2002;Hadka and Reed, 2013;Limbrunner et al., 2013;Chen et al., 2015 2.1.1. Atmospheric Process Methods One ML method that is used to capture the underlying relationship between independent and dependent variables in atmospheric processes is Artificial Neural Networks (ANNs). ANNs are interconnected networks comprising an input layer, some number of hidden layers, and an output layer. Each layer contains several processors, or nodes, referred to as artificial neurons. The neurons in each layer are connected to the neurons in the previous and next layers, and they transfer information from one layer to the next. Synaptic weights and biases, along with activation functions applied to the input layer, modulate the input signals sent from one layer to the next. The processed information is then sent as output to the connected neurons in the output layer (Zounemat-Kermani et al., 2020). The power of ANNs is their ability to learn functional relationships, with minimal empirical error, between these variables. Additionally, the use of activation functions with ANNs allows them to handle non-linear data effectively . In fact, many water related studies (e.g., Sahoo et al., 2017) using ANNs have shown that complex, reproducible, non-linear relationships exist among, for example, precipitation, temperature, streamflow, climate indices, irrigation demand, and groundwater levels.
Another ML method that has been used for predicting average rainfall is a classification algorithm known as Support Vector Machines (SVM) (e.g., Mohanty and Mohapatra, 2018). This method, developed by Vapnik (1995), is based on Structural Risk Minimization, which, rather than minimizing empirical error, as ANNs do, minimizes an upper bound of the generalization error ε. Dynamic Support Vector Machines (DSVMs), a modified version of the SVM, can be used to accommodate the structural changes in non-stationary rainfall data because it uses, instead of a static ε and static regularization constants, an exponentially decreasing ε, and exponentially increasing regularization constants (Cao and Gu, 2002) to allow room for analysis of changing patterns in the data.
The probabilities of hydrological extreme events such as floods and drought are modeled using different distributions from those that predict future average values. Traditionally, these events and their return periods are estimated with distributions associated with Extreme Value Theory (e.g., Kao and Ganguly, 2011). However, ML techniques for anomaly detection have begun to be applied to hydrological extremes problems. Anomaly detection is the identification of outliers in the data, or items that differ significantly from the overall trend of the data. Typically, anomalous data is related to issues such as measurement equipment failure or an extreme hydrological event. For example, Das and Parthasarathy (2009) used unsupervised spatio-temporal distance-based and neighborhood-based anomaly detection method with global climate data to identify extreme drought and heavy rainfall at specific locations. Characterization of short-term and long-term future extreme events have also been made with anomaly detection using trends found in historical time series. For these analyses, techniques such as kernel-based (rule-based classification), window-based (examination of the data in smaller "windows" in space or time), predictive, and segmentation (partitioning data into even smaller, possibly unequal, segments) algorithms are employed along with anomaly detection for locating extremely low and extremely high temperature and precipitation events (Chandola et al., 2009b). In the case of the research by Sun et al. (2017), a density-based method was applied to anomaly detection in a hydrological time series. That is, the data were transformed to a piecewise linear representation through the important feature points of the data before mapping their slope, length, and mean to three-dimensional space for examination.

Catchment-Level Methods
Flood models at the catchment level analyze mainly issues of runoff generation and concentration leading to flood discharge. Because flood flow predictions are complex, non-linear, and not well-understood, ML may be required to evolve algorithms to derive characteristics of a particular flow. One way of evolving these algorithms is with the use of genetic programming, or genetic algorithms (GA), which produce, using routines imitating Darwin's "natural selection, " algorithms directed to perform tasks defined by a set of training examples. Whigham and Crapper (2001) applied a type of genetic programming system to discover rainfall-runoff relationships for two meteorologically and topographically different catchments, one in Wales and one in Australia, and compared the results to those obtained with a traditional deterministic lumped parameter model. While both models did well when rainfall and runoff were correlated, the genetically programmed model performed better on the more poorly correlated data because it was allowed not to assume any underlying relationships, only to demonstrate its "fitness" to solve the problem. Guidolin et al. (2016) used a two-dimensional cellularautomata-based model employing simple transition rules and a weight-based system to model catchment-level runoff. This diffusive-like method is designed to work with various general grids (rectangular, hexagonal, triangular) and with different neighborhood types (e.g., Moore or von Neumann). It also allows for model parallelization to increase its efficiency in large compute environments. To propagate a flood using this method, ratios of water to be transferred from a central cell to downstream neighbor cells are calculated using a weight-based system, with water volume transferred limited by Manning's formula (Manning et al., 1890), and the critical flow equation. Water velocity and an adaptive time step are evaluated within a larger updated timestep. The results of the emergent behavior of this process shows good agreement with much more computationally intensive physical methods.

Machine Learning for Analyzing River Floods
Flood hazard in rivers can be characterized by the probability and intensity of large river flows and their consequent inundations, and it depends on the atmospheric and catchment processes preceding river flood generation (Merz et al., 2010). In fact, river floods are generally defined in hydrological terms by their water level or amount of discharge. Thus, Shamseldin (2010) explore the use of ANN for forecasting discharge from the Blue Nile river in Sudan. The type of neural network they chose was that of a multi-layer perceptron (MLP) feedforward network, a non-linear input-output model consisting of a network of interconnected neurons, or computational units, linked together by connection pathways. The input layer is essentially a set vectors of independent variable values, whereas the output layer is a set of possible dependent variable vectors of values. Between these two layers is a hidden layer containing an unknown number of neurons which are usually estimated by a trial-and-error procedure based on a mathematical non-linear transfer function (Shamseldin, 2010). Input variables in this case were weighted historical rainfall estimates, weighted seasonal rainfall estimates, and seasonal expectation of discharge; and the output variables were the river discharge values. Results showed strong correlation with observations for the river.
In addition to the multilayer perceptron ANN approach, other types of ANNs have been used to analyze river floods. For example, Tayyab et al. (2016) applied and compared three different types of ANNs to predict stream discharge for the Jinsha River Basin in China. The methods included feedforward back propagation neural networks (FFBPNN), generalized regression neural networks (GRNN), and radial basis function neural networks (RBFNN). The differences among these approaches lies in the hidden layer functions and activation functions that are applied to the problem. Badrzadeh et al. (2013) expanded on these ANN approaches by coupling wavelet (transforms that identify trends in the data normally not revealed by signal analysis approaches and also help to de-noise a dataset) multiresolution analysis and adaptive neuro-fuzzy interface system (ANFIS) techniques (integration of neural networks and fuzzy logic) as preprocessing techniques to the ANN and show improved daily river flow forecasting over the use of ANNs alone, especially for long lead times. Mosavi et al. (2018) demonstrated the application of ANNs, neuro-fuzzy, SVM, and support vector regression (SVR) (SVM with regression only), in forecasting river floods and predicting the runoff hydrograph. The robustness of these techniques was evaluated and was found to be in good agreement with the observations.

Methods for Addressing Flood-Prone Urban Areas
Building resilience to natural disasters is one of the most pressing challenges for achieving sustainable urban development in floodprone regions . River flooding in urban areas can cause high levels of damage, and while a relationship between hydrological characteristics and damaging floods may exist, knowing about an area's hydrological characteristics does not always indicate understanding of its vulnerability to damaging floods (Pielke, 2000). This understanding is imperative for hazard-mitigation planning for urban areas because these areas' responses to rainfall extremes tend to be faster than those for natural surfaces (Rodriguez et al., 2003). Thus, strategies for flood mitigation in these areas such as detention ponds, soakaways, permeable concrete, and green spaces, or upstream solutions such as river training and construction of dams and levees (Shamseldin, 2010) should be evaluated and implemented based on a thorough understanding of flood risks and responses of the area. For example, for predicting urban floods for the city of Pattani south of Thailand, Noymanee et al. (2017) examined the entire Pattani basin, which includes two dams for water management: a diversion-type, Pattani Dam, and a hydropower plant, Bang Lang Dam. It is known that the most frequent floods are a result of overflow from flash flooding of the Pattani Dam rushing toward the city. The researchers acknowledge that a comprehensive approach to controlling floods in the area must include both structural and non-structural measures such as the development of improved technology for data management of the drainage network, and an increase in the sensors' frequency and extent of coverage. Thus, Noymanee et al. (2017) tested five different ML methods using open data pertaining to the area hydrology, the dam structures, the drainage network, and the technological components of the dams to explain the occurrence of extreme floods estimating dam water levels and cumulative precipitation amounts to forecast flood peaks in the urban area. The five methods tested included an ANN, Bayesian linear regression (statistical inference using Bayes' theorem), boosted decision tree regression and decision forest regression (both similar to random forest analysis discussed in section 2.2.1) and linear regression. Results showed the lowest error and highest correlation with the observations in the urban area from the Bayesian linear regression. This favorable result for that method may have occurred because it was informed by probability distributions drawn from prior data.
Often, in order to understand and manage risks of urban flooding beyond purely hydrological considerations, integration of decision support tools with predictive models is instructive. For example, one study (Rozos, 2019), combined a hydrological model, a demand management model called a network flow programming model (NFP), and an Feed Forward Neural Network (FFNN) to simulate a water supply system in Athens, Greece. The NFP optimizes and simulates the operation of a water supply system given hydrological inputs. FFNNs are the simplest type of ANN, whereby information moves in a forward direction from input nodes to the hidden layer to the output nodes (Mosavi et al., 2018) and they lend themselves to multimodel coupling. In this case, the NFP used synthetic data of a length capable of capturing the risk of each policy. Then the penalty functions of the NFP were selected to reflect the operating policies with different levels of risk acceptance. This process provided a large set of training data over a long period of time that was then used as input to the FFNN. This process allowed optimal decisions to be identified and made for the Athens system.

Predicting Indirect Flood Effects in Urban Areas
Indirect flood effects are those that cause damage to assets outside the flooded area. These assets can be physical, economic, social, or ecological in nature with impacts lasting for days, months, or even years after a large flooding event (Costello et al., 2019). In order to evaluate the extent of these effects, multi-agent-based simulations have been applied. Agent-based models simulate actions and interactions of autonomous agents, which can be individual actors or groups of actors, to assess the effects of these individual actions on the system as a whole. In one study , reinforcement learning, which rewards software agents for actions taken to maximize their cumulative reward, was used with the agent-based simulation for the optimization of post-disaster recovery for both individual companies and supply chains for Tokyo, Japan. That study showed improved indirect damage estimation accuracy and mitigation potential over statistical methods and rough empirical models.

Drought
Drought is a prolonged period of precipitation deficit that may occur at varying spatiotemporal scales ranging from local to regional, lasting for weeks, months, multiple years, or even decades (Pendergrass et al., 2020;Hao et al., 2018). Drought may be exacerbated by extreme heat, soil moisture deficit, land atmosphere feedbacks, sea surface temperature anomalies, atmospheric circulation, and human activities such as land use and land cover changes and increased water demand (Cook et al., 2007;Dai, 2011;Kam et al., 2014). Droughts are high-impact weather hazards that affect agriculture, economy, ecosystem, water supply, and human lives (Hao et al., 2018). Over the past two decades, the total cost associated with drought is estimated to be billions of dollars (Huntingford et al., 2019). In a warming climate, the duration and intensity of drought is further projected to increase (Pagán et al., 2016;Pendergrass et al., 2020). Therefore, an advancement in the capability of timely prediction and development of early warning systems is crucial for drought risk management and strategic planning.

Advancement in the Use of Machine Learning Techniques for Drought Prediction
Drought is a complex weather hazard (Van Loon, 2015); therefore, a comprehensive understanding of the physical mechanisms that drive drought is essential to improving drought prediction (Huang et al., 2016). Numerous studies have been conducted to understand the intricate physical processes that lead to the extreme low moisture conditions of drought. Scientists have employed dynamical methods that involve climate and hydrological model simulations, statistical models using a suite of predictors and drought indices, as well as hybrid models for drought prediction (Fernández et al., 2009;Dutra et al., 2014;AghaKouchak, 2015;Mo and Lyon, 2015;Wood et al., 2015;Hao et al., 2017Hao et al., , 2018. During the last decade, there has been an increase in the use of ML techniques to improve drought predictability (Hao et al., 2018). For instance, random forest ML algorithms have been increasingly used in drought prediction studies (Park et al., 2016;Kuswanto and Naufal, 2019;Rahmati et al., 2020). Random forests are extensions of decision tree analysis that start with classification trees-types of decision trees that can be grown together as a "forest" in a computational system. They provide highly accurate classification and characterization of complex predictor variable interactions while maintaining flexible analytical technique selection . Random forests also provide the capability to deal with the issue of overfitting and multicollinearity as compared to the traditional linear regression models (Konapala and Mishra, 2020). Park et al. (2016) employed random forests, boosted regression tree, and Cubist ML algorithms (rule-based model trees on which the terminal leaves contain linear regression models) for meteorological and agricultural drought monitoring using 16 remote sensing based drought factors over arid and humid regions in the United States. Their findings suggest that among the three approaches, random forests provide the best performance for Standardized Precipitation Index (SPI) prediction. Similarly, Kuswanto and Naufal (2019) found the performance of random forests to be optimal when using SPI derived from Modern-Era Retrospective analysis for Research and Applications (MERRA-2) for drought prediction over the East Nusa Tenggara Province in Indonesia. A more recent study, Rahmati et al. (2020) compared the performance of six different ML techniques [classification and regression trees (CART), boosted regression trees (BRT), random forests, multivariate adaptive regression splines (MARS), flexible discriminant analysis (FDA), and SVM] for mapping agricultural drought hazard in the southeast region of Queensland, Australia. Similar to Park et al. (2016) and Kuswanto and Naufal (2019), they found that random forests had the best goodness-of-fit and predictive performance among the six models. Zaniolo et al. (2018) contributed to the FRIDA (FRamework for Index-based Drought Analysis) for the automatic design of basin-customized drought indexes across different types of basins by applying a MLpowered variable selection algorithm. The algorithm is based on a Wrapper for Quasi-Equally Informative Subset Selection (W-QEISS), which applies a multi-objective evolutionary algorithm to identify Pareto-efficient subsets of variables. This technique is able to maximize the wrapper accuracy, minimize the number of selected variables, and optimize relevance and redundancy of the subset. As a result, the framework is able to build an index that represents a surrogate of the drought conditions in a basin through the computation and combination of all the relevant available information regarding the water cycle in the system identified using the feature selection algorithm.
ANN ML techniques (see section 2.1.1) have also been used for drought forecasting (Mishra et al., 2007;Morid et al., 2007;Belayneh and Adamowski, 2012;Belayneh et al., 2014). Belayneh et al. (2016) coupled a wavelet transform data processing technique (see section 2.1.3), bootstrapping and boosting ensemble approaches with ANN and Support Vector Regression (SVR) (see section 2.1.1) for drought prediction in the Awash river basin of Ethiopia. Bootstrapping is a resampling technique with replacement that was used to create bootstrap ANN and SVR ensemble models to reduce model prediction uncertainty. Boosting techniques improve the performance of an algorithm by producing a series of models focusing on training cases that were not well predicted previously. The researchers found that the coupled models showed an improved performance and provided more robust SPI predictions as compared to either of ANN or SVR alone.
ANN models can be limited by model interpretability, local minima traps, and computational efficiency issues. Thus, alternatively, XGBoost has been gaining popularity due to its high execution speed and improved model performance as compared to other ML techniques such as SVM, ANN, and random forests (Fan et al., 2018;Shimoda et al., 2018;Zhang R. et al., 2019). XGBoost is an ensemble technique that implements a gradient boost decision tree algorithm to produce an ensemble of weak prediction models. Models are subsequently added to improve errors until an optimum performance is achieved. Zhang R. et al. (2019) compared the performance of XGBoost with a traditional statistical model and an ANN model for Standardized Precipitation Evapotranspiration Index (SPEI) prediction with a lead time of 1-6 months for 32 weather stations in the Shaanxi Province of China. In their study, the XGBoost model showed the best performance for SPEI prediction, achieved highest user's and producer's accuracies and was much faster than the ANN model.

Water Quality
The deterioration of water quality in both groundwater and surface water has become a major concern causing negative impacts on human well-being, eco-systems, water supply, and infrastructure around the world (UN, 2012; Khan and See, 2016). According to United Nations (UN), more than 880 million people are living in water scarcity without adequate safe drinking water, and 2.6 billion people lack access to basic sanitation due to water shortage (UN, 2010(UN, , 2012. Effective management of water supply systems and watersheds often requires reliable and timely approaches for predicting water quality and forecasting future water quality trends (Wang et al., 2017;Bui et al., 2020). Based on established water quality standards (Nowell and Resek, 1994;EPA, 2012), water quality is often estimated using a combination of water quality parameters that reflect the physical, biological, or chemical characteristics of the air, watershed hydrology, soils, and sediment transported in the aquatic system (Hou et al., 2013;EPA, 2019). Developing accurate and timely prediction of water quality is a challenging effort. The traditional approaches utilize water quality models for analyzing and predicting water quality parameters. Most of these models consist of mathematical representations of physical mechanisms that determine (a) the fate, transport, and degradation of pollutants within a water body, and (b) the movement of pollutants from land-based sources to a water body (Refsgaard and Henriksen, 2004). Despite their usefulness for modeling specific scenarios, water quality models can only provide one line of evidence that serves as an imperfect approximation of reality (Kebede, 2009). This is because of process complexity of the water quality problems in that (1) there is a large number of interconnected multi-domain processes (e.g., physical transport, hydrological, chemical, and biological); and that (2) many underlying mechanisms that may affect water quality are still unknown. Complex water quality models often involve time-consuming and labor-intensive processes (Ahmed et al., 2019), rendering them costly and ineffective for supporting many time-critical water resources management tasks that have limited budgets. Compared with process-based (mechanistic) models, the newly emerging datadriven approaches for water quality predictions often rely on a large volume of water quality and hydrological data from various sources (Khan and See, 2016). Examples of these data sources include the United States Geological Survey (USGS) online resource-National Water Information System (NWIS) and the United States Environmental Protection Agency's (USEPA) STORET Data Warehouse (Beran and Piasecki, 2008). These analyses normally consider the combined effect of multiple water quality parameters, such as ammoniacal nitrogen (NH3-N), suspended solid (SS), dissolved oxygen (DO), pH, and salinity. As many of these parameters are dynamic and affected by natural watershed hydrology, their influences on water quality may vary across watersheds (EPA, 2019). In different watersheds, some parameters may have greater and more noticeable influences on water quality than others (Khan and See, 2016). In response to this challenge, the water quality index (WQI) has been proposed as a representation of several water quality variables simultaneously considered. However, calculating WQI using traditional approaches consumes time and is often filled with errors during derivations of sub-indices (Bui et al., 2020). To address these limitations and improve water quality analysis and prediction, researchers have applied many ML techniques (Khan and See, 2016;Ahmed et al., 2019;Bui et al., 2020), as well as developed a few hybrid approaches that combine various traditional methods with ML techniques (Taskaya-Temizel and Casey, 2005;Wang et al., 2017). We discuss the application of some of these approaches next. Palani et al. (2008) and Singh et al. (2009) applied ANN models to predict river and coastal water quality in India and Singapore respectively. Each found that the ANN-computed values of water quality indicators were in close agreement with their respective measured values in the river water. García-Alba et al. (2019) developed an ANN model to estimate bathing water quality in estuaries and found that ANN models are able to estimate Escherichia coli concentrations comparable to those extimated by process-based models, and at much lower computational cost. In more recent studies, combinations of multiple ML and data analytic techniques applied to a problem are preferred to analysis with a single ML technique. For example, Lu and Ma (2020) proposed coupling two ML models to improve water quality prediction: XGBoost (section 2.2.1), and a random forest algorithm (section 2.2.1). They found that while the hybrid XGBoost model performed better for PH values, turbidity, and fluorescent dissolved organic matter predictions, and the random forest model performed better for temperature, dissolved oxygen, and specific conductance prediction; the combined performance of the two models was the best for optimizing the calculation of a water quality index. Barzegar et al. (2020) applied two standalone deep learning (DL) models, a convolutional neural network (CNN), an ANN with a convolutional activation function, and the long short-term memory (LSTM) model, which includes feedback in addition to feedforward networks, and a combined CNN-LSTM model to predict two water quality variables, dissolved oxygen (DO; mg/L), and chlorophyll-a (Chla; µ/L), in the Small Prespa Lake in Greece. Assessment of the model performance using statistical metrics, showed that LSTM outperformed the CNN model for DO prediction, but the standalone DL models yielded similar performances for Chl-a prediction. The combined CNN-LSTM model, however, outperformed the standalone models for predicting both DO and Chl-a. By coupling the LSTM and CNN models, both the low and high levels of water quality parameters were successfully captured, particularly for the DO concentrations (Barzegar et al., 2020). Similar successful approaches involving the coupling of multiple ML algorithms for the short-term prediction of water quality parameters include Li et al. (2018) and Lu and Ma (2020). Bui et al. (2020) applied four standalone algorithms [random forests and three variants: M5P (similar to Cubist, section 2.2.1), random tree (RT), reduced error pruning tree (REPT)], and developed 12 algorithm combinations among these methods to predict water quality in northern Iran. They found fecal coliform concentrations to have the most effect and total solids to have the least effect on the predictions. Finally, Read et al. (2019) integrated theory with state-of-the-art ML techniques to improve predictions of water quality related parameters guided by physical laws. The study presented a use case for a Process-Guided Deep Learning (PGDL) hybrid modeling framework for predicting depth-specific lake water temperature, which serves as an important water quality parameter. The PGDL consisted of three primary components: a deep learning (many-layered neural network) model with temporal awareness (long short-term memory recurrence), theory-based feedback (model penalties for violating conversation of energy), and model pre-training to initialize the network with synthetic data (water temperature predictions from a process-based model) (Read et al., 2019). Through the use case the researchers demonstrated that the integration of scientific knowledge into deep learning tools shows promise for improving predictions of many important environmental variables.

Soil Erosion and Sediment Transport
Erosion and sedimentation are naturally occurring processes that include the detachment, transportation, and deposition of soil particles through the action of wind, water, and ice (NRCS, 2008). However, excessive soil erosion and sedimentation rates are results of anthropogenic activities (e.g., urbanization and agriculture) where soil surfaces are exposed and initially not revegetated (e.g., construction sites). Without proper mitigation, erosion and sedimentation in urban areas can cause a series of adverse impacts to the environment and urban areas (Guy, 1970;Hewett et al., 2018), which include water pollution, degradation of aquatic habitat, infrastructure damage (e.g., sediment blockage in urban waterways, storm sewer, and stream crossings, as well as silting of roadways, utility supply networks, and fences), increase in water-treatment costs, and stream bank instabilities (e.g., gullying and land-slides) (NRCS, 2008).

Machine Learning Techniques for Sediment Research
To tackle sediment-related problems, the predictions of sediment production and transport are required to inform urban planning and watershed management communities of the major source of sediment and erosion-prone areas. Conventionally, these predictions are addressed through a wide variety of erosion and sediment transport models (Merritt et al., 2003;Nearing et al., 2005). Despite the usefulness and maturity of these traditional approaches, the prediction of sediment-related parameters (e.g., soil losses, in-stream sediment load, and sediment delivery ratio) is still challenging because of the following model limitations: (a) running many physically-based erosion and sediment transport models are time-and resource-intensive, and requires the consideration of more physical processes in addition to the hydrological process making models are less applicable to sediment-related predictions in large watersheds and areas (Abaci and Papanicolaou, 2009); (b) most models are designed to simulate a specific type of erosion (e.g., rill, gully, and stream bank erosion) and sediment transport (e.g., suspended load and bed load) (Wischmeier and Smith, 1978;Ganasri and Gowda, 2015), while sediment-related problems in urban areas and urban waterways often entail multiple types of erosions and sediment transport therefore requiring the integration of a variety of models; and (c) most erosion and sediment transport models do not cover sediment transport and deposition at manmade structures (Rowley, 2014) in urban areas. A comparative study conducted by Liang et al. (2019) showed that data-driven models can effectively inform and complement the simulations conducted with physics based models. Currently, there are many studies that utilize various ML methods to address various issues in sediment research. We summarize a list of example studies by their application areas and their applied ML methods: 1. Modeling sediment transport (a) Artificial neural networks (Tayfur, 2002;Lin and Montazeri Namin, 2005;Bhattacharya et al., 2007;Yang et al., 2009), (b) Adaptive-network-based fuzzy inference system: (Lin and Montazeri Namin, 2005;Bakhtyar et al., 2008;Wieprecht et al., 2013), (c) M5 Model trees (Onderka, 2012;Goyal, 2014).

Sediment-related impacts on urban infrastructure
(a) Random forest (Xu et al., 2019a), (b) Adaptive-Network-based Fuzzy Inference System (ANFIS) (Azamathulla et al., 2011(Azamathulla et al., , 2012. In general, erosion and sediment research is a broad subject that provides numerous opportunities for ML applications. By reviewing the above-mentioned example studies, we have summarized that (a) compared with traditional erosion and sedimentation transport models, ML methods are easier and cheaper (Cigizoglu, 2002;Tayfur and Guldal, 2006;Yadav et al., 2019a), and can be readily applied to solve complex sediment problems that entail human factors and multiple erosion and sediment transport processes (Xu et al., 2019a), (b) ML models that rely on field data generally produce better and more reliable results than those obtained from experimental models (Kitsikoudis et al., 2014).

Hybrid Modeling Techniques for Sediment Research
In addition to its application to previously described hydrological studies, hybrid modeling has also been applied to sediment research (Merritt et al., 2003;Hajigholizadeh et al., 2018). Through the fusion of inductive data-driven models and deductive process-based models (Goldstein and Coco, 2015), hybrid models inherit the strengths of both the ML methods and physics-based models in a single model that has an increased performance in terms of speed (Babovic et al., 2001;Hall, 2004), accuracy (Krasnopolsky and Fox-Rabinovitz, 2005;Goldstein and Coco, 2015), and the capability of addressing soilwater problems with complex and multi-scale physical processes (Hajigholizadeh et al., 2018). An additional benefit of hybrid modeling is that ML models and data can be directly coupled to improve the calibration of process-based models (Knaapen and Hulscher, 2003;Ruessink, 2005;Mekonnen et al., 2012). Hajigholizadeh et al. (2018) summarized a table of hybrid modeling applications that integrate statistical models with process-based models in sediment research including: • Modified Morgan, Morgan and Finney (MMMF) (Morgan et al., 1984), • Sediment river network model (SEDNET) (Prosser et al., 2001),

Application of Machine Learning to Remotely-Sensed Data for Water Hazard Prediction and Mitigation
Remotely-sensed (RS) data, due to its wide spatial coverage, provides a synoptic view of disaster affected areas. It is also frequently available during the disaster response phase providing a temporal overview of the disaster situation. Due to the recent advancements in satellite sensor technology, RS data is now available at various spatial resolutions (i.e., low, medium, and high) affording local, regional, and global coverage, and various spectral resolutions, from a few spectral bands in optical sensors to several hundreds of spectral bands in hyperspectral sensors. Additionally, advancements in the RS field have resulted in a continuous growth in Earth Observation (EO) data archives. Due to these characteristics, RS data is a potential data source for each stage during hydrological preevent planning and post-event countermeasures (Ge et al., 2020). Nevertheless, it is not always possible and is often dangerous to conduct ground surveys of disaster affected areas. Often the disaster destroys the transportation and communication facilities making ground-based survey impossible. In such timecritical situations, the proper selection of the sensor type, spatial resolution, and satellite revisit period is crucial, as predisaster and ancillary data can provide a wide coverage of the disaster affected area (Ge et al., 2020). Despite these occasional limitations, various powerful approaches have been developed recently in the context of advanced ML and computer vision to exploit the wealth of information that can be found in RS data to address various urban water hazards related events (Kurte et al., 2017).

Flood Management
Over the last two decades, RS data have successfully contributed to various stages of flood management (Rahman and Di, 2017) such as flood risk assessment and flood emergency planning and management. Flood risk assessment requires the performance of flood hazard assessment, exposure risk assessment, and vulnerability assessment. As a part of the flood hazard assessment, RS data have been analyzed for flood forecasting and evaluation of flood inundation. As a part of flood emergency planning and management, RS data have been widely used in flood early warning systems, rescue and relief operations, post-flood damage assessment and policy making. Various recent approaches have used advanced ML techniques and RS during various stages of flood management. Flood forecasting requires accurate estimation of rainfall. Although satellite RS has limited direct applicability to flood forecasting, it has been widely used for precipitation estimation, which is an important input for flood forecasting models. In the late 90s, Tsintikidis et al. (1997) used a shallow neural network with one hidden layer to estimate rainfall from a passive microwave radiometer SSM/I data. The network considered brightness temperature and associated polarization information as inputs and it output the rainfall rates. A random forest based ML algorithm was used to estimate the precipitation which used satellite-derived information on cloud-top height, cloud-top temperature, cloud phase, and cloud water path retrieved from the Meteosat Second Generation (MSG) Spinning Enhanced Visible and Infrared Imager (SEVIRI) (Kühnlein et al., 2014). Recently, Shi et al. (2015) proposed a spatiotemporal sequence forecasting approach using Convolutional Long-Short Term Memory (ConvLSTM) with RADAR echo data in 2D from a ground-based RADAR for precipitation nowcasting by forecasting the RADAR echo data. Pan et al. (2019) proposed a Convolutional Neural Network (CNN) based approach to improve the precipitation estimates from numerical weather prediction (NWP) models. The authors stated that the method outperformed reanalysis precipitation products as well as statistical downscaling (SD) products using linear regression, nearest neighbors, random forests, or fully connected deep neural networks. In an another recent work, Hayatbini et al. (2019) proposed a precipitation estimation framework using a fully convolutional neural network and the advanced baseline imager data from GOES-16, a multispectral geostationary satellite. Specifically, they proposed that the U-net CNN architecture could perform rain/no-rain classification using satellite imagery. The study was based on the earlier work of Hong et al. (2004) on precipitation estimation using remote sensing data and an ANN.
Flash flood susceptibility mapping is another important process in flood risk assessment. Recently, Costache et al. (2019) used a Digital Elevation Model (DEM) with 30 m spatial resolution obtained from Shuttle Radar Topography Mission (SRTM), and which was developed using the technique called SAR interferometry, to derive seven flash-related conditioning factors such as slope angle, aspect, profile curvature, and other factors. In addition, the authors used aerial imagery from Google Earth to delineate the torrential areas along with the land use/cover data, CORINE, which was derived from Sentinel-2 and Landsat-8 RS images. K-nearest neighbors (kNN), K-Start (KS), and Anlytical Hierarchy Process (AHP) algorithms were then applied to obtain the flash-flood susceptibility mapping. Thus, RS techniques played a crucial role in obtaining eight out of 10 flash-flood conditioning factors. In a similar work, Shahabi et al.
(2020) used a ML ensemble method with four different k-nearest neighbor (kNN) algorithms for flood detection and susceptibility mapping. Authors used Sentinel-1 images to generate the flood inventory and SRTM DEM to obtain various flood-related conditioning factors. These two works show that ML ensemble methods are gaining traction in flood susceptibility mapping.
Mapping of flooded areas is important to performing damage assessment, deploying rescue and relief operations and developing policies. An example of applying RS and ML to this undertaking is Feng et al. (2015), who developed a random forest based approach to map accurately a flooded area using high-resolution (0.2 m) imagery obtained from Unmanned Areal Vehicle (UAV) imagery. The data were obtained for Yuyao City of Zhejiang Province in Eastern China during the flooding that occurred due to the extreme rainfall event on October 7, 2013. Additionally, Jain et al. (2020) developed a hybrid approach to combine the strength of the traditional water indices from RS imagery and generalization capability of Convolutional Neural Networks (CNN). The authors proposed a new water index which minimized cloud interference in the RS image and used it with a pre-trained VGG-16 model (Simonyan and Zisserman, 2014) and a transfer learning based approach to re-train the model for a new task of flood water detection. In a similar work, Potnis et al. (2019) used an Encoder-Decoder Neural Network based on the Efficient Residual Factorized Convnet (ERFNet) architecture for multi-class segmentation of urban floods satellite imagery from WorldView-2 of floods in Srinagar, India during September 2014. Recently, Jiang et al. (2020) proposed an approach to obtain waterlogging depth from video images using CNN. The approach generated synthetic images from the set of images of reference objects and flood surface, which was further used to train the CNN model to obtain the waterlogging depth. This method can also be employed to obtain waterlogging depth from the images taken of the flooded area using recent drone-based video surveillance. Cervone et al. (2017) added to these techniques a methodology to fuse social media data with the RS data during a flood situation to improve the flood mapping capability.
Recently, a few approaches to model the semantics in RS images were proposed for flood detection and mapping. Kurte et al. (2017) proposed a semantics enabled framework to model the spatial relationships among various regions in the RS images to enable spatial-relationships-based queries such as Retrieve all images in the ALI repository having Built Up region externally connected to the Stagnated Flood Water. Later this work was extended to accommodate the temporal aspect to enable the spatio-temporal semantic queries such as Show road segments which were completely submerged during 9th September 2014 to 22nd September 2014 . In a similar semantics based approach, Potnis et al. (2018) developed a flood scene ontology (FSO) which formally defines complex classes such as Flooded_Residential_Buildings, Accessible_Residential_Buildings, Operational_Roads. After detecting various objects in the RS imagery using any supervised classification approach, the ontology can be used to infer complex classes which are very important for flood mapping.

Water Quality Monitoring
RS data has been used over the past 50 years to monitor water quality. For instance, RS data can be used to measure water turbidity, or lack of transparency, which is a good measure of the water quality. Clear water shows high absorptivity in the infrared and near-infrared wavelength regions. It also shows some reflectivity in the visible regions. Reflectivity in this application can reveal variations in water quality due to salinity, temperature, and turbidity. In the past decade, much research has been published in which remote sensing and ML approaches are used to estimate additional water quality parameters. For example, Dogan et al. (2009) explored the non-linear capability of ANN to improve the accuracy of biological oxygen demand (BOD) estimation. Wu et al. (2014) compared multiple regression (MR) with ANN for total suspended solid (TSS) turbidity estimations using data measured with a hyperspectral spectroradiometer and found that the non-linear transformation function of ANN performed better than MR. Wang et al. (2011) used the support vector regression (SVR) method to retrieve various water quality estimators from SPOT-5 satellite data. SVRs showed potential in solving problems with small sample size, non-linearity, or high dimension (Vapnik, 1995). Huo et al. (2014) stated that the lakes near urban areas or inside urban areas are becoming eutrophied or even hypereutrophied due to excessive urbanization and a fast growing economy. The authors used genetic algorithms combined with support vector machines (GA-SVM) to build an inversion model for eutrophic indicators such as Chl-a from Landsat ETM imagery. They showed that the GA-SVM based method had better prediction accuracy than the traditional statistical regression methods and ANN based approaches. According to Sharaf El Din et al. (2017), modeling water quality using satellite data is a complex problem, and conventional regression-based approaches can not perform well while modeling such complex relationships between water quality and RS data. The authors claimed that the proposed Landsat8-based-BPNN-back propagation neural network-to estimate water quality (both optical and nonoptical) worked better than SVM-based methods. Moreover, the authors mentioned that, compared to the BPNN-based methods, the SVM-based methods could produce very different results due to differences in parameter selections, kernel-selection, high algorithmic complexity, and extensive memory requirement. The developed model showed R 2 > 0.9 for the water quality indicators such turbidity, total suspended solids (TSS), chemical oxygen demand (COD), biological oxygen demand (BOD), and dissolved oxygen (DO). Recently, Hafeez et al. (2019) compared several ML techniques including artificial neural networks, random forests, cubist regression, and support vector regression for estimating the concentrations of suspended solids (SS), Chla, and turbidity using Landsat data. The results showed that the ANN-based model achieved the highest accuracy in estimating the above mentioned water quality indicators. In an another recent study, Govedarica and Jakovljević (2019) used 4-years of time-series data of in-situ monitoring of surface water bodies for the calibration and validation of a water quality estimation based on SVM and ANN algorithms using Landsat 8 data. The work also compared the estimations based on Landsat 8 with the Sentinel-2 data and found that, due to higher spatial and spectral resolution, Sentinel-2 data is a better alternative for water quality monitoring. Interestingly, the results showed that SVM produced more accurate results than ANN when used with Landsat data, whereas ANN provided better estimation accuracy for turbidity and TSS than SVM, and lower accuracy for TN and TP than SVM when used with Sentinel-2 data. Finally, Wang et al. (2017) conducted a study that combined a ML algorithm and remote sensing spectral indices [difference index (DI), ratio index (RI), and normalized difference index (NDI)] through fractional derivatives methods and in turn establishes a model for estimating and assessing the water quality index (WQI) (2.3). For this study, the WQI was calculated using sensitive wave bands and a spectral index of hyperspectral data, and particle swarm optimization (Kennedy and Eberhart, 1995;Shi and Eberhart, 1998)-support vector regression models (PSO-SVR), which deploy a population of candidate solutions over the SVR search space. Through comparisons of the predictive effects of the 22 water quality index estimations determined by the PSO-SVR, Wang et al. (2017) demonstrated that the model based on RI, DI, and NDI values of the 1.6 order was better performing than the others for predicting the water quality index of the semiarid area of central Asia [R2 (0.92), RMSE = 58.4, RPD (2.81) and a slope of curve fitting of 0.97].

Impervious Surface Detection
Urban impervious surfaces such as roads, driveways, sidewalks, and parking lots prevent water from infiltrating into soil, which has impacts on urban hydrology, groundwater, and water quality. Impervious surfaces facilitate pollutant's movements to nearby water bodies during heavy rain and urban flooding (Hall and Hossain, 2020). In the context of ML, identifying impervious surfaces from RS data is fundamentally a classification approach. However, many index-based approaches for sighting impervious surfaces using RS (e.g., Weng, 2012) focus on the developments in this area that use ML algorithms. Recently, Yao et al. (2017) adopted a one-class classification approach to detect impervious surfaces using high-resolution GF-1 satellite images, and found that Presence and Background Learning (PBL) and Positive Unlabeled Learning (PUL) outperformed SVM models in detecting impervious surfaces. Miao et al. (2019) also used a one class classification technique and Landsat-8 imagery for impervious surface classification. In a similar study, Bian et al. (2019) used a random forest algorithm and time-series data from multiple satellites HJ-1A/B and GF-1/2 to estimate the changes in the impervious surface percentage over the years 2009-2017. Lin et al. (2019) addressed the challenges in detecting impervious surfaces due to the diversity of land use and shadow effects in high-resolution satellite imagery using a dictionary sparse representation classification and data fusion approach with WV-2, GeoEye-1, TerraSAR-X, and LiDAR. Zhang H. et al. (2019) addressed similar issues by using a deep CNN approach with data fusion from optical and SAR satellites WV-3, Sentinel-2, and Radarsat-2. Similar other works,

IDENTIFICATION AND ASSESSMENT OF MULTI-HAZARD RISK
Multi-hazard identification and compound risk assessment inform effective planning activities and strategies (FEMA, 2015), and help water managers prioritize attention, investment, and recourse (Dickson-Anderson et al., 2016) to target the most urgent and the highest impact risks. Risk is defined as a combination of hazard, exposure, and vulnerability (Garrick and Hall, 2014). Because exposure in urban areas is relatively high due to the high density of population and man-made structures (Hoekstra et al., 2018), cities without proper preparedness and adaptation strategies are vulnerable to a wide variety of urban water hazards (Shaw et al., 2016;Eldho et al., 2018;Hoekstra et al., 2018;Gangrade et al., 2019;Rahmasary et al., 2019) that are often causally linked to further hazards. Additionally, coincidental hazards may occur, resulting in a compounding effect overwhelming the ability of local or national governments to respond (Liu and Huang, 2014). For example, a specific urban water hazard such as flooding can lead to multiple risks (Dai et al., 2017;Cook et al., 2019) that include inundation of building structures, damage to infrastructure, and/or the spread of water-borne diseases (Gangrade et al., 2018;Pereira, 2018). Consequently, multi-hazard risk assessment techniques must be conducted in the urban water management sector in a manner that considers the combined effects and interactive reactions of multiple urban water hazards in urban areas (Garcia-Aristizabal and Marzocchi, 2013;Gruber and Mergili, 2013;FEMA, 2015;Karlsson et al., 2017).
Despite its usefulness for hazard mitigation planning, multihazard risk assessment has been under-emphasized in natural disaster management and planning (Rahmati et al., 2019) due to the difficulty of analyzing the risk for more than one hazard in the same area, and of analyzing their interaction. In the past, studies have focused primarily on forecasting and controlling hazards, and their physical processes  in natural areas, without considering the social and economic impacts of these hazards in urban areas (e.g., hazard effects on buildings, infrastructures, and agriculture). Previous studies, which intended to analyze hazard risk and social vulnerabilities, only analyzed the risks of single hazards separately (Bühler et al., 2013;Statham et al., 2017) using physical or statistical models [e.g., flood impact using the HEC-FIA model (Lehman and Light, 2016) or economic damage to fisheries caused by surface water pollution using AQUATOX model (Park et al., 2008)]. In general, most past studies do not consider the multi-hazard chain (hazard interaction) and the combined risk of coupled hazard events (Garcia-Aristizabal and Marzocchi, 2013;Rahmati et al., 2019). Although a few studies (Freeman and Warner, 2001;Newman et al., 2017) analyze the components of different types of vulnerability and risk by evaluating physical, social, and economic consequences of a chain of urban hazards, developing a systematic approach for multi-hazard risk assessment using conventional modeling methods faces multiple challenges. These challenges are primarily associated with (a) integrating multiple physical or statistical models and domain-data that only target single hazards to simulate a multi-hazard chain and predict the combined effect of multiple urban water hazards, and (b) in-depth understanding of hazards, including interconnections between different hazards, and dynamics behind multiple hazards. In the presence of hydrocomplexities, many underlying mechanisms of urban water hazards remain unknown. Therefore, conventional methods based on physical modeling alone may not be the best way to assess multi-hazard risk in urban water systems.
In recent years, advanced ML methods have been used to develop innovative multi-hazard risk assessment frameworks and workflows, which are able to address the challenges associated with conventional risk assessment techniques. The feasibility of applying ML to multi-hazard risk assessment is shown by the following: (a) ML is a subfield of artificial intelligence and data-driven analysis where ML models can easily identify trends, patterns, and empirical relationships in a large volume of data without considering detailed physical processes behind a phenomenon, such as the interactive reactions between multiple water hazards (Dibike and Solomatine, 2000;Rahmati et al., 2019), and (b) ML models are capable of handling data that are multi-dimensional and multi-domain (Anzai, 2012). In this section, we review several ML workflows and applications that are designed to support the analysis of multi-hazard risk for mitigating water-related hazards.
For example, Rahmati et al. (2019) investigated and mapped multi-hazard exposure using several ML models including BRT (Boosted Regression Trees), GAM (Generalized Additive Model, a regression which can include linear or non-linear predictor variables and predicted values potentially following any of a variety of probability distribution functions), and SVM (Support Vector Machines), and they evaluated the performance of these ML models using threshold-dependent and thresholdindependent methods. The study consists of several steps: (1) selection of predictive factors for modeling multiple hazards (e.g., flood, landslide, soil erosion, and debris flow), (2) creation of Multi-Hazard Inventory using records from road organization and the regional water company (RWC) to document the occurrence of various hazards, (3) application of ML models to predict and map the exposure of multiple hazards, and (4) evaluation of the accuracy of these models. The results of this study indicate that (a) different ML models differed in their accuracy of predicting the different hazards (Rahmati et al., 2019), and (b) the applied ML models are useful and generalizable for multi-risk mapping around the world.
Another example of a multi-hazard multi-model approach is Chen et al. (2019), in which the researchers evaluate the risk of regional flood disaster in the Yangtze River Delta (YRD) region. Based on the driving force, pressure, state, impact, and response (DPSIR) conceptual framework, the study first applies a random forest algorithm to screen important indices of flood risk. They then construct a radial basis function (RBF) neural network to evaluate the flood risk level. In this study, the radial basis function is the activation function for the ANN. The study approaches the urban flood risk assessment as a multi-classification problem using ML methods and indicates that only a few of the previous studies use ML theory to assess the urban flood disaster risks that are complex and associated with multiple sources and contributing factors. The study concludes that the level of urban flood disaster is closely related to rainfall, topography, economic development, land use, soil erosion, urban flood control investment, and disaster emergency response capability, shedding light on effective regulation measures for improving flood prevention in urban environments.

Exploration of Complex and Interconnected Hazards and Risks
To explore complex and interconnected hazards and risks, Xu et al. (2019a) present a visual analytics framework that combines various types of ML applications (e.g., feature selection, classification, and multivariate clustering analysis) with different geo-visualization techniques to analyze multi-hazard risk at culverts due to flooding and sedimentation. ML models applied in this study include the classification schemes, random forests and Self Organizing Maps (SOM), and are used for exploratory data analysis, aiming to improve the understanding of the factors and interconnected hazards (e.g., flooding, excessive erosion, and sediment transport in rivers) that contribute to the sedimentation and flood over-topping of culverts (transportation infrastructure). The results of the study show that ML application can be used not only for multi-risk assessment and hazard prediction but also for exploring the complex and interconnected processes behind multiple hazards. Additionally, the same framework can be readily extended to analyze multiple hazards at other hydraulic structures, such as bridges and weirs. Pourghasemi et al. (2020) presented a ML workflow, debuted as the Sendai framework, for assessing and mapping multi-hazard risk susceptibility, with an overall objective of reducing hazard risk and increasing sustainable development in urban areas. The workflow entails three main steps: (1) data preparation for obtaining the location of various hazards (floods, forest fires, and landslides), (2) recognition of the most important factors contributing to the occurrence of different hazards using the Boruta algorithm (a wrapper around random forest classification that iteratively removes irrelevant features from the data), and (3) construction of multi-hazard susceptibility maps along with validation processes using the random forest model and the preparation of a Multi-hazard Probability Index (MHPI) for the study area. The significance of the Sendai framework is that it (a) creates a reasonable understanding of the factors controlling flood and forest fire through ML-powered variable ranking and landslide occurrence, and (b) produces a multi-hazard probability map for facilitating integrated and comprehensive watershed management and land use planning.

Hybrid Modeling for Multi-Hazard Risk Assessment
A few researchers have applied hybrid models to water-related multi-hazard risk assessment. For example, Yang T. et al. (2019) used long short-term memory units (LSTM) to improve the timing component of the amplitude of peak discharge for flood simulations produced with global hydrological models over different climate zones. Hajigholizadeh et al. (2018) used hybrid models for predicting and assessing water erosion vulnerability and risks, as well as for the optimization of management strategies for agricultural or soil and water conservation practices. Application of hybrid models to these multi-hazard hydrological risks is still emerging within the domain, but the utility of this approach continues to be demonstrated across a variety of hydrological applications.

SELECTION OF BEST MANAGEMENT PRACTICES
The proper selection and placement of Best Management Practices (BMPs) is a critical planning process that helps many watershed and urban planning communities effectively mitigate water-related hazards and manage urban water resources (e.g., stormwater management, water pollution reduction, and erosion controls) (Cheng et al., 2006;NRCS, 2011;USEPA, 2018). These BMPs are carefully selected from a pool of planning and mitigation alternatives that exists in various forms. Based on their spatial scales, these alternatives can be categorized as either localized alternatives, which are city-scale practices for protecting the municipal water supply and infrastructure through structural actions and non-structural actions, and watershed alternatives, which represent the management of land cover and land-use at the watershed scale (Carson et al., 2018). The selection of BMPs is a complex multi-objective optimization problem that requires the consideration of multiple planning objectives and criteria, which aim to maximize the environmental and social benefits for multiple urban communities, while minimizing the economic cost for the implementation of these management practices (Maringanti et al., 2008;Rodriguez et al., 2011). The development and advancement in GA (section 2.4.1) have provided watershed management communities with a method for solving complicated optimization problems that are associated with the selection of BMPs. GA are capable of handling complex and irregular solution spaces when searching for a global optimum (Chambers, 2000;Rodriguez et al., 2011) in a multiobjective optimization. Multiobjective optimization has been defined as "vector optimization" (Cohon and Marks, 1975) for which the objective function is a vector containing scalar objectives subject to a set of constraints, and for which Pareto optimal solutions show the best performance.  evaluated a variety of multiobjective optimization GA as applied to rainfall-runoff calibration, long-term groundwater monitoring, and risk-based water supply portfolio planning. They found five best performing algorithms, of which their highperformance adaptive search Borg algorithm  was the most scalable and the best performing, and has shown particular stakeholder usefulness in its incorporation into a visual and interactive decision support framework .
In the water quality management sector, several studies applied GA-based optimization models to find optimal solutions to water quality problems for several watersheds in the United States by connecting non-point pollution reduction models with economic components (Srivastava et al., 2002;Chen et al., 2015). In the stormwater management sector, Limbrunner et al. (2013) applied classic optimization techniques to stormwater and non-point source pollution management at the watershed scale, and compared their effectiveness for finding optimal solutions to that of genetic algorithms, and linear and dynamic programming. Dynamic programming proved to find the most efficient solution to the sediment-managementoptimization problem.
In addition to the optimization of planning alternatives, ML methods can enable selection for optimal management practices (Savic, 2019). AI-driven applications are envisioned to learn from the human decision-making process, during which best management practices are selected by planners and watershed managers based on their past experiences.

VISION: NEW APPLICATIONS OF MACHINE LEARNING TO URBAN WATER SECURITY
In order to ensure high-quality and timely water availability in the right quantities for urban areas, water resources must be managed well. In order for water resources to be managed well, a planning system leading to actions that promote sustainability and urban water security must be in place at the municipal level. We have shown that ML can help with this system as it applies to every stage of disaster management and planning, as outlined sequentially on the left hand side of Figure 2 and shown as an interconnected and cyclical process on the right side. That is, we have outlined a variety of ML applications for facilitating the individual disaster management stages and planning processes. For long-term planning and mitigation, we have presented studies that use ML methods to identify and assesses multihazard risks and vulnerability in urban water systems, taking into account socio-economic factors and the multi-hazard chain. We have also discussed how ML can help optimize the selection of urban best management practices for reducing water pollution and supporting storm water management. For early warning and hazards prediction, we have examined a range of ML applications for supporting the prediction of various water-hazard related parameters. We included studies that combine ML methods with process-based models (e.g., conceptual and physics-based hydrological and sediment transport models) into hybrid models to increase the accuracy and speed of the predictions for water hazard-related parameters. We have also discussed how FIGURE 2 | Potential ML opportunities for improving both the generic hazard mitigation stages (left) and detailed long-term planning steps (right).
innovative combinations of ML and remote sensing technologies can improve the discovery and extraction of useful hazard information and features that are critical to early-warning, rapid response and rescue, and recovery and restoration.
Our vision is that these methods can be combined into ML water management workflows that build on those already in use for characterizing and predicting multi-hazard hydrological events. By weaving together the ML methods we have described, long-term management processes including the six steps shown on the right hand side of Figure 2 and outlined in the introduction can be captured. For example, risks associated with flood, drought and water quality can be identified using genetic algorithms, artificial neural networks, support vector machines, random forests, and other types of regression and hybrid models. Then planning objectives can be determined by weighing social risk and adaptive capacity using agent-based models, boosted regression trees, generalized additive models, and support vector machines. To inventory data, ground-based and satellite-based data can be reckoned, cataloged, and formatted for use in spatialrelationships-based queries, k-nearest neighbors, analytical hierarchy processes, and convolutional neural networks. To select mitigation approaches, classification schemes can be used along with multi-criteria decision methods. Uncertainty estimates can be used to evaluate the mitigation approaches selected. Finally, the insight gained from the ML results may be discussed by the planners to modify and implement the approaches determined.
ML is often not the first choice of analytical tools for planners for a variety of reasons. The first is that reasonably robust methods with known uncertainty for analyzing water risks are well established and accepted in the water management community. ML methods are less proven even if they often can perform better on data than the traditional methods. To address the uncertainty in ML methods, some researchers (e.g., Morrison et al., 2003;Duncan, 2014) use metrics such as Receiver Operating Characteristic Curves for scoring the diagnostic ability of a binary (or higher dimensional) classifier system, or alternative goodness-of-fit measures for evaluating the reliability of ML output. Others (e.g., Munafò and Smith, 2018) suggest a method of investigation called triangulation, in which multiple approaches (at least 3) are used to address one question. The uncertainty associated with a complete model chain is large, especially at the required level of decision-making under climate change, urbanization (Dessai et al., 2009), and the accumulation of uncertainty at each level of the assessment (Merz et al., 2010). However, while each ML method may have its own strengths, weaknesses, and unrelated assumptions, uncertainty quantification can help assign some degree of confidence to results obtained.
We observe that many aspects of urban water security and hazard modeling are still underrepresented as ML problems, in particular, those pertaining to the prediction of indirect effects of water-related hazards and their associated risks. Additionally, the use of ML techniques often requires additional mathematical and computational training (and often large high performance compute resources) beyond traditional statistical methods, and time constraints of working water managers may not allow for this additional training. Nevertheless, understanding the development of sustainable urban water management planning, we can draw lessons from history and devise sensible approaches for the future that include ML. If we view hydrological systems as "structurally co-constituted of natural, engineered, and social elements, " (Brelsford et al., 2020), we may more readily employ ML to integrate disparate data and discover new perspectives on management practices based on the new patterns these methods reveal. In the near future, We also envision an increase in the applications of the hybrid modeling approaches (i.e., theory-guided ML) (Mekonnen et al., 2012;Karpatne et al., 2017;Frame, 2019) in the urban water management sector through the integration of data-driven ML methods and conventional process-based domain models.

AUTHOR CONTRIBUTIONS
MA-D suggested the focus on urban hydrology, co-wrote the introduction and vision sections, and wrote the text on machine learning for flooding. HX developed the outline and organization of the paper based on urban water management practices, co-wrote the introduction and vision, and wrote the text for multi-hazard risk, soil erosion and sediment transport, and selection of best management practices. KK co-wrote the introduction, provided the research trend analysis on the historical application of machine learning to water management and hazard, and wrote the text on applications of machine learning to disaster management using remote sensing. DR wrote the sections on machine learning application to drought and water quality characterization and prediction. All authors edited and revised the document throughout.