Prediction of River Pollution Under the Rainfall-Runoff Impact by Artificial Neural Network: A Case Study of Shiyan River, Shenzhen, China

Climate change and rapid urbanization have made it difficult to predict the risk of pollution in cities under different types of rainfall. In this study, a data-driven approach to quantify the effects of rainfall characteristics on river pollution was proposed and applied in a case study of Shiyan River, Shenzhen, China. The results indicate that the most important factor affecting river pollution is the dry period followed by average rainfall intensity, maximum rainfall in 10 min, total amount of rainfall, and initial runoff intensity. In addition, an artificial neural network model was developed to predict the event mean concentration (EMC) of COD in the river based on the correlations between rainfall characteristics and EMC. Compared to under light rain (< 10 mm/day), the predicted EMC was five times lower under heavy rain (25–49.9 mm/day) and two times lower under moderate rain (10–24.9 mm/day). By converting the EMC to chemical oxygen demand in the river, the pollution load under non-point-source runoff was estimated to be 497.6 t/year (with an accuracy of 95.98%) in Shiyan River under typical rainfall characteristics. The results of this study can be used to guide urban rainwater utilization and engineering design in Shenzhen. The findings also provide insights for predicting the risk of rainfall-runoff pollution and developing related policies in other cities.


INTRODUCTION
Rapid urbanization has adverse effects on the natural environment, especially in aquatic environments. Due to changes in the hydrological cycle and the high diversity of pollutants, urban rainfall-runoff pollution has become a major problem (Kammen and Sunter, 2016). Especially in the initial stage of rainfall, the river pollutant content is the highest in the entire runoff process, which is referred to as the first flush effect (Gnecco et al., 2005;Feng et al., 2017). Common contaminants of river mainly include suspended solids, nutrients and heavy metals which have a major effects on the water quality of urban rivers (Perera et al., 2019;Yang et al., 2021).
Rapid urbanization has increased the impervious areas in cities, thereby reducing rainwater infiltration and increasing the total amount of runoff into urban rivers and the pollution load of urban surface runoff (Chen et al., 2017;He et al., 2018). Li et al. (2021) found that human activity has contributed to long-term reductions in the total amount and frequency of weak precipitation and the significant increases in the total amount and frequency of heavy precipitation in China. Human activities such as land use and land cover change, the construction of dams and irrigation canals, and mining have altered urban river runoff (Adeyeri et al., 2020) and significantly affected the rainfall characteristics. Urban rainfall-runoff pollution has become the main cause of global urban water pollution (Wang et al., 2021). Thus, identifying the factors affecting this type of pollution is critically important for controlling urban river pollution.
To make timely predictions related to rainfall-runoff pollution with minimal data, researchers have begun to apply machine learning methods. These methods do not require a comprehensive understanding of the mechanism underlying the interactions between various parameters. These methods are also effective for simulating nonlinear and non-stationary hydrological environmental processes (Wang and Yao, 2013;Badrzadeh et al., 2015). Machine learning-based methods have shown advantages for the analysis of rainfall-runoff pollution, and with the development of data science, various machine learning methods have been explored and developed to predict rainfall-runoff pollution in urban rivers (Jeung et al., 2019). These methods including random forest (RF), gradient boosting decision tree (GBDT), and extreme gradient boosting (XGBoost) methods, which have been applied to analyze the relationships between rainfall characteristics and runoff pollution (Wu et al., 2014;Wang et al., 2015). RF algorithms have been used to rank the importance of multiple rainfall characteristics affecting the initial scouring effect of river runoff, revealing the following six most important characteristics: total rainfall amount; maximum rainfall intensity in 5 min; rainfall duration; total amount of runoff; peak runoff; and average rainfall intensity (Alias et al., 2014;Perera et al., 2019). By using the boosting method, GBDT will integrate multiple decision trees (DT) for analysis, which has shown a good prediction performances (Liang et al., 2020). Huan et al. (2020) applied a GBDT method to select characteristic factors with strong effects on dissolved oxygen and used these factors as input data to reduce the time needed for calculations. Joslyn (2018) evaluated the performance of XGBoost in predicting nine water quality factors (each factor was separately predicted using the other eight factors) and obtained success rates ranging from 80% to 90%.
Rainfall characteristics can also significantly affect the concentrations of river pollutants (Feng et al., 2015;Zhang et al., 2021), and rainfall duration and rainfall intensity are two of the most important factors affecting hydrological processes (Ran et al., 2012). Rainfall-runoff pollution is also affected by other rainfall characteristics such as the total amount of rainfall, which can affect the scouring effect of runoff , and the dry period (Zhang, 2011;Pang et al., 2012). Jeung et al. (2019) assessed the effects of rainfall characteristics on water quality parameters in urban rivers and found that different water quality parameters were affected by different rainfall characteristics; for example, biochemical oxygen demand and chemical oxygen demand (COD) were closely related to rainfall intensity, whereas total organic carbon and total phosphorus were strongly affected by the dry period. Gnecco et al. (2005) analyzed the event mean concentrations (EMCs) of various pollutants in urban rivers and found a strong correlation between maximum rainfall intensity and EMC.
Although several previous studies have analyzed rainfall characteristics and their qualitative effects on rainfall-runoff pollution, few studies have quantitatively analyzed the effects of rainfall characteristics on its pollution. Neural network models including artificial neural network (ANN), convolutional neural network, and back-propagation neural network models can consider multiple rainfall features together to predict rainfallrunoff pollution (Wu et al., 2014;Wang et al., 2015;Chau, 2017;Fotovatikhah et al., 2018). Some researchers have used neural network models to generalize the complex relationships between rainfall characteristics and water quality parameters to enhance the accuracy of rainfall-runoff simulation and prediction (Fernandes et al., 2020). For example, ANN models can be used to accurately determine whether the surface water quality meets the criteria set by national regulations or quantify the characteristics of water bodies (Palani et al., 2008;Shi et al., 2018). Using rainfall characteristics such as rainfall duration and confluence area as inputs and EMC as a training target, a back-propagation neural network model showed high accuracy for evaluating the total amount of pollutants in rainwater runoff (Tian, 2016). Ye et al. (2020) summarized the characteristics of neural network models used in environmental pollution research and found that ANN models can significantly improve the efficiency of pollutant prediction in rivers.
The objectives of this study were to analyze the CODs of Shiyan River and Shiyan Reservoir and explore the relationships between COD and rainfall characteristics. To achieve these objectives, we: 1) ranked the importance of different rainfall characteristics in terms of their effects on rainfall-runoff pollution using various integrated learning methods; 2) quantified the relationships between rainfall characteristics and runoff pollution in Shiyan River using an ANN model; and 3) estimated the non-point-source pollution load based on the typical EMC in the Shiyan River. As verified in this paper, the data-driven method presented herein can quickly predict the COD of the river. The findings provide a reference for water quality analysis in other fields.

Study Area
Over the past 40 years, precipitation and extreme precipitation in the western urban area of Shenzhen have increased. Changes in the underlying surface and rainfall characteristics have also affected the temporal and spatial distributions of non-pointsource pollution. In this study, Shiyan River in Shenzhen, China is taken as a research case, as can be seen in Figure 1. Shiyan River is located in Shiyan Street, Bao'an District and is a first-level tributary in the Maozhou River Basin. The total length of Shiyan River is 10.44 km, and the catchment area of the basin is 27.05 km 2 . The Shiyan River eventually merges into the Shiyan Reservoir, which is one of the four major reservoirs in Shenzhen and one of the largest sources of drinking water in Bao'an District. With the rapid economic development of Bao'an District, Shiyan River and Shiyan Reservoir will play increasingly important roles in water supply.

Data
The following data were used in this study. First, the hourly/daily/ annual river discharge (m 3 /s) and COD data (mg/L) for Shiyan River from 2009 to 2012 were obtained from Qin et al. (2013). We also used data from Qin et al. (2013) to select the most influential rainfall characteristics for EMC and build the ANN model based on rainfall characteristics to predict the EMC value. Rainfall-runoff data (mm/min) for 2013-2018 were obtained from Li (2020).
Due to the uncertainty of rainfall events, there is currently no uniform and clear criteria for classifying rainfall events. Based on recent research considering the effects of rainfall confluence time and rainfall duration, 180 min was used as the minimum time interval between two rainfall events, and the cumulative rainfall of each event had to be greater than 3 mm (Huang et al., 2021). According to the amount of rainfall in 24 h, all rainfall events were divided into six categories: light rain (< 10 mm); moderate rain (10-24.9 mm); heavy rain (25-49.9 mm); torrential rain (50-99.9 mm); rainstorm (100-249.9 mm); and extraordinary rainstorm (> 250 mm). Based on the above definitions of rainfall event and rainfall type, we obtained the typical rainfall characteristics (total amount of rainfall, rainfall duration, maximum rainfall per minute, maximum rainfall in 10 min, rainfall intensity, and dry period) for the study area and used  (Li, 2020 them in the subsequent verification of the pollution load of Shiyan River (Table 1).

Methods
As shown in Figure 2, we first processed the minute-level rainfall data (2013-2018) into different rainfall characteristics (total amount of rainfall, rainfall duration, maximum rainfall per minute, maximum rainfall in 10 min, rainfall intensity, and dry period). Subsequently, we identified the rainfall characteristics that most strongly affect EMC using mathematical statistical methods and three integrated learning methods (RF, GBDT, and XGBoost). Next, we developed an ANN model to predict the EMC values of Shiyan River by inputting typical rainfall characteristics under different rainfall types in terms of rainfall intensity per 24 h (namely, light rain, moderate rain, heavy rain, and torrential rain). Finally, we calculated and verified the runoff pollution load of the Shiyan Reservoir using the predicted EMC value.

Integrated Learning Methods
The integrated learning methods mainly completes research tasks by building and combining multiple different approaches, which can have a high accuracy rate. In this study, three widely used integrated learning algorithms (RF, GBDT, and XGBoost) were used to analyze the importance of various rainfall characteristics. RF is one of the most popular algorithms for solving classification and regression problems in recent years, with extremely high accuracy. GBDT can process a wide range of data types, and tuning parameter is relatively easy. XGBoost is efficient and flexible, which prevent overfitting and reduce model complexity.

Random Forest
RF is a method for accurately classifying large amounts of data by creating multiple decision trees. The RF algorithm consists of a combination of tree classifiers where each classifier is generated using a random vector sampled independently from the input vector, and each tree casts a unit vote for the most popular class to classify an input vector. The decision trees use the CART algorithm to select variables based on the splitting criteria of the root node and make judgments based on the characteristic evaluation standard; the root node recursively generates child nodes through the internal node. The internal nodes represent the judgments of the characteristics, and each child node represents a regression result. Random attributes are introduced into the training process of decision trees, and the results are determined by the predicted mean values of multiple decision trees. Averaging can alleviate the problem of high variance and high deviation by finding a natural balance between the two extremes. Because RFs are often used as black-box models, they can generate reasonable predictions for data without configuration.

Gradient Boosting Decision Tree
GBDT is a characteristics selection method with high interpretability. GBDT has high nonlinear processing ability when considering the interactions of multiple groups of characteristics (Huan et al., 2020). GBDT is a powerful machine learning tool consisting of three parts: regression decision tree, gradient boosting, and shrinkage. GBDT is based on the linear combination of basic functions; multiple rounds of iteration are performed, and each round of iteration produces a weak classifier (regression decision tree). Each classifier is trained based on the gradient of the previous classifiers, and the accuracy of the result is continuously improved by reducing the deviation. The algorithm aims to obtain a set of decision rules using the original characteristics as input to create a new decision tree (He et al., 2014). Frontiers in Environmental Science | www.frontiersin.org June 2022 | Volume 10 | Article 887446

Extreme Gradient Boosting
XGBoost is an integrated machine learning algorithm based on decision trees that uses a gradient boosting framework. XGBoost has been widely used for regression, classification, and other applications. The core idea of XGBoost is to increase the number of decision trees by continuously splitting characteristics. Every time a tree is added, a new function f(x) is learned to fit the residual of the last prediction. When the training is completed, resulting in k trees, the score of a sample is predicted based on the characteristics of this sample, and the score corresponding to each tree is added to the predicted value of the sample. This method uses normalization in the objective function to prevent overfitting and reduce the complexity of the model.

Artificial Neural Network Models
ANN models are networks of parallel distributed information processing systems that link input vectors to output vectors. They are composed of many information processing elements called neurons or nodes (Bisht et al., 2013). ANNs are mainly composed of three parts: the input layer, hidden layer, and output layer. The input layer primarily provides input data for the ANN model, and the hidden layer performs various transformations of the data (fitting the data by adjusting the function type and the number of neurons in the hidden layer), thus enhancing the network's ability to simulate complex functions. The output layer is considered to be a summary of the parallel calculation results performed by the hidden layer. The result of each neuron is the input of neurons existing in the next layer of the network, and the result of the output layer can be compared with the observed result (Haghiabi et al., 2018). Because the model is relatively simple and convenient for practical applications and prediction, it is a powerful tool for modeling many nonlinear hydrological processes; for example, ANN models have proven effective for use in the fields of water quality analysis and prediction.

Definition of Event Mean Concentration
Rainfall-runoff pollution events are characterized by uncertainty, and river runoff pollution is affected by factors such as the rainfall characteristics and underlying surface types. Instantaneous FIGURE 4 | PCA results between rainfall characteristics and EMC. PC1 and PC2 represent 42% and 20.5% of the total variance of the data, respectively. Red dots represent the scores of rainfall events. The red circle represents the 95% confidence interval. The blue arrows represent the proportions contributed by rainfall characteristics to PC1 and PC2. pollution concentrations do not accurately capture the characteristics of runoff pollution (Lee et al., 2011), it is necessary to find a variable to describe the pollution concentration in each rainfall-runoff pollution event. Therefore, in each event, COD was used to characterize water quality, and the degree of rainfall-runoff pollution was analyzed based on the event mean concentration (EMC) (Kim et al., 2007), which was calculated as the following equation (EMC formula): Where: EMC = event mean concentration, mg/L; M = amount of pollutant, mg/L; V = runoff volume, m 3 ; T = rainfall duration, s; C t = concentration of pollutant over time, mg/L; and Q t = flow rate over time, m 3 /s.

Model for Estimating Rainfall-Runoff Pollution
Due to the randomness of surface runoff discharge, the annual pollution load is usually to estimate the pollution load concentration of urban surface runoff, that is, the total amount of pollutants discharged from surface runoff caused by multiple rainfall events in a year (Li et al., 2010). The annual non-point-source runoff pollution load of Shiyan River was estimated based on the EMC values under typical rainfall characteristics. This estimation method has been widely used both within and outside of China (Wang, 2015). The annual runoff pollution load based on the EMC of the site was calculated using the following formula: where: L y = annual pollution load, t; C F = proportion of rainfall events that produce runoff, usually taken when data are lacking, 0.9 was the empirical coefficient (Wang, 2015); Ψ = runoff coefficient; A = catchment area, km 2 ; P = average annual rainfall, mm; and C = EMC, mg/L.

Effects of Rainfall Characteristics on Event Mean Concentration
In this study, Shiyan River and Shiyan Reservoir were taken as typical research areas to explore the correlation between rainfall characteristics and runoff pollution. COD was used as a typical metric for pollution analysis. For convenience, the EMC formula was used to convert COD to EMC. Due to the low frequency of rainstorms and extraordinary rainstorms (0.03 and 0.01, Prediction of Rainfall-Runoff Pollution respectively), these rainfall types were combined with torrential rainfall events in this analysis. Figure 3 shows the relationships between rainfall type and EMC based on rainfall data from 26 rainfall events (2009)(2010)(2011)(2012). The vertical axis of the box plot represents the degree of data dispersion, and the center represents the typical distribution probability of EMC. Therefore, the typical value of EMC represents the location corresponding to the maximum distribution probability. In general, the typical EMC value is largest during light rain; the EMC during light rain (based on the median value) was almost four times higher than that during moderate rain, which was associated with the lowest EMC. FIGURE 6 | Relationship between EMC and rainfall characteristics. Each black dot represents the EMC value of a rainfall event. The dark orange range represents the 95% confidence interval, while the light orange range represents the 95% prediction interval. Meanwhile, the EMC values corresponding to heavy rain and torrential rain were much lower than that for light rain.
Evaluating the effect of rainfall type on EMC requires analyzing the effects of different rainfall characteristics on EMC. In the case of light rain, the initial stage of rainfall will wash urban surface pollutants into the river, increasing the EMC. During moderate rain, rainfall runoff gradually increases, and the pollutants washed into the river channel change little, which reduces the EMC to a large degree. For heavy rain and torrential rain, the erosion of the river channel may wash pollutants from the deep layers of the ground into the river channel, resulting in a gradual increase in EMC.
Principal component analysis (PCA) was used to identify the correlations between rainfall characteristics and EMC. Figure 4 shows the results of the PCA of rainfall characteristics and EMC. PC1 and PC2 explained 42% and 20.5% of the total variance of the data, respectively. The proportion contributed by each rainfall characteristic to PC1 and PC2 can be obtained from the directions and lengths of the blue arrows in Figure 4. Most of the scattered points in Figure 4 are within the 95% confidence interval (shown by the red circle), which indicates that there were no obvious extreme values. A correlation matrix was used to calculate the correlation coefficients between EMC and the rainfall characteristics ( Figure 5). In Figure 5, red and blue represent positive and negative correlations, respectively, and the intensity of the color indicates the strength of the correlation. Among the rainfall characteristics, dry period showed the strongest correlation with EMC (correlation coefficient = 0.5). Average rainfall intensity was negatively correlated with EMC (correlation coefficient = −0.13). The initial, average, and maximum runoff intensities were highly correlated. If the angle between the two rainfall eigenvalues in Figure 4 was less than 30°, and the correlation coefficient was greater than 0.8 in Figure 5, we considered the correlation between two variables to be significant. Using these criteria, no significant correlations were observed between EMC and any single rainfall characteristic. Thus, it was necessary further analyze the data to reveal the relationships between multiple rainfall characteristics and EMC.
Using six rainfall characteristics (dry period, total amount of rainfall, average rainfall intensity, rainfall duration, maximum rainfall intensity in 10 min, and initial runoff intensity), we performed multiple linear regression fitting for EMC. Figure 6 shows the relationships between EMC and various rainfall characteristics. Each black dot represents the EMC value of a rainfall event, and each figure contains 26 rainfall events. By constructing a linear fitting curve, we can observe the effects of different rainfall characteristics on EMC. In Figure 6, the dark orange range represents the 95% confidence interval between different rainfall characteristics and EMC, while the light orange range represents the 95% prediction interval between them. A positive correlation was observed between the dry period and EMC, while EMC gradually decreased as the initial runoff intensity increased. As the total amount of rainfall increased, EMC first decreased and then increased; a similar trend in EMC was observed for the maximum rainfall intensity. Thus, EMC showed different relationships with the various rainfall characteristics.

Analysis of the Relative Importance of Different Rainfall Characteristics in Determining Event Mean Concentration
Three widely used machine learning algorithms (RF, GBDT, and XGBoost) were used to analyze the relative importance of rainfall characteristics affecting runoff pollution (Figure 7). The most important rainfall characteristic affecting EMC was the dry period followed by the average rainfall intensity, the maximum rainfall in 10 min, the total amount of rainfall, and the initial runoff intensity. The high importance of the dry period may be  related to the accumulation of surface pollutants during dry periods. During rainfall after an extended dry period, large quantities of pollutants are washed into the river, causing the EMC in the river to increase. However, due to the large differences in rainfall duration observed in the data, the effect of rainfall duration on EMC was relatively weak. The risk of runoff pollution was quantified by selecting the rainfall characteristics with the strongest effects on EMC based on the above analysis.

Construction and Analysis of the Prediction Model
Analyzing the input data improves the interpretability of the input data for neural network models. We established a model to predict runoff pollution from rainfall characteristics based on the relative importance of different rainfall characteristics and the availability of actual forecast data. The model considered the five rainfall characteristics with the strongest effects on EMC (i.e., dry period, average rainfall intensity, maximum rainfall in 10 min, total amount of rainfall, and initial runoff intensity), and the corresponding EMC values were used as the training targets. A total of 70% of the original data was used to train the ANN model, while the remaining 30% of the original data was used to verify the ANN model. Figure 8 shows the training performance of the ANN model. The model training performance was 0.976, the test performance was 0.989, and the verification performance was 0.883. It also can be found that most of these scattered points are clustered near the fitted line from Figure 8, which show that the response results of this model are good; thus, the model can be used to predict EMC. The frequencies of rainstorms and extraordinary rainstorms were extremely low; thus, these rainfall types were ignored in the analysis of runoff pollution. Therefore, the typical EMC values for each rainfall characteristic under the other four rainfall types were used to predict EMC.
Under typical rainfall characteristics, the predicted EMC value was highest under the light rain scenario (2612.20 mg/L), and this EMC was approximately two times the predicted EMC under moderate rain ( Table 2). The EMC prediction values for heavy rain and torrential rain were almost the same (both were one-fifth of the predicted EMC value for light rain). This may be explained by the fact that the dry period under the light rain scenario was typically longer than that under other rain types; thus, the dilution effect of pollutants washing into the river channel during light rain was weak. The dilution effect of heavy and moderate rain was more pronounced; under the torrential rain scenario, the predicted EMC value increased slowly because of the stronger effects of rainfall duration and runoff area. Rainfall events following long dry periods and light rain events require additional attention, and measures should be enacted in a timely manner to prevent water pollution.
The annual pollution load of non-point-source runoff for Shiyan River was estimated based on the predicted EMC values under typical rainfall characteristics (Table 1). EMC was used to approximate the annual runoff pollution load. Given that the traditional method of EMC value selection is relatively random, the predicted EMC values under typical rainfall events should be used. Hence, the estimation of rainfall-runoff pollution load considered the predicted values of EMC under typical rainfall events along with the probabilities of different rainfall patterns. The proportion of runoff events was 0.9, and the multi-year runoff depth of Shiyan River was 860 mm (average runoff depth is the product of the runoff coefficient and annual average rainfall). The EMC under different rainfall pattern was predicted based on the typical rainfall characteristics of the model, then the actual EMC value of this river can be obtained by weighted average of the probability values of different rainfall pattern, which was 1460 mg/L. Thus, according to the conversion of Eq. 2, the annual non-pointsource COD pollution load was calculated to be 497.6 t. The annual non-point-source pollution load of COD in the built-up area of Shiyan Reservoir was previously reported to be 477 t (Yang et al., 2013), indicating an estimation accuracy for our model of 95.98%. Integrated learning methods were used to analyze the relationships between rainfall characteristics and EMC for Shiyan River. Rainfall characteristics were then used to predict the water quality by constructing an ANN model. The main findings are summarized below. PCA of the rainfall characteristics revealed no significant correlations between rainfall characteristics. A positive correlation was observed between rainfall dry period and EMC, while EMC was negatively correlated with initial runoff intensity. Using mathematical statistical analysis and a variety of machine learning algorithms, we qualitatively described and ranked the effects of different rainfall characteristics on EMC. Among all rainfall characteristics, the dry period was the most important factor influencing EMC. This can be attributed to the build-up of surface pollutants as the dry period becomes longer. When rainfall occurs after a long dry period, large quantities of pollutants are washed into the river, causing EMC to increase. After dry period, the next most important rainfall characteristics were average rainfall intensity, maximum rainfall in 10 min, total amount of rainfall, and initial runoff intensity.
A model to predict EMC based on rainfall characteristics was constructed using the above five most influential rainfall characteristics as inputs, which greatly improved the interpretability of the neural network and the accuracy of the ANN model. The model training performance was 0.976, the test performance was 0.989, and the verification performance was 0.883. The prediction results under typical rainfall characteristics revealed that the runoff pollution caused by light rain is approximately two times that under moderate rain and five times that under heavy rain. Based on the predicted EMC values under typical rainfall characteristics, the annual nonpoint-source runoff pollution load of Shiyan River was estimated to be 497.6 t. The accuracy of the estimation method was 95.98%, indicating the robustness of the model.
We acknowledge that there are several limitations of this study. The relatively short time scale of the high-precision rainfall data in Shenzhen precludes a more in-depth study of future rainfall characteristics and trends under the influence of climate change. Because of the difficulty in monitoring rainfallrunoff pollution, the total number of samples used in this study was only 26 rainfall events. In order to prove the credibility of the results, we use RF and XGBoost methods to study this data. As can be seen in Figure 9, the simulation results of these two methods are good. The values of the coefficients of determination (r 2 ) obtained by RF and XGBoost methods are 0.879 and 0.914, which are similar to the results we predicted by ANN model (0.9). These results also prove that our findings are still reliable. Of course, subsequent studies should collect data over a longer time scale and consider the effects of land use and other factors on water quality/runoff pollution. The focus of this study was the relationships between rainfall characteristics and runoff pollution. Machine learning models can also be used to integrate rainfall and runoff data in future research.
The results of this study have implications for the utilization of urban rainwater and resources, engineering design, predicting rainfall-runoff pollution risk, and policy development.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.