A Proof-of-Concept Study for Hydraulic Model-Based Leakage Detection in Water Pipelines Using Pressure Monitoring Data

It is estimated that about 20% of treated drinking water is lost through distribution pipeline leakages in the United States. Pipeline leakage detection is a top priority for water utilities across the globe as leaks increase operational energy consumption and could also develop into potentially catastrophic water main breaks, if left unaddressed. Leakage detection is a laborious task often limited by the financial and human resources that utilities can afford. Many conventional leak detection techniques also only offer a snapshot indication of leakage presence. Furthermore, the reliability of many leakage detection techniques on plastic pipelines that are increasingly preferred for drinking water applications is questionable. As part of a smart water utility framework, this paper proposes and validates a hydraulic model-based technique for detecting and assessing the severity of leakages in buried water pipelines through monitoring of pressure from across the water distribution system (WDS). The envisioned smart water utility framework entails the capabilities to collect water consumption data from a limited number of WDS nodes and pressure data from a limited number of pressure monitoring stations placed across the WDS. A popular benchmark WDS is initially modified by inducing leakages through addition of orifice nodes. The leakage severity is controlled using emitter coefficients of the orifice nodes. WDS pressure data for various sets of demands is subsequently gathered from locations where pressure monitoring stations are to be placed in that modified distribution network. An evolutionary optimization algorithm is subsequently used to predict the emitter coefficients so as to determine the leakage severities based on the hydraulic dependency of the monitored pressure data on various sets of nodal demands. Artificial neural networks (ANNs) are employed to mimic the popular hydraulic solver EPANET 2.2 for high computational efficiency. The goals of this study are to: (1) validate the proof of concept of the proposed modeling approach for detecting and assessing the severity of leakages and (2) evaluate the sensitivity of the prediction accuracy to number of pressure monitoring stations and number of demand nodes at which consumption data is gathered and used. This study offers new value to prioritize pipes for rehabilitation by predicting leakages through a hydraulic model-based approach.

It is estimated that about 20% of treated drinking water is lost through distribution pipeline leakages in the United States. Pipeline leakage detection is a top priority for water utilities across the globe as leaks increase operational energy consumption and could also develop into potentially catastrophic water main breaks, if left unaddressed. Leakage detection is a laborious task often limited by the financial and human resources that utilities can afford. Many conventional leak detection techniques also only offer a snapshot indication of leakage presence. Furthermore, the reliability of many leakage detection techniques on plastic pipelines that are increasingly preferred for drinking water applications is questionable. As part of a smart water utility framework, this paper proposes and validates a hydraulic model-based technique for detecting and assessing the severity of leakages in buried water pipelines through monitoring of pressure from across the water distribution system (WDS). The envisioned smart water utility framework entails the capabilities to collect water consumption data from a limited number of WDS nodes and pressure data from a limited number of pressure monitoring stations placed across the WDS. A popular benchmark WDS is initially modified by inducing leakages through addition of orifice nodes. The leakage severity is controlled using emitter coefficients of the orifice nodes. WDS pressure data for various sets of demands is subsequently gathered from locations where pressure monitoring stations are to be placed in that modified distribution network. An evolutionary optimization algorithm is subsequently used to predict the emitter coefficients so as to determine the leakage severities based on the hydraulic dependency of the monitored pressure data on various sets of nodal demands. Artificial neural networks (ANNs) are employed to mimic the popular hydraulic solver EPANET 2.2 for high computational efficiency. The goals of this study are to: (1) validate the proof of concept of the proposed modeling approach for detecting and assessing the severity of leakages and (2) evaluate the sensitivity of the prediction accuracy to number of pressure monitoring stations and number of demand nodes at which consumption data is gathered and used. This study offers new value to prioritize pipes for rehabilitation by predicting leakages through a hydraulic model-based approach.
Keywords: pipeline condition assessment, pipeline leak detection, smart utilities, pipeline monitoring system, evolutionary optimization INTRODUCTION As the increasing paucity of water resources and the fast-growing water demands (Gupta and Kulat, 2018) in water distribution systems (WDSs) as a critical infrastructure in societies loom ahead, sustainable maintenance of WDSs operationally and financially is of an utmost essence (Gupta and Kulat, 2018;Momeni et al., 2018;Zhang K. et al., 2019;Al Qahtani et al., 2020;Shukla and Piratla, 2020). Specifically, leakage in WDSs reportedly makes up between 5 and 50 percent of the total freshwater losses depending on the conditions of the pipelines in developed countries (Gupta and Kulat, 2018;Sophocleous et al., 2019;Shukla and Piratla, 2020;Yazdekhasti et al., 2020). It is also estimated that a significant portion of catastrophic pipe breaks stems from undetected and thus unaddressed minor or moderate leaks as well as poor fittings (Grigg, 2017;Gupta and Kulat, 2018;Xie et al., 2019). Besides, detecting and addressing leakages in metallic and plastic pipelines through conventional techniques are found to be disputable, for instance, due respectively to difficulty in localizing welded joint failures (Zhang W. et al., 2018) and inaccuracies of low-frequency detection of plastic materials acting as low-pass filters (Gao et al., 2017). However, conventional leakage detection techniques are per se inclusive of cumbersome tasks which incur massive operational costs and are often labor-intensive (Liu et al., 2019;Ma et al., 2019). Hence, a systematic data-driven background leakage detection offering high accuracy and cost-effectiveness in WDSs plays an integral part in pinpointing and addressing the leak sources to both optimize energy consumption and prevent major future pipe breaks across a network (Gupta and Kulat, 2018;De Marchis and Milici, 2019). Recently, data-driven schemes of detecting and measuring the severity of leaks have been proposed to offer a paradigm shift. For instance, an estimation of life-cycle cost and energy consumption of a sensor-based, network-wide leakage monitoring detection system has been conducted (Yazdekhasti et al., 2020). Also, multiscale neural networks as well as various multi-objective optimization methods have been leveraged to employ consumption data for localization of leaks in a WDS (Creaco and Haidar, 2019;Zhang K. et al., 2019;Shukla and Piratla, 2020;Hu et al., 2021). However, what these methods seem to share is (i) relying partly on either human intervention or expensive tools and (ii) focusing mostly on detecting rather than measuring the severity of leaks with high accuracies. A hydraulicmodel-based scheme for leakage detection and most importantly severity assessment could offer promise given the growing adoption of smart water meters and continuous hydraulic monitoring of WDSs. As a result, building upon previous studies (Momeni et al., 2018(Momeni et al., , 2020Piratla and Momeni, 2019;Momeni and Piratla, 2021), this paper (i) offers a preliminary proofof-concept study of a fully data-driven hydraulic model-based prediction paradigm leveraging pressure monitoring data where not only are leak sources detected, but also their severities captured with a reasonable accuracy and (ii) conducts a series of sensitivity analyses of the very prediction model to the number and placement of smart meters and pressure monitoring stations. This preliminary study proves novel by shedding light on a wider scope of how consumption data can be leveraged to minimize the risk of major pipe breaks due to difficult-to-detect leaks without entirely relying on manual inspection techniques and consequently imposing less maintenance costs on municipalities.

MATERIALS AND METHODS
The fundamental methodology in this paper is to predict the leakage presence and its severity using a reverse-engineering data-driven condition assessment scheme by employing artificial neural networks (ANNs) and genetic algorithms (GA) for a modified version of Hanoi (Fujiwara and Khang, 1990;Piratla and Momeni, 2019) benchmark WDS. Consumption data (nodal demands from smart meters) and pressure monitoring data from the WDS are fed into neural networks in MATLAB 2020a to circumvent the time-consuming EPANET 2.2 hydraulic simulator toolkit. Then, the trained networks will be leveraged in genetic algorithms to predict the induced leakage by mimicking it through emitter nodes.

Hanoi Water Distribution Network Demonstration
Hanoi benchmark WDS, a metallic three-looped network, is modified by including emitter nodes in the middle of pipes to characterize the leakage at each pipe. Since this paper studies leakages at some of the pipes in Hanoi, emitter nodes are randomly placed on six and 12 pipes to establish two cases of actual leakage induction. Figure 1 shows the placement of such emitters for the two cases in Hanoi WDS. Table 1 represents the original Hanoi network geometric and hydraulic specifications.

Leakage Induction Model
Emitters in EPANET 2.2 function as nodes which characterize the outflow through a nozzle or orifice discharging to the atmosphere (Muranho et al., 2014;Sebbagh et al., 2018). Such emitter nodes are associated with emitter coefficients that can be leveraged to model the severity of abovementioned outflows to the atmosphere (i.e., leakage). It is hypothesized that leakage has a direct correlation with pressure which can be characterized as the summation of background and bursts leakage (Muranho et al., 2014;Soldevila et al., 2016;Adedeji et al., 2017b;Zhou et al., 2019). The popular pressure-leakage relationship for a given pipe j can be shown as follows (Georgescu et al., 2017): Where q leak j is the total discharge along pipe j in cubic meters per hour (cmh); l j is the length of pipe j in meters; α j and β j are parameters associated with background leakage model; C j and γ j are parameters of the bursts leakage model (EPANET 2.2 orifice formula); and P j accounts for average pressure in pipe j in meters which equals the pressure at the emitter node k placed in the middle of pipe j. The background leakage term is considered zero in this paper, so the simplified EPANET-based leakage equation is as follows (Adedeji et al., 2017a):  Where q leak j is the total discharge along pipe j in cmh; C j and γ j are parameters of the bursts leakage model (i.e., C j accounts for emitter coefficient at the emitter node placed in the middle of pipe j and γ j is the emitter exponent that equals 0.5 by default in EPANET 2.2); and P j accounts for average pressure in pipe j in meters which equals the pressure at the emitter node k placed in the middle of pipe j.
As the atmospheric discharge is deduced from Equation 2 at the emitter node (q leak j ), the emitter actual demand in EPANET 2.2 equals the induced leakage discharge to the atmosphere, which signifies: Where D k accounts for the actual demand at emitter node k in cmh; and q leak j is the atmospheric discharge in cmh. In order to characterize the amount of acceptable leakage at a given pipe, the absolute value of the proportion of atmospheric discharge at the emitter to the flow rate in the pipe comprising the emitter yields the amount of leakage percentage at the associated pipe. Equation 4 shows the leakage severity in percentage at pipe j: Where L loc j is the leakage severity in percentage locally at pipe j; D k accounts for actual demand at emitter node k in cmh; and F inflow j denotes pipe-j inflow to emitter node k in cmh. It is hereby postulated that the actual local leakage percentage at a given pipe is not meant to exceed a maximum value of 20% to only ensure the existence of major leaks rather than pipe breaks in Hanoi WDS.
Ultimately, in order to demonstrate the total leakage in the network, the following equation shows the network-wide leakage proportional to the total supply: Where q leak j denotes the amount of leakage discharge in cmh (derived from Equations 2 and 3) to the atmosphere at emitter node j; j is the index for the emitter nodes; Q sup is the total supply of water to the WDS in cmh; and L net j accounts for the network-wide leakage in percentage at emitter node j.
While inducing leakages in the two cases illustrated in Figure 1, it is hypothesized that the total amount of networkwide leakage ( j L net j ) must not exceed a maximum of 10% to characterize a real-world scenario.

Prediction Model Formulation
In order to implement an optimization procedure for prediction purposes, genetic algorithms (GA) have been selected for (i) their robustness in meta-heuristically triangulating on a set of rather than a single solution point, (ii) high capability of being finetuned thanks to a decent number of algorithmic parameters, and (iii) a built-in constraint function that stands out compared structurally to Harmony Search or Particle Swarm algorithms. The prediction model is established upon the prediction of emitter coefficients (E) at the given places in two cases of actual leakage induction presented in Figure 1 by minimizing the objective function in the GA optimization framework. The objective function is composed of the mean squared error (MSE) of pressure values at the pressure monitoring locations.

Decision Variables
Emitter Coefficients (E) constitute the decision variables of the optimization framework and the set for E is as follows: Where, e is the emitter coefficient for each of the given emitter nodes, and x is the number of considered emitter nodes in the WDS.

Objective Function
Pressure (P k ) measured at various pressure monitoring stations in the Hanoi water distribution network for a given set (k) of nodal demands (Q k ) are characterized as follows: Where, q is the nodal demand, y is the number of nodes in the WDS, p is the pressure measured at monitoring stations located in the WDS, and m is the number of pressure monitoring stations (PMSs) placed in the WDS. The genetic algorithm optimization framework is utilized to predict the set E using j sets of Q k and P k . For candidate (i) solution sets of E in the optimization process, pressures at the monitoring stations can be estimated as follows assuming all the dynamic condition parameters are known except for emitter coefficients: Where, g() denotes the hydraulic simulations that could usually be conducted through software applications such as EPANET 2.2, E i is a candidate solution set of emitter coefficients, i is the candidate solution reference in the optimization algorithm, and P k,i is the estimated set of pressure values at all the monitoring stations for the corresponding candidate solution set E i . The objective function in the optimization algorithm is to minimize Z whereby, Where a is the index for the pressure monitoring station, p a,k,i is the estimated pressure at PMS a for set of nodal demands k for candidate solution i, and p a,k is the actual measured pressure at PMS a for set of nodal demands k (obtained from set P k ).

Constraints
Three different constraints are employed in the proposed optimization model: (1) The minimum pressure head at all the nodes has to be >30 m for any candidate solution to be considered feasible; (2) Ensuring that none of the leakage flows at any of the emitter nodes would exceed a maximum value of 600 m 3 /h (cmh) based on Equations 2 and 3 (so as to avoid solutions with excessively high leakage flows); and (3) Constraining the maximum value of the local leakage (L loc j from Equation 4) at each of the emitter nodes to a value of 20% (so as to avoid ridiculously large leaks).

Algorithmic Parameters
Efforts were made to tune the GA parameters according to the number of decision variables, complexity of the prediction model, constraint features, and time-efficiency. Table 2 shows these GA parameters specified for the proposed model in this study.

Characterization of Artificial Neural Networks (ANNs)
According to Figure 1, since there exist two cases of actual leakage induction, thus two separate but identical series of neural networks are trained to predict pressures at all the PMS locations for given sets of nodal demands and emitter coefficients so as to bypass the time-consuming application of EPANET 2.2 simulator toolkit in MATLAB. Table 3 displays the properties of ANNs used in MATLAB to train the simulated data by employing resilient backpropagation function (Riedmiller and Braun, 1993) for optimization framework.

Formulation for Accuracy Measurement
This section offers accuracy metrics to analyze the performance of both trained neural networks and the prediction model numerically.

Neural Networks Accuracy Metric
The accuracy of trained neural networks (ANNs) is measured using a metric known as mean absolute percentage error (MAPE) (de Myttenaere et al., 2016;Khair et al., 2017) and is characterized as (Momeni and Piratla, 2021): Where pr i,j is the predicted value of pressure using the trained ANN model for node j in validation scenario i, sim i,j is the simulated value of pressure calculated using EPANET 2.2 for node j in validation scenario i, y is the number of nodes in the WDS, and l is the number of the validation scenarios.

MAPE
Where pr j is the predicted value of either emitter coefficient for emitter j or the leakage severity (q leak ) (see Equations 2 and 3) for the associated pipe in the middle of which emitter j is placed, act j is the actual value of either emitter coefficient for emitter j or the leakage severity (q leak ) (see Equations 2 and 3 for the associated pipe in the middle of which emitter j is placed, y is either the number of the considered emitter nodes or the number of considered leakage-induced pipes in the WDS.

Proof-of-Concept Demonstration
This section accounts for the demonstration of leakage prediction through the proposed data-driven asset management scheme by exemplifying two separate cases of actual leakage locations: (i) leakage at six orifice nodes and (ii) leakage at 12 orifice nodes. For each of these two cases, a single scenario of partial consumption data and random placement of pressure monitoring stations (PMSs) in Hanoi WDS is established for a partial input-output (I/O) data of 70% of nodal demands and eight pressure stations in order to analyze the accuracy, robustness, and reliability of the proposed leakage model. In other words, it is assumed that it would be possible to obtain nodal demands from 70% of Hanoi WDS's nodes through the use of some nominal smart water meters and that there would be eight pressure monitoring sensors placed in the Hanoi WDS that would gather and relay pressure data synchronously with the nodal demand data. After generating 200 (j in Equation 9) demand scenarios to represent data from smart meters and corresponding pressure data from the eight pressure monitoring stations, artificial neural networks (ANN) are trained to mimic and replace the EPANET 2.2 hydraulic simulator in MATLAB for optimization purposes by employing genetic algorithms. The input data for these neural networks includes 70% of actual demand data (nodal demands) along with emitter coefficients (representing the leakage) and output data includes pressure values harvested from eight various smart meters across Hanoi network to establish the objective function (mean square error of actual and simulated pressures) in the optimization framework. Moreover, another set of neural networks is trained for leakage constraint as mentioned in the methodology section. Input data for ANN in this case includes the aforementioned partial demand sets and emitter coefficients, and ANN target data includes actual demands at emitter coefficients as well as inflow rates at the pipe preceding the emitter orifices (see Equation 4). Table 4 shows the specifics of ANN models along with the selected input and output data for the baseline scenario according to Hanoi WDS depicted in Figure 1.

Neural Network Accuracy and Performance Analysis
The accuracy of the trained neural networks for pressure and leakage (composed of actual demands at emitter nodes and inflow rates, which is consequently calculated by Equation 4) using MAPE metric can be observed in Table 5. Figures 2, 3 also demonstrate the performance analyses of training the neural networks for Case #1 and Case #2 respectively.
According to the MAPE values in Table 5, these neural networks are reasonably accurate and appropriate alternatives for the EPANET 2.2 simulator toolkit in MATLAB, thus providing much higher time-efficiency for the execution of the prediction model and thus allowing the inclusion of thousands of scenarios for sensitivity analyses of placement and number of consumption data later in the paper.

Prediction Model Accuracy
The accuracy of the prediction model is measured using both Pearson's correlation coefficient (see Equation 11) and MAPE metric (see Equation 12) by considering the actual and predicted emitter coefficients (E) as well as the actual and predicted leakage severities (q leak ). Figures 4, 5 illustrate the variation of actual and predicted emitter coefficients as well as actual and predicted leakage severities in cubic meters per hour (CMH) in Hanoi WDS for the two cases of leakage induction presented earlier.
The Pearson's correlation coefficient (PCC) and MAPE value for the variation of emitter coefficient and leakage severities for both cases presented in Figures 4, 5 can be observed in Table 6. As can be viewed in Table 6, the actual and predicted values of emitter coefficients are found to be reasonably correlated. It can be observed that as the number of emitter coefficients increases from 6 to 12, the accuracy of the model will be affected and thus more sensitive to the variations of pressure in the objective function, which emphasizes the importance of sensitivity analyses presented later in the paper.

Sensitivity Analyses
The sensitivity of the proposed leakage model to the placement and number of selected consumption nodes and pressure monitoring stations (PMSs) is measured by including 4,000 scenarios where various numbers of partial nodal demand datasets (i.e., ANN input data) and pressure monitoring stations (PMSs) (i.e. ANN output data) are randomly generated. Four categories of scenarios are developed for the sensitivity analyses, as identified in Table 7: (a) consumption data from 70% of demand nodes and pressure monitoring stations placed at eight locations, characterized as (70%, 8)-which is consistent with the baseline scenario discussed in the previous section; (b) (70%, 5), (c) (50%, 8), and (d) (50%, 5). 1,000 scenarios were randomly generated for each of these four categories out of which 100 scenarios were selected for sensitivity analyses based on the best ANN MAPE prediction for WDS pressure output data.

Neural Networks Performance Analysis
As mentioned before, four separate categories of partial consumption data are studied to render the presented model more representative of the real-world data which can be harvested for modeling purposes. Table 7 depicts the specifics of generated scenarios for four different sets of input and output data associated with the previously discussed two different actual leakage cases in Hanoi WDS. As can be observed in Table 7, the MAPE values of trained neural networks for pressures as target data and leakage as data for constraint function averaged across the 4,000 scenarios of each of the two presented cases are found to be reasonably accurate in order to be fed into the optimization framework. According to Table 7, the average pressure MAPE as the ANN output tends to decrease as the number of consumption nodes is increasing from 50% of nodal demands and five pressure meters to 70% of nodal demands and eight pressure meters. This still holds true for the average inflow rate MAPE as it drops from 3.32 to 2.09% in Case #1 and 2.99 to 2.07% in Case #2. However, the average actual demand MAPE at emitter nodes is found to insignificantly increase for both cases. By comparing the average of all three averages for each combination of consumption data, it can be found that Case #2 demonstrates a slightly better accuracy than Case #1.

Prediction Model Accuracy Analysis
As mentioned before, the prediction model is applied to the best 100 scenarios (according to the best ANN MAPEs for pressure output) out of the 1,000 trained scenarios for each of the four categories described in the previous section for both of the cases. The accuracy of the prediction model in the sensitivity analyses section is also measured using MAPE metric and Pearson's Correlation Coefficient (PCC) (see Equations 11 and 12). Figures 6, 7 represent the emitter coefficient and leakage severity MAPE and PCC variations for the baseline scenarios along with each of the four sensitivity analyses categories (see Table 7). According to Figures 6, 7, it is noteworthy that the PCC and MAPE values in each of the cases are almost similar for emitter coefficients and leakage severity, since they are directly correlated according to Equation 2. Two types of comparisons are made according to Figures 6, 7: case-wise and pairwise.

Case Wise Comparison
This section compares the accuracy metrics in Case #1 to those in Case #2, as the number of emitter nodes increases. Firstly, according to Figures 6, 7, it can be inferred that Case #2 displays a lower average prediction accuracy than Case #1 (average MAPE of ∼12% in Case #1 compared to that of ∼37% in Case #2 in all combinations) as the number of emitter nodes increases. This observation is consistent with the baseline scenario analysis in the proof-of-concept demonstration as well. It can also be concluded that the overall variations of both accuracy metrics range from 0.5306 to 0.9985 for PCC and from 1.98 to 30.94% for MAPE in Case #1 as well as from 0.1561 to 0.9666 for PCC and 12.72 to 63.47% for MAPE in Case #2. These relatively large ranges are indicative of the high sensitivity of the model to the number and locations of the selected smart meters for pressure and nodal demands across the network.

Pairwise Comparison
In this section, comparisons are made within each of the two cases in terms of the four categories (combinations) of partial nodal demands and pressure monitoring stations (Combinations #1 through #4 according to Table 7) for both emitter coefficient and leakage severity predictions.
Regarding Case-#1 emitter coefficient predictions, according to Figures 6A,B, the average PCC and average MAPE values for all the combinations range within 0.0099% and 0.22% respectively, which suggests that increasing the number of smart meters for wider consumption data combined with increased number of pressure monitoring stations does not necessarily contribute to greater average accuracy of the model for Case #1 with six leak sources across the network. Furthermore, comparing the prediction accuracy of Combinations #1 and #2 for Case #1 in Figure 6, the following observations can be made for when the number of pressure monitoring stations are increased from 5 to 8 keeping the percentage nodal input consideration at 50%: (1) the average PCC value drops from 0.9127 to 0.9060, which suggests that there is slightly less correlation among actual and predicted leakages although the number of pressure meters has increased. On the other hand, the average MAPE has slightly improved from 12.08% to 12.04%; (2) the maximum PCC value increased very marginally from 0.9976 to 0.9985, which is not a considerable improvement; (3) the least MAPE declined from 3.34% to 2.32%, which is also not greatly significant; and (4) the range of variation in PCC has shrunk going from Combination #1 to Combination #2 whereas it expanded for MAPE. Similar observations can also be made for the comparison of Combinations #3 and #4 where the number of pressure monitoring stations increased from 5 to 8 while the percentage nodal input consideration is at 70%. It can be concluded from these observations that increasing the number of pressure monitoring stations does not necessarily result in considerably better prediction accuracy of leakage severity assessment. On other hand, by comparing Combination #1 with Combination #3, it can be observed that (i)   the sensitivity of the model. Similarly, the average values and variation ranges of the accuracy metrics for leakage severity in Figures 6C,D are identical to those of the emitter coefficients across the four combinations of sensitivity analyses. Based on the variation range of the prediction for all the categories studied in the sensitivity analyses, it can be concluded that optimizing the locations for placement of smart water meters and pressure monitoring stations in the WDS would yield the best prediction (i.e., highest PCC and lowest MAPE) of pipeline condition assessment as envisioned through the proposed approach. Considering Case #2, as per Figures 7A,B, the average PCC and MAPE values with 12 emitter nodes for all four combinations are found to be within a range of 0.1358% and 9.64% respectively, which suggests that increasing the number of PMSs and the percentage of consumption data shows a more significant contribution compared to Case #1. It can thus be concluded that as the emitter nodes increase in number, the prediction model seems to show more sensitivity to the selected locations and numbers of the smart meters. Also, having the smallest MAPE variation range, Combination #2 demonstrates the lowest sensitivity of the model as well as the highest average accuracy (average MAPE equals 33.12%) out of all the four combinations. Furthermore, by comparing Combinations #1 and #2 from Figure 7

CONCLUSION AND FUTURE WORK
This study aimed at proving the validity of a data-driven water pipeline leakage prediction scheme using artificial neural networks and genetic algorithms as demonstrated on Hanoi WDS. By employing pressure monitoring stations for a set of partial nodal demands, neural networks were trained and incorporated into a genetic algorithm optimization framework in MATLAB to predict emitter coefficients for two cases of actual leakage induction at six and 12 pipes respectively. The results indicate that (i) this preliminary prediction scheme offers promise to predict leakage severities based on cybermonitoring data with reasonable accuracy and (ii) the prediction model does not show improvements when both consumption at more demand nodes and more pressure stations are considered, while on average the increase in the number of pressure stations at 50% nodal demands showed better accuracy and relatively higher correlation of parameters in the model for both cases. Some of the limitations of the study include (i) the consideration of leakage induction at some rather than all of the pipes, (ii) the assumption that leaking pipelines are known (locations of leaks were predefined), (iii) the assumption of the availability of consumption data collected synchronously with pressure data, (iv) the assumption that all other dynamic pipeline condition parameters (e.g., pipeline roughness, effective pipeline diameter) are known, and (v) the consideration of with high leakage outflows given the sizes of pipes in the Hanoi WDS. Future work should focus on: (1) prediction of leakages in all the pipelines without assuming that the leaking pipelines are known; (2) co-prediction of a variety of dynamic pipeline condition parameters including leakages, roughness values, effective hydraulic diameters, etc.; (3) wider validation campaign to cover WDSs of varying layouts and pipe sizes to test the ability of the proposed model in detecting smaller leakages; (4) optimizing the locations for placement of smart water meters and pressure monitoring stations in the WDSs for best pipeline condition prediction accuracy.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation and upon reasonable request.