Improving Estuarine Hydrodynamic Forecasts Through Numerical Model Ensembles

Numerical models are essential tools for the study and analysis of the hydrodynamics of estuarine systems. However, the model results contain uncertainties, which need to be minimized to increase the accuracy of predictions. In this work, the ensemble technique is proposed as a solution to improve hydrodynamic forecasts for estuarine regions. Two numerical models, openTELEMAC-MASCARET and Delft3D, were considered for the application of this technique to two Portuguese estuaries. Superensembles for three scenarios (summer, winter, and extreme event) were built to assess the effectiveness of the technique in improving water level prediction. Various weighing techniques were tested in the construction of the ensembles. Weighing techniques that consider the previous performance of each model alone outperformed other techniques. This was observed for all scenarios considered, at all sampling points and in both studied estuaries. The effect of the ensemble size was also analyzed. It was found that the size of the set is directly related to the prediction accuracy, with the best results provided by the superensembles with the highest number of elements. It is concluded that the combined use of several hydrodynamic models reduces the uncertainty of the results and increases the reliability and consistency of predictions for estuarine regions.


INTRODUCTION
Coastal regions, particularly the estuarine areas, are of strategic importance from an environmental, economic, and social point of view. They provide multiple ecosystem services, such as fishery and other food resources, leisure and tourism, energy, water, and raw materials. Coasts are generally densely populated and highly urbanized. Approximately 40% of the world population lives at a distance less than 100 km from the coast, and that percentage is increasing. This implies an increase in economic activities, coastal infrastructures, and urban intensification (Dangendorf et al., 2012;IPCC, 2012;Hallegatte et al., 2013;Moser et al., 2014;Bell et al., 2018). Estuaries are particularly dynamic coastal systems, with strong variations in salinity, currents, stratification, and water quality. They are subject to the influence of meteo-oceanographic and geomorphological phenomena such as wind, tides, waves, currents, river flows, transport of sediments, erosion, and accretion. In terms of ecosystems, they are highly productive, serving as habitats for numerous species. They usually present vast urban areas due to their privileged situation in terms of resources and accessibility. The massive occupation entails the artificialization of the banks, which leads to a loss of biodiversity, affecting both the physical and environmental properties of the water/land interface. This increases the vulnerability of the system and, therefore, the risk to populations and their assets, with adverse effects on the economy (Peixoto, 2016). Considering the current context of climate change, an increase in the frequency and intensity of extreme events is expected (IPCC, 2012), with serious consequences for society and the environment, and with impacts on populations, infrastructures, habitats, and ecosystem services (Vose et al., 2014;Bell et al., 2018).
There is, therefore, an urgent need for accurate scientific and technical information to support sustainable coastal management, to reduce the exposure and vulnerability of estuarine areas, to mitigate the risks associated with different climate scenarios and to promote the adaptation and the resilience of communities to potential adverse impacts (Coelho et al., 2009;Veloso-Gomes, 2016). To provide this accurate information, it is necessary to understand, in a comprehensive way, all the hydrodynamic processes that take place in estuaries, and their interactions with anthropogenic activities and ecosystems. In situ measurements can help to understand these phenomena, enabling the representation of the current state of the systems and their main historical evolutionary trends. However, obtaining this type of data is not simple, as it requires detailed planning and constant monitoring, which is not always feasible. As a consequence, there is a huge lack of continuous and long-term observations. The limited knowledge, related to a lack of systematic monitoring, in combination with the complexity of estuarine systems, leads to a high degree of uncertainty about future scenarios and the effects they will have on populations and ecosystems (Iglesias et al., 2020b).
Numerical models are valuable tools that make it possible to overcome shortcomings in information (Horritt and Bates, 2002;Liu et al., 2008;Chen et al., 2015a,b;van Maren et al., 2015;Pereira, 2016). The efforts of the scientific community and the developments in computational techniques allowed to increase the simulation capacity, improve the accuracy and resolution of the numerical models and reduce the computational processing time. Numerical simulation of hydrodynamic processes in estuarine systems is essential to evaluate the effects of forcing terms, through the imposition of dynamic boundary conditions, the inclusion of topo-bathymetric characteristics, and the presence of anthropic structures (Bastos et al., 2016;Teng et al., 2017). However, each numerical model has its own strengths and simplifications, and its results and uncertainties are dependent on many factors, such as its conceptual framework, approach to physical processes, initial and boundary conditions, and horizontal and vertical spatial resolutions. The assumptions made to define and quantify those factors are also a relevant source of the uncertainty associated with a model (Feng et al., 2011).
Next to the natural randomness associated with the temporal and spatial variation of natural processes, there are other sources of numerical model uncertainties. They can result from model initialization, due to, for example, incomplete required data, measurement errors or errors in data processing, or reside in the model itself, due to imprecise parametrization of physical processes, inaccuracies in the definition of parameters, the existence of unresolved scales, or errors in the boundary conditions (Palmer, 2003;Buizza et al., 2005;Weigel et al., 2008). Uncertainties in the construction and application of these numerical models can be grouped into: initial conditions uncertainties, model boundary conditions and model parameters uncertainties, and structural uncertainties (Tebaldi and Knutti, 2007). Furthermore, it must not be forgotten that any attempt to represent nature in a set of equations will always be a simplification of reality that inevitably introduces uncertainties. Although the partial differential equations that govern estuarine hydrodynamic phenomena are well known, they must be truncated to a finite-dimensional set of linear equations to be computationally integrated and solved. The uncertainties associated with this process can propagate and produce effects across the entire spectrum of scales predicted by the model (Palmer et al., 2004;Hagedorn et al., 2005). To produce a perfect forecast, a sound knowledge of each process and state of the system would be needed, and this should be perfectly implemented in the model. However, this can only be done in an approximate way, as even the actual quantification of the system is accompanied by inherent uncertainties. All uncertainties, even minor ones, propagate through the model processes, adding up and eventually leading to a large forecast error (Mohan Das et al., 2017;Deutscher Wetterdienst, 2021). The need for accurate forecasts, where errors of this type are minimized, highlights the importance of finding and implementing new solutions to deliver better numerical results. In this context, ensemble modeling is considered as one of the best solutions because it can minimize the combined uncertainty in input data, model parameters and model structure, improving the performance of the models (Viney et al., 2009;Suh et al., 2012;Mohan Das et al., 2017).
Given that the ensemble technique has not yet been fully implemented for the assessment of estuarine hydrodynamics, the main objective of this study is the implementation and testing of an ensemble technique for estuarine regions to improve hydrodynamic forecasts and demonstrate its effectiveness in reducing the uncertainty of the results. The technique will be selected and tested considering various scenarios, different estuarine regions, diverse locations inside the estuaries and several statistical techniques for ensemble construction. In the following sections we describe the approach used for modeling the hydrodynamics of the Douro and Minho rivers estuaries using an ensemble technique.

THE ENSEMBLE TECHNIQUE
A numerical model ensemble is the combination of several numerical model simulations using statistical methods. This technique consists of running different numerical models of the same natural system, or running the same model considering changes in key input parameters. The obtained results are synthesized in a single result that presents smaller bias and variance than the individual solutions, thus improving the accuracy, reliability and consistency of the final prediction (Tebaldi and Knutti, 2007;Re and Valentini, 2010). It is a technique widely applied in atmospheric and climate sciences for weather, seasonal, interannual and climate change forecasting (Gneiting and Raftery, 2005;Weigel et al., 2008;Feng et al., 2011;WMO, 2012;IPCC, 2021). Other scientific domains where ensembles techniques have been applied are hydrology (Ajami et al., 2006), morphodynamics (Thornhill et al., 2012), geography and remote sensing (Benediktsson et al., 2007), wave and coastal forecasting (Yang-Ming et al., 2013;Kourafalou et al., 2015), public health (Thomson et al., 2006), agriculture (Cantelaube and Terres, 2005), information security (Menahem et al., 2009), astronomy and astrophysics (Bazell and Aha, 2001), finance (Leigh et al., 2002), medicine (Polikar et al., 2008) and bioinformatics (Re and Valentini, 2010). In estuarine hydrodynamics, this technique has not been fully adopted and most modeling studies are based on a single model. To the best of our knowledge, few works were carried out in this area. However, a recent interest is emerging (Mohan Das et al., 2017;Iglesias et al., 2019b;Taeb and Weaver, 2019;Dinápoli et al., 2021;Khanarmuei et al., 2021).
The uncertainty in forecasts was first introduced by Lorenz (1963Lorenz ( , 1965, who examined the uncertainties of initial states, their effect on predictions, and the so-called butterfly effect. Centered on the atmosphere, Lorenz observed that this system is chaotic due to the non-linearity of its processes. He demonstrated that, no matter how good the observations or forecasting techniques are, there will be a non-transposable limit to how far into the future it is possible to make a forecast (Roy Bhowmik and Durai, 2008). After Lorenz's studies, the idea of combining predictions from multiple models was explored for more than 30 years in econometrics and statistics (Bates and Granger, 1969;Dickinson, 1973;Newbold and Granger, 1974). The leap into the earth sciences was taken by Epstein (1969);Leith (1974) and Thompson (1977), who included the concept of ensembles in weather forecasting, considering the perspective of the randomness of atmospheric motion. However, it was not until the 1990s that the first ensemble prediction systems were implemented at the ECMWF and at the NCEP (Feng et al., 2011).
Therefore, for the case of predictions of future behavior, an ensemble generates a forecast based on a set of forecasts, minimizing the systematic bias that occurs when a single model solution is used (Roy Bhowmik and Durai, 2010). However, a combination of different forecasts is only useful if they disagree on some inputs or if they are generated by different models. There is no gain of information from a combination of identical models with identical results (Krogh and Vedelsby, 1995). The numerical models that are members of the set should therefore include different initial conditions, boundary conditions, parameterizations, model structures, or some combination of them, to reflect the uncertainty (Buizza et al., 2005;Demeritt et al., 2007). The ensemble average of these different solutions yields a more accurate forecast than the individual forecasts of each ensemble member, while the ensemble dispersion provides quantitative information about forecast uncertainty (Wandishin et al., 2001).
Two main types of ensembles can be distinguished: those that use results from a single model and those that are computed with multiple models. Ensembles composed of a single model combine several runs with different initial and/or boundary conditions. It has long been accepted that running an ensemble of numerical forecasts from perturbed initial conditions can have a beneficial impact on prediction ability (Yang-Ming et al., 2013). The multimodel ensemble uses several numerical models with different complexities and structures that are run using similar initial and boundary conditions (Iglesias et al., 2019b(Iglesias et al., , 2020b. Using several models in an ensemble is a way to take into account our lack of knowledge about the system that is being modeled, as different models make different assumptions and perform differently (Palmer et al., 2004;Doblas-Reyes et al., 2010). If multiple initial conditions are used for each model, the multimodel ensemble is a superensemble (Tebaldi and Knutti, 2007). A superensemble presents advantages when compared with a multimodel ensemble. A multimodel ensemble reduces the random component in model errors but does not provide any reduction in the initial condition uncertainty, which is taken into account in the superensemble (Doblas-Reyes et al., 2010). Furthermore, some models may provide better predictions for some conditions than for others. For example, a model might provide better predictions for summer conditions than for winter conditions. Thus better overall forecasts can be obtained using combinations of different models for different scenarios, also known as conditional ensembling (Viney et al., 2009).
Although the results of past studies are dependent on ensemble size and region, one can state that ensembles forecasts that use multiple models usually outperform single model deterministic forecasts and often outperform any combination of solutions from a single model (Krishnamurti et al., 1999;Georgakakos et al., 2004;Hagedorn et al., 2005;Palmer et al., 2005;Tebaldi and Knutti, 2007). However, and although it seems that increasing the sample size of the ensemble by pooling information from different models presumably offers a reduction in the variance of the mean of the forecast, it is not clear that such a strategy will always be effective. This is due to the linear and non-linear biases in each model (Rajagopalan et al., 2002). The ensemble's So, its suitability has thus to be tested for estuarine hydrodynamic models.
There is a general agreement on the fact that the combination of several different forecasts provides significant improvements in the solutions. After running the models, the computed forecasts are usually collapsed into a consensus forecast or ensemble mean. There are many possible methods to obtain an ensemble result (Clemen, 1989). However, it is not clear which is the ideal method for different forecasting problems (Thomson et al., 2006;Rozante et al., 2014). Several statistical techniques can be used that can be categorized into three types: simple composite methods, weighted ensemble methods, and synthetic methods (Suh et al., 2012). The simplest technique is to use the median or the arithmetic mean, which has the effect of filtering out features of the forecast that are less predictable while retaining those features that show agreement among the members of the ensemble (Yang-Ming et al., 2013). This technique is usually selected when no in situ observations exist for comparison. However, different numerical models often have different simulation capabilities, and a variable may be better represented by one model and worse by another. To improve the ensemble solution, the predictive capability of each model should be considered as it makes sense to attribute more weight to the better performing models (Tebaldi and Knutti, 2007). Nevertheless, there is some controversy about the best way to combine the results of the models, and some previous studies in the literature found no evidence that the weighted average system provides results closer to observation than the establishment of equal or random weights (Déqué and Somot, 2010). In a weighted average method, a weighting coefficient is allocated to each model according to its forecasting performance as a single model (Feng et al., 2011). These coefficients are obtained by comparing numerical results with field observations, and generally evolve from metrics such as mean squared errors, correlation coefficients or multiple linear regressions (Woodcock and Engel, 2005;Chakraborty and Krishnamurti, 2006;Roy Bhowmik and Durai, 2010). Other more complex methods include linear regressions, non-linear regressions, principal component regression, singular value decomposition, composites, linear programming, multimodel superensembles, Supra Bayesian method or neural networks (Krishnamurti et al., 1999;Ajami et al., 2006;Durai, 2008, Roy Bhowmik andDurai, 2010;Viney et al., 2009;Feng et al., 2011;Kumar et al., 2012;WMO, 2012;Rozante et al., 2014;Mohan Das et al., 2017;Bernhofen et al., 2018). However, as stated by Clemen (1989), simple combinations often outperform more complex methods.

GEOGRAPHICAL SETTINGS
For the application of the ensembles technique, the estuaries of the rivers Minho and Douro in northern Portugal were chosen (see Figure 1). Despite being geographically close and having similar seasonal flow regimes (minimum in summer and maximum in winter), these estuaries have distinct dynamics and environmental conditions. They differ in the river average and peak discharges, as well as in terms of morphology, bathymetry, length, and banks configuration and level of urbanization (Iglesias et al., 2020b).
The river Minho reaches the Atlantic Ocean between A Guarda (Spain) and Caminha (Portugal). Its 40 km-long shallow estuary has an average depth of 4 m, although depths around 20 m can be found along stretches where the narrowing of the main channel increases current velocities and, consequently, erosion (Freitas et al., 2009;Reis et al., 2009). The freshwater inflow is controlled by the Frieira dam, located 80 km upstream from the mouth. Compared to the river Douro, the river Minho has lower flow rates, which increase the water residence time up to 1.5 days (Ferreira et al., 2005). This long residence time, in conjunction with the small estuarine area and relatively low water volumes, make the estuary vulnerable to pollutants, which can endanger its great diversity of habitats (Domínguez García et al., 2013;Ribeiro et al., 2016). It is a partially mixed estuary, where vertical stratification can occur due to a salt wedge configuration (Baeta et al., 2017). Its dynamics is strongly influenced by the tide, with a direct effect on sediment transport. The lower part of the estuary, near the mouth, presents a widening that results in a decrease in current velocity, creating favorable conditions for the deposition of sediments (Delgado et al., 2011;Portela, 2011;Melo et al., 2020). The morphodynamic patterns due to silting have caused restrictions to navigation and the appearance of islands and sandbanks during low tide (Zacarias, 2007;Reis et al., 2009;Santos et al., 2021). This fact is most notable in spring low-tide conditions, when the connection between the estuary and the sea is restricted to two shallow channels (Iglesias et al., 2019a, Iglesias et al., 2020b. The river Douro flows into the Atlantic Ocean through an estuary surrounded by two large cities: Porto and Vila Nova de Gaia. The estuary is relatively narrow and highly dynamic, with torrential regimes that produce strong currents and recurrent floods (Bastos et al., 2012;Iglesias et al., 2020a). It has an irregular bathymetric configuration, with depths between 0 and 10 m, and some deeper regions, up to 28 m-deep, located at narrower sections, outer bends, and former sites of sediments extraction (Portela, 2008). Its dynamics is conditioned by the river flow, being highly dependent both on natural conditions and on the hydroelectric power generation schedules at the upstream Crestuma-Lever dam, which limits the estuary to 21 km in length. On the southern bank of the estuarine mouth, there is a wetland and a sand spit that partially obstructs the entrance and protects the estuary. This sand spit, formed by fluvial and maritime sediments, is conditioned by natural (wind, rain, river flows, waves, tides, and storms) and anthropic (construction of breakwaters and dams, sand extraction, and dredging) processes (Santos et al., 2010;Granja et al., 2011). In the past, this sand spit suffered erosion and a slow migration toward the interior and to the north of the estuary, partially occupying the navigation channel. To ensure safe navigation, the northern breakwater at the inlet was extended and a new detached breakwater was built. These structures, completed in 2008, interfere with the hydromorphodynamic patterns, silting up the wetland and significantly increasing the sand spit area and volume in a relatively short period (∼10 years) (Bastos et al., 2012). Historical records show ruptures of this spit during flood episodes, allowing for a rapid water discharge and reducing the risk of urban flooding. Its current configuration, more stable and robust, reduces the probability of rupture so that more severe effects are expected in terms of economic losses and structural damage during floods (Iglesias et al., 2019b).

MATERIALS AND METHODS
The numerical models for the ensemble were implemented using the openTELEMAC-MASCARET (OTM) and the Delft3D (D3D) modeling suites. They solve similar formulations of the shallow water equations, considering several physical processes such as tidal forcing, tidal flats, river discharges, the rotation of the Earth, bottom friction, turbulence, sub-and supercritical flows, and water density effects (Jones and Davies, 2010;Robins and Davies, 2010;Monteiro et al., 2011;van Maren et al., 2015). These two packages were selected because they have shown to be able to accurately simulate the hydrodynamic behavior in estuarine areas, to assess the flooding risk, and to quantify the effects of climate change (Corti and Pennati, 2000;Horritt and Bates, 2002;Gomes et al., 2015;Putra et al., 2015). These OTM and D3D models were already calibrated for the Minho and Douro estuaries, and successfully demonstrated their ability to accurately represent these estuaries' characteristic hydrodynamic patterns for both frequent and extreme events (Iglesias et al., 2019b(Iglesias et al., , 2020b(Iglesias et al., , 2021Melo et al., 2020;Weber de Melo et al., 2022).
The ensembles were based on simulations performed with OTM and D3D modeling suites. These ensemble hindcasts, performed for a training period, were compared with historical in situ data, indicating which statistical technique is the best. An ensemble obtained as a combination of two single outputs, one from each modeling suite, would produce little improvement compared to the outputs of the individual models, as they are insufficient to resolve the inherent uncertainty. Thus, superensembles are proposed to maximize the number of members of the ensembles A set of M modified initial conditions were considered to produce a superensemble hindcast for each estuary. As two modeling suites are used, there will be a total of 2 M ensemble members, and the superensemble hindcast will be the combination of the results of all these members.

Superensembles
The models were run for historical conditions to calibrate the ensemble with in situ measurements of the water elevation at several locations, namely: stations M1, M2, M3, and M4 for the Minho estuary, and stations D1, D2, D3, and D4 for the Douro estuary (see Figure 1). The consideration of all these stations will allow to inspect whether the technique's performance varies spatially, i.e., depending on the measuring location. The simulations, run with both modeling suites, OTM and D3D, considered three scenarios that represent three different conditions: summer, winter, and extreme event (flooding) conditions (Table 1). Measuring station M2 was only available for the winter scenario. A deterministic approach was adopted for this study. Numerical modeling simulations were run for a single day of 2006 to proceed with the calibration of the ensemble results with in situ data. This allows the maintenance of the characteristics of each selected scenario. Each model run consisted of a 24 h-long simulation with constant river flow and water elevation boundary conditions, preceded by a spin-up period of 3 h to avoid numerical instabilities. The water elevation  modeling results for ensemble construction were extracted at the end of each performed simulation when the models are stable. Given the predominant wind direction, the small size of the estuaries and the close configuration of their mouths, atmospheric (wind and sea level pressure) and wave forcing were neglected. The scenarios and respective runs are summarized in Supplementary Tables 1-3. A superensemble was built for each of the proposed scenarios. For each scenario, runs with different combinations of river flow and oceanic water elevation were considered because they are the principal hydrodynamic drivers in the considered estuaries. The river flow rate was included as: (a) registered value, (b) registered value plus standard deviation of the simulated month, and (c) registered value minus standard deviation of the simulated month, to include possible deviations associated with inaccuracies in the river flow measurements (cf. Table 1). Standard deviations were computed based on daily mean river flows measured at Foz do Mouro (1973Mouro ( -2020, for the Minho estuary, and hourly mean river flows at Crestuma-Lever (1998-2020, for the Douro estuary (cf. Table 1). The water level at the ocean boundary considered the (i) maximum and (ii) minimum tide levels registered on the simulation day, as well as the (iii) annual highest tide, (iv) highest spring tide, (v) highest neap tide, (vi) annual lowest tide, (vii) lowest spring tide, and (viii) lowest neap tide (cf. Table 2). Considering these different water levels in the ensemble construction will increase the number of ensemble members to avoid inaccuracies associated with tidal amplification and asymmetry inside the estuarine regions, as well as sudden changes in depth or inaccuracies in the bathymetry included in the numerical grids. For the extreme event, since extreme events are rare, with percentiles of non-exceedance above 95% (Iglesias et al., 2021), the standard deviation of the flow was not taken into account. An additional ocean elevation of 1.10 m, associated with a storm surge, was included for both estuaries (Gama et al., 1994;Almeida et al., 2009).

Conditional Ensembling
Besides the winter and summer superensembles with their 48 members (models' results), conditional ensembling was considered to assess the effect of the ensemble size. The numerical solutions for summer and winter were grouped into the 24 highest and 24 lowest water elevation predictions. The 48-member superensembles were calibrated with the mean water elevation observed during the selected simulation day,    whereas the results of the 24-member superensembles were compared with the maximum and minimum values registered at the measuring stations. The superensembles characteristics are summarized in Table 3.

Weighting Techniques
To construct the ensembles solutions, the following techniques were selected and evaluated: the simple model average (SMA), the members' median (Med), the trimmed mean (Trim), and weighted average methods (WAM). For the WAM technique, three weightings were analyzed: a random weighting (WAM-Rand), the absolute error (WAM-AE) and the squared error (WAM-SE).
The SMA and the Med techniques are the simplest ones and will be used as benchmarks for comparison with more sophisticated techniques. They consist of the arithmetic average and of the median of the ensemble members. These techniques are widely used for ensembles construction because there is no need for comparisons between the models' results and the observations (Ajami et al., 2006;Viney et al., 2009;Déqué and Somot, 2010;Weigel et al., 2010;Feng et al., 2011;Suh et al., 2012). If the SMA is applied to a single forecast of each model, the technique it is called a "poor man's" ensemble (Du et al., 2018). In both SMA and Med techniques, all model outcomes have the same weight in the final forecast. This means that they are likely to include several poor models' results, degrading the overall results (Roy Durai, 2008, Roy Bhowmik andDurai, 2010). For the Trim technique, the outliers, which are the largest and smallest values of the dataset that might affect the arithmetic mean, are removed. Subsequently a simple average of the remaining values is calculated. Several Trim combinations were computed, considering the 36-46 centermost predictions for summer and winter scenarios, the 20-30 centermost predictions for the extreme event scenario, and the 12-22 centermost predictions for the higher and lower predictions for the summer and winter scenarios. It is expected that Trim predictions lie somewhere between SMA and Med so that the Trim ensembles with more members approach SME forecasts while the Trim ensembles with fewer members approach Med forecasts (Viney et al., 2009).
In a WAM technique, the weighting coefficients are constrained to be always positive and their sum must be equal to one. They shall not require large quantities of data for their estimation and should distinguish between better and poorer models (Woodcock and Engel, 2005). The weighting coefficients need to be derived from comparisons with the observations, taking into consideration the simulation performance of each model through some statistical property of the model's calibration predictions (Viney et al., 2009;Suh et al., 2012). Different models have in generally different simulation capacities. At the same time, they forecast capacity may vary depending on the analyzed variables, the seasons and the geographic regions. If the different capacities of the models are taken into account through an assignment of weights (as done in WAM), the ensemble results can be closer to the observations than those obtained with the arithmetic mean method (Feng et al., 2011).
Since the SMA technique can be seen as a WAM technique with an equal weight applied to each ensemble member, random weights were considered to investigate the effectiveness of the WAM methods. In the first random method, WAM-Rand1, the run with index m gets the weight: Frontiers in Marine Science | www.frontiersin.org being M the number of simulations that integrate the ensemble (Casanova and Ahrens, 2009). For the second random method, WAM-Rand2, the weights are random numbers with a uniform distribution between 0 and 1 that are then normalized (Déqué and Somot, 2010). For the knowledge on the performance of the models to be incorporated in the construction of the ensemble, two more techniques were considered. The WAM-AE technique, based on the absolute error metric, L AE , and the WAM-SE technique based on the squared error metric, L SE .
The absolute error metric is defined as: Where X f and X o are the forecasted and the observed values, respectively, of the variable X. In a similar way, the squared error metric is defined as In the WAM-AE technique, the weight w m is given by: Where M is the number of members in the ensemble, whereas in the WAM-SE technique, the weight w m is given by: The performance of each model and ensemble was then evaluated comparing the results with the 2006 in situ observations of water elevation. Therefore, the AE (Equation 2) was selected as the evaluation score. A flow diagram outlining the methodology performed in this study is presented in Figure 2.

RESULTS AND DISCUSSION
Prior to the superensemble construction, the results of the individual numerical models were analyzed. The scatter plots in Figure 3 showed a strong dispersion of results for all runs and for all scenarios. This was expected, given the wide range of variation imposed in the boundary conditions (Tables 1, 2 and Supplementary Tables 1-3). Stronger dispersion was observed for the summer and winter scenarios (cf. Figures 3A,B,D,E) due to the wider range of adopted conditions when compared to the extreme event scenario (cf. Figures 3C,F), but also because, during extreme flooding events, the water level inside the estuaries is only weakly affected by any variation of the ocean tide level. Even so, the dispersion is larger for the summer scenarios than for the winter scenarios.
The summer scenario runs were forced with low river flows, causing the tide to become the main driver of the estuarine dynamics. For each run, the measured water levels show little variability along the length of the estuaries (cf. Figures 3A,D). The modeled water levels at both estuaries show a much wider variation due to the variations in the tide level among the various runs. Nevertheless, none of the runs provides a good solution, given that no solution is on the line of the perfect hindcast.
For the winter scenario, the differences between the upstream and downstream sampling stations are more pronounced (cf. Figures 3B,E). The highest elevations for the Minho estuary were not recorded at the upstream station M3, but at the intermediate station M2, probably due to the configuration of the estuary. Again, the modeled solutions for both estuaries feature a wide range of water levels, and for every station, there is at least one close-to-ideal solution. This means that the river flow mechanism that drives the hydrodynamic circulation during winter scenarios is better represented by the numerical models than the tide. Inaccuracies in the tidal water elevation could be related with uncertainties in the model grid, due to imprecisions in bathymetry and topography data, or inaccuracies in the definition of the models coefficients, affecting the representation of the tidal amplification and asymmetry inside the estuarine regions.
For the extreme event scenario (Figures 3C,F), the highest water levels were registered at the upstream sampling stations (M4 and D4), reaching 4 m at the Minho estuary and 8 m at the Douro estuary. As expected, the modeled solutions feature a wider range of water levels due to variations in river flow between the various runs. For the Minho estuary, there are closeto-ideal solutions for all stations. A different situation is found for in the Douro estuary, at stations D2 and D3, where the water level is always underestimated. This could be related with uncertainties in the model grid because D2 and D3 are located on the estuarine margins.
D3D hindcasts displayed higher errors, with a global average minimum error of 0.383 m, while the global average minimum error for OTM hindcasts was 0.299 m (cf. Table 4  respectively. Global average minimum errors for the Douro were 0.543 and 0.420 m ( Table 4). Although small, the differences between the OTM and D3D errors could be related with the numerical grid implemented for each modeling suite. While D3D uses a regular curvilinear gird, OTM uses an unstructured irregular computation grid of triangular elements.
This unstructured grid suits the shape of the estuaries better, providing better results. The highest minimum errors were calculated for the extreme event scenario at D3. Extreme events can be difficult to model due to the complexity of their dynamics, namely the strong erosive processes they give rise to, and the flooding of the riverbanks, which present a hydrodynamic behavior distinct from that of the main estuarine channel. We assumed that measured data is representing the reality. However, it cannot be ruled out that these large differences between measured and hindcasted water levels may be related to errors in the field campaigns, or to inaccuracies in the numerical models for that specific locations. Similarly to Figure 3 and Table 4 also reveals a strong dispersion in the modeling results. The average maximum absolute error for D3D was 2.352 m, but it reached a maximum value of 4.627 m for winter scenarios at D4 station. For OTM, the average Frontiers in Marine Science | www.frontiersin.org 13 February 2022 | Volume 9 | Article 812255 maximum absolute error was 2.057 m, reaching 3.388 m for the winter scenario at M2. Models results of the summer and winter scenarios were divided into two groups, one group for the 24 highest water level ocean boundary conditions and the other for the 24 lowest ones. The results for these two groups were compared with the highest and lowest field recorded water levels, and the absolute errors were computed and presented in Table 5. It is noticeable that OTM performs slightly better than D3D and that dispersion is now lower for both modeling suites, with most maximum absolute errors below 1.6 m and minimum dispersion values below 0.3 m.
The results of the techniques selected for the 48-member superensembles are displayed in Figure 4 and Table 6. The scatter plots showed that, despite the existing variability for all considered scenarios, estuaries and measuring stations, some superensembles results lie on the perfect hindcast line. The techniques that present the best hindcasts are WAM-AE and WAM-SE and the ones that present the worst hindcasts are WAM-Rand1 and WAM-Rand2. This revealed the importance of selecting a WAM method that considers the previous performance of the numerical models. The Trim, SMA and Med techniques showed an average performance, with Trim between SMA and Med, as expected. Table 6 presents the best technique for each scenario, estuary and measuring station. Notice that, for some ensembles, the absolute error is null, i.e., below 1 mm, and the hindcast can be considered perfect. WAM-SE stood out as the best technique for the high river flow scenarios (winter and extreme event), while WAM-AE performed the best for the summer scenarios. The results seem to be independent of the chosen measuring stations and the estuary under study. For all but the Douro extreme event simulations at D2 and D3, the simulations presented lower absolute errors than those obtained for the single numerical modeling runs (see Table 4). Measuring stations D2 and D3 displayed the highest absolute errors  . Table 6). Nevertheless, for some ensembles, the second-best technique displays an absolute error larger than those computed for the single models ( Table 4). These techniques are thus not recommended for ensembles construction. The effect of the size of the ensemble can be analyzed in Figures 5, 6 and Tables 7, 8. The scatter plot constructed with the results considering the 24-highest water elevation predictions revealed a good performance of the superensembles (cf. Figure 5), with the best techniques being those that consider weights depending on the performance of the models. However, not all superensembles produced results on the perfect prediction line, revealing that the size of the ensemble affects its predictive ability. This fact is confirmed by the AEs ( Table 7). The best technique was WAM-AE for Douro summer and WAM-SE for Minho and Douro winter. The best Minho summer ensemble technique depends on the location, which could be related with errors in the observations or numerical causalities (as, for example, WAM-Rand2). AEs were generally below 0.1 m. However, in 5 of the 15 scenarios/sampling points, ensembling did not improve the hindcast skills of the individual models, demonstrating that the ensemble size is crucial in producing the best hindcast. The second-best technique revealed even worse hindcasts, with AE values surpassing those obtained for the comparison between models results and measurements.
For the 24-member superensemble for the lower water elevation predictions, the scatter plot in Figure 6 shows a vaster dispersion of ensembles results compared to the previously analyzed superensembles (cf. Figures 4, 5). Most solutions are very distant from the perfect hindcast, which could be due to uncertainties in the modeling of low tide conditions. This suggests that the larger superensembles were able to absorb these uncertainties, providing better results, which is confirmed by the absolute error values (cf. Tables 4-6, 8). The 24-member superensembles outperform the single model hindcasts in 7 out of the 15 runs/measuring stations, with values higher than those obtained for the 48-and 32-member superensembles and for the 24-member superensembles for higher water elevation conditions. The technique that presented the best results was the WAM-SE (cf.

CONCLUSION
Numerical models are essential tools for a better understanding of estuaries. They can anticipate and predict simultaneously the effects of anthropogenic interventions, extreme events, and climate change, providing the basis for an efficient estuarine management. However, as modeling results present uncertainties, mainly related to inaccuracies or assumptions in the initial and forcing conditions, we need to increase the forecasting accuracy by developing and implementing new solutions that avoid or mitigate such errors.
The approach used in this work, based on the integration of the D3D and OTM modeling suites, proved to efficiently map the response of two Portuguese open water systems -the Douro and Minho estuaries. Three scenarios were considered: summer, winter and extreme event conditions. For each estuary and scenario, a superensemble was constructed including modified conditions of river inflows and ocean surface elevation. The single model's and the superensembles' results were compared with in situ measurements of water elevation to assess their accuracy. It was demonstrated that the superensembles were able to circumvent the models' inaccuracies and produce better results than the individual models, providing solutions that were closer to the observed values than those resulting from the individual models.
Several techniques of ensemble construction were implemented to find the one that yields the best results for each estuary, scenario, and measuring station. The results revealed that an adequate ensemble technique effectively improves the forecasting results of the individual models. Among the applied techniques (SMA, Med, Trim, WAM-Rand1, WAM-Rand2, WAM-AE, and WAM-SE), the best results were obtained with those that considered the hindcast capabilities of the single models (WAM-AE and WAM-SE). Namely, WAM-SE stood out as the best technique for scenarios with high river inflow (winter and extreme event scenarios), whereas WAM-AE was the best for summer scenarios. No relevant differences were found between the two estuaries or associated with the location of the field measurements, demonstrating that this technique can be widely applied to any estuarine system. However, the ensemble size affected the predictability, and the most accurate solutions were obtained with the superensemble constructed with more members.
It was demonstrated that, even for the scenarios presenting more difficulties in being hindcasted by a single model, such as extreme event scenarios, the application of the ensemble technique reduced the inaccuracy of the forecasts and improved their results. This is of upmost importance, especially for predicating future conditions under the effect of climate change, when stronger and more frequent extreme events are expected. Therefore, hydrodynamic modeling ensembles have the potential to provide accurate information for estuarine management activities and to reduce the risk and the vulnerability of populations, infrastructures, habitats and environment.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
II, JP, and PA-V: conceptualization, methodology, and formal analysis. II, WM, and AB: computation. II and WM: validation. JP, PA-V, JV, AB, LB, and FV-G: resources. II, WM, and JP: data curation. II, JP, WM, PA-V, JV, AB, AG, LB, and FV-G: writingreview and editing. FV-G: supervision. II, JP, and FV-G: project administration. II, JP, PA-V, JV, AB, LB, and FV-G: funding acquisition. All authors contributed to the article and approved the submitted version.

FUNDING
This research was partially supported by the Strategic Funding UIDB/04423/2020 and UIDP/04423/2020 through national funds provided by FCT-Foundation for Science and Technology and European Regional Development Fund (ERDF). This work has also been funded by the project EsCo-Ensembles (PTDC/ECI-EGC/30877/2017), co-financed by NORTE 2020, Portugal 2020 and the European Union through the ERDF, and by FCT through national funds.