Water table depth assimilation in integrated terrestrial system models at the larger catchment scale

As an important source of water for human beings, groundwater plays a significant role in human production and life. However, different sources of uncertainty may lead to unsatisfactory simulations of groundwater hydrodynamics with hydrological models. The goal of this study is to investigate the impact of assimilating groundwater data into the Terrestrial System Modeling Platform (TSMP) for improving hydrological modeling in a real-world case. Daily groundwater table depth (WTD) measurements from the year 2018 for the Rur catchment in Germany were assimilated by the Localized Ensemble Kalman Filter (LEnKF) into TSMP. The LEnKF is used with a localization radius so that the assimilated measurements only update model states in a limited radius around the measurements, in order to avoid unphysical updates related to spurious correlations. Due to the mismatch between groundwater measurements and the coarse model resolution (500 m), the measurements need careful screening before data assimilation (DA). Based on the spatial autocorrelation of the WTD deduced from the measurements, three different filter localization radii (2.5, 5, and 10 km) were evaluated for assimilation. The bias in the simulated water table and the root mean square error (RMSE) are reduced after DA, compared with runs without DA [i.e., open loop (OL) runs]. The best results at the assimilated locations are obtained for a localization radius of 10 km, with an 81% reduction of RMSE at the measurement locations, and slightly smaller RMSE reductions for the 5 and 2.5 km radius. The validation with independent WTD data showed the best results for a localization radius of 10 km, but groundwater table characterization could only be improved for sites <2.5 km from measurement locations. In case of a localization radius of 10 km the RMSE-reduction was 30% for those nearby sites. Simulated soil moisture was validated against soil moisture measured by cosmic-ray neutron sensors (CRNS), but no RMSE reduction was observed for DA-runs compared to OL-run. However, in both cases, the correlation between measured and simulated soil moisture content was high (between 0.70 and 0.89, except for the Wuestebach site).

As an important source of water for human beings, groundwater plays a significant role in human production and life. However, di erent sources of uncertainty may lead to unsatisfactory simulations of groundwater hydrodynamics with hydrological models. The goal of this study is to investigate the impact of assimilating groundwater data into the Terrestrial System Modeling Platform (TSMP) for improving hydrological modeling in a real-world case. Daily groundwater table depth (WTD) measurements from the year for the Rur catchment in Germany were assimilated by the Localized Ensemble Kalman Filter (LEnKF) into TSMP. The LEnKF is used with a localization radius so that the assimilated measurements only update model states in a limited radius around the measurements, in order to avoid unphysical updates related to spurious correlations. Due to the mismatch between groundwater measurements and the coarse model resolution ( m), the measurements need careful screening before data assimilation (DA). Based on the spatial autocorrelation of the WTD deduced from the measurements, three di erent filter localization radii ( . , , and km) were evaluated for assimilation. The bias in the simulated water table and the root mean square error (RMSE) are reduced after DA, compared with runs without DA [i.e., open loop (OL) runs]. The best results at the assimilated locations are obtained for a localization radius of km, with an % reduction of RMSE at the measurement locations, and slightly smaller RMSE reductions for the and . km radius. The validation with independent WTD data showed the best results for a localization radius of km, but groundwater table characterization could only be improved for sites < . km from measurement locations. In case of a localization radius of km the RMSE-reduction was % for those nearby sites. Simulated soil moisture was validated against soil moisture measured by cosmic-ray neutron sensors (CRNS), but no RMSE reduction was observed for DA-runs compared to OL-run. However, in both cases, the correlation between measured and simulated soil moisture content was high (between . and . , except for the Wuestebach site). KEYWORDS groundwater, data assimilation, integrated model, real data, soil moisture, cosmic-ray neutron sensors

. Introduction
As a widespread and highly used resource, groundwater provides globally 50% of the drinking water, with higher values for inhabitants of dry regions, and 2.5 billion people depend entirely on groundwater resources for their basic daily water needs (UNWWAP, 2015). Either groundwater level or water table depth is a significant variable related to groundwater and can vary between 0 m in wetland areas to depth of hundreds of meters from the land surface in arid regions. Shallow groundwater is crucial in terrestrial ecosystems as it strongly influences the soil water content in the root zone and thus exerts an important control on water and energy fluxes between the subsurface, land surface, and the atmosphere (Koster et al., 2004;Vereecken et al., 2016). To understand the influence of temporal and spatial variations of groundwater level on terrestrial ecosystems, models like the integrated Terrestrial System Modeling Platform (TSMP) (Shrestha et al., 2014) are used, which can model the groundwater-soil-vegetation-atmosphere system in a physical manner.
However, the accuracy of modeling is often affected by uncertain model forcings, parameters, and initial conditions (Freeze, 1975;Baatz D. et al., 2017). Especially for groundwater systems, the extreme spatial heterogeneity of hydraulic parameters is challenging (De Marsily, 1986). To quantify and reduce the uncertainties of model predictions, DA can be used to correct model predictions with observations and improve the estimation of hydrological variables (Reichle et al., 2002(Reichle et al., , 2008. One of the most commonly used DA algorithms is the Ensemble Kalman Filter (EnKF) (Evensen, 1994(Evensen, , 2003. EnKF uses a Monte Carlo approach to forecast model error statistics (Evensen, 1994). DA with EnKF is often affected by a poor quality of the estimated model error covariances and spurious correlations between grid cells which are separated far in space. Therefore, the Localized Ensemble Kalman Filter (LEnKF) approach (Houtekamer and Mitchell, 2001) is used in this study, which can improve the effectiveness of the EnKF analysis (Hu et al., 2012).
Past studies have proven the effectiveness of DA in improving real-time hydrological modeling and forecasting (e.g., Han et al., 2014;Zhang et al., 2018;Yu et al., 2020), and some studies investigated the assimilation of groundwater data with hydrological models. Camporese et al. (2009a) assimilated both pressure head and soil moisture data with the EnKF, and the results showed that assimilation of either pressure head or soil moisture can improve the characterization of subsurface states in the vicinity of the measurement locations. Camporese et al. (2009b) assimilated synthetic observations of pressure head and streamflow for a v-tilted catchment, and the results suggested that streamflow prediction can be improved by assimilation of pressure head and streamflow, either individually or simultaneously. Kurtz et al. (2014) assimilated jointly piezometric heads and groundwater temperatures with EnKF to update uncertain hydraulic subsurface parameters (i.e., hydraulic conductivities and leakage coefficients) for an area near the river Limmat in Switzerland, and found that the joint assimilation of the two kinds of data with updating of uncertain hydraulic parameters gives the best characterization. Zhang et al. (2016) assimilated soil moisture and groundwater head measurements with the MIKE SHE hydrological model for catchments of different complexities and using different assimilation settings (observation types, ensemble sizes, and localization schemes) and found that the ensemble transform Kalman filter (ETKF) method improved the model performance compared to the OL run. But the average difference between observations and model simulations was subtracted from the original data when comparing in situ head measurements with predictions. The proposed scheme by Zhang et al. (2016) with both distance localization and variable localization was shown to be more robust than only using one localization scheme and provided better results. However, these experiments on groundwater assimilation have only been conducted by hydrological models in synthetic experiments or over-simplified real-world cases. No studies demonstrated the potential of assimilating real groundwater observations into integrated terrestrial system models to improve groundwater estimates at the regional scale. This is therefore still an emerging research topic.
The integrated model TSMP, which is composed of an atmospheric, land surface, and subsurface model, was used in this work, in combination with the Parallel Data Assimilation Framework (PDAF) (Nerger et al., 2005;Kurtz et al., 2016). TSMP has been applied in a series of studies (e.g., Shrestha et al., 2015;Keune et al., 2016;Furusho-Percot et al., 2019). The combination of PDAF and TSMP has been used for the assimilation of different hydrological variables (e.g., soil moisture and groundwater) at different scales (e.g., hillslope, catchment, and continental scale) (Kurtz et al., 2016;Zhang et al., 2018;Gebler et al., 2019;Naz et al., 2019Naz et al., , 2020Hung et al., 2022). Zhang et al. (2018) demonstrated in synthetic experiments with only four grid cells that the joint assimilation of groundwater level and soil moisture data has great potential to improve root zone soil moisture characterization. Hung et al. (2022) assimilated groundwater levels at the large catchment scale in a synthetic study that mimicked the Neckar catchment, and the results showed that groundwater level assimilation can lead to a large scale improved characterization of groundwater levels, also between groundwater wells, but the impact of groundwater level assimilation on other compartments of the terrestrial system was limited, except for the deep vadose zone. However, there is still a lack of studies with integrated land surface-subsurface models to investigate assimilation of real groundwater measurements at the larger catchment scale. The objective of this study was to investigate whether assimilating groundwater data into the integrated terrestrial systems model TSMP at the larger catchment scale for a real-world case is able to achieve a better characterization of groundwater levels (and other terrestrial system states and fluxes) than an OL run and identify the main limitations and complications in practice. Furthermore, soil moisture measured by cosmic-ray neutron sensors was used to verify the model simulation accuracy and evaluate whether assimilating WTD data can improve soil moisture characterization. This is a novel contribution, as the assimilation of groundwater measurements in integrated land surface-subsurface models at . Materials and methods

. . Study area and data
The simulation domain is the Rur catchment (2,354 km 2 ) which is situated in western Germany and includes a small part of Belgium and the Netherlands. The Digital Elevation Model (DEM) for the area was acquired from SRTM 90 m Version 4 (Jarvis et al., 2008) and is shown in Figure 1. The altitude ranges from 15 to 690 m a.s.l., decreasing from south to north, and the Rur river flows from the Eifel hills in the south to the northern flat terrain. From the northern to the southern part of the catchment, long-term average annual precipitation ranges from 650 to 1,300 mm, mean annual air temperature decreases from 10 to 7 • C and mean annual potential evapotranspiration varies between 850 and 450 mm (Montzka et al., 2008;Bogena et al., 2018). The land use types were taken from the CRC/TR32 Database (Waldhoff and Lussem, 2015) and are mainly agriculture (corn, sugarbeet and wheat in the north), grassland, and coniferous and deciduous forest (southern mountainous areas).
The high-resolution (1:50,000) regional soil map BK50 (Nordrhein-Westfalen, 2009) (see Figure 2) and Pano (2006) were used to obtain the soil characteristics and to calculate the soil hydraulic properties. Bulk density was obtained from ESDB.
Based on the thickness of the BK50 soil layers, we treat the layers below the soil layers as aquifer layers. The upper aquifer hydraulic conductivity (see Figure 3) was obtained from the Information System Hydrogeological Map of North Rhine-Westphalia with a resolution of 1:100,000 (https://www. opengeodata.nrw.de/produkte/geologie/geologie/HK/ISHK100/). The permeability of the aquifer is based on different classes of rock types.
The high-resolution reanalysis dataset COSMO-REA6 developed with the numerical weather prediction (NWP) model COSMO (Baldauf et al., 2011) is used as atmospheric forcing in this work (Bollmeyer et al., 2015;Wahl et al., 2017). Currently, the reanalysis covers the period 1995-2019 at a high spatial resolution of 0.055 • (6 km) and is continuously extended by the German Weather Service (Deutscher Wetterdienst; DWD). The forcing data include precipitation, air temperature, air pressure, wind velocity, specific humidity, incoming shortwave radiation, and incoming longwave radiation.
The measured WTD data from the monitoring network Geoportal NRW (www.geoportal.nrw) were used for assimilation and some as independent verification data for the model simulations. For the year 2018, there were 865 sites located in shallow and deep aquifers of the Rur catchment that monitored the WTD (see Figure 1), and most measurement sites are distributed along the river. The observation frequency varies for each site, and can be daily, weekly, or monthly. In 2018, there were 575 sites with WTD between 0 and 20 m. We only used the sites with WTD between 0 and 20 m to be sure that only measurements from the upper aquifer were included for assimilation or verification, as our model only considered the upper 20 m. CRNS is a precise method to measure soil moisture at the field scale (Zreda et al., 2008;Baatz et al., 2014;Köhli et al., 2015). The Rur catchment CRNS network comprises 13 CRNS stations (see Table 1) (CRS1000, HydroInnova LLC, 2009) (Baatz R. et al., 2017;Bogena et al., 2022) and these observation sites are relatively evenly distributed over the study area (see Figure 1). CRNS measures the fast neutron intensity, and the measured number of neutron counts shows an inverse correlation with soil moisture content. Fast neutrons originate from the collisions of secondary cosmic particles from outer space with terrestrial atoms. Fast neutrons are moderated most effectively by hydrogen, since the mass of the neutron is similar to the mass of a nucleus of the hydrogen atom. Thus, the corresponding fast neutron intensity measured by CRNS strongly depends on the amount of hydrogen within the CRNS footprint, allowing for a continuous non-invasive soil moisture estimate at the field scale (Baatz R. et al., 2017). The horizontal footprint of this measurement matches the 500 m horizontal model resolution quite well. It can measure soil moisture until 83 cm depth under very dry conditions, and to 15 cm depth under very wet soil conditions (Köhli et al., 2015;Schrön et al., 2017).

. . Model description (TSMP)
The coupled terrestrial system model used consists of three compartments integrated under the framework TSMP, the 3D variably saturated groundwater flow model ParFlow for the subsurface (Kollet and Maxwell, 2006), the land surface model CLM version 3.5 (Community Land Model) from the National Center for Atmospheric Research (Oleson et al., 2007(Oleson et al., , 2008, and the numerical weather prediction model COSMO (Consortium for Small Scale Modeling) (Baldauf et al., 2011). These three models are two-way coupled by the Ocean Atmosphere Sea Ice Soil coupling Model Coupling Toolkit (OASIS-MCT, version 3) (Valcke, 2013). The integrated modeling platform TSMP can run with different combinations of the component models (Shrestha et al., 2014). In this study, CLM-ParFlow was used without COSMO.

. . . Land surface model CLM
The biophysical processes simulated by CLM3.5 include solar and longwave radiation interactions with vegetation canopy and soil, momentum and turbulent fluxes from canopy and soil, canopy hydrology (e.g., interception processes), soil hydrology, and stomatal physiology and photosynthesis (Oleson et al., 2007). The mass and energy balance solved by CLM include soil evaporation, evaporation from intercepted water, transpiration from plants, infiltration of water in the soil, sensible and ground heat fluxes, and freeze-thaw processes (Oleson et al., 2007(Oleson et al., , 2008.
The nested subgrid hierarchy is used to represent spatial land surface heterogeneity (Oleson et al., 2008). Each grid cell is divided into a variety of land units (glacier, lake, wetland, urban, and vegetated), where each land unit can have a different number of snow/soil columns, and each column can have multiple plant functional types (PFTs) (Bonan et al., 2002;Oleson et al., 2008). In CLM, the soil column and snow column are discretized into 10 and 5 vertical layers, respectively (Oleson et al., 2007(Oleson et al., , 2008. Each PFT is characterized by distinct plant physiological parameters, which could capture the biogeophysical and biogeochemical differences between the different vegetation types (Oleson et al., 2007(Oleson et al., , 2008.

. . . Subsurface hydrological model ParFlow
In TSMP, the soil hydrology of CLM is substituted by the soil hydrology of ParFlow (Kollet and Maxwell, 2008) and also surface runoff and groundwater flow are calculated by ParFlow. ParFlow is a three-dimensional variably saturated groundwater flow model improved with a two-dimensional overland flow simulator (Ashby and Falgout, 1996;Kollet and Maxwell, 2006). It combines the kinematic wave equation (Lighthill and Whitham, 1955) and the 3D Richards' equation (Richards, 1931) to describe the dynamic coupling of surface-subsurface flow under overland flow boundary conditions (Kollet and Maxwell, 2006). In ParFlow, the threedimensional Richards' equation can be written as follows (Maxwell, 2013): and, Frontiers in Water frontiersin.org . /frwa. .
; q r is a general source/sink term that represents transpiration, wells, and other fluxes [LT −1 ]; x is the length along the x-axis specified as horizontal direction [L]; z is the elevation along the z-axis specified as upward to be positive [L]; v is the subsurface flow velocity [LT −1 ]; K s (x) is the saturated hydraulic conductivity tensor [LT −1 ]; k r is the relative permeability.
ParFlow requires input soil hydraulic parameters like saturated hydraulic conductivity, porosity, and Mualem-van Genuchten parameters (van Genuchten, 1980;Kollet and Maxwell, 2008). For our study, saturated hydraulic conductivities of soil layers were calculated from sand, silt and clay contents and bulk density using the software rosettav3 H3, which is based on an artificial neural network analysis coupled with a bootstrap resampling method (Schaap et al., 2001;Zhang and Schaap, 2017). To keep hydraulic consistency between CLM and ParFlow, porosity (θ s ) for both models is calculated on the basis of the sand fraction via the following pedotransfer function in CLM (Oleson et al., 2007): The van Genuchten formulation (van Genuchten, 1980) is employed to evaluate the pressure head from soil moisture data: where Ψ is subsurface pressure head [L]; θ s is porosity; θ r is residual soil moisture content; α is a measure of the first moment of the pore size density function [L −1 ]; n is an inverse measure of the second moment of the pore size density function; and m = 1-1/n.
The Newton Krylov solution technique is applied in ParFlow and acts as a non-linear solver (Jones and Woodward, 2001). The coupled partial differential equations for subsurface flow and surface water flow are solved by the Newton-Krylov method with multigrid preconditioning, which is good at handling subsurface flow problems at large-scales in highly heterogeneous media and under variably saturated conditions Maxwell, 2006, 2008;Maxwell, 2013). A prominent advantage of ParFlow is that it was designed for parallel computer systems, so that it can efficiently compute large-scale problems at high resolution, which has been demonstrated in many studies (Jones and Woodward, 2001;Maxwell, 2006, 2008).

. . . Coupling interface OASIS-MCT
The external coupler OASIS-MCT (Valcke, 2013) is used to couple CLM and ParFlow, and control the exchange of fluxes between the different component models (Shrestha et al., 2014). When the fluxes correspond to different spatial and temporal scales, OASIS-MCT uses time integration/averaging and spatial interpolation operators to keep the scales consistent . In TSMP, ParFlow provides CLM with the upper 10 subsurface layers pressure and saturation, and in turn, CLM provides ParFlow with the upper boundary condition, which is net infiltration or exfiltration. The net infiltration includes precipitation, interception, total evaporation, and total transpiration (Zhang et al., 2018).

. . LEnKF methodology
DA consists of a forecast and an analysis step. For the forecast step, the state estimation is only based on past data (McLaughlin, 2002). For the analysis step, the information from current measurements and from a prior short-term forecast (which .
is based on past data) is used to produce a current state estimate. Then, the estimate will be used to initialize the next short-term forecast, which is subsequently used in the next analysis, and so on (Hunt et al., 2007). The EnKF sequentially performs a model forecast and a filter analysis. The efficiency of the filter relies on the accurate determination of the forecast error covariance from the ensemble, and the main sources of forecast errors are initial conditions, forcing data, model parameters, and model equations (Turner et al., 2008). Perturbation approaches can take these error sources for the ensemble generation into account. For each ensemble member j at time step i, the state vector x j,i in the forecast step is updated by observations and model predictions and is given by: where j is the ensemble member, x j,i is the model forecast state vector at time step i (pressure head in our study), M is the model TSMP, x j,i−1 is the earlier model analysis with state vector at time step i-1, q j,i is the vector with (perturbed) model forcings (perturbed forcings are precipitation, incoming shortwave radiation, incoming longwave radiation, and air temperature in this study) and p j,i denotes the model perturbation vector with parameters (porosity and saturated hydraulic conductivity in this study). In summary, in this work, the ensemble of model realizations is generated by different initial conditions, forcings, and parameters. Model forecasts are updated according to: Where, y j,i is the vector with (perturbed) observations, and the superscripts a and f refer to the updated state vector (the analysis) and the model predicted state vector, respectively. The observation operator H i is used to map model forecasts into the observation space, which is here assumed to be linear, and K i denotes the Kalman gain that is calculated as: where R i is the measurement error covariance matrix, and P i is the model covariance matrix, which is calculated from the forecasted ensemble of model simulations at time step i according to: where x f is a vector with ensemble mean values for the model states at time step i. N is the number of ensemble members.
The estimation of the covariances with a limited ensemble is affected by strong sampling fluctuations, and the estimated covariances might be affected by spurious correlations Mitchell, 1998, 2001). Houtekamer and Mitchell (1998) suggested a localization approach to remove spurious correlations to avoid filter divergence, limiting the updates to the surroundings of observations. Based on the localization of the error covariances proposed by Houtekamer and Mitchell (2001), in the evaluation of the Kalman gain in Equation 7, P i is replaced by ρ • P i , ρ • P i represents the Schur product of the correlation matrix ρ and covariance matrix P i , where ρ is a correlation matrix containing correlations between the grid cells (which are set to zero for grid cell combinations that are separated beyond a certain threshold). And the ρ and P i should have the same dimensions, so the LEnKF analysis scheme can be expressed as: Here ρ is determined by using a fifth-order piecewise function, as given by Gaspari and Cohn (1999). The correlation ω between a grid point and an observation, i.e., an element in ρ, can be approximated as Hu et al. (2012): Where, l is the defined localization radius and e is the Euclidean distance between an analyzed grid point and an observation location. The correlations ω are distance-dependent and vary between 1 at observation locations and 0 at distances greater than twice the influence radius l. Only the observations located within the localization radius from an analyzed grid point can contribute to the analysis for this grid point (Hu et al., 2012). The cutoff radius can filter out small and noisy correlations associated with remote observations (Houtekamer and Mitchell, 2001). A larger radius may contain more spurious correlation, resulting in less effective assimilation. In contrast, a radius that is too small limits the influence of observations too much to update neighboring grid cells. Therefore, determining an appropriate assimilation localization radius is crucial.

. . Assimilation methodology
To be able to assimilate WTD measurements into TSMP, WTD data need to be transferred into pressure accordingly (see Figure 4). At locations with WTD measurements, the pressure head in the saturated zone is calculated from the measured WTD assuming a hydrostatic pressure distribution, according to Zhang et al. (2018): Where Ψ i is the pressure head at the i th soil layer [L], D i is the depth from land surface to the i th soil layer [L], and WTD obs is the observed WTD [L].
In our study, in order to ensure stability and avoid the occurrence of anomalous pressure values in the unsaturated zone  related to updating pressure in the DA step, a weakly coupled approach was followed, which implies that only pressure in saturated layers is updated during assimilation. Hung et al. (2022) found that the weakly coupled approach outperformed the fully coupled approach for assimilating WTD measurements in TSMP. In the OL run, the vertical division between the unsaturated and saturated zones will differ among ensemble members. But as stated in Zhang et al. (2018), every grid cell should be updated consistently in DA, so the definition of the state vector should be the same for all ensemble members. The saturated and unsaturated zones are defined by the deepest WTD among the ensemble members, following Zhang et al. (2018). In the analysis step, only the pressure head values for the defined saturated zone will be directly updated via LEnKF.

. Experimental setup . . Ensemble generation and simulations
The simulation domain was discretized with a horizontal spatial resolution of 500 m. The study domain has a vertical extension of 20 m, which is discretized into 20 soil and aquifer layers with variable thicknesses. The thicknesses of the 10 uppermost layers increase exponentially with depth and extend to a total of 3 m. The deeper ten subsurface layers have thicknesses of 1 m (three layers) or 2 m (seven layers).
It is expected that the assimilation performance improves with increasing ensemble size (number of realizations), as found, for example, in studies with groundwater flow models (Chen and Zhang, 2006). An increasing ensemble size also implies higher computational costs. Hendricks Franssen and Kinzelbach (2008) indicated that 100 realizations should be sufficient for real-time groundwater flow modeling problems with state updating only. For combining state and parameter estimation, the ensemble size needs to be larger. As a compromise between accuracy and available compute time and data storage, we established an ensemble with 128 members for WTD assimilation in this work.
Meteorological forcings, hydraulic conductivities, and porosity were perturbed to generate the ensemble. Four atmospheric variables were perturbed: precipitation, incoming shortwave radiation, incoming longwave radiation, and air temperature. The meteorological forcings were perturbed without spatial correlation, while temporal correlations were induced by a firstorder autoregressive model (Reichle et al., 2010;Han et al., 2015). Since the four meteorological variables are correlated, random values were drawn from a multivariate normal distribution. The statistics of the perturbed atmospheric variables are summarized in Table 2. The temporal correlations and standard deviations of the perturbations were chosen based on previous catchment-scale and regional-scale DA experiments (Reichle et al., 2010;Han et al., 2013Han et al., , 2015Baatz R. et al., 2017).
Precipitation and shortwave radiation were multiplied by lognormally distributed noise (Han et al., 2013). A direct back transformation would induce a bias (resulting typically in a larger mean precipitation and larger mean incoming shortwave radiation), and therefore a correction is applied (Yamamoto, 2007): Where, Z * j,i is the bias corrected perturbed variable of ensemble member j at day i, Z i is the original variable at day i, K is the corrective factor, and x j,i is the random perturbation of ensemble member j at day i from the multivariate normal distribution. N is the number of ensemble members (128 in this study).
Not only atmospheric forcing, but also hydraulic conductivity was perturbed in this study as the uncertainty of this parameter is in general large with an important effect on the groundwater flow prediction. We use different K s data for the upper soil layers and lower aquifer layers. Hence, the K s values of the soil and aquifer layers were perturbed separately ( Table 3). The K s values for soil layers were perturbed by perturbing the sand and clay contents first, and then applying the Rosetta pedotransfer functions (Schaap et al., 2001;Zhang and Schaap, 2017) to obtain the perturbed K s . Sand and clay content were perturbed by calculating a field of spatially correlated perturbation values with geostatistical simulation and mean zero. A spherical variogram model was used, with nugget 0, sill of 50% 2 , and range 25 km. In order to avoid unphysical values for the soil textures, the sum of the sand and clay content were constrained between 0 and 100%. The K s of the bottom aquifer layers were perturbed for each hydrogeological unit by taking a value from a univariate uniform distribution with values between −0.5 and −0.5 and adding this to the mean K s of the unit [log 10 (m/s)]. The porosity for the upper soil layers was determined according to Equation 3 and on the basis of the perturbed sand contents, while for the bottom aquifer layers, the constant value of 0.15 was used, without perturbation.
It is known from previous studies that the spin-up for the model TSMP significantly influences the simulated WTD. The 100 year spin-up for 128 ensemble members departed from a WTD of 0 m, and was forced by 30-year average recharge values (derived from gridded data of precipitation and actual

. . Selection criteria for assimilated sites
There is a spatial mismatch between the point-scale groundwater measurements and the TSMP grid cell size of 500 m. In order to compare the measured WTD data with the model simulated values, each groundwater observation site was assigned to the nearest grid cell center. It is therefore common to have several measurement sites located in the same grid cell. We kept the groundwater measurement site which had the median value of all measurement sites in the grid cell for the year 2018, while the rest of the measurement sites were excluded from assimilation.
In addition, due to the relatively coarse model resolution, some measurement sites were located in model river grid cells. If all soil layers for each ensemble member in a grid cell were saturated for the complete year, the grid cell was considered to be a river grid cell. The river grid cells were eliminated from the analysis as groundwater measurements are not informative for the pressure values in river grid cells. Grid cells directly next to rivers were also excluded from the DA procedure, as these grid cells were also saturated most of the time and sometimes became part of the river. In this study, within the localization radius, we assimilated only observations from one site, with measurement values that are median values considering all sites in the localization radius.
In our study, the impact of the localization radius on the assimilation results was investigated, and three different localization radii were considered for assimilating groundwater measurements: 10, 5, and 2.5 km. According to the assimilation site selection criteria, three different localization radii resulted in 10 groundwater sites being selected for each of the DA experiments. Also, we used the groundwater data from unassimilated locations to investigate whether the localized assimilation could also improve the groundwater estimation at locations without assimilating data.

. . Evaluation of model performance
The root mean square error (RMSE) and bias (BIAS) were calculated to evaluate the performance of the WTD assimilation. The RMSE of WTD at each time step is calculated as: where N is the total number of observation sites, WTD sim n,i is the ensemble average groundwater table depth of the grid cell where the observation site is located at the time step i (either from an OL run or a DA run), and WTD obs n,i is the observed WTD at the n th site and time step i.
The bias is also specified to quantify systematic differences between simulated and measured WTD: In addition, simulation results were also compared with measured soil moisture content by CRNS. We follow the approach by Schrön et al. (2017), where weighted soil moisture content from the simulations was compared with CRNS measurements. The indicators, including BIAS, correlation coefficient (R), RMSE and unbiased root mean square difference (ubRMSD), are used to evaluate simulated soil moisture compared with the CRNS . /frwa. . measurements. For each CRNS site, the above indicators were calculated individually and aggregated over time.
where T is the total number of time steps, θ sim i is the simulated (either from an OL run or a DA run) ensemble average soil moisture content at the time step i, and θ obs i is the observed soil moisture by CRNS at the time step i. The overbars in equations 16 and 18 indicate the temporal mean over the entire time period.
. Results and discussion . . Water table depth . . . Spatial autocorrelation analysis For localized assimilation, the selection of the appropriate radius of localization is important. The localization radius should not be too small in order not to neglect positive correlations, and it should not be too long so that areas with spurious correlations are excluded. Therefore, we calculated the spatial autocorrelation of groundwater level measurements and simulated groundwater tables in the OL run (see Figure 5) to identify the appropriate radius. The spatial autocorrelations for different distance classes (0-0.5, 0.5-2.5, 2.5-5, 5-10, 10-20, 20-30, 30-40, 40-50, 50-60, and 60-70 km) were determined. The spatial correlation functions for the measured and WTD are quite close, implying that the model represents quite well the spatial correlation of the real groundwater levels in the Rur catchment. The largest differences are found for shorter distances, where the model autocorrelation is higher than the measured autocorrelation.

. . . Di erent localization radius assimilation strategies
Based on the spatial autocorrelation analysis, the localization radius could be up to 10 km. We tested 10, 5, and 2.5 km as localization radii, including 10 groundwater measurement sites for the assimilation.
For all scenarios, the RMSEs of WTD after 1 year of assimilation were lower than those of OL at the measurement locations (see Figure 6). The histogram of WTD errors at measurement locations also illustrates the improved WTD characterization after DA, compared with the OL. It can also be observed that the OL results on average in WTD values were larger than the measured ones, implying that the simulated WTDs were deeper than the measured ones. DA resulted in a reduction of the bias, and the peak of the histogram is closer to zero than OL. Thus, in all cases, LEnKF strongly reduced the bias and RMSE of WTD, compared to the scenario without assimilation of groundwater data, and the simulation improvement is best when using 10 or 5 km localization radius, and slightly worse for 2.5 km radius.
Since the groundwater assimilation results for the three radii were similar at the assimilation sites, and the best results were obtained for the 10 km radius, only the simulated WTD from the 10 km localization radius DA run is shown in Figure 7, and compared with the WTD from the OL run and the measurements. The changes in groundwater levels during assimilation show that once assimilation starts, the WTD gets closer to the measurements.
The impact of the DA is similar for the different localization radii, with the same regions affected by increases or decreases in WTD (see in Figure 8). The main difference is that for a larger localization radius, the area updated by assimilation is .
/frwa. . larger, with a stronger reduction of ensemble standard deviations. However, for some areas, the ensemble standard deviation was larger for the DA run than for the OL run. This occurred when the measurements deviated strongly from the ensemble of OL runs. With DA, some ensemble members became closer to the observations, but others were not, resulting in an increase in the ensemble dispersion.

. . . Data assimilation verification
To explore the impact of the assimilation of WTD measurements, we also evaluated the WTD characterization at verification locations (555 sites in total) which were not included in the assimilation, for the three different localization radii. We only show results for verification locations within the localization radius and only if enough measurement data were available for assimilation at a given time step (see Figure 9). Table 4 shows the RMSE for the OL and DA simulations, averaged for the period of 1year. At verification locations, the RMSE of the WTD also decreased, especially closer to the assimilation location, with verification locations separated <2.5 km from assimilation locations. DA could improve the groundwater simulation around measurement locations, which is consistent with the results by Hung et al. (2022).

. . Soil moisture
The impact of WTD assimilation was also evaluated with in-situ soil moisture measurements from CRNS networks. Simulated soil moisture in OL and DA runs (for 10 and 5 km localization radius) was compared with CRNS measurements. The OL results indicate that simulated soil moisture contents have similar temporal variations as measured soil moisture contents (see Figure 10). Assimilation of WTD measurements did not result in an obvious improvement for soil moisture estimation (see Table 5). Hung et al. (2022) also found that assimilating groundwater table data only slightly improved soil moisture characterization with RMSE reductions between 1 and 6%, and the improvements were limited to a relatively small area around observation locations. This is related to the fact that soil moisture is only indirectly updated by the propagation of the pressure below the groundwater table. Therefore, when the groundwater table is deep, the impact of WTD assimilation on the upper soil moisture is small.

. . Discussion
In all DA experiments, the estimation of the WTD improved, and also close to observation sites an improved groundwater characterization was found. This shows that for real-world cases, the localized EnKF could merge the integrated model TSMP with data to more accurately simulate the groundwater table.
There are some caveats regarding the use of in-situ groundwater observations to do assimilation and validate model estimates. Since the spatial representativeness of model and measurements are different, it is non-trivial to assimilate the in-situ groundwater measurements into the integrated model and to evaluate the coarse resolution model results against in-situ measurements. In our study, the model has a grid resolution of 500 m, while the groundwater measurements are obtained from points. Many observation sites were located in the same grid cell and were not included in the assimilation in this work. In future work, these measurements should be assimilated by modifying the measurement operator. However, this will not resolve all issues regarding scale mismatches. As the coarser model resolution flattens the topography, and therefore the gradients for surface and subsurface water flow, a systematic bias in the simulated groundwater table can be expected and is also observed in this study. In theory, for data assimilation, we should not have systematic differences between simulated and measured values, and prior bias correction would be a strategy to consider. In practice, we normally have to deal with systematic biases in data assimilation and if the ensemble spread is large enough, the model states can still be corrected toward the measurements. Removing the systematic bias in simulated groundwater levels with TSMP is not trivial as it depends on the model resolution. An extensive effort is needed to remove the systematic bias, which is a research question in itself and beyond the scope of this study. We argue that in the future better results can be obtained if a higher model resolution of 100 m instead of 500 m is used so that groundwater bodies related to narrow valleys can be better represented.
In addition, the model TSMP in this study only considers a vertical depth up to 20 m, and only one upper unconfined aquifer is better modeled. However, the real situation is much more complicated, as typically multiple unconfined and confined aquifer layers exist. As our model only models the 20 m subsurface, measurements relating to deeper aquifers were also excluded. In future work, an extension of the vertical depth could provide more realistic simulations, but for this, it would be important to have more detailed 3D geological information.
The spatial autocorrelation analysis indicates that groundwater levels were correlated for separation distances up to 10 km. However, groundwater level characterization was only improved in a smaller area for locations separated by <2.5 km from measurement sites. Hung et al. (2022) used a dense observation .
/frwa. .   illustrate that for a real-world application, the improvement is more limited, which will be related to model structural errors like inadequate grid resolution and missing information on pumping activities. Theoretically, only grid cells within the localization radius can be updated in the analysis step (Houtekamer and Mitchell, 2001). However, as the assimilation proceeds over time, updates around measurement locations can laterally propagate through the working of the physical equations, and this effect could be particularly strong in the saturated zone given the importance of lateral flow in the saturated zone. In the assimilation experiments with 10 and 5 km localization radius, there were no obvious improvements in the characterizations of soil moisture content by TSMP. Though the groundwater bias was corrected after assimilation, soil moisture does not change significantly with the change in deep groundwater tables. Also, Hung et al. (2022) discovered topography variations and lateral groundwater flow greatly influence groundwater levels, making soil moisture data probably less informative for groundwater levels, which also supports the findings of this study. Hung et al. (2022) found a slight improvement for soil moisture characterization related to groundwater level assimilation, which was not found in this study. The worse performance in this real-world study might be related to model structural errors, as Hung et al. (2022) simulated a catchment of similar complexity, but in a synthetic version that mimicked that catchment. A further reason might be the limited number of groundwater assimilation and soil moisture validation sites used in this study. We assimilated only groundwater level data, but soil moisture was not measured at the same locations, and soil moisture verification locations were separated from the groundwater monitoring sites. Improved results can be expected for more ensemble members and/or a higher spatial resolution, which was not feasible in this work, as only one single DA experiment with 128 ensemble members required 73,728 core hours (the spin up not included) and 1.75 TB of computer storage for 1 year of simulation at a daily timescale.
Furthermore, although updating saturated hydraulic conductivities in Hung et al. (2022) only marginally improved the simulation of subsurface states, compared with only state updating, we will explore in future work the role of this important parameter in groundwater modeling. This should be evaluated for simulations at higher spatial resolution and larger ensemble sizes.

. Conclusions
The localized Ensemble Kalman Filter was used to assimilate groundwater level measurements into the integrated terrestrial system model TSMP for the ∼2,000 km 2 Rur catchment. This is the first application of the assimilation of observed WTD data into the integrated land surface-subsurface model TSMP for a real-world case. Earlier work focused on a synthetic case, mimicking the Neckar catchment in southwest Germany. For the Rur catchment, 128 ensemble members were generated by perturbing four atmospheric forcing variables, saturated hydraulic conductivities and porosity. The perturbed ensemble was used as input in the TSMP-PDAF data assimilation framework and assimilation experiments were done for different localization radii (10, 5, and 2.5 km). The performance of WTD assimilation was assessed by comparing results from OL and DA experiments, and using groundwater observations and soil moisture measurements from cosmic ray neutron sensors as verification data. The main findings are: 1. The WTD simulated by the integrated model TSMP could be improved by localized EnKF, with more than 75% RMSE reduction at the assimilated locations for 3 different localization radii. The positive impact of assimilation is limited to the vicinity of the assimilated locations. The localized WTD assimilation is greatly affected by the unevenly distributed groundwater observations.
2. Simulated soil moisture generally reproduced the observed temporal fluctuations of soil water content, but soil moisture characterization was not improved after WTD assimilation. This can be related to the fact that only the saturated zone was directly updated via assimilation (and the unsaturated zone only indirectly), and the presence of model structural errors like a relatively coarse grid resolution of 500 m and missing information on groundwater pumping activities, for example.
3. Systematic differences between simulated and measured WTD might be related to the too coarse model resolution and model structural errors. Future work should focus on DA with integrated land surface-subsurface models at a higher spatial resolution and with more ensemble members, which would allow parameter estimation. In addition, the measurement operator needs to be considered for multiple groundwater level observations in a grid cell.

Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions
FL and H-JHF designed the experiments. FL prepared the data for the experiments with the help from WK, conducted all the experiments and analyzed the results with feedback from CPH, HV, and H-JHF, and prepared the manuscript with contributions from all co-authors. All authors contributed to the article and approved the submitted version.

Funding
This research was funded by China Scholarship Council (CSC), grant number. 201904910448.