Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Remote Sens., 23 January 2026

Sec. Microwave Remote Sensing

Volume 6 - 2025 | https://doi.org/10.3389/frsen.2025.1718353

ASCAT soil moisture retrieval using deep learning: a focus on localization strategy

  • Estellus, Paris, France

This study investigates the intercomparison of daily soil moisture (SM) retrieval from ASCAT (Advanced SCATterometer) observations using machine learning. The exploitation of spatial structure through convolutional neural networks (CNNs) is shown to significantly enhance retrieval performance compared to a standard multilayer perceptron (MLP), with spatial correlation with the target ERA5 SM increasing from 0.55 to 0.91 and temporal correlation from 0.61 to 0.73. Incorporating “localization” (i.e., a strategy to adjust the neural network (NN) behavior to local conditions) into the model is a key factor for improving retrieval quality, resulting in more accurate SM estimates, reduced regional biases, improved temporal dynamics, and more realistic representations of extreme SM events. Our NN-based retrievals show strong agreement with in situ SM measurements, achieving temporal correlations of 0.60 and 0.68 for the MLP and CNN models, respectively, in the contiguous United States (CONUS) during 2019. These findings underscore the critical role of spatial learning and localization in SM retrieval from remote sensing data such as ASCAT.

1 Introduction

Soil moisture (SM) is a vital component of the Earth system, regulating the exchange of water and energy between the land surface and the atmosphere. It serves as a critical factor in numerous applications, including agricultural monitoring, drought and flood forecasting, water resource management, and climate prediction (Seneviratne et al., 2010; Ochsner et al., 2013; Vereecken et al., 2015). Recognizing its importance, extensive efforts have been made to obtain SM estimates across different temporal and spatial scales. These methods include ground-based in situ measurements (Dorigo et al., 2013), satellite-based retrievals (Liu et al., 2012; Aires et al., 2021b), and land surface model simulations (Guo et al., 2006). While each method has its strengths, they also face limitations related to coverage, representativeness, and accuracy.

In the last two decades, data-driven methods, particularly neural networks (NNs), have shown strong potential for SM retrieval (Aires et al., 2001; Aires et al., 2005; Rodríguez-Fernández et al., 2015; Aires et al., 2021b). Multilayer perceptrons (MLPs) provide a robust and interpretable pixel-scale baseline, aligning with current operational retrieval frameworks (Rodríguez-Fernández et al., 2019; Aires et al., 2021b). More advanced convolutional neural networks (CNNs) extend these models by exploiting the satellite data at the image scale. Unlike physically based inversion methods, these approaches are purely data-driven: they learn statistical relationships between remote sensing observations and soil moisture states, enabling flexible, scalable, and potentially more accurate retrievals (Zhang et al., 2025). However, their performance is highly dependent on the choice of input data. The inclusion of relevant predictor variables, whether physical (e.g., brightness temperature, vegetation indices, land surface temperature) or geographical (e.g., land cover, soil texture, topography), directly influences retrieval accuracy and generalization.

Previous retrieval studies have improved physical realism by integrating auxiliary variables to constrain the radiative transfer models better. For instance, Aires et al. (2001) included surface emissivities along with microwave satellite observations to improve surface temperature retrieval, effectively accounting for local variations such as humidity and vegetation. Due to the lack of reliable, real-time emissivity products at that time, first-guess emissivities from climatologies were used to provide a seasonal and location-specific constraint. In another example, as SM influences various satellite observations (Prigent et al., 2005), early efforts such as Aires et al. (2005) developed global NN retrieval models that leveraged synergies across visible/infrared, active, and passive microwave sensors. This global approach has continued in later studies, such as in Kolassa et al. (2016), and has evolved to include a broader range of inputs. More recently, Greifeneder et al. (2021) integrated data from a diverse collection of datasets (i.e., synthetic aperture radar, Landsat, MODIS, land cover, and other sources), with a total of 62 input features for a global NN-based SM retrieval. The growing trend in NN-based SM retrievals has been toward using increasingly large and diverse input sets. While this may help capture complex environmental interactions, it can sometimes obscure the role of localized factors. To address this, some studies have begun to explicitly introduce geo-localization information to enhance localization in NN models (Mahara and Rishe, 2023; Singh and Gaurav, 2023; Han et al., 2023). Incorporating both physical (i.e., dynamic) and geographical (i.e., static) inputs has thus become a common strategy.

In this study, we evaluate the performance of NN-based models for SM retrieval from the Advanced SCATterometer (ASCAT) observations, with a particular focus on the role of “localization”—–the incorporation of region-specific information to allow NN models to adapt more effectively to local conditions (Boucher et al., 2023; Pellet et al., 2025). We compare MLPs and CNN2 with varying levels of localization, ranging from simple models based only on physical or geographical inputs to highly localized approaches such as pixel-wise MLPs and locally connected CNNs. Other modern architectures, such as convolutional neural network-long short-term memory (ConvLSTM) models, which account for both spatial and temporal dependencies, have demonstrated strong performance in forecasting SM (Wang et al., 2024; Rabiei et al., 2025b; Rabiei et al., 2025a). However, as this study focuses on retrieval rather than forecasting, such models were excluded from our analysis. Nonetheless, temporal modeling could also offer potential benefits for retrieval. Further extensions will therefore investigate temporal architectures that are better suited for retrieval tasks than ConvLSTM, such as transformers or embedding-based methods. We assess here the ability of MLPs and CNNs to retrieve daily SM and capture extreme values (minimum and maximum) using a 2019 validation dataset. Model performance is evaluated against in situ measurements from the International Soil Moisture Network (ISMN; Dorigo et al., 2013; Dorigo et al., 2021) and compared with the ASCAT-derived SM product. The analysis is focused on the contiguous United States (CONUS; 125°W–70°W, 25°N–50°N), a region characterized by dense in situ measurements and pronounced spatial variability (Bernhardt et al., 2018). While CONUS provides an ideal testbed for assessing model skill, the methodology is inherently generalizable and applicable to other well-instrumented regions, such as Europe.

In the following, Section 2 describes the datasets employed in this study and presents preliminary information content analysis between the target SM and the other considered predictors. Section 3 details the NN-based architectures, localization strategies, and evaluation metrics. Section 4 assesses model performance at the daily scale and investigates the effects of localization. Section 5 focuses on extreme SM retrievals, comparing localized and non-localized NN-based models. Section 6 reports the evaluation against in situ measurements. Finally, Section 7 summarizes the findings, highlights practical implications, and discusses the potential of deep learning and localization approaches for advancing remote sensing–based SM retrievals.

2 Datasets

2.1 ERA5 database

We employed the ERA5 reanalysis dataset (i.e., the fifth generation of ECMWF (European Centre for Medium-Range Weather Forecasts) reanalysis) (Hersbach et al., 2023), which provides consistent land–atmosphere variables at hourly temporal resolution and 0.25° spatial resolution, based on the ECMWF coupled assimilation system. Here, we chose ERA5 SM as the training target since it currently provides one of the most accurate and spatially coherent large-scale SM datasets available (Soci et al., 2024). This choice is a well-established practice in previous developments of machine-learning SM retrievals (Aires et al., 2005; Rodríguez-Fernández et al., 2019; Aires et al., 2021b; Pellet et al., 2025).

The following ERA5 variables relevant to SM dynamics were considered in this study:

Soil moisture (SM) – Volumetric SM (m3/m3) is available at multiple depths. We selected the 0–7 cm layer to match the penetration depth of ASCAT observations.

Soil temperature (ST) – ST is important for retrieving SM due to its thermal properties (Campbell, 1985), which indirectly relate ST and soil water content. We used the topmost layer (0–7 cm), which is most relevant to ASCAT observations.

Leaf area index (LAI) – LAI was included as a vegetation descriptor due to its impact on the backscatter coefficient in microwave remote sensing (Prigent et al., 2005; Petchiappan et al., 2022).

Antecedent precipitation evaporation index (APEI) – APEI is a well-established predictor of SM, representing the balance between precipitation and evaporation integrated over a decaying temporal window (Shaw et al., 1997; Han et al., 2023). It is defined as:

APEIt=i=0Nkiptieti,

where k is an empirical factor to indicate the decay effect from the rainfall, pti and eti are the precipitation and evaporation values respectively, at ith day before the reference day t; N is the maximum number of days considered prior to day t. Following Han et al. (2023), we adopted k = 0.91, N = 33, as this configuration demonstrated a strong correlation with ERA5 SM in the preliminary analysis (see Section 2.5). Although these values were presented as globally optimized parameters, they have been validated against a long-term in situ SM network across the United States from the United States Climate Reference Network (USCRN). We also conducted additional tests (not shown), exploring alternative parameter combinations, but did not observe any significant improvements. Therefore, we have decided to use the same combinations of k and N as in Han et al. (2023), which are considered appropriate for our study region.

Amplitude of the diurnal surface temperature cycle (dT) – The diurnal temperature amplitude is closely linked to soil thermal inertia and, consequently, to SM content (Prigent et al., 2005). Therefore, alongside the aforementioned variables, dT serves as an auxiliary predictor in our SM retrieval framework. We computed dT as the difference between the daily maximum and minimum hourly ERA5 surface temperatures.

The original hourly ERA5 variables were aggregated to a daily scale. Precipitation and evaporation were converted to cumulative daily values, while all other variables were averaged over a 24-h period.

2.2 ASCAT database

ASCAT is a C-band radar onboard the Metop-A, B, and C satellites, providing near-global coverage every 12 h from a sun-synchronous orbit at 817 km altitude. We used the Metop ASCAT Climate Data Record (CDR, H120 v7), which offers consistent SM-related products at a resampled 12.5 km resolution (from native 25–34 km, depending on swath position) (H SAF, 2021a; H SAF, 2021b). Two ASCAT variables were employed:

Backscatter (σ40) – Normalized radar backscatter (σ40, in dB) was adjusted to a 40° incidence angle. Backscatter is highly sensitive to soil’s dielectric properties and thus widely used in SM retrieval (Wagner et al., 1999; El Hajj et al., 2016).

ASCAT soil moisture (SM) – Relative SM (%) derived by rescaling σ40 between dry and wet reference conditions (H SAF, 2021b). To ensure comparability with ERA5 and retrieval results, we converted ASCAT relative SM into volumetric units (m3/m3) using soil porosity data from the Global Land Data Assimilation System [GLDAS; Rodell et al. (2004)], following the approach of Saxton and Rawls (2006). This variable is hereafter referred to as the H120 SM product.

Both σ40 and H120 SM data were collocated with ERA5 variables by regridding to 0.25° resolution and aggregating to daily values. The analysis covers a 4-year period (2016–2019).

2.3 In situ data from the international soil moisture network

In situ SM measurements from the International Soil Moisture Network [ISMN; Dorigo et al. (2013); Dorigo et al. (2021)] were used to assess retrieval performance in 2019. The ISMN provides a globally accessible archive of harmonized SM measurements, established through international collaboration and supported by the European Space Agency (ESA) funding and community contributions. This database serves as an essential reference for validating the retrieval of SM products (Rodríguez-Fernández et al., 2015; Batchu et al., 2023). For the CONUS, we selected data from three main networks, including: (1) the Soil Climate Analysis Network [SCAN; Schaefer et al. (2007)], (2) the SNOwpack TELemetry network [SNOTEL; Leavesley et al. (2010)], and (3) the United States Climate Reference Network [USCRN; Bell et al. (2013)]. We used measurements from the top layer (0–7 cm) to match the depth sensitivity of ASCAT and ERA5. The measurements, originally reported at hourly intervals in volumetric units (m3/m3), were aggregated into daily averages. Stations with more than 100 missing records were excluded.

2.4 Soil texture and land-cover data

To further evaluate how retrieval performance varies across different environmental conditions, we incorporated soil texture and land-cover information into our analysis. Soil texture classes at the surface level (0 cm) were obtained from the United States Department of Agriculture (USDA) soil texture classification system provided by Hengl (2018). Land-cover information was derived from the Annual National Land Cover Database (NLCD) Collection 1 Land Cover 2019 product (U.S. Geological Survey, 2024).

We extracted the local soil texture and dominant land-cover type associated with each ISMN site based on its geographic coordinates. These ancillary datasets allow us to stratify the evaluation of SM retrievals across major soil texture classes and land-cover categories for all selected in situ sites across CONUS during 2019.

2.5 Preliminary information content analysis

To assess the spatial and temporal dependencies among variables (Sections 2.2, 2.1), we performed a sensitivity analysis using ERA5 and ASCAT data aggregated at the daily scale across all pixels in the study domain from 2016 to 2019. Geographic coordinates (latitude and longitude) were also included. Figure 1 summarizes the results as correlation matrices, reporting total, temporal, and spatial correlations. Here, the “spatial correlation” denotes the temporal average of spatial correlation coefficients across the domain, while the “temporal correlation” corresponds to the spatial average of temporal correlations computed at each grid point.

Figure 1
Correlation matrices for total, temporal, and spatial correlations. Each matrix displays variables SM, σ₄₀, ST, LAI, APEI, dT, lat, and lon, with color gradients indicating correlation strength from -1.0 (blue) to 1.0 (red).

Figure 1. Correlations between all the considered variables—ERA5 soil moisture (SM), ASCAT backscatter (σ40), soil temperature (ST), leaf area index (LAI), antecedent precipitation evaporation index (APEI), amplitude of the diurnal cycle of surface temperature (dT), latitude (lat), longitude (lon)—based on pooled data across all pixels in the study domain for the 2016–2019 period. The panels show total correlations (left), temporal correlations (middle), and spatial correlations (right) among variables.

The analysis reveals that ASCAT σ40 backscatter and APEI exhibit moderate positive total correlations with ERA5 SM (0.39 and 0.45, respectively), whereas ST and dT show moderate and strong negative correlations (−0.37 and −0.67). Longitude also correlates positively with SM (0.38), suggesting the presence of large-scale geographic gradients. These results highlight dT, APEI, σ40, and longitude as particularly influential predictors in terms of their overall and spatial association with SM. On the other hand, temporal correlations further emphasize the relevance of ST (−0.44) and LAI (−0.30), underscoring their role in capturing day-to-day SM variability. As these correlations aggregate all locations and seasons, they offer a general overview of variable interactions. Still, examining total, spatial, and temporal correlations provides a comprehensive understanding of variable dynamics and helps inform predictor selection. There is a trade-off between capturing temporal versus spatial dynamics, which should be balanced when designing retrieval models.

Importantly, correlations also reflect confounding influences. For example, σ40 correlates with LAI (0.40), indicating sensitivity to vegetation as well as SM. This mixed signal complicates large-scale retrieval, reinforcing the need to incorporate auxiliary (e.g., ERA5) and ancillary (independent from ERA5) data to improve SM estimates (Aires et al., 2021a).

Finally, the link between auxiliary variables and geographic coordinates also offers insights into spatial patterns—for example, σ40 and longitude (0.36), or ST and latitude (−0.49). Recognizing such spatial and temporal dependencies should help in better designing models and selecting inputs, taking into account the underlying correlations that could either support or limit retrieval. These impacts will be discussed further in the following sections.

3 Retrieval methods and localization strategies

3.1 Neural network models

NN retrieval models have been extensively applied in the field of remote sensing (Goïta et al., 1994; Atkinson and Tatnall, 1997; Maggiori et al., 2017). In this study, we investigated two widely used NN architectures—MLPs and CNNs—for the SM retrieval.

3.1.1 Multilayer perceptrons

An MLP, also known as a feedforward NN, consists of multiple layers of neurons (perceptrons) organized sequentially (Rumelhart et al., 1986). Each neuron performs a weighted summation of its inputs, followed by the application of a nonlinear activation function. In a fully connected architecture, every neuron in a given layer is connected to all neurons in the preceding layer. The output of neuron j is computed as: yj=f(bj+i=1Nwijxi), where xi (i=1,,N) are the inputs to neuron j, wij are the synaptic weights associated with each input, bj is the bias term, and f denotes an activation function (Bishop, 1995). Typically, neurons in the hidden layers (i.e., all layers except input and output layers) use a sigmoid activation (f(x) = 11+ex), while neurons in the output layer employ the linear activation function (f(x) = x).

3.1.2 Convolutional neural networks

Unlike MLPs, which process inputs as flattened vectors (e.g., a single pixel or feature at a time), CNNs are specifically designed for grid-like data such as images. A CNN processes inputs as multidimensional tensors, e.g., XRH×W×C where H and W denote the spatial dimensions (image height and width) and C represents the number of channels (or input depth). In our case, H = 100 and W = 200 pixels covering the CONUS domain, while C ranges from two to seven, including σ40 and other auxiliary variables.

As suggested by the term convolutional, CNNs consist of multiple convolutional layers that transform the input tensor X into successively higher-level feature representations. Each convolutional layer applies a set of learnable kernels (or filters) across the input. As the kernel slides over the tensor, it computes localized weighted sums that collectively form a feature map. Mathematically, this can be expressed as:

x=WX+b,

where W denotes the convolutional kernel, represents the convolution operation, X is the local input, b is the bias term, and x is the output feature. Unlike element-wise multiplications in MLPs, CNN kernels exploit spatial locality, making them effective for capturing spatial patterns. Typically, multiple kernels are used in each layer to extract diverse features.

The behavior of a convolutional layer is influenced by several architectural parameters:

Padding extends the input tensor by adding extra pixels (commonly zeros) around the borders, which allows control over the spatial size of the output. Padding is often used to preserve the input dimensions after convolution.

Stride specifies the step size with which the kernel slides across the input. A stride greater than one reduces the output resolution.

Dilation expands the convolutional kernel (also known as the receptive field) without increasing the number of parameters by involving pixel skipping with a given rate (typically 2 or 3), allowing the model to capture long-range dependencies efficiently (Yu and Koltun, 2016).

After convolution, activation functions are applied element-wise to introduce nonlinearity. Common activation functions include:

Rectified Linear Unit (ReLu), ReLU(z)=max(0,z), is widely used due to its simplicity and effectiveness in mitigating the vanishing gradient problem (Goodfellow et al., 2016).

Tanh, tanh(x)=exexex+ex, maps inputs to (−1,1), centered around zero, but can still suffer from vanishing gradients.

Softmax converts a vector of outputs into a probability distribution, typically applied only in the final layer for multi-class classification.

Beyond the core convolutional layers, a CNN model can include additional specialized layers or processing steps that enhance its ability to learn the input–target relationship:

Dropout layer temporarily deactivates a fraction of neurons during each training iteration, introducing randomness into the learning process and helping to mitigate overfitting. This technique promotes more uniform learning across neurons by preventing reliance on any single neuron. During inference, all neurons are active, but their outputs are scaled to account for the dropout applied during training.

Batch normalization layer normalizes the input of each layer for all training batches. This process helps stabilize and accelerate training by reducing internal covariate shift across all layers.

3.2 Localization approaches

NNs are statistical models typically trained to minimize global errors over large datasets. While effective on average, their performance can degrade in regions with complex surface conditions—such as areas with dense vegetation or significant topographic variability—leading to localized regional biases. These biases are often not due to insufficient data samples but rather a lack of input features that adequately represent the local surface conditions.

To address this challenge, the concept of “localization” has been introduced (Boucher et al., 2023; Pellet et al., 2025). Localization refers to strategies that enable NNs to adapt their behavior to local conditions. This can be achieved in several ways: by augmenting input features (e.g., adding physical or geographical variables) or by modifying the model architecture to account for spatial heterogeneity. In the following, we discuss several localization strategies for the two types of NN architectures considered: MLPs and CNNs.

3.2.1 For multilayer perceptrons

MLPs are commonly applied in remote sensing as pixel-wise regressors, where a single model is trained over all spatial locations. This imposes a compromise solution that does not account for regional variability in land surface characteristics, often leading to regional biases. To mitigate these issues, several localization strategies have been proposed (Boucher et al., 2023):

Physical variable augmentation: Inputs, such as soil ST or vegetation indices (e.g., LAI), are introduced to better represent the surface conditions. This approach enhances the model’s physical realism by better constraining the inverse radiative transfer and allows for more generalized models with global applicability (Aires et al., 2001; Aires et al., 2005; Kolassa et al., 2013).

Geographic coordinate addition: Spatial static information such as latitude and longitude [or other coordinate-derived variables (Mahara and Rishe, 2023)] is introduced as proxy features to localize the model. This strategy helps partition the input space and simplifies the underlying relationships by implicitly capturing location-dependent processes. While less physically interpretable (latitude and longitude do not intervene in a radiative transfer model linking satellite observations and variables to retrieve), this statistical localization can be effective in improving regional accuracy in a global model (Madadikhaljan and Schmitt, 2025) by modulating the NN behavior in several parts of the spatial domain.

The choice between these strategies depends on the intended application. For example, purely statistical approaches [e.g., CDF-matching at the site level (Drusch et al., 2005)] are commonly used in assimilation frameworks, where improving dynamics may outweigh physical interpretability. However, it is important to note that even physically motivated inputs may be exploited by the NN in a purely statistical way—leveraging correlations rather than causal relationships. This is evident when inspecting Jacobians (i.e., sensitivities) of trained models, which may show unrealistic patterns. Addressing this challenge may require the integration of physical constraints during training (Aires et al., 1999; Aires et al., 2004) or the development of hybrid models where physical knowledge provides structural constraints and NNs learn the unknown parameters.

In practice, especially when the information content of the observations is not enough to fully characterize the variable to monitor, some form of localization is necessary to reduce regional biases. One can use only physical variables, but incorporating both physical and geographical inputs has become a common strategy (Singh and Gaurav, 2023; Han et al., 2023).

Pixel-scale modeling: At the most localized level, fully independent MLPs can be trained per pixel, allowing the model to specialize in the local observation-to-variable relationships. This “pixel-specific” modeling simplifies the retrieval task but sacrifices generalization and scalability. There is also a risk of overfitting due to the smaller sample size per model. Such a strategy may be appropriate in cases where auxiliary physical variables are unavailable or unreliable. It is worth noting that approaches such as CDF matching (Aires et al., 2021a) or a linear regression (Wagner et al., 2013) of an SM index to a physical SM range (e.g., from ECMWF) are also examples of pixel-specific models.

3.2.2 For convolutional neural networks

Grid information: CNNs can inherently incorporate spatial information through convolutional layers that apply filters or kernels across the input images. When trained on a fixed domain (e.g., CONUS), the pixel location in the image is directly equivalent to the geographic coordinates information previously cited for MLPs. As such, traditional CNNs already embed a form of localization. However, standard CNNs rely on “weight sharing” (Lecun et al., 1998), where the same set of filters is applied across all spatial locations. This can limit their capacity to capture region-specific behaviors in large, heterogeneous domains. To address this, CNNs typically use multiple stacked convolutional layers to develop hierarchical features that generalize spatial variations. Nonetheless, this approach may require a large number of filters to approximate diverse land surface conditions.

Local connection: the locally connected layers were first proposed by Chen et al. (2015) and later adopted in remote sensing [e.g., in Boucher et al. (2023); Pellet et al. (2025)]. These layers are convolutional layers but do not share weights across space. Instead, each spatial region has its own filter set, allowing the network to learn location-specific patterns. This architectural approach is analogous to the pixel-specific strategy used in MLPs, while benefiting the local neighborhood (within the kernel window) to enhance retrieval for the central pixel.

To comprehensively assess the impact of localization strategies, we explored a set of model architectures with varying input configurations for SM retrieval. The following section introduces these models and the experimental setups used to investigate how different forms of localization influence the spatial and temporal accuracy of the retrievals.

3.3 Proposed model architectures

Various model architectures can be exploited to retrieve SM.

Standard MLP: A simple MLP model consists of a single hidden layer with 10 neurons with sigmoid activation functions, followed by a linear output layer. The input can be limited to only σ40 and/or incorporates auxiliary inputs, including physical variables and geographic coordinates. Different architectures are explored, denoted as MLPa,b, where a represents the number of physical variables and b the number of geographic variables.

Localized MLP (MLP-l): A pixel-specific MLP where, instead of pooling all pixels together, a separate MLP is trained for each individual pixel, allowing highly localized predictions. We chose to test here MLP3,0-l with three physical inputs: σ40, ST, and LAI, without including any geographic input, as a single model for each pixel.

Standard CNN: The proposed CNN consists of multiple convolutional layers, followed by a ReLU activation function. The architecture employs dilated convolutions to expand the receptive field without reducing spatial resolution. The network begins with a standard convolutional layer of 5×5 × 16 kernels to extract low-level spatial features, followed by three dilated convolutional layers (5×5 × D, where D corresponds to 32, 64, and 32 kernels) with rates of 1, 2, and 1, respectively, to capture broader contextual information. An additional standard convolutional layer (5×5 × 16 kernels) is included before the final output layer. Batch normalization is applied after each layer to enhance convergence, and dropout (30%) is introduced to mitigate overfitting. “Same” padding is considered here to preserve input dimensions. The input consists of a 2D image of σ40 and/or additional auxiliary inputs. Similar to the standard MLP, the notation CNNa,b indicates a CNN with a physical variables and b geographic variables (although the latter case is not really meaningful).

Localized CNN (CNN-l): A localized variant that considers a locally connected convolutional layer, allowing the filter to learn unique weights for different spatial locations. The architecture includes a zero-padding to preserve the spatial dimensions, followed by a single locally connected 5×5 layer to capture local-specific patterns, without weight sharing. We considered in this case a CNN3,0-l model with three inputs, including three images of σ40, ST, and LAI.

All models use consistent hyperparameters: Adam optimizer (with an initial learning rate of 0.001), Glorot Uniform initialization with a fixed seed, and early stopping after 10 epochs of no validation loss improvement (maximum of 200 epochs), restoring best weights. The only variation is in the batch size, which depends on the model type: MLPs use a batch size of 64, while CNNs use 16. For the localized models, MLP-l uses a batch size of 16, whereas CNN-l uses 8. Since the dataset covers the entire CONUS domain, computationally intensive optimization strategies (e.g., Bayesian tuning) were infeasible, consistent with observations in previous large-sample remote sensing studies (Rabiei et al., 2025a). As a result, a structured manual tuning strategy is adopted here: key hyperparameters were first selected within standard ranges reported in the literature (e.g., Pellet et al., 2025; Rabiei et al., 2025a) and then iteratively refined based on validation performance. Further adjustments around the final configurations produced only marginal improvements, indicating that the selected architecture is adequate for capturing SM retrieval accuracy.

In all the experiments, the overall dataset was split sequentially rather than spatially: the 3-year 2016–2018 dataset was divided into 80% for training and 20% for validation, while 2019 was reserved for testing and performance evaluation. This approach ensures that the model is evaluated on a truly unseen period, thus providing a realistic assessment of generalization ability. Other temporal combinations (e.g., using 2016 as the test year) are also possible, provided the test set remains independent of the training period. A Monte Carlo-like test using four different test years, followed by averaging the test rates for each year, will be evaluated in upcoming work.

The ascending and descending satellite orbits were trained separately.

3.4 Evaluation metrics

To evaluate the performance of various SM retrieval models, we used several statistical evaluation metrics, including: Pearson’s correlation coefficient (r, unitless), root mean square error (RMSE, m3/m3), and bias (Bias, m3/m3). The standard deviation (Std, m3/m3) of the difference between two datasets is also reported.

In addition, we also evaluated the dampening and inflating effects (Boucher and Aires, 2023) by comparing the range of values in the predicted SM dataset (Rangepred) with that of the reference ERA5 dataset (Rangeref). A dampening effect occurs when Rangepred<Rangeref, indicating a reduced range in the predictions. Conversely, an inflating effect is observed when Rangepred>Rangeref. These effects are quantified as:

E%=100×RangepredRangerefRangeref.

Negative values indicate dampening, while positive values indicate inflation. The range quantifies the difference between the average SM of the upper and lower 10% quantiles at each pixel (Boucher and Aires, 2023).

4 Analysis of daily soil moisture retrievals

4.1 Overall performance of MLPs and CNNs

As described in Section 3.3, we investigated various model configurations based on MLP and CNN architectures, incorporating different combinations of physical and geographical input variables for both MLPs and CNNs. These configurations were designed to assess the models’ sensitivity to varying degrees of localization and input diversity. Table 1 summarizes the evaluation metrics for all MLP and CNN retrieval models, applied to the 2019 testing dataset, covering eight scenarios for each model type. Detailed results, including spatial maps and time series plots, are presented in the following for selected representative models.

Table 1
www.frontiersin.org

Table 1. Performance of MLP and CNN retrieval models over the 2019 validation dataset, for different scenarios with different combinations of input parameters: Physical variables: ASCAT backscatter (σ40), soil temperature (ST), leaf area index (LAI), antecedent precipitation evaporation index (APEI), amplitude of the diurnal cycle of surface temperature (dT); Geographic coordinates: latitude (lat), longitude (long).

The performance of MLP models improves with the addition of more auxiliary features. For example, as shown in Table 1, the addition of ST and LAI data results in a significant increase in total correlation, rising from 0.37 in MLP1,0—which uses only the ASCAT σ40 backscatter as an input—to 0.59 in MLP3,0. Corresponding improvements are also observed in temporal correlation (increasing from 0.24 to 0.61) and spatial correlation (increasing from 0.41 to 0.55). Overall, model performance continues to enhance with the inclusion of additional features such as APEI, dT, and geographic coordinates. This trend is further illustrated in Figure 2, which shows SM retrievals from different models for the ascending orbit on 20 January 2019. Although MLP1,0 captures some SM patterns in the southeastern part, MLP3,0 yields improved retrievals that, while not perfect, are closer to the ERA5 reference data. This enhancement can be attributed to the MLP model’s ability to treat all input features as independent, thereby effectively integrating additional information.

Figure 2
Six maps of the United States display soil moisture data in cubic meters per cubic meter (m³/m³) using color gradients from 0.0 to 0.5. The top row compares MLP and CNN estimates with mostly uniform color. Subsequent rows show more detailed predictions with visible white areas indicating missing data. The final map, labeled ERA5, serves as the reference. Each map varies in color intensity, representing different soil moisture levels.

Figure 2. From top to bottom: the soil moisture (SM) retrievals from different MLP (left) and CNN (right) models, and the target ERA5 SM on the ascending orbit of 20 January 2019.

On the other hand, CNNs exhibit a different behavior. While performance improves from CNN1,0 to CNN1,2 or CNN3,0, the model achieves a high total correlation of 0.91 using only three physical inputs (e.g., σ40, ST, LAI), suggesting it was already near its capacity. Adding more features results in only marginal gains and, in some instances, slight performance degradation. This phenomenon can be attributed to the architectural characteristics of CNNs, which are designed to extract spatial hierarchies and local patterns through convolutional layers. As shown in Figure 2, even a single σ40 input allows CNN1,0 to effectively capture meaningful SM patterns. For example, it identifies very dry conditions (indicated in red) over the Rocky Mountains and the southwestern part, while showing wetter conditions (in blue) across the southeastern region–these patterns closely resemble the ERA5 reference data for 20 January 2019. Therefore, the CNN3,0 model is recommended here.

CNN consistently outperforms the equivalent MLP models, as evidenced by the statistical metrics reported in Table 1. This superiority is further illustrated in Figure 3, which compares the spatial correlation scores of various MLP and CNN models. Across all configurations, CNNs demonstrate higher spatial correlation than classical MLP models. This improvement can be attributed to the CNNs’ ability to exploit spatial patterns, which is particularly advantageous when such patterns are strongly expressed in the target variable, such as SM. However, both model types still have difficulty capturing abrupt events (i.e., sudden SM peaks). For example, during a pronounced wetting event in the eastern United States around late May 2019, all models showed a reduced spatial correlation compared to the surrounding periods (see Figure 3).

Figure 3
Line graph showing spatial correlation versus time from January 2019 to January 2020. Multiple lines represent different models and data sources, with varying spatial correlations. Legends include MLP and CNN models, and ERA5 SM for West and East regions. Y-axis on the left shows spatial correlation, right shows soil moisture in cubic meters per cubic meter.

Figure 3. Comparison of spatial correlations over the 2019 validation dataset for six MLP and CNN models (as detailed in Table 1), relative to the target ERA5 soil moisture (SM). The left y-axis shows the spatial correlation scores for each model. The right y-axis displays the average ERA5 SM over the western United States (all pixels from 125° W to 97.5° W) and eastern United States (all pixels from 97.5° W to 70° W).

It is also noted that when increasing the localization: MLP3,0 MLP3,2 MLP5,0 MLP5,2 the total correlation increases (from 0.59 to 0.74) at the detriment to the temporal correlation (from 0.61 to 0.53). This means that the localization (by adding geographic or auxiliary features) allows to constrain better the overall SM pattern, but the SM dynamics are degraded locally. This is a true issue because even if we need a correct spatial pattern of SM, the SM dynamics are an essential piece of information, for instance, for assimilation. There is clearly a balance that needs to be done between a better spatial pattern, reducing local biases, and a satisfactory local dynamics. We strongly suggest that any new product should be evaluated using these two diagnostics.

4.2 Impacts of the localization information

We examined a simplified case where only geographic coordinates (latitude and longitude) were used as input variables. Although not practical for real-world SM retrieval, this setup highlights the role of spatial information in model performance. The results showed non-negligible total correlations: just latitude and longitude information provides a total correlation of 0.44 for MLP0,2. Incorporating coordinate information significantly enhances spatial correlation, as demonstrated by an increase from 0.37 in the MLP1,0 model (model with only σ40 as an input) to 0.51 in the MLP1,2 model (MLP with σ40, latitude, and longitude as inputs). In contrast, the temporal correlation does not change and is equal to 0.24. This addition of the geographic features only improves the spatial correlation, reducing the local biases, without increasing the temporal correlations. This is natural, as latitude and longitude are static.

On the other hand, substituting physical variables (e.g., in the MLP3,0 model with inputs including σ40, ST, and LAI) instead of using coordinate information (as done in the MLP1,2 model with three inputs: σ40, latitude, and longitude) enhances both temporal and spatial correlations. These physical features carry inherent spatiotemporal information, allowing the model to better capture variations across both dimensions. This shows that physical variables are more SM information-carrying features than just latitude and longitude.

While coordinate information benefits MLPs, it has minimal impact on CNNs. For example, in the CNN1,2 model, which incorporates σ40, latitude, and longitude as inputs, all correlation metrics (total, temporal, and spatial) increase by less than 0.02 compared to a standard CNN model that uses only σ40 as an input feature. This limited improvement occurs because the grid position of the pixels in the input image is exploited directly by the CNN as latitude and longitude information when trained on a fixed domain as CONUS. We will not comment further on the CNN_,2 configurations.

It is not surprising that the two more extreme localization cases—MLP3,0-l and CNN3,0-l—outperform all other experiments, as presented in Table 1 with high total correlations of 0.92 and 0.93, respectively. An example of Figure 2 also shows that the MLP3,0-l and CNN3,0-l retrieved SM patterns more closely align with the ERA5 target for the ascending orbit on 20 January 2019, compared to other classical models. Both architectures demonstrate similar spatial and temporal performance, as illustrated in Figures 3, 4, respectively, and they far surpass classical NN models such as MLP3,0 and CNN3,0. The local bias in these localized models is notably reduced, with near-zero bias errors and very low standard deviations observed in most regions of the study area (see Figure 4).

Figure 4
Four rows of maps display correlation, bias, and standard deviation across the United States for different models, labeled MLP and CNN. Each row represents a model with correlation values ranging from 0.61 to 0.73. The middle maps show bias with values near zero, and the right maps depict the standard deviation, mostly around 0.045 cubic meters per cubic meter. Color gradients highlight regional variations.

Figure 4. Comparison of retrieval statistics, namely, temporal correlation (left), bias (middle), and standard deviation (right), over the 2019 validation dataset; from the top to bottom: for the MLP3,0 and CNN3,0, MLP3,0-l, and CNN3,0-l models, compared to the target ERA5 SM.

Despite these promising results, it is important to acknowledge the limitations of such localization strategies. By design, these models primarily learn localized statistical relationships, which may limit their ability to generalize or represent broader physical processes. Consequently, they may inadvertently propagate inherent biases in the target dataset. Furthermore, the reliability of the models is highly dependent on the quality of training data at each location; areas with sparse observations may experience instability, leading to issues such as overfitting or underfitting.

5 Evaluation of extreme cases

5.1 Minimum and maximum soil moisture retrievals

We further assessed the models’ ability to capture SM extremes, both dry and wet conditions, by comparing their retrievals of minimum and maximum values. Figure 5 shows the corresponding maps derived from several NN-based models alongside the ERA5 reference. The results from the ASCAT product (i.e., H120) are also presented for comparison purposes.

Figure 5
Maps compare soil moisture data across the United States using different models, labeled as H1 2 0, M L P 3 comma 0, C N N 3 comma 0, M L P 3 comma O dash I, C N N 3 comma O dash I, and E R A 5. Each model shows columns for minimum, maximum, and dampening/inflating percentages. Color gradients indicate variations in soil moisture levels, with scales at the bottom ranging from red to blue for moisture content and green to violet for percentage change.

Figure 5. Minimum soil moisture (SM) (left); maximum SM (middle); dampening (in magenta) and inflating (in green) effects (right), from top to the bottom, for the: H120 product, MLP3,0, CNN3,0, MLP3,0-l, CNN3,0-l, and ERA5, during year 2019.

Both the non-localized models, MLP3,0 and CNN3,0, tend to underestimate SM extremes. Notably, they struggle to represent dry conditions in the western region and high SM values in the eastern part, because these are anomalies compared to standard cases.

In contrast, the localized models, MLP3,0-l and CNN3,0-l, demonstrate improved estimates of extreme values. Localization enables the models to adapt more effectively to regional variability, allowing them to detect local anomalies with higher accuracy. The enhanced ability of localized models to capture extreme wet and dry events highlights their potential value for targeted applications such as drought monitoring or flood forecasting.

5.2 Dampening and inflating effects

In general, statistical models (e.g., NNs) tend to dampen the range of values, as regression-based models typically capture a smaller portion of the total variability than the original data (Hastie et al., 2009; Skafte et al., 2019). However, under certain conditions—particularly when training data from different locations are pooled—regression models can also exhibit inflation (Boucher and Aires, 2023). Inflating effects are more commonly associated with non-statistical models.

As illustrated in the third column of Figure 5, the ASCAT H120 product shows a widespread inflating effect (depicted in green), largely due to an excessive number of pixels with a minimum SM of zero. It is important to note that ASCAT SM is not trained on ERA5; it is more “independent” than the NN-based results presented here. In contrast, our NN-based retrievals (MLPs and CNNs) rarely show inflation. These models mainly exhibit dampening (shown in magenta), consistent with their statistical nature. Among the NN-based models, the localized versions (MLP3,0-l and CNN3,0-l) show a reduced dampening effect, as also seen in Table 1, indicating better characterization of SM extremes. By tailoring the model to regional characteristics, localization improves the representation of both minima and maxima, further enhancing retrieval realism.

6 Evaluation of daily soil moisture retrievals against in situ measurements

In this section, we evaluated the performance of our daily NN-based SM retrievals using the in situ measurements from the ISMN dataset during the 2019 validation period at 638 sites in the study area. For this assessment, we selected two representative models—MLP3,0 and CNN3,0—and compared them alongside the ERA5 and ASCAT H120 SM products against measurements. Each ISMN site corresponds to a specific geographic point, while the SM retrievals are provided at a 0.25° resolution. Therefore, for each site, we selected the nearest pixel cell to represent the corresponding model data for comparison. We first present time series examples from randomly selected sites, followed by a comprehensive statistical evaluation across all 638 locations.

6.1 Time series analysis

Figure 6 presents examples of daily SM time series in 2019 from six randomly selected in situ sites, along with the correlation scores between model outputs and ground measurements. Among these six examples, the ASCAT H120 product demonstrates a relatively good correlation with the in situ data; however, it exhibits a pronounced bias in absolute values, as shown by the consistently offset range (orange lines) in Figure 6.

Figure 6
Multiple line graphs depicting soil moisture content (SM in cubic meters per cubic meter) over time from January 2019 to January 2020 at six different sites. Each graph includes four data series: ISMN, H120, MLP3,0, CNN3,0, and ERA5, differentiated by color. Correlation coefficients for each dataset are provided above each graph. The graphs display variations in soil moisture across different locations and models.

Figure 6. Soil moisture (SM) time series in 2019 for six in situ sites, including in situ measurements (ISMN) and ASCAT retrievals: H120 product, MLP3,0, and CNN3,0, and ERA5. The temporal correlation (rtem, denoted as r in each panel for brevity) between in situ and SM retrieval data is presented in each panel.

On the other hand, the other three retrievals—our two NN-based models and ERA5—show more reasonable SM ranges and better capture observed temporal dynamics. ERA5 demonstrates the highest agreement with ISMN data, with correlation coefficients exceeding 0.80 across all six sites, consistent with its assimilation of extensive in situ and satellite observations (Aires et al., 2005). CNN3,0 also performs well, closely following the ERA5 time series. This is consistent with expectations, given that ERA5 SMs served as the training target for the models. Interestingly, at one of the six randomly selected sites (last panel of Figure 6), CNN3,0 surpasses ERA5, with a correlation of 0.83 against the in situ data. In most cases, CNN3,0 outperforms MLP3,0, further highlighting the advantages of convolutional architectures in capturing spatial and temporal patterns of SM.

6.2 Statistical comparison

Table 2 presents the statistical performance of four SM retrievals (H120 product, MLP3,0, CNN3,0, and ERA5) evaluated against in situ measurements from 638 sites during the 2019 validation period. To complement this tabular summary, Figure 7A illustrates the distributions or the probability density functions (PDFs) of three temporal metrics: correlation (rtem), bias (Biastem, m3/m3), and standard deviation (Stdtem, m3/m3).

Table 2
www.frontiersin.org

Table 2. Performance metrics of soil moisture (SM) retrievals compared to the ISMN in situ measurements across 638 sites for the year 2019.

Figure 7
(a) Three density plots displaying data comparisons for variables \( r_{\text{tem}} \), \(\text{Bias}_{\text{tem}}\), and \(\text{Std}_{\text{tem}}\) in various colors: H120 (yellow), MLP\(_{3,0}\) (red), ERA5 (black), and CNN\(_{3,0}\) (blue). (b) Two US maps showing \( r_{\text{tem}} \) distribution with colored dots indicating values from 0.0 to 1.0 and a map showing improvement from MLP\(_{3,0}\) to CNN\(_{3,0}\) with color gradation from -0.4 to 0.4.

Figure 7. (a) Probability density functions (PDFs) of temporal correlation (rtem), bias (Biastem, m3/m3) and standard deviation (Stdtem, m3/m3) for four SM products (ASCAT-derived retrievals: H120, MLP3,0, and CNN3,0, and ERA5), evaluated against 638 in situ sites during 2019. (b) Map of temporal correlation (rtem) between MLP3,0 and in situ measurements, along with the corresponding improvement map showing the correlation difference between CNN3,0 and MLP3,0 (Δrtem = rtem(CNN3,0) - rtem(MLP3,0)) from MLP3,0. Red (gray) indicates an improvement (decrease) in correlation.

Across 638 sites, the ACAST H120 SM product demonstrates a moderate median temporal correlation (rtem = 0.61), which is comparable to the other three evaluated SM products. Although it achieves the lowest median bias (Biastem = 0.024 m3/m3) among all products, its standard deviation is relatively higher (Stdtem = 0.080 m3/m3), suggesting greater variability compared to the alternatives. Additionally, Figure 7A shows that the H120 product exhibits a longer negative tail in its bias distribution, and its standard deviation values extend beyond the typical range observed in the other retrieval methods.

In contrast, ERA5 performs best among all retrievals, with the highest median correlation (rtem = 0.75) and low errors (RMSEtem = 0.106, Biastem = 0.061, Stdtem = 0.060 m3/m3). This superior performance is attributed to its reanalysis nature, which combines model data with observations, thereby enhancing consistency with ground-based measurements (Aires et al., 2005). Our NN-based retrievals, MLP3,0 and CNN3,0, also show competitive performance. In particular, CNN3,0 achieves a temporal correlation of 0.68, closely approaching ERA5, while MLP3,0 records a lower correlation of 0.60. Notably, the bias and standard deviation distributions of CNN3,0 retrieval closely resemble those of ERA5 (as presented in Figure 7a), indicating that it provides more stable and reliable predictions than the other ASCAT-derived products.

Figure 7b provides spatial insights by mapping the temporal correlation (rtem) of MLP3,0 with the in situ measurements over 638 considered sites. Lower correlation values, indicated by yellow markers, are primarily found in the Rocky Mountains. A corresponding improvement map highlights the change in temporal correlation between CNN3,0 and MLP3,0 (Δrtem = rtem(CNN3,0) - rtem(MLP3,0)). The observed increases in correlation (shown in red) over the mountainous areas underscore the ability of CNN architectures to better capture spatial variability, outperforming MLPs in complex environmental conditions, such as those presented by varied topography and heterogeneous mountain landscapes.

6.3 Effects of soil texture and land-cover types

To further investigate environmental controls on model performance, we evaluated SM retrievals within individual soil texture groups and land-cover classes against in situ SM measurements. Figures 8, 9 present the comparison metrics for the major soil texture classes and dominant land-cover types, respectively. Overall, ASCAT H120 exhibits smaller biases but with higher standard deviations than both NN-based retrievals and ERA5 across most soil texture and land-cover categories, likely due to the retrieval’s calibration to local conditions (Aires et al., 2021b). Between two NN models, CNN3,0 consistently outperforms MLP3,0, showing higher correlations and lower RMSE, bias, and standard deviation across all classes. Across soil texture groups (Figure 8), CNN3,0, although trained on ERA5, demonstrates better performance compared to ERA5, particularly for clay loam and silty clay loam in terms of correlation, and for most texture classes in terms of RMSE. Across land-cover types (Figure 9), CNN3,0 also outperforms ERA5 in correlation over mixed forest, and in RMSE over evergreen and deciduous forests, cropland, grassland, pasture, and developed areas.

Figure 8
Box plot grid comparing four models (H120, MLP₃,₀, CNN₃,₀, ERA5) for different soil types: Clay Loam, Silty Clay Loam, Sandy Clay Loam, Loam, Silty Loam, Sandy Loam, and Loam Sandy. Metrics shown include rₑₜₑₘ, RMSEₑₜₑₘ, Biasₑₜₑₘ, and Stdₑₜₑₘ. Each plot visualizes performance variations across models and metrics.

Figure 8. Boxplots of performance metrics for soil moisture (SM) retrievals (i.e., ASCAT-derived retrievals: H120 product, MLP3,0, CNN3,0, and ERA5 reanalysis) compared with ISMN measurements, stratified by soil texture class (clay loam, silty clay loam, sandy clay loam, loam, silty loam, sandy loam, loam sandy). The number of sites within each class is shown in parentheses. Soil texture types represented by only a few sites (e.g., sand, silt, clay) are omitted.

Figure 9
Box plots comparing four models: H120, MLP3,0, CNN3,0, and ERA5 across eight land cover types: Evergreen, Deciduous, Mixed, Crop, Shrub, Grassland, Pasture, and Developed. Metrics include correlation coefficient (r), root mean square error (RMSE), bias, and standard deviation (Std). Each plot shows model performance variability across different vegetation types with sample sizes given in parentheses.

Figure 9. Same as Figure 8, but stratified by major land-cover categories including forests (evergreen, deciduous, mixed), croplands, crops, shrubs, grasslands, pasture, and developed areas. Land-cover types represented by only a few sites (e.g., wetlands, barren) are omitted.

7 Conclusions and perspectives

This study presented a comprehensive evaluation of various NN-based models (MLPs and CNNs) for daily SM retrieval, with a particular focus on the role of localization. Overall, NN-based models demonstrate strong performance in capturing both temporal and spatial dynamics of SM, showing high correlations with the target ERA5 SM as well as in situ measurements.

Among the two types of models evaluated, CNNs more effectively exploit spatial patterns and outperform MLPs. Our investigation of localization strategies suggests that physical variables should be preferred over purely geographical ones:

• Latitude and longitude, while useful for capturing the overall structure, do not fully convey finer details. Specifically, they tend to smooth variations even when the variable displays spatial heterogeneity, as observed in surface variables such as SM.

• Physical variables, particularly those linked to radiative transfer processes, preserve spatial heterogeneity and promote more physically consistent retrievals. Moreover, when CNNs are used, coordinate information can be implicitly encoded in input imagery, especially in localized CNN configurations.

Localized NN-based models (MLP-l and CNN-l) show improved performance in retrieving extreme values, highlighting their potential for specialized applications such as drought monitoring or flood forecasting. However, caution is needed concerning the risks of overfitting—especially for pixel-specific models like MLP-l—due to the limited sample size available for each localized model.

The study highlights several methodological considerations that are relevant to future SM retrieval research. ERA5 remains one of the most reliable large-scale reference datasets for SM, as it integrates physical modeling with the assimilation of diverse observational inputs (i.e., multiple satellite products, various precipitation datasets, and in situ measurements). As a result, NN-based SM retrievals cannot be expected to surpass ERA5 skill globally. It is nevertheless important to acknowledge that ERA5 exhibits local-dependent biases, and NNs trained on ERA5 inevitably inherit these biases. However, our results suggest that localization can mitigate some of these local errors. In certain cases, our retrieval (based on ASCAT backscatter and auxiliary variables) aligns more closely with in situ measurements than ERA5. Moreover, the performance improvements were observed across different soil textures and land-cover types. This improvement can arise from various factors: for example, the models may extract observation signal components not fully exploited in the reanalysis, correct specific local biases present in ERA5, or better capture high-frequency variability at certain sites. This finding aligns with previous work (Rodríguez-Fernández et al., 2019).

Overall, although a single-sensor retrieval cannot comprehensively outperform ERA5, our findings demonstrate that deep learning can extract high-quality SM information from ASCAT alone. This contributes to the advancement of independent satellite-based SM retrievals, which remain essential for climate analysis, model evaluation, long-term consistency assessments, and data assimilation (Aires et al., 2005). Furthermore, in many practical applications, the retrieved SM will ultimately be assimilated into a land surface model whose climatology is already based on ERA-type reanalyses. This further supports the use of ERA5 as an appropriate and consistent training target for the NN retrieval.

For future works, these deep learning approaches can be extended to different regions or even to a global-scale analysis. This can ultimately support the development of a long-term SM record by leveraging observations from multiple satellite sensors. However, integrating data across different sensors poses notable challenges, including inter-sensor calibration and varying retrieval uncertainties. Addressing these complexities is essential for generating a coherent multi-sensor SM product. Incorporating in situ measurements into training (e.g., via regional fine-tuning) can be a promising approach, though technically challenging due to scale mismatches between point and satellite measurements. Furthermore, deep learning techniques have the potential not only to exploit spatial patterns, as demonstrated in this study using CNNs, but also to utilize temporal information. Therefore, we plan to explore the possibility of testing transformers and embedding representation methods to perform retrieval across a full sequence of images, rather than just a single image at a specific time. Another direction is the potential for forecasting, rather than just retrieval. Specifically, we might focus on short- and medium-term SM forecasting using ConvLSTM models (Wang et al., 2024; Rabiei et al., 2025a). Given that SM layers in the ground exhibit strong dependencies tied to soil texture and geology, our retrieval methods could also aim to estimate moisture levels in deeper layers beyond the surface layer we considered here.

The methods proposed in this study are general and hold potential for a wide range of applications beyond SM retrieval. For example, localized CNN approaches could be used to retrieve surface water extent at high spatial resolution. Moreover, these methods can also be applied in forward mode, using surface parameters such as land surface temperature, vegetation cover, and surface roughness to estimate land surface emissivities or even satellite-observed brightness temperatures. Such a forward model could be integrated into data assimilation frameworks to support numerical weather prediction systems. Finally, although this study focused on methodological development, at 0.25° and daily resolution, producing higher-resolution near-real-time SM maps would enhance real-world practical applications such as precision agriculture (for optimal watering), hydrological modeling, fire security during dry periods, and identifying areas prone to flooding (Huang et al., 2025). This improvement can be achieved by using higher-resolution sensors (e.g., Sentinel) or by directly downscaling the obtained SM fields within a transformer-based retrieval scheme.

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found here: The ASCAT data supporting this study can be obtained from the Metop ASCAT SSM CDR (H SAF, 2021b), https://hsaf.meteoam.it/ (last access: 22 January 2025). Porosity data (Rodell et al., 2004) are from https://ldas.gsfc.nasa.gov/gldas/soils (last access: 31 May 2025). The ERA5 reanalysis dataset can be downloaded from https://cds.climate.copernicus.eu (Hersbach et al., 2023) (last access: 15 March 2025). Finally, the International Soil Moisture Network data are available at https://ismn.earth/en/data/ (Dorigo et al., 2021) (last access: 3 March 2025).

Author contributions

LD: Conceptualization, Data curation, Formal Analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review and editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. The CERISE project (grant agreement no. 101082139) is funded by the European Union.

Acknowledgements

The author thanks Filipe Aires (LIRA) and Victor Pellet (LMD) for valuable discussions and for their comments and suggestions on the original draft of the manuscript. She also thanks Patricia de Rosnay and Peter Weston (ECMWF) for interesting discussions.

Conflict of interest

Author LD was employed by Estellus.

Generative AI statement

The author(s) declared that generative AI was used in the creation of this manuscript. Generative AI tools, including Grammarly and ChatGPT, were used solely for language editing and refinement. All ideas, interpretations, and original text were created by the author. AI assistance was limited to enhancing clarity, grammar, and style, without contributing to the conceptual or intellectual content of the work.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Author disclaimer

Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union or the Commission. Neither the European Union nor the granting authority can be held responsible for them.

References

Aires, F., Schmitt, M., Chedin, A., and Scott, N. (1999). The “weight smoothing” regularization of mlp for Jacobian stabilization. IEEE Trans. Neural Netw. 10, 1502–1510. doi:10.1109/72.809096

PubMed Abstract | CrossRef Full Text | Google Scholar

Aires, F., Prigent, C., Rossow, W. B., and Rothstein, M. (2001). A new neural network approach including first guess for retrieval of atmospheric water vapor, cloud liquid water path, surface temperature, and emissivities over land from satellite microwave observations. J. Geophys. Res. Atmos. 106, 14887–14907. doi:10.1029/2001JD900085

CrossRef Full Text | Google Scholar

Aires, F., Prigent, C., and Rossow, W. B. (2004). Neural network uncertainty assessment using bayesian statistics: a remote sensing application. Neural Comput. 16, 2415–2458. doi:10.1162/0899766041941925

PubMed Abstract | CrossRef Full Text | Google Scholar

Aires, F., Prigent, C., and Rossow, W. B. (2005). Sensitivity of satellite microwave and infrared observations to soil moisture at a global scale: 2. Global statistical relationships. J. Geophys. Res. Atmos. 110. doi:10.1029/2004JD005094

CrossRef Full Text | Google Scholar

Aires, F., Boucher, E., and Pellet, V. (2021a). Convolutional neural networks for satellite remote sensing at coarse resolution. Application for the sst retrieval using iasi. Remote Sens. Environ. 263, 112553. doi:10.1016/j.rse.2021.112553

CrossRef Full Text | Google Scholar

Aires, F., Weston, P., de Rosnay, P., and Fairbairn, D. (2021b). Statistical approaches to assimilate ascat soil moisture information—i. methodologies and first assessment. Q. J. R. Meteorological Soc. 147, 1823–1852. doi:10.1002/qj.3997

CrossRef Full Text | Google Scholar

Atkinson, P. M., and Tatnall, A. R. L. (1997). Introduction neural networks in remote sensing. Int. J. Remote Sens. 18, 699–709. doi:10.1080/014311697218700

CrossRef Full Text | Google Scholar

Batchu, V., Nearing, G., and Gulshan, V. (2023). A deep learning data fusion model using sentinel-1/2, soilgrids, smap, and gldas for soil moisture retrieval. J. Hydrometeorol. 24, 1789–1823. doi:10.1175/jhm-d-22-0118.1

CrossRef Full Text | Google Scholar

Bell, J. E., Palecki, M. A., Baker, C. B., Collins, W. G., Lawrimore, J. H., Leeper, R. D., et al. (2013). U.s. climate reference network soil moisture and temperature observations. J. Hydrometeorol. 14, 977–988. doi:10.1175/jhm-d-12-0146.1

CrossRef Full Text | Google Scholar

Bernhardt, J., Carleton, A. M., and LaMagna, C. (2018). A comparison of daily temperature-averaging methods: spatial variability and recent change for the conus. J. Clim. 31, 979–996. doi:10.1175/JCLI-D-17-0089.1

CrossRef Full Text | Google Scholar

Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford University Press. doi:10.1093/oso/9780198538493.001.0001

CrossRef Full Text | Google Scholar

Boucher, E., and Aires, F. (2023). Improving remote sensing of extreme events with machine learning: land surface temperature retrievals from iasi observations. Environ. Res. Lett. 18, 024025. doi:10.1088/1748-9326/acb3e3

CrossRef Full Text | Google Scholar

Boucher, E., Aires, F., and Pellet, V. (2023). Towards a new generation of artificial-intelligence-based infrared atmospheric sounding interferometer retrievals of surface temperature: part i – methodology. Q. J. R. Meteorological Soc. 149, 1180–1196. doi:10.1002/qj.4447

CrossRef Full Text | Google Scholar

Campbell, G. S. (1985). Soil physics with basic: transport models for soil-plant systems. Amsterdam: Elsevier.

Google Scholar

Chen, Y., López-Moreno, I., Sainath, T. N., Visontai, M., Álvarez, R., and Parada, C. (2015). Locally-connected and convolutional neural networks for small footprint speaker recognition. Dresden, Germany: Interspeech.

Google Scholar

Dorigo, W., Xaver, A., Vreugdenhil, M., Gruber, A., Dostálová, A., Sanchis-Dufau, A. D., et al. (2013). Global automated quality control of in situ soil moisture data from the international soil moisture network. Vadose Zone J. 12, 1–21. doi:10.2136/vzj2012.0097

CrossRef Full Text | Google Scholar

Dorigo, W., Himmelbauer, I., Aberer, D., Schremmer, L., Petrakovic, I., Zappa, L., et al. (2021). The international soil moisture network: serving earth system science for over a decade. Hydrology Earth System Sciences 25, 5749–5804. doi:10.5194/hess-25-5749-2021

CrossRef Full Text | Google Scholar

Drusch, M., Wood, E. F., and Gao, H. (2005). Observation operators for the direct assimilation of trmm microwave imager retrieved soil moisture. Geophys. Res. Lett. 32. doi:10.1029/2005GL023623

CrossRef Full Text | Google Scholar

El Hajj, M., Baghdadi, N., Zribi, M., Belaud, G., Cheviron, B., Courault, D., et al. (2016). Soil moisture retrieval over irrigated grassland using x-band sar data. Remote Sens. Environ. 176, 202–218. doi:10.1016/j.rse.2016.01.027

CrossRef Full Text | Google Scholar

Goïta, K., Gonzalez-Rubio, R., Bénié, G. B., Royer, A., and Michaud, F. (1994). Literature review of artificial neural networks and knowledge-based systems for image analysis and interpretation of data in remote sensing. Can. J. Electr. Comput. Eng. 19, 53–61. doi:10.1109/CJECE.1994.6592069

CrossRef Full Text | Google Scholar

Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep learning. MIT Press. Available online at: http://www.deeplearningbook.org.

Google Scholar

Greifeneder, F., Notarnicola, C., and Wagner, W. (2021). A machine learning-based approach for surface soil moisture estimations with google earth engine. Remote Sens. 13, 2099. doi:10.3390/rs13112099

CrossRef Full Text | Google Scholar

Guo, Z., Dirmeyer, P. A., Hu, Z.-Z., Gao, X., and Zhao, M. (2006). Evaluation of the second global soil wetness project soil moisture simulations: 2. Sensitivity to external meteorological forcing. J. Geophys. Res. Atmos. 111. doi:10.1029/2006JD007845

CrossRef Full Text | Google Scholar

H SAF (2021a). Algorithm theoretical baseline document (ATBD) metop ASCAT surface soil moisture climate data record v7 12.5 km sampling (H119) and extension (H120)

Google Scholar

H SAF (2021b). ASCAT surface soil moisture climate data record v7 12.5 km sampling - metop. doi:10.15770/EUM_SAF_H_0009

CrossRef Full Text | Google Scholar

Han, Q., Zeng, Y., Zhang, L., Wang, C., Prikaziuk, E., Niu, Z., et al. (2023). Global long term daily 1 km surface soil moisture dataset with physics informed machine learning. Sci. Data 10, 101. doi:10.1038/s41597-023-02011-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Hastie, T., Tibshirani, R., and Friedman, J. (2009). Neural networks. New York, NY: Springer, 389–416. doi:10.1007/978-0-387-84858-7_11

CrossRef Full Text | Google Scholar

Hengl, T. (2018). Soil texture classes (usda system) for 6 soil depths (0, 10, 30, 60, 100 and 200 cm) at 250 m (v0.2). doi:10.5281/zenodo.2525817

CrossRef Full Text | Google Scholar

Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz Sabater, J., et al. (2023). Era5 hourly data on single levels from 1940 to present. doi:10.24381/cds.adbb2d47

CrossRef Full Text | Google Scholar

Huang, J., Sehgal, V., Alvarez, L. V., Brocca, L., Cai, S., Cheng, R., et al. (2025). Remotely sensed high-resolution soil moisture and evapotranspiration: bridging the gap between science and society. Water Resour. Res. 61, e2024WR037929. doi:10.1029/2024WR037929

CrossRef Full Text | Google Scholar

Kolassa, J., Aires, F., Polcher, J., Prigent, C., Jimenez, C., and Pereira, J. M. (2013). Soil moisture retrieval from multi-instrument observations: information content analysis and retrieval methodology. J. Geophys. Res. Atmos. 118, 4847–4859. doi:10.1029/2012JD018150

CrossRef Full Text | Google Scholar

Kolassa, J., Gentine, P., Prigent, C., and Aires, F. (2016). Soil moisture retrieval from amsr-e and ascat microwave observation synergy. part 1: satellite data analysis. Remote Sens. Environ. 173, 1–14. doi:10.1016/j.rse.2015.11.011

CrossRef Full Text | Google Scholar

Leavesley, G. H., David, O., Garen, D. C., Nrcs-Usda, N., Goodbody, A. G., Lea, J. K., et al. (2010). A modeling framework for improved agricultural water-supply forecasting

Google Scholar

Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324. doi:10.1109/5.726791

CrossRef Full Text | Google Scholar

Liu, Y., Dorigo, W., Parinussa, R., de Jeu, R., Wagner, W., McCabe, M., et al. (2012). Trend-preserving blending of passive and active microwave soil moisture retrievals. Remote Sens. Environ. 123, 280–297. doi:10.1016/j.rse.2012.03.014

CrossRef Full Text | Google Scholar

Madadikhaljan, M., and Schmitt, M. (2025). Geolocation-aware deep coding. PFG – J. Photogrammetry, Remote Sens. Geoinformation Sci. 93, 3–18. doi:10.1007/s41064-024-00328-5

CrossRef Full Text | Google Scholar

Maggiori, E., Tarabalka, Y., Charpiat, G., and Alliez, P. (2017). Convolutional neural networks for large-scale remote-sensing image classification. IEEE Trans. Geoscience Remote Sens. 55, 645–657. doi:10.1109/TGRS.2016.2612821

CrossRef Full Text | Google Scholar

Mahara, A., and Rishe, N. (2023). Integrating location information as geohash codes in convolutional neural network-based satellite image classification. IPSI Trans. Internet Res. 19, 24–30. doi:10.58245/ipsi.tir.2302.04

CrossRef Full Text | Google Scholar

Ochsner, T. E., Cosh, M. H., Cuenca, R. H., Dorigo, W. A., Draper, C. S., Hagimoto, Y., et al. (2013). State of the art in large-scale soil moisture monitoring. Soil Sci. Soc. Am. J. 77, 1888–1919. doi:10.2136/sssaj2013.03.0093

CrossRef Full Text | Google Scholar

Pellet, V., Aires, F., Boucher, E., and Volden, E. (2025). Enhancing soil moisture statistical retrieval from smos using partial convolutions and localization strategies. J. Appl. Meteorology Climatol. 64, 1951–1965. doi:10.1175/JAMC-D-25-0041.1

CrossRef Full Text | Google Scholar

Petchiappan, A., Steele-Dunne, S. C., Vreugdenhil, M., Hahn, S., Wagner, W., and Oliveira, R. (2022). The influence of vegetation water dynamics on the ascat backscatter–incidence angle relationship in the amazon. Hydrology Earth Syst. Sci. 26, 2997–3019. doi:10.5194/hess-26-2997-2022

CrossRef Full Text | Google Scholar

Prigent, C., Aires, F., Rossow, W. B., and Robock, A. (2005). Sensitivity of satellite microwave and infrared observations to soil moisture at a global scale: relationship of satellite observations to in situ soil moisture measurements. J. Geophys. Res. Atmos. 110. doi:10.1029/2004JD005087

CrossRef Full Text | Google Scholar

Rabiei, S., Babaeian, E., and Grunwald, S. (2025a). Deep learning-based short- and mid-term surface and subsurface soil moisture projections from remote sensing and digital soil maps. Remote Sens. 17, 3219. doi:10.3390/rs17183219

CrossRef Full Text | Google Scholar

Rabiei, S., Babaeian, E., and Grunwald, S. (2025b). Surface and subsurface soil moisture estimation using fusion of smap, nldas-2, and solus100 data with deep learning. Remote Sens. 17, 659. doi:10.3390/rs17040659

CrossRef Full Text | Google Scholar

Rodell, M., Houser, P. R., Jambor, U., Gottschalck, J., Mitchell, K., Meng, C.-J., et al. (2004). The global land data assimilation system. Bull. Am. Meteorological Soc. 85, 381–394. doi:10.1175/bams-85-3-381

CrossRef Full Text | Google Scholar

Rodríguez-Fernández, N. J., Aires, F., Richaume, P., Kerr, Y. H., Prigent, C., Kolassa, J., et al. (2015). Soil moisture retrieval using neural networks: application to smos. IEEE Trans. Geoscience Remote Sens. 53, 5991–6007. doi:10.1109/TGRS.2015.2430845

CrossRef Full Text | Google Scholar

Rodríguez-Fernández, N., de Rosnay, P., Albergel, C., Richaume, P., Aires, F., Prigent, C., et al. (2019). Smos neural network soil moisture data assimilation in a land surface model and atmospheric impact. Remote Sens. 11, 1334. doi:10.3390/rs11111334

CrossRef Full Text | Google Scholar

Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by back-propagating errors. Nature 323, 533–536. doi:10.1038/323533a0

CrossRef Full Text | Google Scholar

Saxton, K. E., and Rawls, W. J. (2006). Soil water characteristic estimates by texture and organic matter for hydrologic solutions. Soil Sci. Soc. Am. J. 70, 1569–1578. doi:10.2136/sssaj2005.0117

CrossRef Full Text | Google Scholar

Schaefer, G. L., Cosh, M. H., and Jackson, T. J. (2007). The usda natural resources conservation service soil climate analysis network (scan). J. Atmos. Ocean. Technol. 24, 2073–2077. doi:10.1175/2007jtecha930.1

CrossRef Full Text | Google Scholar

Seneviratne, S. I., Corti, T., Davin, E. L., Hirschi, M., Jaeger, E. B., Lehner, I., et al. (2010). Investigating soil moisture–climate interactions in a changing climate: a review. Earth-Science Rev. 99, 125–161. doi:10.1016/j.earscirev.2010.02.004

CrossRef Full Text | Google Scholar

Shaw, B. L., Pielke, R. A., and Ziegler, C. L. (1997). A three-dimensional numerical simulation of a Great Plains dryline. Mon. Weather Rev. 125, 1489–1506. doi:10.1175/1520-0493(1997)125<1489:ATDNSO>2.0.CO;2

CrossRef Full Text | Google Scholar

Singh, A., and Gaurav, K. (2023). Deep learning and data fusion to estimate surface soil moisture from multi-sensor satellite images. Sci. Rep. 13, 2251. doi:10.1038/s41598-023-28939-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Skafte, N., Jø rgensen, M., and Hauberg, S. R. (2019). “Reliable training and estimation of variance networks,” in Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems. Vancouver, BC, Canada, December 8–14, 2019. Editors H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett (NeurIPS). 6323–6333.

Google Scholar

Soci, C., Hersbach, H., Simmons, A., Poli, P., Bell, B., Berrisford, P., et al. (2024). The era5 global reanalysis from 1940 to 2022. Q. J. R. Meteorological Soc. 150, 4014–4048. doi:10.1002/qj.4803

CrossRef Full Text | Google Scholar

U.S. Geological Survey (USGS) (2024). Annual nlcd collection 1 science products. doi:10.5066/P94UXNTS

CrossRef Full Text | Google Scholar

Vereecken, H., Huisman, J. A., Hendricks Franssen, H. J., Brüggemann, N., Bogena, H. R., Kollet, S., et al. (2015). Soil hydrology: recent methodological advances, challenges, and perspectives. Water Resour. Res. 51, 2616–2633. doi:10.1002/2014WR016852

CrossRef Full Text | Google Scholar

Wagner, W., Noll, J., Borgeaud, M., and Rott, H. (1999). Monitoring soil moisture over the canadian prairies with the ers scatterometer. IEEE Trans. Geoscience Remote Sens. 37, 206–216. doi:10.1109/36.739155

CrossRef Full Text | Google Scholar

Wagner, W., Hahn, S., Kidd, R., Melzer, T., Bartalis, Z., Hasenauer, S., et al. (2013). The ascat soil moisture product: a review of its specifications, validation results, and emerging applications. Meteorol. Z. 22, 5–33. doi:10.1127/0941-2948/2013/0399

CrossRef Full Text | Google Scholar

Wang, Y., Shi, L., Hu, Y., Hu, X., Song, W., and Wang, L. (2024). A comprehensive study of deep learning for soil moisture prediction. Hydrology Earth Syst. Sci. 28, 917–943. doi:10.5194/hess-28-917-2024

CrossRef Full Text | Google Scholar

Yu, F., and Koltun, V. (2016). “Multi-scale context aggregation by dilated convolutions,” in Conference: international conference on learning representations (ICLR).

Google Scholar

Zhang, X., Sun, X., and Lin, Z. (2025). Improving soil moisture prediction using gaussian process regression. Smart Agric. Technol. 11, 100905. doi:10.1016/j.atech.2025.100905

CrossRef Full Text | Google Scholar

Keywords: ASCAT, deep learning, localization, neural network, soil moisture retrieval

Citation: Dinh LA (2026) ASCAT soil moisture retrieval using deep learning: a focus on localization strategy. Front. Remote Sens. 6:1718353. doi: 10.3389/frsen.2025.1718353

Received: 03 October 2025; Accepted: 15 December 2025;
Published: 23 January 2026.

Edited by:

Amen Al-Yaari, Université Paris-Sorbonne, France

Reviewed by:

Xiangzhuo Liu, INRA Centre Provence-Alpes-Côte d’Azur, France
Saman Rabiei, University of Florida, United States

Copyright © 2026 Dinh. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Lan Anh Dinh, bGFuLWFuaC5kaW5oQG9ic3BtLmZy

Present addresses: Lan Anh Dinh, LIRA, Observatoire de Paris, Université PSL, Sorbonne Université, CNRS, Paris, France

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.