CLOINet: Ocean state reconstructions through remote-sensing, in-situ sparse observations and Deep Learning

Combining remote-sensing data with in-situ observations to achieve a comprehensive 3D reconstruction of the ocean state presents significant challenges for traditional interpolation techniques. To address this, we developed the CLuster Optimal Interpolation Neural Network (CLOINet), which combines the robust mathematical framework of the Optimal Interpolation (OI) scheme with a self-supervised clustering approach. CLOINet efficiently segments remote sensing images into clusters to reveal non-local correlations, thereby enhancing fine-scale oceanic reconstructions. We trained our network using outputs from an Ocean General Circulation Model (OGCM), which also facilitated various testing scenarios. Our Observing System Simulation Experiments aimed to reconstruct deep salinity fields using Sea Surface Temperature (SST) or Sea Surface Height (SSH), alongside sparse in-situ salinity observations. The results showcased a significant reduction in reconstruction error up to $40\%$ and the ability to resolve scales $50\%$ smaller compared to baseline OI techniques. Remarkably, even though CLOINet was trained exclusively on simulated data, it accurately reconstructed an unseen SST field using only glider temperature observations and satellite chlorophyll concentration data. This demonstrates how deep learning networks like CLOINet can potentially lead the integration of modeling and observational efforts in developing an ocean digital twin.


Introduction
Nowadays, there is an increased consciousness of the role played by the ocean in many crucial aspects of human safety, health, and well-being due to the cumulative impacts of climate change, unsustainable exploitation of marine resources, pollution, and uncoordinated development (UNESCO, 2019,Pascual et al. [2021]).In response to these challenges, which UNESCO has encapsulated in 10 objectives for the Ocean Decade (2021)(2022)(2023)(2024)(2025)(2026)(2027)(2028)(2029)(2030), the European Union is endeavoring to develop a digital twin of the ocean.The concept of digital twins involves creating a digital representation of real-world entities or processes, based on both real-time and historical observations, to depict the past and present and to model potential future scenarios.
In the ocean case and especially to address climate change-related concerns, one major challenge is understanding the state and evolution of the ocean's interior.Its stratification significantly influences large-scale integrated variables like ocean heat content, acidification, and oxygenation [Wang et al., 2018, Durack et al., 2014].Moreover, numerous studies have highlighted the importance of resolving submesoscale dynamics to account for the majority of vertical ocean transport, which is vital for carbon export, fisheries, nutrient availability, and pollution displacement [Pascual et al., 2017].These challenges underscore the need for high-resolution, three-dimensional representations of the ocean state.High-resolution numerical models and data assimilation techniques, which align model outputs with actual observations, are currently the most common solutions [Carrassi et al., 2018, Mourre et al., 2004].
Operational simulations now assimilate near-real-time observations, including in-situ (ship-based observations, underwater gliders, and floats) and remote sensing data.CIT Satellite observations provide frequent global snapshots of the sea surface, for instance Sea Surface Temperature and Chlorophyll concentration images offer resolutions as fine as 1 km on a daily basis.In contrast, the current capabilities of remote altimeters are limited to a 200 km wavelength for the global ocean at mid-latitudes and about 130 km for the Mediterranean Sea [Ballarotta et al., 2019], though significant advancements are upcoming with the Surface Water and Ocean Topography (SWOT) mission successfully launched in December 2022 [Morrow et al., 2019].Notably, Sea Surface Height (SSH) data are unaffected by cloud cover.However, the uncertainties regarding the ocean interior remain significant due to the sparse distribution of in-situ observations in time and space [Siegelman et al., 2019].As a result, while data-assimilating models adhere to physical balances, they still lack accuracy [Arcucci et al., 2021].
The ocean twin strategy proposes data-driven approaches as a complementary method for revealing the ocean state.In previous oceanographic studies, multivariate methods allowed to elaborate three-dimensional hydrographic fields relying on their vast in-situ measurements collected during ocean campaigns [Cutolo et al., 2022, Gomis et al., 2001].However, these methods are not easily scalable to a global observing system due to the sheer number of parameters involved, such as correlation lengths.Machine learning techniques offer a solution to these scalability issues, as the models are directly learned from the data.A key challenge for these techniques is the need for a substantial quantity of realistic training data.General circulation and process study models play a new role here, providing a cost-effective way to generate large datasets that adhere to ocean physics.Even datasets that only approximately match the true ocean state can be valuable, provided they encompass a wide range of scenarios.This last point is especially crucial in preventing the risk of deep networks memorizing the input climatology rather than capturing the actual ocean dynamics.Such a focus ensures that the networks can understand and adapt to scenarios that significantly deviate from the average, rather than being confined to repetitive patterns.To effectively generalize beyond their training data, neural networks require careful design to preserve relevant input features across their layers.In this context, explainable AI aims to advance beyond the black-box applications typical in ocean remote sensing studies, promoting a deeper understanding of the model workings ([Zhu et al., 2017].Despite these difficulties, recent studies have demonstrated the potential of deep-learning methods for various dynamical system tasks.These range from idealized situations [Fablet et al., 2021] to realistic case studies, such as interpolating missing data in satellite-derived observations of sea surface dynamics [Barth et al., 2020, Manucharyan et al., 2021, Fablet et al., 2020].With regard to reconstructing hydrographic profiles from satellite data, there's a spectrum of approaches: from proof-of-concept studies using self-organizing maps (SOMs) and neural networks (Charantonis et al. [2015], Gueye et al. [2014]) and feed-forward or long short-term memory (LSTM) neural networks [Contractor and Roughan, 2021, Sammartino et al., 2020, Jiang et al., 2021, Fablet et al., 2021] as well as [Pauthenet et al., 2022] relying instead on multilayer perceptron.Even considering these past works the interpolation of temperature and salinity profiles given some in-situ and sea surface information is an open challenge.
In this study, we introduce an innovative modular neural network designed to seamlessly integrate remote-sensing images with in-situ observations for a complete 3D reconstruction of the ocean state.This integration is underpinned by the Optimal Interpolation (OI) scheme's mathematical principles [Gandin, 1966].Unlike traditional applications of OI, which typically use Euclidean distance to estimate the correlation between points, our approach involves computing distances within a specially designed latent space.A specific module within our neural network transform all our input information into this latent space made of 'clusters'.Within these clusters, non-local correlations become more easily identifiable and can be effectively applied to enhance the correlation matrix.Like attention mechanisms in advanced neural models [Vaswani et al., 2017], which focus on key aspects in large datasets for tasks such as language processing or image recognition, our neural network module similarly identifies crucial correlational patterns through the latent space of clusters.
We privileged a network structure composed of independent nested modules to facilitate the understanding and analysis of its internal information flow from the input data to the covariance structure.To the best of our knowledge, this is the first work in which neural networks achieve the most optimal combination of remote-sensing and in-situ observations without previous knowledge of the study area's climatology.This study is structured as follows: section 2 presents the main synthetic dataset that we used for the training and testing and some real observations for some preliminar use case scenario.All the details regarding the network architecture are in section 3 while the results are shown in section 4.

Data
Neural networks need large amounts of data to be trained appropriately.A common choice in oceanography where such a significant quantity of actual observations are unavailable is relying on numerical models.In our case, we chased NATL60, a simulation based on the Nucleus for European Modelling of the Ocean described.We used the fields of this model to simulate both remote-sensing and in-situ observations in a so-called Observing System Simulation Experiment (OSSE).The model output is sampled in these experiments to replicate the different types of partial observations available.The advantage is that we can quickly check the obtained improvements since the model output also provides the ground truth we aim to reconstruct.The danger of what is usually called called "supervised learning" only aiming to minimize the discrepancy with the provided ground truth is that the network weights memorize the "right answers" so in our context the model climatology.We faced this problem, including two self-supervised terms in our loss function as we describe later but also accurately selecting a highly varying training and test dataset as presented here in subsection 2.1.
Finally, we proved the generalization capabilities of our network, testing it with actual multi-platform observations.In particular, we used the remote-sensing products of Sea Surface Temperature (SST) and Chlorophyll-a concentration (CHL) from CMEMS, together with temperature observations from gliders, as described in subsection 2.2.

eNATL60 based OSSE
Our primary experiments utilized the eNATL60 configuration of the Nucleus for the European Modelling of the Ocean (NEMO) model [Gurvan et al., 2022], featuring a 1/60 • horizontal resolution and 300 vertical levels across the North Atlantic.This high-resolution configuration is essential for understanding ocean dynamics, particularly for surface oceanic motions down to 15 km, which aligns with SWOT observations (Ajayi et al. [2020]).We direct readers to this work for a detailed understanding of NATL60's capabilities.Additionally, numerous studies have employed the non-extended version of NATL60 for resolving fine-scale dynamical processes ( [Metref et al., 2019, 2020, Fresnay et al., 2018, Amores et al., 2018]).
For our training and testing data, we utilized daily averages of Sea Surface Temperature (SST) and Sea Surface Height (SSH), both individually and combined, from the eNATL60 simulation spanning an entire year.Alongside these, we gathered in-situ salinity observations at three specific depths: 5m, 75m, and 150m.Our focus was then to reconstruct the 2D salinity fields at these depths.In particular our analysis predominantly focused on the 5m and 150m depths, selected to assess the robustness of our model both within and beyond the mixed-layer depth.To ensure that our network's training and testing in-situ observations mirrored real oceanographic conditions, we adopted two distinct sampling strategies: random and regular.This approach allowed us to evaluate the network's performance in various realistic observational scenarios.The random strategy selects N domain points based on a uniform distribution, while the regular strategy uses a homogeneous grid sampling with a fixed spacing of δx.By varying N and δx, we conducted different experiments to observe metric variations.
Our focus was on two marine areas: the subpolar northwest Atlantic for training, and the Western Mediterranean Sea for testing.Both regions are notable for SWOT passages during its rapid-sampling phase (see Figure 1).The Mediterranean region, in particular, is known for its dynamic oceanographic characteristics and has been extensively studied through in-situ and remote-sensing methods ( Ruiz et al. [2009]).Using different regions for training and testing helps prevent overfitting in the neural network.Overfitting occurs when a model learns the specifics and noise in the training data to an extent that interfere with the model's performance on new data.Since the climatology of the northwest Atlantic differs significantly from that of the Western Mediterranean Sea, we ensure that our network is not just memorizing patterns from the training data but is effectively learning to generalize across different oceanographic contexts.Additionally, we diversified the dataset by sampling the same day with varying N or δx values.
For simplicity, our approach assumes a synoptic scenario, where all observations occur simultaneously.Future work will address the non-synoptic nature of actual sampling and explore how the network accommodates this.Furthermore, in this study, we did not incorporate simulated noise or measurement errors into our data, opting to explore these aspects in subsequent research.Despite this, the practical effectiveness of our network is demonstrated through tests using actual observational data, details of which are provided in the following subsection.

Remote-sensing observations
In our study, we have used Sea Surface Temperature (SST) and Ocean Color (CHL) imagery from the 18th of February, 2022, distributed by CMEMS.The CHL has 1 km spatial resolution, and it is a level-3 product obtained by multi-Sensor processing from OceanColor [Volpe et al., 2019].The SST also has a 1 km spatial resolution and it is basedd on level-2 product based on multi-channel sea surface temperature (SST) retrievals, which it has generated in real-time from the Infrared Atmospheric Sounding Interferometer (IASI) on the European Meteorological Operational-A (MetOp-A) satellite.

Glider Observations
Gliders are autonomous underwater vehicles that allow sustained collection at high spatial resolution (1 km) and low costs compared to conventional oceanographic methods.Many studies confirmed the feasibility of using coastal and deep gliders to monitor the spatial and low-frequency variability of the coastal ocean [Zarokanellos et al., 2022, Heslop et al., 2012, Ruiz et al., 2019, Alvarez et al., 2007]).In this work we used the observations from two gliders in the Balearic Sea as a part of the Calypso 2022 experiment.The two gliders carried out a suite of sensors that measure temperature, conductivity and pressure (CTD), dissolved oxygen (oxygen optode), Chlorophyll fluorescence and Turbidity (FNLTU).The two gliders were programmed to profile from the surface up to 700 m with a vertical speed of 0.18 ± 0.02 m/s and moved horizontally at approximately 20˘24 km per/day.Data were processed following the methodology described in Troupin et al. [2015].In this study, we have used the temperature data at 15 m from the 10th of February until the 18th of February.

Methods
When sparse observations are available, the most common technique that has been adopted in oceanography and in different fields of science using a gridded product is Optimal Interpolation (OI) [Gandin, 1966].The technique relies on a solid mathematics basis and has been the state-of-the-art approach for many geophysical products until now.Since the proposed neural approach and specifically our prior builds over the OI framework we reviewed it in subsection 3.1.Then, we introduce CLOINet our neural approach and its submodules in subsection 3.2.Lastly, we present the metrics we used for bench-marking purposes.

Baseline: OINet
A common approach to explain the OI math start considering y as the vector containing all the observations we have of the true state x, which is unknown.We can relate them with the following observation model: where H is the observation (or masking) operator, and ϵ is the observation error.Under Gaussian hypotheses for ϵ and the prior on x, we can obtain the best possible estimation of true state x s given the observations y through a linear operator K (the Kalman gain) (Welch, G., Bishop, G. 1995): where R is the observation error covariance matrix and B is the error covariance specific of the analysis.In Equation 3, we are assuming an a-priori knowledge of both R and B, which could be theoretically obtained by repeating the same experiments many times.Practically, a parameterized covariance matrix is often used to substitute the complete climatology covariances (Wu, 1995;Gaspari Cohn, 1999).The most common parametrization for this matrix is a Gaussian-shaped correlation, depending only on the points' distances and pre-determined correlation lengths.So for two generic position vectors r i and r j , we have: where the sum for dimension n considers the squared difference of the components of the position vectors r i,n and r j,n divided by the squared nth correlation length c n .Regarding the observation error matrix, we asssume from now on that it is diagonal: A different case where the observation errors are correlated is possible.However, R is often assumed diagonal to reduce computational costs [Miyoshi and Kondo, 2013].Finally, inserting B and R in Equation 3 and then in Equation 2we can compute our estimated field x s .
In our experiments, we established a baseline method with OINet, a simple neural network, that automatically discover the OI correlation lengths among different variables (SST, SSH, and salinity) and dimensions.OINet is then provided with the same input data as CLOINet, including surface fields (SST and/or SSH) and in-situ salinity observations.It operates in a two-step process: the first step involves transforming the multivariate surface fields into a unified field, making it compatible for being used with the salinity observations.The second step is to estimate the three correlation lengths specific to the current set of observations.While the first step involves 2 convolutional layers the second one is a simple feed-forward neural networks able to process a generic number of O observations (see the bottom part of Figure 2).
Beyond the parameter estimation this module is simply realizing an OI using the formula in Equation 4 to calculate the covariances.Notably this approach not only automates the tuning of parameters but also leverages GPU power for more efficient interpolation computations

CLuster enhanced Optimal Interpolation Network
Ocean dynamics often display non-local and anisotropic patterns, which traditional Optimal Interpolation (OI) methods struggle to account for effectively.The main challenge with OI lies in its correlation function assumptions, which may not accurately reflect the actual physical conditions of the ocean.For instance, as seen in Equation 4OI typically presumes that points in close proximity are strongly correlated, while distant points are not.However, oceanographic phenomena can exhibit the opposite behavior.For example, ocean fronts, characterized by narrow zones with strong horizontal density gradients, act as boundaries between water masses with distinct physical and bio-optical properties.
Conversely, in dynamic ocean features like meanders and eddies, water masses can remain similar over vast distances.
Here, we aim to benefit from the wealth of information from remote sensing regarding the shape of the ocean features, whether they belong to the mesoscale or the submesoscale.The key idea is grouping a set of objects in such a way that each object is more similar to the objects belonging to its same group (called a cluster) than the rest.This procedure in statistics is called clustering.Applying this concept to reconstructing the ocean state, our approach is to reveal non-local correlations by clustering grid points that are part of the same oceanic features.This led us to develop CLOINet (Cluster-enhanced Optimal Interpolation Net), an end-to-end system designed to optimally interpolate sparse in-situ observations using available remote-sensing images.CLOINet is able to process any kind of surface fields (2D images) and in-situ observations (2D masks and observation values).Its main submodule is CLuNet, which transforms 2D fields into fuzzy clusters.While satellite images could directly been passed to this module in-situ observation profiles are initially processed by OINet, which serves as a prior, converting them into images.Finally a further submodule, RefiNet, module merges the fuzzy clusters from both surface fields and observation priors into a final cluster set.Within this latent cluster space an alternative distance could replace the euclidean distance allowing a better estimation of B and consequently obtains the reconstructed field x s .
Our network structure allows a joint training of all modules, minimizing their specific loss function terms summed up in a global loss function.Convolutional Neural Networks (CNN) layers.Here following, we describe the details of the network submodules and how we obtained the interpolation in an end-to-end scheme also summarized in Figure 2).

Clusters space transformation: CluNet
The first module of our scheme, called CluNet is in charge of transform any images into a set of clusters.Piratically speaking it segments 2D images (like the remote-sensing one or the observations priors) into C clusters of similar points.In this context, we consider two points similar according to their positions (as in Equation 4) but also their values in the input 2D fields.In particular, we worked within the so-called "fuzzy logic", where the membership function m jk , which expresses how much the j point belongs to the k cluster, could assume every value between 0 and 1. Considering this continuous range means that each grid point could be part of more than one cluster as long as the following normalization holds: For its non-binary logic, this clustering technique is called "Fuzzy Clustering" or soft k-means.being this last the simpler binary case in which m jk could be just 1 or 0. CluNet takes the remote-sensing images as input and gives the tensor composed by all the m jk through various CNN and finally a softmax layer to guarantee Equation 6.The associated training loss, referred to as a Robust fuzzy C-means [Chen et al., 2021] loss, is composed of two terms: y is the vector containing the surface field that we want to cluster with y j its value at point j in our domain Ω. q is a parameter that satisfies q ≥ 1 and controls the amount of fuzzy overlap between clusters.Minimizing the first term achieves that points with high membership function for the k cluster should be similar to its center v k defined as follows: The second term guarantees the membership function's spatial smoothness, forcing the j point to have a similar value to its neighborhood N j .The parameter β controls the intensity of this constraint.
In summary, to obtain the clustering, we minimize Equation 7with respect to the parameters of the CNN layers included in CluNet, which stand in the θ vector.Since in this loss term, we do not directly provide any ground truth (i.e., the best way of clustering the inputs), this part of the network could be considered self-supervised since it learns indirectly from the rest of the loss term.As it show in Figure 2 we used this module twice, firstly for clustering surface input fields and secondly for clustering the 2D fields coming from the observations priors described hereafter.Consequently in the global loss there are two terms like Equation 7.

Observations Priors
We have outlined the process by which CluNet segments any set of 2D fields into distinct clusters.To handle in-situ observations, which are essentially vectors of observations at different depths, we utilize OINet to convert them into a series of images that can then be clustered.As previously mentioned, OINet has the capability to autonomously determine the appropriate correlation lengths for a given set of observations and then perform a canonical Optimal Interpolation (OI).In our approach, we generate four different versions of these interpolations, each initiated with correlation lengths that are submultiples of the domain sizes.These parameters, among others, are then fine-tuned during the learning phase.This process results in four fields that CluNet subsequently clusters into areas exhibiting similar values, despite being derived using different correlation lengths.The clusters formed from these observations provide insights into the certainty we have about specific regions and the extent to which a particular depth is influenced by surface conditions.Essentially, this method allows us to address potential anisotropy in the uncertainties without having to rely on fixed length scales.

Data fusion in the clusters space: RefiNet
We now have a set of clusters derived from the surface fields, and an additional set for each depth of the in-situ observations.For each depth, the corresponding sets of surface and observation clusters are processed through RefiNet.The resulting clusters, along with their membership vectors, are used to compute the covariance matrix as follows: In this equation, we sum the differences in the membership functions of points ii and jj across all clusters.This process, while bearing similarities to Equation 4 deviates by using subtraction instead of an exponential function since m ik and m jk are already bounded within the 0-1 range.This summation represents a non-local distance in the cluster space, replacing the classical Euclidean distance.Consequently, two points within the same cluster (i.e., with similar membership vectors) will be correlated, regardless of their spatial distance.
Using parametrization (9), we then compute the associated optimal interpolation as Equation 3 and then Equation 2. This forms an end-to-end architecture that uses remote sensing images and in-situ data to output regularly-gridded vertical profiles (see Figure 2 for the data flow).
The training loss combines three components: two clustering-based losses Equation 7(one for the surface fields and one for the observations priors) and a supervised reconstruction term.So globally we minimize: where α, β and γ are the weights of the three loss terms, and L M SE is given by: This last term is just the mean squared error with respect to the ground truth x.Within the considered supervised training strategy, the rfcm self-supervised losses Equation 7 act as regularization terms to improve generalization performance and explainability.We maintain equal weights of α, β, and γ at 1, as no significant differences were observed with other values.Our network also shows relative insensitivity to other hyperparameters, such as the number of clusters.However, our cross-validation tests indicated that setting this number to 20 yielded the best results

Performance Metrics
To understand how the clusters sets were changing according to the input data we computed the associated entropy fields.In fact, given that the membership vector is normalized and thus it can be seen as a distribution, its entropy definition is: To assess the performance of the proposed approach, we first define the error between the ground truth and the estimated field value: then we easily obtain our first performance metric: the Root Mean Squared Error (RMSE): We will present this metric in percentage of the standard devation of the ground truth fields.Now considering the standard deviation of the error over the whole N snapshot: we can compute the explained variance score dividing by the standard deviation of the ground truth To highlight the effective resolution of the different reconstruction methods we use the noise-to-signal ratio NSR [Ballarotta et al., 2019]: the effective resolution is in fact given by the wavelength λ s where the NSR(λ s ) is 0.5.

Results and Discussion
This section first reports numerical experiments using NATL60 OSSEs to evaluate the proposed approach quantitatively.The concluding subsection presents an application to real observations.

Clusters Entropy
The initial part of our analysis focuses on understanding how CLOINet, via CluNet and subsequently RefiNet, organizes clusters based on different data inputs: SST, SSH, and various sets of randomly located in-situ salinity observations.To illustrate this, we plotted some example entropy fields in Figure 3 along with statistics on how entropy changes with an increasing number of observations N .In the four panels on the left side of Figure 3 we display two clusters' entropy fields (panels a and e) and their corresponding input fields for SSH (panel b) and SST (panel f) from a selected snapshot.In the four panels on the right side, we present the entropy associated with the in-situ observations' clusters at two different depths z = 5 (panel c) and z = 150m (panel g) together with the correspondent refined clusters entropy (panel d and h) along with the refined clusters' entropy (panels d and h).
This particular snapshot was chosen for its submesoscale features.The differences between SSH and SST-based entropy are noticeable; the SSH clusters highlight more prominent features, while SST forms smaller clusters that extend to deeper depths.The correspondence between the surface fields and their cluster entropy is relatively straightforward, but differences in other sets are more subtle.For observation clusters' entropy, we observe lower entropy (blue regions) near points with similar observations.Areas of higher entropy occur between two observation points with differing values.This behavior varies at different depths, explaining the differences between panels c and d.The refined clusters, influenced by both observations and surface fields, exhibit subtler changes, but we can still see an increase in entropy with depth, particularly noticeable in the northeast region of panels d and h.
Beyond this specific snapshot, panel i shows the percentage change in entropy between the two depths, averaged across the entire test dataset as a function of the number of observations.When only SST data is available, the changes in clusters are more pronounced, as SST information is less directly related to the ocean's interior compared to SSH or combined SST and SSH data.As expected, all deltas increase with the number of observations, eventually reaching a saturation point where they decrease.This occurs because the clusters' information becomes less critical, and the field can be reconstructed relying primarily on in-situ observations.
Figure 3: The entropy of the cluster sets, resulting from the input SSH and SST, is depicted in panels a and e, respectively, while the corresponding fields themselves are shown in panels b and f.Panels c and d (and g and h) display the entropy of the observation and refined cluster sets at a depth of Z = 5m and Z = 150m, respectively.The dots in these panels represent salinity observations at these depths, with their colors indicating the magnitude of salinity (scale not shown).
Panel i illustrates the variation in the entropy of the refined cluster sets along the vertical axis, corresponding to different numbers of observations.The varying colors in this panel represent different networks.

RMSE and Correlation
WWe present the outcomes of the random sampling OSSE in Figure 4.The first two rows illustrate a ground truth salinity example at two different depths, alongside the reconstructions by the baseline OINet and CLOINet with various surface input fields.Again, we chose the same snapshot from Figure 3 for its distinct submesoscale features.
OINet can effectively use surface information for reconstructing the surface layer, but it struggles to propagate this information to deeper layers.We also experimented forcinga bigger correlation length in the z axis but we ended up with a reversed scenario: a well-reconstructed bottom layer but a poorly reconstructed (not shown).This limitation arises because the simple network cannot determine which surface fields to prioritize based on the in-situ observations.
In the case of CLOINet, we observe different results based on the input fields provided.SST leads to better surface interpolations, while SSH is more effective for deeper fields.This outcome aligns with our expectations, as SSH data is depth-integrated and thus more informative than SST for understanding the shape of water masses at depth.Notably, when both SST and SSH are used as inputs, the network effectively leverages their shape information to enhance both surface and interior reconstructions, leading to a reduction in RMSE by about 40% at both depths.
The results across the entire testing set show similar patterns.On panel l (m) we show how for all methods, the RMSE (correlation) decreases (increases) in proportion to the number of observations.In these plots, solid lines represent surface salinity fields, while dashed lines indicate interior fields at z = 150m.Interestingly, on average, OINet's performance is comparable to CLOINet's for surface reconstructions but falls short for interior reconstructions.This fact is mostly related with presence or not of submesoscale features as the next subsection analysis will show.The variation in CLOINet's surface inputs shows minimal impact on surface results, with only slight improvements observed in the SST+SSH case.However, the introduction of the SSH field significantly enhances the interior field reconstructions.Once again, this confirms that SSH provides more comprehensive information about the entire water column compared to SST.

Resolved scales
In Figure 5 , we present the results of the OSSE conducted with regular grid sampling, varying the spacing between observations to understand how different methods resolve various spatial scales.Specifically, we examined the impact of sampling resolution on the explained variance and the Power Spectral Density (PSD)-based score.The explained variance for the different reconstruction methods at a sampling resolution of 20 km is shown in panels a, c, e, and g (and panels b, d, f, and h for the interior field).When provided with the same inputs as OINet (SST and SSH), CLOINet slightly surpasses it on the surface and by about 20% in the interior.Again, we observe superior performance from CLOINet-SSH in the interior, while the inferior performance of the network relying solely on SST suggests that, on average, this field does not significantly account for salinity variability.
The PSD-based score, shown in panel l, indicates the effective resolution of the reconstruction (the point at which the score falls below 0.5) demostrating how CLOINet generally resolves smaller scales than OINet across various sampling resolutions.For higher resolutions, such as 5 and 12 km, CLOINet resolves scales approximately 1.5 times larger than OINet.The training set's averaged spectra, depicted in panel i, reveal that OINet is typically limited to reconstructing larger scales.Indirectly, this suggests that the variability explained in the test region is predominantly due to larger scales, which even OINet can adequately account for.

Real Ocean Data Preliminary Tests
In line with many deep learning studies, our research focuses on applying neural networks, initially trained on synthetic data, to real-world observations.We evaluated CLOINet's effectiveness in improving Sea Surface Temperature (SST) estimates using glider surface temperature observations, enhanced with shape information from a Chlorophyll (CHL) snapshot (refer to Figure 6).Both OINet and CLOINet were able to reconstruct the general SST pattern observed in reality.CLOINet demonstrated a slightly superior performance, as evidenced by higher correlation values.This improvement aligns with our qualitative observations, suggesting that CLOINet more accurately preserves submesoscale features.Notably, this achievement was realized without the networks being specifically trained on CHL data.In these preliminary tests, the CHL data was provided as if it were the SST and SSH fields, demonstrating the networks' versatility in utilizing shape information from various types of variables.Achieving similar levels of accuracy with traditional Optimal Interpolation (OI) methods would be more complex, likely necessitating intricate, predefined multi-variate correlation functions and extensive parameter tuning.

Conclusion
In this, we presented CLOINet, a comprehensive end-to-end neural network designed to interpolate sparse in-situ observations into a full 3D field leveraging shape information from various ocean remote sensing imagery types.We conducted end-to-end training of CLOINet within a supervised framework, using Observing System Simulation Experiments (OSSEs) based on the NEMO-derived NATL60 simulation.Our study focuses on comparing the reconstruction capabilities of CLOINet with those of a data-driven version of classical Optimal Interpolation, which we have named OINet.This comparison also extends to applications involving real observational data.
Our research covered various scenarios, including both randomly and regularly spaced in-situ salinity observations, paired with different remote sensing inputs such as Sea Surface Temperature (SST), Sea Surface Height (SSH), or a combination of both.Upon creating a 3D salinity field, we thoroughly analyzed how our performance metrics responded to variations in the number and density of in-situ observations.
In dense regular sampling we showed how CLOINet was able to resolve scales 1.5 smaller scales compared to OINet while in random sampling contexts, CLOINet showed enhanced performance in terms of both RMSE and correlation, especially notable when limited observations were available.This improvement was significant in scenarios involving in-depth fields and areas rich in submesoscale features, where RMSE improvements reached as high as 40%.
Despite not incorporating simulated errors to mimic actual sampling instruments, the promising results with real data highlight the potential of our approach in operational contexts.Both OINet and CLOINet adeptly handled noisy CHL fields and successfully reconstructed the general pattern of an unseen SST field, without specific training for this task.Our training approach, which combined two self-supervised losses with a supervised reconstruction loss, enabled the network to generalize effectively.This was evident as it performed accurately in the Western Mediterranean test area, distinctly different from the North Atlantic training region.This suggests that our method is not limited by specific regional climatology and could potentially be scaled for global application.
Overall, the modular design of CLOINet not only enhances our understanding of its internal processes but also positions it for future enhancements.One promising direction for subsequent research is extending the model to incorporate space-time dynamics.Another intriguing possibility is employing this neural network approach for guiding an adaptive sampling multi-platform ocean campaign.Given the significant role of SSH data in assessing the reconstruction of the deeper water layers, the upcoming high-resolution SSH observations from SWOT present an exciting opportunity for further refining and applying CLOINet.

Figure 1 :
Figure 1: Training area (A) and testing area (B) presented with the SWOT passages in the fast-sampling phase.

Figure 2 :
Figure 2: Flow chart of CLOINet information processing: Red and green elements (boxes and arrows) represent the processing paths for the SST input surface field and the in-situ salinity observations, respectively.Purple elements indicate the combined use of both inputs.The bold text along the arrows specifies the network module in operation.The line style and width of each box vary to represent different processing stages, ranging from thin and dashed for inputs to solid and thick for outputs.Inside each box, capital letters denote the corresponding tensor dimensions: W for field width, H for field height, S for the number of input surface fields, P for the number of OINet priors, C for the number of clusters (consistent across all modules in our tests), D for the number of depths, and O for the number of in-situ observations.Colorbars are omitted for clarity.The lower part of the image illustrates the CNN architecture of the three modules, along with the number of parameters used.

Figure 4 :
Figure 4: Example of salinity field interpolation at Z = 5m (first row) and Z = 150m (second row) with ten random observations.The first column shows the ground truth, and the subsequent columns represent various interpolation methods.The two bottom plots display the RMSE (left) and correlation coefficient (right) as functions of the number of observations in a random sampling scenario, averaged across the entire test dataset.In these plots, the solid line corresponds to Z = 5m, while the dashed line represents Z = 150m.

Figure 5 :
Figure 5: This figure shows the surface explained variance for various interpolation methods averaged across the entire test dataset, in a scenario with regular sampling at 45 km intervals.Panels (a), (b), (c), and (d) display the results for OI, CLOINet-SST, CLOINet-SSH, and CLOINet-SST+SSH, respectively.Black dots mark the locations of in-situ observations, and the spatial average value for each method is indicated on the corresponding subplot.Panel (e) compares the Power Spectral Density (PSD) of the different reconstruction methods with the ground truth, with each color representing a different method.Panel (f) shows the corresponding score, where the colors denote different sampling resolutions.In both panels (e) and (f), solid lines represent surface fields, while dashed lines correspond to fields at a depth of z = 150m.The red line across panel (f) marks the 0.5 threshold value, indicating the effective resolution of the interpolation methods.

Figure 6 :
Figure 6: The top panel displays the Chlorophyll-a (CHL-a) concentration in the Western Mediterranean Sea on February 18, 2022.The dots represent temperature observations from two gliders over the preceding 48 hours, while the red box outlines the area where interpolation was performed.The bottom panels, labeled a, b, and c, show the actual Sea Surface Temperature (SST) this same day and the reconstructed SST using CLOINet and OINet, respectively.The correlation values for each reconstruction method are indicated in the corresponding plots.