A global gridded ocean salinity dataset with 0.5° horizontal resolution since 1960 for the upper 2000 m

A gridded salinity dataset with high resolution is essential for investigating global ocean salinity variability and understanding its role in climate and the ocean ecosystem. In this study, a new version of the Institute of Atmospheric Physics gridded salinity dataset with a higher resolution (0.5° by 0.5°) is provided by using a revised ensemble optimal interpolation scheme with a dynamic ensemble. The performance of this dataset is evaluated using “subsample test” and the high-resolution satellite-based data. Compared with the previous 1° by 1° resolution IAP product, the new dataset is more capable of representing regional salinity changes with the meso-scale and small-scale signals (i.e., in the coastal and boundary currents regions), meanwhile, maintains the large-scale structure and variability. Therefore, the new dataset complements the previous data product. Besides, the new dataset is compared with in situ observations and several international salinity products for the salinity multiscale variabilities and patterns. The comparison shows the smaller magnitude of mean difference and Root-mean-square deviation (RMSD) in basin scale for the new dataset, some differences in strength and fine structure of the “fresh gets fresher, salty gets saltier” surface and subsurface salinity pattern amplification trends from 1980 to 2017, a broad similarity for the salinity changes associated with El Niño-Southern Oscillation (ENSO) and a consistent salinity dipole mode in the tropical Indian Ocean (S-IOD). These results support the future use of gridded salinity data.


Introduction
Ocean salinity, as one of the fundamental properties of ocean water, has important implications for the global ocean climate system and ecosystem. For example, the long-term changes in the surface and subsurface salinity are of worldwide concern, as they are ideal indicators of an intensified global water cycle (Durack and Wijffels, 2010;Rhein et al., 2013;Cheng et al., 2020). Furthermore, salinity contributes to the intensity of upper ocean stratification, and thus modulates the vertical oceanic exchanges and large-scale ocean circulation processes, such as the thermohaline circulation in the North Atlantic (Collins et al., 2013;Rahmstorf et al., 2015). The importance of salinity is also highlighted by its role in changes in marine productivity, through the modulation of the spatiotemporal structure of water stratification (Greene et al., 2012), or influencing the adaptation of marine creatures (Ross and Behringer, 2019). Therefore, it is important to investigate salinity variability to understand its role in climate change.
Estimating ocean salinity change is challenging due to the sparse distribution of salinity data and instrumental bias, mainly before the Argo era (Durack, 2015). Since 2005, the Argo observation network has provided more than 300,000 salinity profiles per year, and has achieved near-global coverage since 2007 . Some mainstream data centers publish their salinity products and update them continuously, aiming to provide more accurate estimates of the ocean state, including objective analyses with long time coverage (i.e., the EN dataset from the UK Met Office Hadley Centre) (Good et al., 2013) and Argo-based products, such as the Scripps Institution of Oceanography (SCRIPPS) dataset , the Japanese Agency for Marine-Earth Science and Technology (JAMSTEC) dataset (Hosoda et al., 2008), and the Global Ocean Argo gridded dataset (BOA-Argo) (Lu et al., 2018). Furthermore, advanced data assimilation technologies allow the combination of multiple sources of observations and models, providing a complete three-dimensional salinity field in a physically consistent manner, for instance, the Ocean Reanalysis System (ORA) (Balmaseda et al., 2013) from the European Center for Medium-Range Weather Forecasts (ECMWF) and the Simple Ocean Data Assimilation (SODA) (Carton et al., 2018) from the University of Maryland. These products have been widely used to investigate the salinity variability and model simulations in previous studies (Xue et al., 2017;Carton et al., 2019). For example, the imprints of interdecadal and multidecadal variabilities of sea surface salinity (SSS) as an indicator of change in the hydrological cycle in recent decades has been explored using several salinity datasets, such as Durack and Wijffels (2010) (named DW10), Skliris et al. (2014); Li et al. (2016) using EN4, Vinogradova and Ponte (2017) using the ECCO estimate, and Cheng et al. (2020) using IAP. El Ninão-Southern Oscillation (ENSO) is the dominant mode of variability at interannual scale in the global ocean and the salinity change in the tropical Pacific has been reported to closely correlate the characteristics of the ENSO cycle (Roemmich and Gilson, 2011;Zhu et al., 2014;Zhi et al., 2019). The salinity changes can also affect the upper ocean temperature and the evolution of ENSO by modulating the upper stratification in the Pacific Ocean (Zheng and Zhang, 2015). Previous studies indicate that the ENSO-related salinity variability and its role during the Argo era are well illustrated by Argo-based products, such as the SCRIPPS and JAMSTEC datasets (Roemmich and Gilson, 2011;Zheng and Zhang, 2015). The JAMSETC analysis has also been used to investigate the formation and propagation of subsurface salinity anomalies and water masses in the North Pacific Ocean (Qiu and Chen, 2012;Yan et al., 2013;Katsura et al., 2019). The advantages of these datasets have been demonstrated in specific applications, but few studies have compared them.
Of considerable interest is the comparison of multiscale salinity changes among different salinity products, as they lead to uncertainties in the salinity variability. For instance, the longterm amplification of global salinity contrast, either at the sea surface or the subsurface layer, has been estimated by several studies, but with large discrepancies in the rates of trends (Hosoda et al., 2009;Durack et al., 2012;Skliris et al., 2014;Jan et al., 2018). In the study of Cheng et al. (2020), the estimates of the basin means of the upper-2000 m salinity changes in different salinity datasets demonstrated the substantial differences, particularly in the Southern Ocean and before the Argo era. These disparities can be attributed to various factors, such as quality control procedures, bias correction schemes, choice of climatology, the mapping methods used to fill the data gaps, and so on (Abraham et al., 2013;Boyer et al., 2016;Cheng et al., 2020). Since the beginning of the Argo period, the quality of salinity data has substantially improved due to advanced observations and technologies. Multiple salinity products have consistently revealed a notable vertical contrast, with an increase in salinity in the upper 200 m layer and freshening within the 200-600 m layer in the past decade . However, substantial discrepancies between various Argo-based salinity products are still found in the subsurface North Atlantic and Southern oceans (Liu et al., 2020). Therefore, comparison of regional and global salinity variability between all the available salinity products is still an important issue in salinity research.
Recently, a new version of the salinity gridded dataset from the Institute of Atmospheric Physics (IAP) with a horizontal resolution of 1°×1°(IAP-1) has been released and evaluated (Cheng et al., 2020). A large number of salinity observations over the 1960-2020 period are also found in Northwest Pacific, Northwest Atlantic, and some coastal regions ( Figure S1). On this basis, the goal of this study is to explore the possibility to increase the spatial resolution (given the fact that there are many places with dense observations), and improve the performance along the coastline and boundary regions. The new version of the salinity gridded dataset provided in this study has an improved horizontal resolution of 0.5°×0.5°(named IAP-05) ( Table 1). The mapping scheme for the IAP-05 dataset and the salinity data used for comparison will be described in section 2.
In section 3, we first evaluate the quality of IAP-05 dataset and its uncertainty using a subsample test. The new dataset is also validated by in situ observations and independent observations, including hydrographic observations and satellite products. A comparison of the global and regional salinity variability is then conducted to examine the consistency of the new dataset with other available products. Finally, a discussion is given in section 4.

Mapping method
The mapping method is based on the ensemble optimal interpolation with a dynamic ensemble (ENOI-DE) scheme proposed by Cheng and Zhu (2016). This approach has been applied in the reconstruction of the historical subsurface salinity and temperature fields with 1°×1°horizontal resolution (Cheng et al., 2017;Cheng et al., 2020).
The source of ocean salinity observations is the World Ocean Database (WOD) of historical hydrographic profiles (Garcia et al., 2018). The erroneous profiles have been removed using the quality control procedure provided by WOD18, which has been widely used in the ocean subsurface dataset. The WOD quality flags include the flags for entire cast as a function of variable and flags on individual observations (Garcia et al., 2018). All the salinity profiles have their monthly climatology averaged from 1990 to 2010 subtracted to obtain the salinity anomaly field. Considering the design of the proposed mapping algorithm, the anomaly profiles were then averaged into their horizontal 0.5°×0.5°gridded means for each month and 41 vertical standard depth levels (same as Cheng et al. (2020)). Given the slowness of ocean temporal variation, the salinity observations in different levels were combined to infill the data gaps within a certain time window, which uses the temporal bins (before and after) increasing from 3 months at the surface to 18 months at 2000 m.
In the temperature/salinity reconstruction of the IAP gridded dataset with 1°horizontal resolution (named IAP-1), the Coupled Model Inter-comparison Project Phase 5 (CMIP5) simulations (Taylor et al., 2012) were used to provide a dynamic ensemble (N > 30) of the model background and its spatial covariance (Cheng and Zhu, 2016). One of its advantages is that it provides an improved estimate on the error covariance, that is the major source of information in propagating the information from observed regions to regions with data gaps. However, the previous approach used in IAP-1 relies on the 1°×1°horizontal resolution models in CMIP5 thus not applicable in higher resolution reconstruction.
Here we used a new ensemble of the high-resolution model outputs (N = 26) from high-resolution model simulations and ocean reanalyses, including the High Resolution Model Intercomparison Project (HighResMIP) simulations for CMIP6 (N = 20) (Haarsma et al., 2016), LASG/IAP Climate system Ocean Model version 2.0 (LICOM 2.0) from China (N = 1) (Yu et al., 2012) and Geophysical Fluid Dynamics Laboratory Coupled Climate Model version 4 (GFDL-CM4) (Held et al., 2019) from the United States (N = 1), the fifth generation of climate model of the Institute for Numerical Mathematics (INM-CM5-0) from Russian (N = 1) (Volodin et al., 2017) and three members of Euro-Mediterranean Center for Climate Change (CMCC) Historical Ocean Reanalysis system (CHOR) (Yang et al., 2017) from Italy (N = 3), which use different atmospheric forcings and/or different assimilation methods.
The HighResMIP is one of the CMIP6 endorsed MIPs and it incorporates the model resolution that ranges from typical CMIP6 resolutions of 100 km to higher resolutions of 8~25 km in the ocean (Roberts et al., 2020). LICOM is the eddy-revolving ocean model with nearly global coverage in longitude-latitude grids (i.e., without the Arctic Ocean). GFDL-CM4 is the latest generation of the high-resolution GFDL model for CMIP6 and outperforms the previous GFDL model components (i.e., GFDL-CM3 for CMIP5; CM2.6: eddy-revolving model). Both models can capture the large-scale features of the ocean state, as well as the small-scale and mesoscale activity. Furthermore, they show a better representation along the coastline and boundary regions. INM-CM5-0 is an evolutionary upgraded generation for CMIP6 with a modified dynamical core and two times higher resolution than  Tables 2, 3. This new covariance statistic with a dynamic ensemble of multiple sources provides a better representation of the error covariance and can minimize the potential impact of model bias in each individual model through the optimal interpolation process. A localization strategy was also used in this mapping method, which assumes that only data within a specified length scale (defined by the influence radius) can propagate the signals during the analysis of a grid point. The fraction coverage of the mapping method was used to indicate the fraction of total ocean area that can be constrained by observations. The size of this area was defined by the influencing radius or spatial correlation length scales. This application helps to reduce the impact of the imperfect error covariance (i.e., the spurious remote correlations). Here we adopted a larger influence radius (R = 26°) from the sea surface down to 2000 m to obtain a near-global fractional coverage (more than 90%) for analysis (not shown here). Using this larger influence radius helps to utilize the remote spatial covariances and it helps to achieve a near-global fractional coverage for analysis and minimize a reconstruction bias in data-sparse regions due to the drift towards the prior guess (Durack et al., 2014;Cheng et al., 2017;Stammer et al., 2019;Cheng et al., 2020). However, the use of a large influence radius will lead to a smoothed spatial pattern by filtering out the smaller-scale signals. Following previous mapping methods (Cheng et al., 2020), the use of multi-step iterative scans shows a significant improvement in re-inserting the ocean variability across those spatial scales. Therefore, three iterative scans were performed using radii of 26°, 8°, and 4°. Other key parameters in this method can be found in the definitions in the study of Cheng et al. (2020).
To evaluate the performance of our reconstructed salinity results, a "subsample test" based on Argo-period observations (since 2007) was used in this study. Such a method has been validated in previous temperature/salinity reconstructions, including the impact of a localization strategy and the uncertainty due to the mapping method (Cheng and Zhu, 2016;Cheng et al., 2017;Cheng et al., 2020). We remark here that this procedure tested the impact of the changing sampling and climate state, it does not test the impact of the changes in the covariance function over time which requires further studies. In a subsample test, the observed salinity fields every six months from January 2007 to July 2017 were defined as the "truth" field (named "truth-obs", total 22). For each "truth-obs", the data within -2 to 2 months centered on each selected month were averaged to construct each "truth-obs" field to increase the data coverage. We subsampled each "truth-obs" as defined above, using different historical observation masks, selected every 60 months from January 1960 to January 2015 (so 12 observation masks are tested). Then all the subsampled fields (total 12×22, i.e., sampling 22 truth with 12 observation masks) were reconstructed by the mapping method and compare with the corresponding "truth-obs". Here we defined their differences as the "sampling errors" to quantify the performance of the reconstruction. Besides the Argo-period observations, synthetic observations were used to evaluate the mapping performance. The synthetic data were re-sampled from a high-resolution (0.25°×0.25°horizontally) CNRM-CM6-1-HR model in CMIP6 archive by the National Centre for Meteorological Research, Meteó-France, and CNRS laboratory, and spans from 2007 to 2018 (historical simulation before 2015 and SSP2-4.5 after 2015) (Voldoire et al., 2019). The model simulations were regarded as "truth" field (named "truthmodel" hereafter), which has global coverage and are dynamically consistent. We adopted 22 "truth-model" fields selected January and August from 2005 to 2015 and then we re-sampled them according to the observation locations on every 60 months from January 1960 to January 2015 (12 subsampled fields for each "truthmodel" field, which are synthetic observations). There are a total of N=22×12 synthetic observation fields. Both IAP-1 and IAP-0.5 mapping methods were tested using this synthetic data approach and they have been averaged into 1°and 0.5°version, respectively.

Other salinity products
Five observational gridded products were used in this study for the comparison of multiscale salinity variability (Table S1): (1) EN4 from the UK Met Office (Gouretski and Reseghetti, 2010;Good et al., 2013); (2) Ishii from Japan (Ishii et al., 2017); (3) NCEI from the United States (Levitus et al., 2012); (4) IAP-1 from China (Cheng et al., 2020); (5) one Argo-based gridded monthly analysis, SCRIPPS from the United States ). Two additional ocean reanalysis data were included: (1) ORAS4 from the ECMWF (Balmaseda et al., 2013); (2) SODA from NOAA, USA (Carton et al., 2018). Among these products, SODA has the same horizontal resolution as the IAP-05 data, and the others have 1°×1°horizontal resolution. All the available products are converted into the same horizontal resolution and vertical depth levels as the new dataset for comparison. The climatology was constructed by the 12-month salinity data for the 1990-2010 period for the observations and model, which are used for mapping method to fill the data gaps. The salinity anomaly fields were then obtained by removing their own monthly climatology average before the analyses. Anomaly fields are compared in this study because of the following reasons: (1) It is a common practice to use anomalies to investigate variability and trend, so anomalies are much more relevant for climate variability and change researches; (2) In all previous studies, the mapping procedure is applied to the anomaly field rather than the absolute field because the anomalies are used to minimize the aliasing of the large spatial gradients of the climatological and seasonal fields into the mapping as a result of the sparse spatial resolution; (3) The procedure for constructing the absolute field (i.e. climatology) is always different from the mapping for anomalies, and their error sources and structure are different. Therefore, the climatology of each dataset should be removed to reduce the error related to climatology construction. The time span for each product is more than 55 years since 1960, with the exception of SODA, for which we used the 1980-2017 period.
To assess the quality of the new dataset, independent observations were also used. (1) The historical hydrographic observations in the Labrador Sea provided by the Bedford Institute of Oceanography (BIO) (Yashayaev, 2007). This hydrographic dataset includes the independently validated and bias-corrected multi-section surveys (i.e., AR7W) and profiles in the Labrador Sea, which span from 1896 to present. The physical parameters and transient anthropogenic traces from this dataset have been used to investigate the structures, properties, and impacts of Labrador Sea water, and some studies of the climate dynamics and change in the subpolar North Atlantic (Yashayaev and Loder, 2017). (2) The monthly SSS gridded version 3 product from the European Space Agency (ESA) Sea Surface Salinity Climate Change Initiative (CCI) consortium (Boutin et al., 2021). This dataset provides the improved calibrated global SSS fields with a 25km horizontal resolution of near-global coverage spanning from 2010  (2011)(2012)(2013)(2014)(2015) and SMAP (2015 to present) L-band radiometers. Consequently, it has the capacity to monitor the global SSS features, including its mesoscale variability and covers a longer period.
(3) the gridded sea surface salinity dataset for the Atlantic Ocean, named LEGOS, which spans from 1980 to 2016 (Reverdin et al., 2007). The LEGOS data are based on the available data collected from 1970 to 2016 mostly from Voluntary Observing Ships, PIRATA moorings and Argo profiles and subsequently validated. This monthly SSS products is mapped by the objective analyzed method and covers the region between 95°W -20°E and 50°N-30°S in the Atlantic Ocean. Here we employed these observational datasets to test the ability of the IAP-05 dataset to capture small-scale and regional signals, and estimate of long-term linear trends.

Examples of the subsample test
The spatial pattern of the 50 m depth salinity anomalies on January 2017, the subsampled fields (for observation location of Januarys of 1960, 1990 and 2010 separately) and the reconstruction fields are presented in Figure 1. This subsample test uses the Argoperiod observations. The ocean salinity field in January 2017 manifested as a dissipated La Ninãa-like pattern, with freshening in the tropical North Pacific and the Intertropical Convergence Zone (ITCZ), together with a salinity increase in the central and eastern Pacific. The northwest and southwest Pacific also experienced a significant salinity increase. The patterns of the reconstructed fields ( Figure 1B) agree well with the "truth-obs" field for 2017, with a high spatial correlation coefficient of 0.61. The global-mean RMSD of the sampling errors for 2017 is 0.08, which is smaller than the reconstructed signals in most part of the global ocean. A good resemblance can be also found in the results of other subsample tests with a sparse historical distribution (Figures 1B,F,H). This consistency can also be found in their zonal patterns and the results for other periods (i.e., 1970, 1980 and 2000), with the positive salinity anomalies within 20°N-40°N,10°S-0°and 55°N-60°N, and negative values within 5°N-15°N and 20°S-10°S ( Figure 1J). They provide indication that the new mapping method can reconstruct the "truth-obs" field given the sparse data distribution even back to 1960.
In addition, it is apparent that the reconstruction field is very different from the first guess field, which is provided by the mean of ensemble members (used as a prior), even in the data sparse region such as the southern oceans ( Figures 1B vs. 1I). To further quantify the potential impact of the first guess field, we manually replaced the first-guess field in IAP-05 mapping scheme with the climatology (zero anomaly), and then compared the reconstructed field with the previous reconstructed results, taking the 50m-depth salinity anomaly in January 1960 as an example ( Figure S2). Both reconstructed fields show a consistent pattern (Figures S2B,S2D) and their difference also has a much smaller magnitude than the reconstructed results ( Figure S2E). This test confirms that the first guess does play a minor role in our reconstruction, mainly because of the strong constraint of observations with the current mapping technique.
However, there are still some differences between the reconstructed field and the "truth-obs", particularly before the Argo Period (i.e., 1960 and1990), such as in the Southeast Pacific, South of Australia, and the southern subtropical Atlantic (Figure 1), indicating the uncertainty in reconstruction, which will be quantified by "subsample test" in the next section.

Subsample test using Argo-period data
The basin means of the salinity anomalies averaged over 0-2000 m (named S2000 hereafter) from 1960 to 2020 are shown in Figure 2, along with the corresponding mean sampling errors and sampling uncertainties. The global mean S2000 change ( Figure 2A) shows a good resemblance to the IAP-1 result, decreasing with a rate of 0.005 ± 0.006 psu per century over the period 1960-2020 (The error range represents the 95% confidence interval of linear trends based on the ordinary least squares (OLS) linear fit). The Pacific Ocean experiences a robust freshening (−0.016 ± 0.004 psu per century) during 1960-2020, and the Atlantic Ocean yields a dramatic increase of 0.024 ± 0.013 psu per century since 1960. This shows that the new dataset can capture the increasing salinity contrast between the Atlantic and Pacific oceans (Curry et al., 2003;Durack et al., 2012). In the Indian Ocean, the S2000 change shows a strong decadal fluctuation, but with a slightly larger salinity value than the IAP-1 results before the 1980s. The S2000 for the IAP-05 and IAP-1 data show consistent decadal variation and a long-term decreasing trend (−0.008 ± 0.007 psu per century) of S2000 in the Southern Ocean. The reconstructed S2000 change in the Atlantic Ocean reveals a slightly positive systematic bias, while all other basins have negative systematic bias. The bias indicates the errors in the mapping method, likely rising from the bias in the error covariance given by models. Future improvement will be made if more and better high-resolution models are included to better estimate covariance. However, a systematic bias means an error in the mean state, and it has a small impact on the decadal and multi-decadal variabilities because the negative/positive bias is consistent over time within the uncertainty range. Moreover, the uncertainty in sampling error (2 standard deviations) reduced significantly after 2005 compared with before 2005, particularly in the Indian and Southern oceans. This reveals that the sampling uncertainty was decreasing with the increasing number of salinity observations.
To quantify the impact of sampling on the reconstructed interannual variability (smaller than 7 years) and decadal/ multidecadal variability (greater than 7 years), we define the "signal over noise" ratio (SNR), calculated as the standard deviation (1s) of the salinity time series divided by 1s of the sampling error given in Figure 2. A 7-year high-pass filter was applied to the entire 1960-2020 time series to extract the interannual variation, and the 7-year low-pass filter was applied to obtain the interdecadal/multidecadal variation during the 1960-2020 period, consistent with Cheng et al. (2017); Cheng et al. (2020). Figure 2F shows the SNR for the decadal/multidecadal and interannual signals in different basins, separated into two different periods: the Pre-Argo period and the Argo period.
These interannual variations are smaller than or close to the sampling uncertainty (SNR ≤ 1) for almost all basins, especially for the Pre-Argo period, suggesting that the reconstructed interannual signals are more affected by the sampling uncertainty and then less reliable. For the decadal/multidecadal timescale, the SNR in all the basins is larger than 1, particularly in the Pacific, Atlantic and Indian oceans (SNR > 2). Since 2005, the reconstructed decadal variations in all the basins are significantly larger than the sampling uncertainty (SNR > 2). This suggests a more reliable reconstruction of decadal/multidecadal salinity variability in the recent decades, thanks to the establishment of Argo network. Meanwhile, for both time scales, the SNR during the Argo period is larger than that before 2005, suggesting a better reconstruction.

Subsample test using synthetic observations
Here we used the synthetic data to test IAP-1 and IAP-0.5 mapping performance. Figure 3 shows their vertical structures of global-mean salinity changes since 1960. Both versions showed consistent long-term changes of salinity, with significant freshening trends at sea subsurface (200~1200 m) and salinification trends at upper 200m ( Figures 3A, B). To quantify the reconstruction performance, Figures 1960, 1970, 1980, 1990, 2000, 2010, 2017 are 2660, 5942, 6722, 8167, 5590, 15563 and 20397 for the 720×360 grid boxes. "truth-model" and shows the sampling error and its standard deviation for IAP-05 and IAP-1 respectively. The mean sampling errors for both IAP datasets are much smaller than the salinity anomalies ( Figures 3C vs. A, D vs. B), indicating a reliable reconstruction by both approaches. IAP-1 mapping results in an overall positive bias (0~1200 m) and a negative bias below 1200m ( Figure 3D), For the IAP-05 mapping, there is a small negative sampling error above 400m and a positive bias within 400-1600m before 2005. Since 2005, the negative bias extends down to 1000m and the positive error confines to depths 1000-1800m ( Figure 3C). Besides, IAP-05 also shows smaller standard deviation than the IAP-1 data for all depths ( Figures 3E vs. 3F). Figure S3 further shows the 0-2000m mean sampling errors for the global ocean and four major basins. In all the basins, the global and basin means of salinity errors for both mapping methods are smaller than the actual salinity change (Figure 2). By contrast, the mean sampling errors for the IAP-05 mapping method are smaller than that for IAP-1 method. The geographical distributions for the linear trends of the 0-2000 mean sampling errors and the globally zonal mean sampling errors of both IAP mapping methods based on the synthetic data are presented in Figure S4. Both IAP schemes display a similar pattern of change in sampling errors over time, and the magnitude of trend errors for the IAP-05 mapping method is generally smaller than IAP-1. For IAP-05, there are the negative biases appeared in the tropical and subtropical South Pacific Ocean, and the positive biases in West Atlantic Ocean.
We note that the large sampling errors of both IAP salinity datasets occur in the Southern Ocean and their vertical distributions of negative anomalies correspond with the subduction pathway from the high latitudes ( Figures S4C, S4D). Nevertheless, the sampling errors for both reconstructions are smaller than the actual salinity change signals (comparing Figure S4 of this study with Figure 6 in Cheng et al., 2020), indicating the robustness of the identified salinity trend patterns. Figure S5 further shows the distributions of standard deviation of linear trends of the sampling errors for both mappings, indicating a large uncertainty of trends reconstruction in the Boundary Current regions and subtropical gyres ( Figures S5A, S5B) in the upper-200m ocean ( Figures S5C, S5D). Compared with the IAP-1 results, the IAP-05 scheme shows a smaller and shallower uncertainty, which is mainly located in the boundary current regions in Southern hemisphere.
These results based on synthetic data with known "truth" suggest a better performance of the IAP-05 mapping method than the IAP-1 in illustrating the basin-scale salinity changes over the past decades.

Comparison with CCI satellite-based product
Here we provided additional assessments of the IAP-05 data by comparing the new dataset with the independent and highresolution satellite gridded salinity product. As one goal of the IAP-05 is to provide a higher horizontal resolution dataset than IAP-1, CCI v3 gridded data at 25km horizontal resolution help us to testify its capacity to reveal the small-scale signals of the ocean. The mean SSS fields from January 2017 to December 2017 for CCI, IAP-05, and IAP-1 are shown in the Figure 4. For a fair comparison, both IAP salinity datasets were interpolated into the same horizontal resolution with CCI salinity product. All these products can reproduce the large-scale salinity characteristics, with a salinity maximum in the mid-latitudes, which is dominated by evaporation, and low values in the precipitation-dominated regions (i.e., the tropical and North Pacific) (Figures 4A, C, E). Both IAP datasets have the high spatial correlation coefficients of more than 0.97 with CCI data. Moreover, IAP-05 shows more details of mesoscale and small-scale spatial salinity distributions than the IAP-1 products. The magnitude of the horizontal gradients of SSS was also calculated in Figure 4 by the square root of the sum of squared zonal gradient and meridional gradient of SSS averaged from January 2017 to December 2017, following the method adopted in Vinogradova et al. (2019). This metric can illustrate the fine structure of the spatial variability of salinity. All the three products show similar salinity gradient patterns, including the stronger variability in the coastal and boundary currents regions, upwelling zones, and large-scale circulation convergence zones. The new IAP-0.5 dataset shows better consistency with CCI than IAP-1, such as the strong salinity fronts in the North Pacific Ocean, North Indian Ocean and Kuroshio extension regions. Figure 5 shows the spatial maps of the mean difference and RMSD of SSS between both IAP datasets and CCI v3 satellite data from 2010 to 2019. Compared with CCI, IAP-05 and IAP-1 datasets have the global mean difference of -0.012 psu and -0.013 psu, respectively, and they show a consistent large-scale pattern with the positive differences in the tropical and subpolar regions in Pacific and Atlantic oceans, Bay of Bengal, Subtropical South Indian Ocean, and with the negative differences in the subtropical Pacific and Atlantic oceans, Arabian Sea and Southern oceans (South of 30°S ), corresponding to the evaporation and precipitation dominated regions. We note that the magnitude of SSS signals for IAP-05 data are slightly smaller than the IAP-1 results. The largest values (>0.3 psu) are located in some coastal regions, such as the north of 45°N in the North Atlantic, and Antarctic Circumpolar Current (ACC) regions, where the large errors were present in CCI satellite data. Albeit for this difference, the spatial patterns of RMSD for both IAP datasets ( Figures 4B, D) are consistent with the higher salinity values (>0.3 psu) in ITCZ regions, ACC regions in Indian Ocean, Kuroshio Current and its extension. They correspond to the largest mean difference for IAP products in these regions. The large RMSD for IAP-05 and IAP-1 (>0.4 psu) are also located in the coastal regions, where the CCI data have a larger uncertainty.
In summary, by comparing with an independent highresolution satellite data, we show that the new IAP-05 data can better produce the smaller-scale signals than the 1°results.

Comparisons with other available observations
The subsample test and assessments using the independent satellite observations show that the 0.5°salinity field reconstructed using the proposed mapping method is a better representation of the salinity than 1°salinity field. In this section, we further compare the new dataset with other available gridded salinity datasets, in order to evaluate how well the current datasets could present the historical salinity variations on various spatial and time scales.

Salinity snapshots in the boundary and coastal regions
The increase in resolution should provide a better representation of some regional features that are not recognized in the 1°gridded resolution data, particularly in the western boundary and coastline regions. The 10 m depth salinity anomalies of the Kuroshio Current and its extension in January 2015 are shown in Figure 6 (this month is selected arbitrarily, using other time yields similar results). The IAP-05, IAP-1, EN4, Ishii, SCRIPPS, and both reanalyses (i.e., ORAS4 and SODA) are included for comparison. For a fair comparison, the 1°gridded datasets (i.e., IAP-1, EN4, Ishii, SCRIPPS and ORAS4) were interpolated into the horizontal resolution of 0.5°. As indicated in Figure 6, all the datasets can capture the large-scale characteristics of the salinity distribution in regions such as the zonal band of positive anomalies in the south of 20°N for the northwest Pacific. The patterns of EN4, Ishii and IAP-1 are smoother, while SCRIPPS and ORAS4 are able to show a few small-scale and mesoscale salinity signals. Compared with the 0.5°interpolated dataset, the high-resolution (including IAP-05 and SODA) results show a better representation of the salinity anomaly distribution in this selected region than other products in Figure 6. IAP-0.5 shows meandering features in the Kuroshio extension, which are more physically reasonable than other data such as IAP-1, EN4, and Ishii. Besides, the small-scale features in IAP-0.5 data are more consistent with in situ observations ( Figures 6A vs. 6B). Figure 6 shows that another improvement of the IAP-05 is the physical description of the coastline. This is because the 0.5°grid box is closer to the land and more confined to the coast than the 1°g rid box (Locarnini et al., 2018). In particular, as SCRIPPS is based only on Argo observations, it shows no signal in some nearshore regions due to the absence of Argo data in shallow waters, such as the Bohai Sea, the Yellow Sea, East China Sea. This comparison suggests that higher resolution mapping could more effectively use the large amount of data near the coastal regions and in the boundary current regions.

Comparison with in situ observations
Here we compare different datasets with in situ salinity observations, based on the anomaly fields computed by removing the climatology over the 1990-2010 period. The in situ salinity data are the 0.5°gridded averaged fields without application of any spatial interpolation, which is used to reduce the sub-grid variability and to compensate the irregular spatial sampling. Figure 7 shows the 0-2000m averaged salinity anomalies differences between the gridded in situ data for the 1980-2017 period. The differences in coastal regions and regions where the depth is less than 2000m are estimated by using the salinity anomalies averaged to the local bottom. The small values in mean difference and RMSD for the new data with in situ gridded data could be expected because the observational sources of IAP-05 data are based on the in situ gridded data. IAP-05, IAP-1 and Ishii show a smaller magnitude of differences compared with the EN4 and two reanalysis products. The global average mean difference for IAP-05 is 0.003 psu; this   psu) are evident in the tropical Indian and southern Atlantic oceans, and the negative biases in the southeast Pacific and southern Indian oceans (<-0.02 psu). Figure S6 illustrates the spatial distribution of the 0-2000m and 1980-2017 averaged RMSD of salinity products relative to the gridded in situ salinity data. All the datasets show the smaller RMSD (mostly <0.03 psu) in the tropical oceans and subtropical North Pacific than in other places. Large RMSD values (>0.05 psu) can be found in the Boundary current regions (i.e., Kuroshio Current and its extension) and some coastal areas, and the subtropical North Atlantic and ACC regions in Indian Ocean. IAP-05, IAP-1 and Ishii data have lower values of RMSD than EN4 and two reanalysis products in most places, with the globalmean RMSD of 0.057 psu for EN4, 0.059 psu for ORAS4 and 0.065 psu for SODA. The IAP-05 data show the smallest values of 0.045 for the global mean of RMSD, and the global-mean RMSD for IAP-1 is 0.052 psu. For the SODA, the largest RMSD (>0.05 psu) also appears in the Southeast Pacific, which is not found in other datasets. Figure 8 further shows the global mean difference and RMSD of salinity anomalies compared with observations from surface to 2000m. All the datasets indicate the larger values of difference and RMSD in the near surface layer and they decrease with depth. The IAP-05 shows the smallest difference and RMSD in these gridded salinity products and its large values are found in the upper 50m with <0.01 psu for mean difference and <0.02 psu for RMSD. For other datasets, larger salty biases are found at the upper 200m. Among them, SODA shows the larger salty bias above 1000m and the largest RMSD in the 0-200m layer and 600-1000m reaching to more than 0.03. Figure S7 indicates the time variations of global mean difference of globally 0-2000m mean salinity anomalies during 1980-2020 compared with in situ observations. Although salt is conserved in the global ocean, it is expected that the mean salinity of the upper 2000 m is not constant and has the global mean freshening associated with the melting of land ice, and large decadal variations. Most products (except SODA) show the significant decrease of errors after 2005. ORAS4 shows a peak of salty bias over the 1995-2005 period. IAP-05 data shows the smallest values of mean difference over the whole period and IAP-1 also has the similar magnitude except within 1995-2000. SODA salinity data show an exceptional large positive bias before 1990 and a negative bias after 2005.
In summary, by comparing with in situ observations, we show that the new IAP-0.5 dataset has smaller mean difference and RMSD than the other products. However, we note that the in situ observations are processed by the authors' group, thus the results in this section can't be regarded as a fully independent and fair comparison between IAP-05 with other products. This is a caveat, but such comparison is still a useful addition to the tests provided in previous sections because all products are based on essentially the same in situ data from WOD.

Long-term trends of salinity
The long-term changes of SSS and subsurface salinity have been identified as indicators of the global-warming induced water cycle Frontiers in Marine Science frontiersin.org intensification, thus it is useful to compare different products for their representation of the long-term trends. Figure 9 shows the spatial pattern of the salinity trends from 1980 to 2017 for all the gridded products. All ocean analyses (IAP-05, IAP-1, EN4, Ishii, NCEI) show a robust signature of the salinity pattern amplification, that is "the fresh gets fresher, and the salty gets saltier", including the increased salinity contrast with the freshening (salinification) in Pacific (Atlantic) Ocean ( Figures 9A-E). But there are major differences in magnitude and locations. Ishii data shows the weakest salinity increase in the subtropical Pacific and a much broader freshening in the tropical Pacific Ocean compared with other observational analyses ( Figure 9D). The results of EN4 and NCEI ( Figures 9C, E) show some patchy or spotty distributions (contours are isotropic), which is likely associated with the use of a parameterized covariance assuming Gaussian distribution (Boyer et al., 2005;Good et al., 2013).
The reanalyses (i.e., ORAS4 in Figure 9F and SODA in Figure 9G) show marked differences in trend's magnitudes compared with the observational analyses ( Figures 9A-E): their linear trends are much larger than the observational analyses in most places. Their spatial pattern correlations against the observational analyses are always smaller than 0.33 (Table 4). By contrast, the correlations between observational analyses are always >0.30 and even get to~0.73 (between IAP-05 and IAP-1) (Table 4). Besides, for ORAS4, the signals of salinity increase in the subtropical Pacific are distributed more southward than the other products. And the inter-basin contrast of SSS trend between the Pacific (fresher) and Atlantic (salter) is smaller for ORAS4 and SODA than the other products ( Figures 9F, G). These comparisons indicate a distinction between reanalyses (ORAS4 and SODA) and the observational datasets, which are probably impacted by the surface forcing, model biases and its assimilation scheme (Balmaseda et al., 2013;Carton et al., 2019).
The new estimate of SSS trend is further compared with the LEGOS and other data in the tropical and subtropical Atlantic (30°S -50°N) during the period of 1980-2016 ( Figure S8). All the datasets show are consistent for the large-scale patterns, showing the salinity increase in the subtropical North and South Atlantic and the freshening in the tropical Atlantic Ocean. The magnitude of trend in IAP-05 data is slightly weaker than the results for the IAP-1, EN4 and two reanalyses, but more consistent with Ishii and LEGOS. The Ishii also shows the weakest magnitude and SODA displays the largest SSS trends in the tropical and subtropical Atlantic. The spatial patterns are similar for all datasets, to quantify the similarity of the spatial pattern between LEGOS and different salinity products, here we calculated the spatial correlations. The highest correlation coefficient is 0.33 for LEGOS versus IAP-05, and Ishii data shows the smallest coefficient (~0.14), indicating LEGOS and IAP-05 are more consistent than LEGOS versus the other datasets. We note that the spatial correlation between IAP-05 and IAP-1 data is 0.55.
The long-term trends in the globally zonal mean subsurface salinity of these datasets also show an overall enhancement of the existing mean pattern (Figure 10), with the high correlation coefficients between two spatial fields for both datasets and even reaching to 0.83 (between IAP-1 and EN4) ( Table 4). This is closely associated with a downward propagation of salinity anomalies into the ocean interior in recent decades, corresponding to a salinity increase of the salty subtropical waters and freshening of the highlatitude waters. Again, the spatial correlations between reanalyses (ORAS4 and SODA) and other observational datasets are low or even negative (i.e., -0.11 and 0.12 for ORAS4 and SODA vs Ishii suggesting major differences are in the subsurface ocean. The biggest differences between ORAS4 and other datasets are located around 40°N extending to depths of~2000 m and~30°S above 200 m ( Figure 10F).

FIGURE 8
The vertical distribution of mean difference (A) and RMSD (B) of global-mean salinity anomalies over 2000m between the gridded in situ data and IAP-05, IAP-1, EN4, Ishii, ORAS4 andSODA during 1980-2017. All the salinity climatology fields are relative to 1990-2010. The 1°gridded IAP-1, EN4, Ishii and ORAS4 have been interpolated into 0.5°versions. Li et al. 10.3389/fmars.2023.1108919 Frontiers in Marine Science frontiersin.org Given the distinction between reanalysis and analysis in our comparison, a thorough comparison among reanalysis products is urgently needed to better understand the uncertainty. A similar analysis has been done for ocean heat content (Palmer et al., 2017), showing much larger spread among ocean reanalyses than observational products. Similar results are expected for salinity as well, because the reanalysis products are subject to additional errors from the inaccurate ocean model (especially at subsurface), surface forcing, and assimilation scheme.

ENSO-related SSS variability in the Pacific Ocean
ENSO is a dominant mode of variability in the tropical Pacific (Qu and Yu, 2014;Zhi et al., 2019). The phase of ENSO is commonly indicated by the Ninão 3.4 index, which is defined as the three-month running mean of SST averaged in the region of 5°S -5°N, 170°W-120°W (Trenberth, 1997). To compare the capability of different products in demonstrating the ENSOrelated SSS changes in the tropical Pacific, we regressed the SSS   [2005][2006][2007][2008][2009][2010][2011][2012][2013][2014][2015][2016][2017] and the Pre-Argo period , respectively ( Figure 11). All the regressed SSS patterns in our study are similar to those in previous studies (Roemmich and Gilson, 2011). During the El Ninão phase, the strongest negative SSS anomalies occur in the western-central equatorial Pacific, centered near 175°E and extending to the northeastern Pacific along the west coast of North America, which roughly coincides with the eastward displacement of the warm pool along the Equator. Three positive cores are also found near the Philippine coast, over the SPCZ, and in the southeastern tropical Pacific. However, a better consistency is found during the Argo period (right panels in Figure 11) than the Pre-Argo period (left panels in Figure 11), associated with the increasing data amount and data quality since 2005 (Riser et al., 2016). For the objective analysis products, the magnitude of the regressed anomalous SSS fields before the Argo period is weaker than that during the Argo period, particularly in the southeast Pacific ( Figures 11A-H). The EN4 and Ishii data display some patchy and spotty distributions in the western and southeast Pacific, particularly before the Argo Period. While the ENSO patterns for IAP-05 and IAP-1 are more physically consistent than that for EN4 and Ishii during the Pre-Argo Period, because they are well constrained by observations, given the near-global fraction coverage (>90%) of their mapping methods. The positive anomalies in the SPCZ for the ORAS4 and Ishii data (Figures 11G,I) are much larger than the results from other datasets. The SODA reanalysis shows another negative anomaly pole in the tropical northern Pacific (>0.15 psu°C -1 ), and its magnitude is larger than other datasets ( Figure 11K). Compared with other salinity products, the new dataset and IAP-1 seems to be more capable of depicting the ENSO-related SSS modes during these two periods.

S-IOD events
Strong salinity variations are also found in the tropical Indian Ocean (IO). Previous studies have shown that the meridional salinity dipole mode in the tropical Indian Ocean (S-IOD) is a predominant mode of SSS variability in the tropical IO (Zhang et al., 2016;Zhang et al., 2017). In its positive phase, there is a contrast with low (high) salinity anomalies in the southern equatorial (tropical southeastern) IO. The S-IOD shows multiple time-scale variabilities and is triggered by many processes, including the ocean advection driven by equatorial IO winds and the surface freshwater flux variability in the Southern tropical IO, the decadal modulation of the Indo-Pacific Walker circulation and Indonesian Throughflow (Zhang et al., 2017). Furthermore, the S-IOD index is used to represent the evolution of the S-IOD events, which is defined as the SSS anomaly (SSSa) difference between the southeastern IO (22°-12°S, 98°-118°E) (1990-2010, units: psu). Linear trends are statically significant at the 95% confidence level in the dotted areas. The 1°gridded IAP-1, EN4, Ishii and ORAS4 have been interpolated into 0.5°versions. (Zhang et al., 2016). Here, S-IOD index is calculated in each individual dataset, and then it is regressed onto the spatial SSS anomalies in the tropical IO for each dataset. The analysis is separated into two periods (2005-2017 and 1980-2004) to test the robustness of the results (Figures S9, S10). The typical S-IOD mode of the significant meridional SSSa dipole during the Argo period is well indicated by all the salinity results ( Figure S9), with contrasting low (high) SSS anomalies at its northern (southern) poles. With higher resolution, IAP-05 and SODA are able to provide a more detailed description of the salinity distribution. Before the Argo period, all the salinity products can capture the main characteristics of the S-IOD mode, but with some differences in the locations and magnitude ( Figure S10). For example, compared with the regression results during the Argo period, the center of the positive SSS anomaly for EN4 is located farther north and east; the scope of the positive salinity anomalies for the Ishii data is stronger but located farther east; and IAP-05 shows a weaker southern pole but strong positive anomalies in West Indian Ocean ( Figure S10A).

Interdecadal salinity change in the Subpolar North Atlantic
The Atlantic Meridional Overturning Circulation (AMOC) is the key component of world ocean's thermohaline circulation, and its variation plays an important role in modulating the global climate variability (Rahmstorf et al., 2015;Lozier et al., 2019). As the primary ventilation region in the North Atlantic, the interdecadal variations in deep convection of the Labrador Sea is one of major issues for AMOC research.
The hydrographic datasets for the Labrador Sea have accumulated a large number of historical salinity profiles (they are not included in WOD) and thus can be considered as an independent data for the validation of our new dataset and other products. On the basis of these hydrographic observations, Figure 12 shows the time series of the mean salinity anomaly averaged over 200-2000 m in the Central Labrador Sea since 1960. The salinity change in this layer acts as a regulator of the density in the Labrador Sea with the significant interdecadal variability, corresponding to the concurrent deep convection process and variation in AMOC strength (Yashayaev and Loder, 2016). The salinity decreased from the early 1970s, picked up in the early 1990s, and then began to decrease again during 2012-2013. In 2011, the salinity anomaly returned to a relatively strong positive value, but smaller than its peak value in the early 1970s. This may be attributed to the impact of global warming on the salinity in this region and the AMOC (Collins et al., 2013). For the salinity products, SODA can capture the change in phase since the 1990s, but it had some spikes within 1982-1987 that are stronger than the magnitude of the positive peak in the early 1970s. ORAS4 also had some spikes in the time series around 1999, and did not return to the positive phase after 2005. All the datasets could capture the positive peak during the period of 1960s and 1970s, and Ishii and ORAS4 show the relatively smaller magnitude. Since then, the IAP-05, NCEI, EN4, and Ishii data show relatively weaker negative salinity anomalies during the 1980-2000 period. After 2005, all the salinity products tended to be consistent, except for the ORAS4, and IAP-05 shows a slightly weaker positive anomalies than other datasets. The salinity variation of IAP-1 data agrees better with the independent observations than the IAP-05 results, and it has the highest correlation of~0.81 with that of observations, despite the high correlation of~0.75 for IAP-05.

Discussions
Few salinity observations are available spatially or temporally, leaving numerous data gaps. Several salinity analysis and reanalysis datasets have been implemented based on different reconstruction schemes and sources of observations in recent decades. This study introduced a new IAP salinity gridded dataset with a horizontal resolution of 0.5°from surface to 2000m since 1960. We first describe the mapping framework of IAP-05 and evaluate its performance. This new dataset is built from a new ENOI-DE scheme of dynamic ensembles, which are derived from the ensemble members of high-resolution models and ocean reanalyses. The evaluation of IAP-05 performance includes the subsample test based on the Argo observations and Synthetic data; comparisons with in situ observations, high-resolution satellite-based product, gridded SSS dataset for the Atlantic Ocean and independent observations in Labrador Sea. Collectively, our evaluation results indicate that the proposed method can better reconstruct the historical salinity changes than IAP-1 product and can better represent the ocean salinity structure.
A comparison of the salinity multiscale variabilities and patterns between the IAP-05 and several salinity gridded datasets is also presented, including ocean analyses (i.e., IAP-1, EN4, Ishii, SCRIPPS and NCEI) and reanalyses (i.e., ORAS4 and SODA). We focused on the following main aspects: (1) the spatial patterns in the western boundary regions and near the coastline; (2) the long-term mean state and linear trends of SSS and the global zonal mean subsurface salinity; (3) the ENSO-related salinity variability in the tropical Pacific Ocean; (4) the S-IOD variability in the tropical Indian Ocean; (5) the decadal salinity variation in the subpolar North Atlantic. The results are summarized below.
The comparison of salinity anomalies with in situ data in the western boundary regions demonstrates the advantage of highresolution version. It provides a better representation of salinity anomaly features, particularly in the mesoscale and small scale, and near the boundary current systems and coastal regions. Besides, the dataset allows more effective use of observations near the coastal regions and in the boundary current regions. For the long-term trends of surface and subsurface salinity, the ocean analyses can represent an amplification of the existing salinity mean state, while ORAS4 and SODA reanalyses show much larger spread. The ENSO-related salinity patterns and S-IOD associated patterns show a good consistency among multiple products, albeit with some differences in locations and magnitude, and time periods. Both IAP datasets indicate better consistency with the hydrographic datasets in illustrating the interdecadal salinity variation in subpolar North Atlantic. However, this study only provides an intercomparison among products, regional differences and their relationship with controlling factors (i.e., rainfall change and ocean advection) need to be further investigated, for example, by analyzing specific ENSO and S-IOD events.
The comparison of the gridded salinity datasets is useful for understanding climate changes at different temporal and spatial scales, and the improvement of data reconstruction. The differences among products provide reference points for the application of salinity gridded datasets. In the future, an important step is to understand the source of error in salinity datasets, including data quality control, vertical interpolation method, mapping method, choice of climatology, data bias etc. The reanalysis also has additional errors from the inaccuracy of ocean model, surface forcing and assimilation scheme. Therefore, a comprehensive comparison and further analysis on the available analysis and reanalysis datasets are recommended, as a start to uncover the uncertainty in these datasets.
We acknowledge that our mapping method has some caveats. For example, the proposed method in this study can be impacted by the performance of the high-resolution model simulations because they provide covariance and first-guess estimate. Improvements and updates of models will feed back to the reconstruction of the historical subsurface salinity field in the future. The care also needs to be taken when using IAP salinity data for model evaluation and for detection and attribution studies. Furthermore, merging the SSS fields derived from recent and future satellites (i.e., Aquarius, SMOS and SMAP) into the in situ observation products will also be useful to better estimate the four-dimensional salinity state.