Estimates of Water-Column Nutrient Concentrations and Carbonate System Parameters in the Global Ocean: A Novel Approach Based on Neural Networks

(CANYON: and validation of biogeochemical models that are difﬁcult to obtain. In particular, combining the increased coverage of the global Biogeochemical-Argo program, where O 2 is one of the core variables now very accurately measured, with the CANYON approach offers the fascinating perspective of obtaining large-scale estimates of key biogeochemical variables with unprecedented spatial and temporal resolutions. The Matlab and R codes of the proposed algorithms are provided as Supplementary Material.

A neural network-based method (CANYON: CArbonate system and Nutrients concentration from hYdrological properties and Oxygen using a Neural-network) was developed to estimate water-column (i.e., from surface to 8,000 m depth) biogeochemically relevant variables in the Global Ocean. These are the concentrations of three nutrients [nitrate (NO 3 − ), phosphate (PO 4 3− ), and silicate (Si(OH) 4 )] and four carbonate system parameters [total alkalinity (A T ), dissolved inorganic carbon (C T ), pH (pH T ), and partial pressure of CO 2 (pCO 2 )], which are estimated from concurrent in situ measurements of temperature, salinity, hydrostatic pressure, and oxygen (O 2 ) together with sampling latitude, longitude, and date. Seven neural-networks were developed using the GLODAPv2 database, which is largely representative of the diversity of open-ocean conditions, hence making CANYON potentially applicable to most oceanic environments. For each variable, CANYON was trained using 80 % randomly chosen data from the whole database (after eight 10 • × 10 • zones removed providing an "independent dataset" for additional validation), the remaining 20 % data were used for the neural-network test of validation. Overall, CANYON retrieved the variables with high accuracies (RMSE): 1.04 µmol kg −1 (NO 3 − ), 0.074 µmol kg −1 (PO 4 3− ), 3.2 µmol kg −1 (Si(OH) 4 ), 0.020 (pH T ), 9 µmol kg −1 (A T ), 11 µmol kg −1 (C T ) and 7.6 % (pCO 2 ) (30 µatm at 400 µatm). This was confirmed for the eight independent zones not included in the training process. CANYON was also applied to the Hawaiian Time Series site to produce a 22 years long simulated time series for the above seven variables. Comparison of modeled and measured data was also very satisfactory (RMSE in the order of magnitude of RMSE from validation test). CANYON is thus a promising method to derive distributions of key biogeochemical variables. It could be used for a variety of global and regional applications ranging from data quality control to the production of datasets of variables required for initialization and

INTRODUCTION
The ocean is under increasing stress (Gruber, 2011;Gattuso et al., 2015). Given this context of a rapidly changing ocean, it is crucial to reinforce the observation capability of biogeochemical variables and develop ways of measuring or estimating new ones (Claustre et al., 2010;Gruber et al., 2010b). This is required not only for monitoring ongoing changes, but also to gain a better understanding of key biogeochemical processes and for reducing uncertainties in budgets of major elements (e.g., carbon, oxygen, nitrogen, phosphorus, and silicium).
Reaching the goal of an improved global observation system for biogeochemical variables primarily relies on enhancing the spatio-temporal resolution of measurements. Historically, marine biogeochemical observations have been conducted from ships either taking discrete water samples followed by laboratory analyses (e.g., Global Ocean Ship-based Hydrographic Investigations Panel, GO-SHIP program;Talley et al., 2016), or conducting continuous measurements of surface-water properties. These approaches have been and still remain essential as their estimates generally have the highest accuracies. Such measurements have been assembled into global databases (e.g., GLODAPv2; Key et al., 2015;Olsen et al., 2016), which are a key resource for making budgets of chemical elements, directly from available measurements (Takahashi et al., 2009) or indirectly through specific innovative methods (Landschützer et al., 2013(Landschützer et al., , 2014(Landschützer et al., , 2016, conducting climate change research (Le Quéré et al., 2015) and biogeochemical modeling (e.g., use of data for model initialization and/or validation; Doney et al., 2009;Ilyina et al., 2013). The ship-based sampling mode has one major limit, i.e., coarse spatio-temporal resolution and resulting undersampling of marine biogeochemical properties. This severely limits the understanding of fundamental processes and the accurate documentation of ongoing changes, especially at some critical scales (e.g., seasonal, regional).
Over the last two decades, observation technologies such as autonomous platforms have matured (e.g., profiling floats and gliders equipped with biogeochemical sensors; Johnson et al., 2009Johnson et al., , 2013. Robotic observation now provides a reliable complement to ship-based sampling that can be used to cost-effectively densify the acquisition of marine biogeochemical properties (Johnson et al., 2009). Among such observation systems, the recently launched Biogeochemical-Argo (BGC-Argo) network offers a promising approach for the global coverage and spatio-temporal resolution of biogeochemical properties (Johnson and Claustre, 2016). The biogeochemically-relevant variables amenable to systematic and reliable acquisition with robotic observation systems presently include concentrations of oxygen (O 2 ) and their number increases rapidly (Johnson et al., 2015). More generally, O 2 concentration is the most mature measurement, and could be easily implemented on all types of profiling floats (Gruber et al., 2010a) including those of the BGC-Argo network.
Oxygen optode sensors have been progressively implemented on profiling floats since the early 2000s, and have thus opened a new area of research (e.g., Körtzinger et al., 2004;Martz et al., 2008;Riser and Johnson, 2008). Strong efforts have been devoted toward the improvement of the long-term reliability and accuracy of autonomous O 2 measurements on profiling floats. A crucial step is the possibility of frequently calibrating optodes by recording O 2 in air when the float surfaces (Bittig and Körtzinger, 2015;Johnson et al., 2015;Bushinsky et al., 2016). Such a calibration can be done for each profile and throughout the float's lifetime, improves the precision and accuracy of O 2 measurements to within 0.2 and 1 %, respectively (Bittig and Körtzinger, 2015), which accuracy is comparable to that of the reference Winkler titration technique. Water column O 2 concentration can therefore be globally monitored at the biogeochemically relevant spatial and temporal resolutions. This will move O 2 , which required specialized measurements until now, among the key standard oceanographic variables.
In the present study, we develop a new approach with which the expected increased densification of O 2 measurements in the near future could be used to support new studies related to seven key biogeochemical variables, i.e., concentrations of three dissolved inorganic macronutrients (nitrate, phosphate, and silicate) and four parameters of the carbonate system (total alkalinity, dissolved inorganic carbon, pH on the total scale, and partial pressure of CO 2 ). Because O 2 concentration is a variable that reflects both phytoplankton production and community respiration processes, the first-order relationships which link O 2 , nutrients and inorganic carbon are rather well-constrained through Redfield stoichiometry (Redfield, 1934(Redfield, , 1958. These intrinsic relationships have been used to develop, from regional to global scales, multiple linear regression, or neural network approaches that link O 2 and simultaneously acquired variables (e.g., pressure, temperature, salinity) to biogeochemical variables, in particular parameters of the carbonate system (Juranek et al., 2011;Velo et al., 2013;Carter et al., 2016;Williams et al., 2016). These relationships could be used as transfer functions to convert dense fields of O 2 (and associated variables) into corresponding fields of biogeochemical variables of interest. This represents a way to cost-effectively populate, spatially, and temporally, the previously loosely resolved fields of these variables.
Such transfer functions represent a potential approach to profit from the upcoming numerous accurate measurements of O 2 (from profiling floats), which are expected to be routine soon, to derive properties or variables that are difficult or costly to acquire. To be useful, these functions must provide "predicted" variables with relatively high accuracy, and they should be as generic as possible (i.e., ideally of global applicability). Among the different possible methods for developing such transfer functions, artificial neural networks are an attractive tool as these powerful methods can be used for approximating any differentiable and continuous functions and thus allow to model complex and non-linear relationships (Hornik et al., 1989;Lek and Guégan, 1999). As a consequence, neural networks have already been largely used for biogeochemical and geophysical applications (e.g., Ward and Redfern, 1999;Friedrich and Oschlies, 2009;Jamet et al., 2012;Ben Mustapha et al., 2013). More recently, neural networks have been used successfully to retrieve the vertical distribution of biogeochemical variables at the global scale using as input the geolocation variables, providing a single global transfer function handling boundary issues compared to regional-based functions (Sauzède et al., 2015(Sauzède et al., , 2016. The present study takes advantage of the simultaneous release of the GLODAPv2 database (Olsen et al., 2016) and the planning of the BGC-Argo program (Johnson and Claustre, 2016). The two observational systems and resulting databases are highly complementary, and new approaches can be developed to synergistically use their respective strengths, i.e., measurement accuracy for GLODAPv2, and spatio-temporal coverage for BGC-Argo, with a specific emphasis on O 2 measurements. We thus focus in this study on the development of global neural network-based transfer functions using O 2 as a primary input to retrieve nutrient concentrations and carbonate system parameters in the water column down to 8,000 m. Hereinafter, we refer to our method as CANYON, for CArbonate system and Nutrients concentration from hYdrological properties and Oxygen using a Neural network.

The GLODAPv2 Database
The Global Ocean Data Analysis Project version 2 (GLODAPv2) was an effort from the international community to consolidate all data from ocean bottle samples collected as part of many oceanic cruises (Olsen et al., 2016). The GLODAPv2 database (available at http://cdiac.ornl.gov/oceans/GLODAPv2/) provides a single high-quality internally consistent global data product that contains CO 2 -relevant ocean interior measurements from ship-based surveys. The GLODAPv2 database includes samples of core variables such as salinity, oxygen, macronutrients, and seawater CO 2 chemistry from 724 oceanic cruises. In this study, we focused on seven variables representative of the macronutrients and of the seawater carbonate system: nitrate (NO 3 − ), phosphate (PO 4 3− ), silicate (Si(OH) 4 ), pH on the total scale (pH T ), total alkalinity (A T ), total dissolved inorganic carbon (C T ), and partial pressure of CO 2 (pCO 2 ). Note that we estimated this last variable from the A T and C T measurements available in GLODAPv2 (see details below).
Initially, GLODAPv2 was instigated to prepare a unified, bias-corrected interior ocean data product. Thus, a high quality control, QC, based on two steps (i.e., primary and secondary QC) was applied to each data (Olsen et al., 2016). The primary QC was carried out following routines outlined in Sabine et al. (2005) and Tanhua et al. (2010), essentially based on inspection of propertyproperty plots. The secondary QC for salinity, oxygen, nutrients, C T , and A T was more complex, and carried out through crossover (i.e., comparing data where two different cruises crossed or came close to each over) and inversion analyses (i.e., calculation of corrections required to minimize all cruise-by-cruise offsets). This two step-based method was introduced by Gouretski and Jancke (2000) and Johnson et al. (2001). For the secondary QC applied to the GLODAPv2 database, the crossover offsets were calculated using the running-cluster crossover routine (Tanhua et al., 2010), with data from beneath 2,000 m to minimize effects of real variations. For pH T , crossover analysis was not possible because data only exist for a small fraction of the cruises. To pass the secondary QC, pH T measurements had to be concomitant with C T and/or A T for calculating offsets (see details in Olsen et al., 2016). For Mediterranean Sea data, the secondary QC always failed because none of the cruises inside the Mediterranean had an overlap with other cruises (e.g., outside the Mediterranean) thus preventing the crossover analysis. Hence, only "high-quality" GLODAPv2 data that passed the secondary QC were used to train and validate the CANYON methods except for the Mediterranean Sea where we used data that only passed the primary QC.
The subset of the GLODAPv2 database used for our study (i.e., the data that passed the secondary QC, except for the Mediterranean Sea data as explained above) contained 37,863 concurrent profiles of water-column (from the surface to a maximum sample depth of 8,000 m) hydrological properties together with nutrients concentration and/or parameters of the carbonate system (see Figure 1). These data were collected between 1972 and 2013 and were representative of the diversity of oceanic regions, i.e., 25 % were collected in the North Atlantic, 10 % in the South Atlantic, 22 % in the North Pacific, 12 % in the South Pacific, 10 % in the Indian Ocean, 13 % in the Southern Ocean, 7 % in the Arctic Ocean, and ∼0.2 % in the Mediterranean Sea (geographic boundaries are provided in Figure S1). On the temporal scale, most of the data were acquired since the 1990's and more data were available for the spring and summer months ( Figure S2). There was a sampling bias according to latitude as data from autumn and winter months (i.e., December to March for the Northern hemisphere, and May to August for the Southern hemisphere) were less represented at high than low latitudes (i.e., >45 • North and South, respectively) in the GLODAPv2 database ( Figure S2).
All the data used to train and validate CANYON were measurements recorded in the GLODAPv2 database, except the pCO 2 estimates that we calculated from A T and C T measurements using the R package "seacarb" (Gattuso et al., 2015(Gattuso et al., , 2016). The carbonate system parameters were computed using the carbonic acid dissociation constants of Lueker et al. (2000), the hydrogen fluoride dissociation constant of Perez and Fraga (1987), the dissociation constant for bisulfate of Dickson (1990), and a ratio of total boron to salinity derived from Uppström (1974). In situ measurements of salinity, temperature, hydrostatic pressure as well as the concentrations of PO 4 3− and Si(OH) 4 were used to calculate pCO 2 . When not available in the GLODAPv2 database, the concentrations of PO 4 3− and Si(OH) 4 were estimated using our CANYON algorithm (see associated accuracies in Section Overall CANYON Performance).
For the neural network development, vertical profiles of nutrients and carbonate system parameters from eight independent zones of the GLODAPv2 database (squares of 10 • latitude × 10 • longitude) were first removed from the general database to provide a more "independent data set" used for an independent validation of the algorithm developed in this study. These zones were chosen in several major oceanic basins and were representative of the Sub-Equatorial Pacific, the Sub-Equatorial Indian, the North Atlantic Subtropical Gyre, the North Atlantic Subpolar Gyre, the North Pacific, the South Atlantic, the South Indian, and the South Pacific (Figure 1). The remaining profiles were then split into two subsets with 80 % and 20 % of the data, the so-called training and validation datasets, respectively (see the number of data for each variable in Table 1).

General Principle of Multi-Layered Perceptron (MLP)
A multi-layer perceptron (MLP; Bishop, 1995;Rumelhart et al., 1988) is an artificial neural network based on several layers (i.e., the so-called input, hidden, and output layers) composed of neurons which are basically elementary transfer functions. These neurons are interconnected with the neurons of the preceding and following layers by weights (Figure 2), which are iteratively readjusted during the training phase of the MLP. The criterion for readjusting the weights is the minimization of a cost function defined as the quadratic difference between TABLE 1 | Number of data available for each variable in the different datasets used in this study: the general GLODAPv2 database (data that passed the secondary quality control, except for the Mediterranean data), the dataset from the eight independent zones that were first removed from the general database, the dataset used to train the neural network (80 % of the general database minus the eight independent zones), and the dataset used to validate the neural network (20 %). Details are given in Section The GLODAPv2 database.  the reference measurements and the MLP-based outputs. This minimization is done through the back-propagation conjugategradient technique (Hornik et al., 1989;Bishop, 1995), an iterative optimization method adapted to the development of MLPs. To prevent overlearning (Bishop, 1995), the training data set is randomly split into two subsets called "learning" and "test" data sets (50 % of the training dataset each). Finally the validation data set is used to evaluate the final method performance. Moreover, the "independent data set" (see above in Section The GLODAPv2 Database) is used to check the general applicability of the method.

CANYON: Developing a MLP to Retrieve Nutrient and Carbonate System Concentrations
The optimal architectures of CANYON MLPs for the seven variables to retrieve (i.e., concentrations of three dissolved macronutrients, NO 3 − , PO 4 3− , Si(OH) 4 , and four parameters of the carbonate system, pH T , A T , C T , and pCO 2 ) were chosen after multiple tests. As summarized in Figure 2, the chosen input variables include hydrological and biogeochemical components (i.e., temperature, salinity, and O 2 measurements), spatial components (i.e., hydrostatic pressure, latitude, and longitude) and a temporal component (i.e., day of the year, doy, for the seven variables, and the year for only pH T , C T , and pCO 2 retrievals). We chose to use the year as input of the MLPs developed for retrieving pH T , C T , and pCO 2 in order to take into account the long-term changes in seawater CO 2 -carbonate chemistry due to the uptake of anthropogenic CO 2 (e.g., Gattuso and Hansson, 2011).
Prior to the full-depth CANYON version in this study, an initial depth-restricted CANYON algorithm (i.e., 30 -1,500 dbar depth range) was first developed, and showed a very good performance in subsurface, mode, and intermediate waters. However, estimated concentrations at 1,500 dbar occasionally showed small-amplitude seasonal cycles (data not shown). This especially occurred in regions with scarce reference data, where spatially adjacent data had been acquired in different seasons. We believe that, when the day of the year (doy) had been provided as extra degree of freedom at depth to the MLP, per-se spatial variability was parameterized as seasonal variability. To avoid this misattribution by the neural network, we decided to develop the full-depth CANYON where the doy information is not provided below a certain depth. This depth is the larger of 750 dbar or the climatological maximum mixed layer depth (Holte et al., 2016), below which no seasonal cycle is expected.
For the full-depth CANYON algorithm development, the pressure input was specifically transformed by a combination of a linear and a logistic curve according to: for two main reasons.
(1) Based on our previous experience of including doy in the model, we wanted to limit the degrees of freedom of the neural network in deep and abyssal waters and focus instead its parameterization on temperature and salinity, i.e., the water mass properties. Preliminary analysis confirmed that temperature and salinity were the main determinants of nutrient concentrations and carbonate system parameters in the deep water masses.
(2) Preliminary analysis showed that subsurface and mode-water CANYON estimates for a full-depth version without pressure transformation were less satisfactory than our initial 30 -1,500 dbar version. We attributed this to the extension of the pressure range to full ocean depth, where sub-surface, and mode waters comprise a much smaller dynamic range than previously. To counteract this effect, we chose the above input transformation of pressure with the aim to mimic the dynamic range of our initial CANYON in the interval between 0 and 1,000 dbar. The uncertainty of the pCO 2 calculation from C T and A T is proportional to pCO 2 , i.e., high pCO 2 levels have a higher uncertainty than low pCO 2 levels. Similarly, the uncertainty of the CANYON-predicted pCO 2 scales with pCO 2 as well. The cost function of the MLP training (the quadratic difference between reference and MLP output), however, works on the absolute pCO 2 -value. To account for the different behavior of pCO 2 and to avoid potential biases to the MLP induced by large absolute pCO 2 -values (with large uncertainties) during training, we transformed the pCO 2 to a hypothetical, pCO 2 -equivalent C T at constant conditions (i.e., A T 2,300 µmol kg −1 , 25 • C, 35 salinity, 0 dbar, zero silicate and phosphate) before training. A constant change in this hypothetical C T corresponds to a change in pCO 2 that is proportional to pCO 2 . This transformation thus approximates the observed pCO 2 behavior while we retain the benefits of our MLP architecture and the backpropagation technique for training.
Similarly to the methods developed by Sauzède et al. (2015Sauzède et al. ( , 2016, a specific normalization procedure was applied to the doy and longitude inputs to take into account the periodicity of these variables (e.g., doy 1 of a given year is very similar from a seasonal perspective to doy 365 of the previous year). These two input variables were transformed into radians: where X was either the doy or the longitude, and a was a constant equal to 182.625 or 180 for the doy or the longitude, respectively (accounting for half the number of days in the year and half the maximum value of longitude, respectively). Moreover, as the elementary transfer function that provided outputs when inputs were applied to the MLP was a sigmoid non-linear function and subsequently varied within the [−1;1] domain, the inputs and outputs of the MLP were centered and reduced to match the range [−1;1] (see details in Sauzède et al., 2016). Finally, the MLP developed in this study for each of the seven variables to retrieve (i.e., output variables) are composed of the input layer, two hidden layers, and one output layer (schematic overview in Figure 2). To choose the best architecture of each MLP, tests were performed using one or two hidden layers with a number of neurons varying between 1 and 50 and 1 and 20, respectively. The MLP architecture for each output variable with minimum error of validation and minimum number of neurons was then selected as the best ( Table 2). In order to evaluate the method robustness for each MLP, several subsets of the training data set were tested with no difference observed in the prediction.

Statistical Evaluation of Method Performance
Four statistics were chosen to evaluate the CANYON algorithms performance on the validation datasets. The coefficient of determination (r 2 ) and the slope of the linear regression between the CANYON-retrieved values and the corresponding GLODAPv2 measurements were computed. The statistics also included the MAE (Mean Absolute Error) and the RMSE (Root Mean Squared Error) to evaluate the errors and accuracies of each model: Note that absolute uncertainties are expressed as values for NO 3 − , PO 4 3− , Si(OH) 4, pH T , A T , and C T parameters . For pCO 2 parameter, the relative uncertainties are expressed as percentages (e.g., a relative uncertainty of 5 % is an absolute uncertainty of 20 µatm at 400 µatm).

Overall CANYON Performance
Using the validation database (i.e., 20 % of the general database minus the eight independent zones), we evaluated the performance of the method by comparing the CANYONretrieved nutrient concentrations and carbonate system parameters with the measurements in the GLODAPv2 database using the statistics from Section Statistical Evaluation of Method Performance. Scatterplots of CANYON-retrieved variables vs. GLODAPv2 measurements (Figure 3) show that the CANYON method predicts nutrient concentration and carbonate system parameters with good accuracy (i.e., of 0.93, 0.066, and 3.0 µmol kg −1 for the concentrations of NO 3 − , PO 4 3− , and Si(OH) 4 , respectively, and of 0.019, 7 µmol kg −1 , 10 µmol kg −1 and 5.1 % or 20 µatm at 400 µatm for pH T , A T , C T , and pCO 2 , respectively). The determination coefficients Frontiers in Marine Science | www.frontiersin.org of the seven linear models between the CANYON-retrieved and GLODAPv2-variables are comprised between 0.982 and 0.996 with slopes ranging from 0.986 to 0.999. In Figure 3 only very few data points diverge from the 1:1 line. A higher scatter is observed for low CANYON-retrieved NO 3 − (and PO 4 3− ), which mostly corresponds to low surface nutrient concentrations inside and at the edges of the subtropical gyres. Moreover, the higher scatter observed near the surface than deeper for most variables is probably due to the higher inherent variability in the surface.
The training and validation datasets of the neural networks used to retrieve carbonate system parameters were smaller than the datasets for the retrieval of nutrient concentrations (Figure 3 and Table 1). It is thus possible that the carbonate system networks are less robust than the nutrient ones. In any case, all the MLPs could be updated in the future as more data become available; this seems especially important for the pH T database, which is presently the least populated. In order to assess the importance of this potential weakness, we developed a special neural network using all pH T data available, i.e., all the data that passed the primary quality control (see details in Section The GLODAPv2 Database). The results of this special CANYON algorithm, based on more but a priori less accurate data used for training, are not improved when compared to our initial results (i.e., RMSE of 0.030). Given this and in order to maintain consistency among CANYON algorithms and their retrieval performance, all neural networks were trained using data that had passed the secondary quality control (except for the Mediterranean Sea, see details in Section The GLODAPv2 Database).
To identify possible trends, errors were plotted against each input variable for each retrieved nutrient concentration (Figure 4) and carbonate system parameter ( Figure 5). Some errors are larger because of the small numbers of data (the intensity of the shading in each box refers to the number of data). Nevertheless, some clear trends are observed. The range of errors for Si(OH) 4 retrieval seems to be higher at high latitudes in the Southern hemisphere (i.e., latitudes <−60 • ) and low temperatures, i.e., the ranges of box-plot whiskers increased with both decreasing latitude and temperature in Figure 4. This corresponds to regions of significant Si(OH) 4 , mostly in the Southern Ocean, which suggests that the CANYON method is less accurate for Si(OH) 4 retrievals in the Southern Ocean than in other areas. The C T estimates show a high error at high temperatures and unusually low (<34 psu) salinities. pCO 2 estimates exhibit an increased uncertainty at the extremes of the O 2 input (Figure 5), i.e., the range of errors increases at low O 2 concentrations corresponding to high pCO 2 (Figure S3), and at high O 2 concentrations corresponding to cold, low salinity polar surface waters. For CANYON-estimated NO 3 − , PO 4 3− , and C T , the upper layer (i.e., ≤700 m) has the broadest error range. The Si(OH) 4 displays an opposite trend with error larger for deeper than surface waters (i.e., ≥700 m). The CANYON-estimated A T , pH T , and pCO 2 errors are not affected by depth inputs. Finally, Figure S3 shows that the CANYON retrieved variables are not biased against the range of in situ values to retrieve, except for a few extreme values where few data (i.e., lightly shaded boxes) were available in the training and validation databases.
The above results indicate that hydrographic data and information about season (doy) and geolocation can certainly predict some aspects of the dynamics of biogeochemical variables at the surface (e.g., nutrient and C T drawdown during the spring bloom, seasonal reset to preformed nutrient concentrations with winter ventilation) with O 2 being the most important predictor in CANYON for production and remineralization, particularly in the ocean interior. The year is the only variable that accounts in CANYON for the increase in anthropogenic CO 2 . As a consequence, when the year was not included, CANYONestimated pH T , C T , and pCO 2 showed a clear remaining trend due to the missing information about the long-term changes in seawater CO 2 -carbonate chemistry (data not shown).

Independent Validation for Eight Geographic Zones
The smoothed mean differences between CANYON-retrieved and in situ measurements were plotted as vertical profiles for each of the seven variables and the eight independent zones (Figure 6). In general, the accuracy (i.e., RMSE) for each variable is comparable to the accuracy determined on the validation data set ( Table 3). Beyond this general agreement, there are a few discrepancies. The errors appear to be higher in the 0 -200 m layer than below. This is maybe due to a larger variability in this upper layer, caused by not only biogeochemical processes but also air-sea exchange of O 2 that act to decouple O 2 from the CANYON outputs (see also Figure 3). The Sub-Equatorial Pacific, North Pacific and South Atlantic display higher RMSE than calculated from the 20 % validation data (Table 3). These results suggest that the GLODAPv2 data set for these specific zones could have been underrepresented in the training database with respect to the regional variability in nutrient concentrations and carbonate system parameters.
The above comparisons and the identification of spatiotemporal domains where CANYON performed less accurately could be used to identify more objectively the periods or regions that require more intense acquisition of discrete high-quality measurements. Indeed, the periods and regions where CANYON fails to reproduce well in situ observations are those where the variability (mostly seasonal) was not  well-represented in the training dataset. From Table 3 and Figure 6, these regions are mainly the Sub-Equatorial Pacific, the North Pacific and the South Atlantic, and the periods are most likely late autumn and winter, when harsh sea conditions generally prevent ship-based collection of highquality measurements. In fact, the GLODAPv2 database seems to be biased in this respect (Section The GLODAPv2 Database and Figure S2).

Further Results for Specific Applications
Here we present further results important for specific applications. Indeed, one of a potential application of the CANYON method is the calibration of NO 3 − and pH T sensors mounted on BGC-Argo profiling floats (because the corresponding sensors may drift over long-term deployment).
To overcome this problem, CANYON could be used to compute deep (e.g., ≥1,000 m) reference measurements each time the float In the last row, comparable information is provided for the validation dataset (Section Overall CANYON Performance), as reference. In brackets figures the number of observations used to compute each RMSE.
profiles. With this aim in mind, specific MLPs were developed to retrieve NO 3 − concentration and pH T only at depths between 950 and 2,050 m, i.e., different from the full water-column MLPs used above. Results show that specific, deep MLPs do not significantly improve the quality of the results compared to the MLPs developed for the entire water column applied to this layer (i.e., RMSE for NO 3 − of 0.50 and 0.54 µmol kg −1 , respectively, and for pH T of 0.013 units for the two approaches). The CANYON method can therefore be used to retrieve NO 3 − concentration and pH T for specific applications focused on deep layers with excellent accuracy.
For pCO 2 applications, most studies focus on the surface layer of the open ocean and on CO 2 air-sea exchange (e.g., Takahashi et al., 2009). The CANYON method could be used to address the regional and seasonal variability of air-sea CO 2 fluxes in view of comparing it with the results of previous studies based on neural networks (Landschützer et al., 2014(Landschützer et al., , 2016. For this application, it is important to ascertain that the general MLP performs as adequately as a specific MLP developed to retrieve pCO 2 for the surface layer only (i.e., ≤100 m). Here again, the CANYON algorithm developed for the entire water column is as robust as a specific, surface-focused MLP (i.e., RMSE of 34 and 33 µatm, respectively).

Example of Application: Illustration with HOT Database
The CANYON method was further applied outside the GLODAPv2 domain, on which it was trained, using an independent dataset that contained similar variables. The Hawaii Ocean Time Series (HOT) database contains monthly vertical profiles of hydrological properties, O 2 , nutrient concentrations and carbonate system parameters from the deep-water station ALOHA since 1994 (Karl and Lukas, 1996;Dore et al., 2003). The HOT temperature, salinity, and O 2 measurements from HOT were used as input variables to estimate the concentrations of nutrients and the carbonate system parameters. The overall agreement between the CANYON-simulated variables and their measured in situ counterparts is satisfactory, as shown by the absolute differences between the two datasets and the corresponding comparison statistics (Figure 7 and Table 4). There are few systematic biases, such as the model underestimation of NO 3 − from 450 to 550 m (i.e., the depth horizon of the nitracline) and the overestimation of PO 4 3− and C T in the upper 150 m (Figures 7b,d, Figure S4). These profiles were subsequently removed from the analysis to avoid contaminating the CANYON-retrieved data shown in Figure 7 and the corresponding statistics in Table 4.
CANYON also provided a way to fill a gap in the HOT dataset. Indeed, pH T had not been acquired during the 1999-2003 period, but the input variables needed to run CANYON had been measured. pH T could thus be estimated during that period (Figure 7g) with a mean accuracy of 0.033 units (i.e., RMSE in Table 4). However, it is obviously not possible to compare the values predicted by CANYON with (non-existent) corresponding in situ measurements from 1999 to 2003 (Figure 7h).
Using the HOT temperature, salinity and O 2 measurements from the last year, i.e., 2015, we estimated pH T for the 15 years to come (by changing only the year input in CANYON), and a decline is found in pH T of 0.024 units over this 15-year period (i.e., decrease of 0.0016 ± 0.0004 units year −1 ). This value is consistent with the decrease in pH T of 0.0019 ± 0.0002 year −1 reported in the central North Pacific (Dore et al., 2009) and more generally of 0.0013 to 0.0026 units year −1 units during the 20-30 last years (Bates et al., 2014). This suggests that the CANYON approach could perhaps also be used outside the temporal range of training for the carbonate system parameters for estimating near future changes in pH T , C T , and pCO 2 , thanks to the use of the year among the input variables. However, this would assume that the relationships between the input and output variables, through the hidden layers of the CANYON model, will remain unchanged in the future, and the sensitivity of retrieved FIGURE 7 | Values predicted by CANYON (a,c,e,g,i,k,m) and absolute differences between HOT measurements and CANYON estimates (b,d,f,h,j,l,n). Time series (22 years  carbonate system parameters to departures from this assumption remains to be explored.

CONCLUSIONS AND PERSPECTIVES
The global Biogeochemical-Argo (BGC-Argo) program is being progressively implemented (Johnson and Claustre, 2016), and its core variables include O 2 . BGC-Argo floats can now acquire high-quality vertical profiles of O 2 on the long term (Bittig and Körtzinger, 2015;Johnson et al., 2015;Bushinsky et al., 2016). Given the possibility of acquiring long-term, global vertical profiles of temperature, salinity and O 2 , CANYON could be used to develop a variety of new applications. Firstly, CANYON may contribute to developing quality control and post-processing procedures for NO 3 − concentration and pH T in oceanic waters (see Section Further Results for Specific Applications). These two variables, together with O 2 , are core BGC-Argo variables and their measurements make use of an optical sensor for NO 3 − (Johnson and Coletti, 2002) and an electrochemical sensor for pH T  with known accuracies (1 µM and 0.010 for NO 3 − and pH T , respectively; Johnson et al., 2013. However, these sensors, like the O 2 probes, drift over long-term deployments. Following an approach similar to the one developed for oxygen sensors, which can be referenced to atmospheric values each time the floats surface (Bittig and Körtzinger, 2015;Johnson et al., 2015;Bushinsky et al., 2016), CANYON could be used to compute deep (e.g., ≥1,000 m) reference measurements at depth for pH T and NO 3 − each time the float makes a profile. At these depths, it is indeed expected that reliable and stable reference measurements could be acquired, which could be used to develop appropriate correction procedures for NO 3 − concentration and pH T and thus guarantee the long-term accuracy of the sensors.
Secondly, CANYON can also provide estimates, with known accuracies, of variables that are not presently measured by BGC-Argo floats. This is the case for PO 4 3− and Si(OH) 4 and the three other variables of the carbonate system than pH T . CANYON could thus be used as a cost-effective method for "filling the spatio-temporal gaps" of these variables by populating spatially and temporally their loosely resolved fields in oceanic waters. For these under-sampled variables, CANYON offers novel opportunities at global and local scales. For example, global fields of these variables provided by CANYON could support the initialization and validation of biogeochemical models which presently crucially lack reference data (e.g., Doney et al., 2009;Ilyina et al., 2013).
Thirdly, CANYON could also be used in combination with present measurements of the respective field nutrient concentrations and/or carbonate system parameters. Beside quality control of these data, CANYON values could serve to identify unusual biogeochemical events that had not been covered by the global but sparse GLODAPv2 training data set, in cases where CANYON and field data diverge.
Fourthly, and for more local approaches (e.g., analysis of individual float time series), the possible derivation of pCO 2 from BGC-Argo float O 2 and ancillary measurements (with or without pH T ) potentially represents a new way to address regional and seasonal variability in CO 2 air-sea exchanges, and to reduce present uncertainties in the estimates of these fluxes. Moreover, estimating the three macronutrients (NO 3 − , PO 4 3− , and Si(OH) 4 ) from BGC-Argo float data could be of great value for better understanding the dynamics of biogeochemical events such as the development and subsequent collapse of phytoplankton blooms.
Fifthly, CANYON could contribute to design future observational programs by identifying areas and periods where data acquisition could most cost-effectively address variability that is presently unresolved. Indeed, the strict quality control of the input data in the present study (which used only GLODAPv2 data that passed the second quality check, see Section The GLODAPv2 Database) eliminated some specific regions from our training and validation datasets. This argues for developing field-based observation programs to conduct high-quality measurements in these areas. More generally, the spatio-temporal domains where CANYON provided the least satisfactory results likely corresponded to weaknesses in the GLODAPv2 database with respect to catching the inherent and natural variability of different variables, such as the Southern Ocean in winter.
Overall, the CANYON-type estimation of biogeochemical variables based on data provided by the global BGC-Argo program offers new avenues for marine biogeochemistry that are comparable to those that have been created for physical oceanography by the Argo network since the early 2000s.
This novel approach could increase tremendously the value of biogeochemical measurements made on board ships and by BGC-Argo floats by combining the high quality of the first type of data with the broad spatio-temporal coverage of the second.

AUTHOR CONTRIBUTIONS
HC, RS, and OP have initiated the study and designed the neuralnetwork configurations with JG and HB. RS ran simulations and created the plots. All authors contributed to analysis and discussion of results. RS wrote most part of the manuscript. All authors commented on and contributed to the improvement of several versions of the manuscript.

ACKNOWLEDGMENTS
This study was supported by the remOcean project (funded by the European Research Council, Grant Agreement No. 246777) and the AtlantOS project (funded by the European Union's Horizon 2020 research and innovation program, Grant Agreement No. 2014-633211). This is a contribution to the Southern Ocean Carbon and Climate Observations and Modeling (SOCCOM) project which is supported by the US National Science Foundation (PLR-1425989). We want to thank Tobias Steinhoff (GEOMAR, Kiel), for helpful discussions on carbonate system calculations. We deeply acknowledge the work from analysts, investigators, and crew who collected the data at sea. We are also grateful to all who have contributed their data to GLODAPv2 project and acknowledge the huge effort made to gather all data to create the GLODAPv2 database.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmars. 2017.00128/full#supplementary-material