A Regional Neural Network Approach to Estimate Water-Column Nutrient Concentrations and Carbonate System Variables in the Mediterranean Sea: CANYON-MED

A regional neural network-based method, “CANYON-MED” is developed to estimate nutrients and carbonate system variables specifically in the Mediterranean Sea over the water column from pressure, temperature, salinity, and oxygen together with geolocation and date of sampling. Six neural network ensembles were developed, one for each variable (i.e., three macronutrients: nitrates (NO3-), phosphates (PO43-) and silicates (SiOH4), and three carbonate system variables: pH on the total scale (pHT), total alkalinity (AT), and dissolved inorganic carbon or total carbon (CT), trained using a specific quality-controlled dataset of reference “bottle” data in the Mediterranean Sea. This dataset is representative of the peculiar conditions of this semi-enclosed sea, as opposed to the global ocean. For each variable, the neural networks were trained on 80% of the data chosen randomly and validated using the remaining 20%. CANYON-MED retrieved the variables with good accuracies (Root Mean Squared Error): 0.73 μmol.kg–1 for NO3-, 0.045 μmol.kg–1 for PO43- and 0.70 μmol.kg–1 for Si(OH)4, 0.016 units for pHT, 11 μmol.kg–1 for AT and 10 μmol.kg–1 for CT. A second validation on the ANTARES independent time series confirmed the method’s applicability in the Mediterranean Sea. After comparison to other existing methods to estimate nutrients and carbonate system variables, CANYON-MED stood out as the most robust, using the aforementioned inputs. The application of CANYON-MED on the Mediterranean Sea data from autonomous observing systems (integrated network of Biogeochemical-Argo floats, Eulerian moorings and ocean gliders measuring hydrological properties together with oxygen concentration) could have a wide range of applications. These include data quality control or filling gaps in time series, as well as biogeochemical data assimilation and/or the initialization and validation of regional biogeochemical models still lacking crucial reference data. Matlab and R code are available at https:// github.com/MarineFou/CANYON-MED/.


INTRODUCTION
The global ocean currently absorbs around 25% of anthropogenic carbon dioxide (CO 2 ) from the atmosphere, therefore playing a crucial role in buffering the effects of climate change (Le Quéré et al., 2018). This role is likely to be modified by ocean warming and acidification, having complex impacts on marine ecosystems and organisms (Gattuso and Hansson, 2011). To better understand the underlying processes and anticipate changes, a large number of variables have to be observed in order to gain a more accurate overall picture. In this context, the "Framework for Ocean Observing" (FOO, Lindstrom et al., 2012) was designed to coordinate the ocean observing community's efforts and maintain a sustained Global Ocean Observing System (GOOS) . This framework is organized around Essential Ocean Variables (EOVs; GOOS, 2018), chosen to balance the feasibility of their measurement with their societal and scientific relevances. The sustained measurement of key EOVs helps fill the gaps in our understanding of the ocean.
Historically, measurements have come from oceanographic cruises and from continuous measurements at fixed stations (buoys and moorings) (Chai et al., 2020). However, the low spatial and temporal resolutions of these sampling platforms have resulted in chronic under-sampling of biogeochemical variables, creating "observational gaps" Weller et al., 2019;Chai et al., 2020). Today, technological advances (miniaturization of sensors, automation of measurements) have made it possible to develop a network of autonomous platforms such as profiling floats (Riser et al., 2016;Roemmich et al., 2019;Claustre et al., 2020) and ocean gliders (Testor et al., 2019). These platforms provide measurements of physical and biogeochemical variables at much higher spatial and temporal resolutions in regions or seasons otherwise difficult to access. These active autonomous networks have thus contributed to the progressive densification of databases of key variables at global scale (Abram et al., 2019). Some variables are presently almost systematically measured regardless of the acquisition platform, for instance physical data (temperature, salinity, pressure) and dissolved oxygen concentration (O 2 ). However, measurements from these autonomous platforms remain limited to a small type of biogeochemical variables, owing to the high cost of some sensors and technological limitations (Bittig et al., 2019;Chai et al., 2020).
Machine learning methods represent a promising way to fill these "observational gaps". They have the potential to predict, from variables systematically measured by autonomous platforms, variables still difficult to measure accurately and cost-effectively with these platforms (e.g., Gregor et al., 2019). Transfer functions such as multiple linear regressions (e.g., Velo et al., 2013;Carter et al., 2018) have therefore been developed to estimate biogeochemical variables. Neural network methods, known to be universal substitutes for any differentiable and continuous function (Hornik et al., 1989), have been also applied to complex data sets in oceanography (e.g., estimation of the variability of the global ocean carbon sink (Landschützer et al., 2014), detection of phytoplankton groups in open ocean waters (Ben Mustapha et al., 2014). O 2 is a key EOV in oceanography (GOOS, 2018). It is widely and increasingly measured on autonomous platforms, with measurement accuracies close to those obtained by Winkler titration of discrete samples (Bittig and Körtzinger, 2015). The O 2 concentration results from the balance between exchanges at the ocean-atmosphere interface, mixing and ventilation (Körtzinger et al., 2004;Piron et al., 2016Piron et al., , 2017Coppola et al., 2018), solubility (dependent on temperature and salinity), and biological processes such as primary production and respiration (Robinson, 2019). Furthermore, O 2 is linked to nutrients and inorganic carbon via Redfield's stoichiometric ratios (Redfield, 1934). Therefore, based on the role of O 2 in remineralization and carbon fixation, Sauzède et al. (2017) have developed neural network-based methods to retrieve carbon and nutrient variables in the global ocean. They use O 2 , temperature, salinity, pressure, longitude and latitude, day of the year and year as input variables to predict the concentrations of three macronutrients [nitrates (NO − 3 ), phosphates (PO 3− 4 ), silicates (SiOH 4 )] as well as carbonate system variables [total alkalinity (A T ), total dissolved inorganic carbon (C T ), pH on the total scale (pH T ), and CO 2 partial pressure (pCO 2 )]. This method referred to as CANYON for "CArbonate system and Nutrients concentration from hYdrological properties and Oxygen using a Neural network" predicts these key oceanographic variables, which cannot be measured independently or with sufficient accuracy, with much improved temporal and spatial resolution (Chai et al., 2020). This method as well as its subsequent improvement, CANYON-B (Bittig et al., 2018), were trained on high-quality data collected over the past thirty years (the GLODAPv2 for Global Ocean Data Analysis Project version 2 database; Olsen et al., 2016). One of its potential applications is to generate virtual carbon and nutrients estimates from the large amount of data acquired by the Biogeochemical-Argo float array (BGC-Argo; Johnson and Claustre, 2016;Claustre et al., 2020).
The Mediterranean Sea is a semi-enclosed marginal sea characterized by high salinities and a rapid overturning circulation, rendering it capable of absorbing more CO 2 than adjacent oceanic regions (Schneider et al., 2010;Lee et al., 2011;Touratier and Goyet, 2011;Álvarez et al., 2014). Machine learning methods produce better results when trained on datasets representative of the considered case study. Therefore, both CANYON and CANYON-B approaches may underperform in this specific oceanic region making a regionalization of this approach highly desirable. Furthermore, the Mediterranean Sea is also a low nutrient concentration basin (McGill, 1966;Krom et al., 1991) with an eastward-increasing oligotrophy gradient. Nutrients are particularly important as tracers of biological cycles, biomass production, and natural and anthropogenic inputs (Béthoux, 1989;Béthoux et al., 1992). Through its rapid response to external conditions relative to other oceans (Crispi et al., 2001), the Mediterranean Sea is considered as a good indicator of global shorter-scale ocean processes and is therefore defined as a "hot spot" for climate change (Giorgi, 2006;Diffenbaugh et al., 2007). This "miniature ocean" (Millot and Taupier-Letage, 2005) is particularly relevant for the study of biogeochemical cycles because of its molar ratios of nutrients very distinct from those of other oceanic regions (Ribera d'Alcalà et al., 2003;Krom et al., 2005;Pujo-Pay et al., 2011;Pasqueron de Fommervault et al., 2015). Additionally, the high amount of data from BGC-Argo floats and observing programs (e.g., MOOSE, NAOS) emphasize the need for a regionalized approach, specific to the Mediterranean Sea, to fully take advantage of these observing systems and platforms (Tintoré et al., 2019).
In this paper, we present the regional downscaling of the CANYON method for the Mediterranean Sea, CANYON-MED. A database of in situ measurements has been specifically assembled to constitute a new quality-controlled dataset used to train the regional neural networks. Neural network ensembles were trained and their results are compared to single neural network architectures. The resulting optimized CANYON-MED networks are validated with an independent dataset and compared to existing methods available for nutrients and carbonate system variables prediction in the Mediterranean Sea.

Training and Validation Datasets
The CARIMED (CARbon in the MEDiterranean Sea) data synthesis initiative (Sanleón-Bartolomé et al., 2017) aims to produce a consistent quality-controlled database for carbon relevant variables from hydrographic cruises covering the whole water column and the different basins of the Mediterranean Sea. As the validation of CARIMED data is still in progress, we performed secondary quality control through visual inspection of the profiles and by comparison with literature values, to remove outliers. After applying similar quality control, the data from ten other cruises were added to the database (i.e., CARBOGIB, CASCADE, DEWEX, GIFT, MOOSE_GE 2011, MOOSE_GE 2013, MOOSE_GE 2015, PACIFIC-CELEBES, SOMBA, MSM72; Table 1). The spatial coverage of the complete dataset is shown in Figure 1.
The DYFAMED site (Coppola et al., 2019a) is located in the Ligurian Sea (43 • 25 N, 7 • 52 E, water depth of 2350 m; red star in Figure 1). It is surrounded by the permanent geostrophic Ligurian frontal jet flow caused by the Northern Current's cyclonic circulation, separating the sampling area from coastal inputs by a density gradient (Millot, 1999;Niewiadomska et al., 2008). Monthly cruises are performed over the whole water column since 1991 and included in the MOOSE network since 2010 (Marty et al., 2002;Coppola et al., 2019b). This is the longest open sea time series in the Mediterranean Sea in terms of O 2 , nutrients and carbonate system measurements that led to a homogeneous and calibrated data set with well-described seasonality of these variables (Copin-Montégut and Bégovic, 2002;Marty et al., 2002;Pasqueron de Fommervault et al., 2015;Coppola et al., 2018).
The dataset gathered for this study ( system variables (total alkalinity: A T , total carbon or dissolved inorganic carbon: C T , and pH on the total scale: pH T ). When A T and C T were available, pH T was calculated using CO2SYS-MATLAB (Lewis et al., 1998;van Heuven et al., 2011). This dataset is openly accessible (Fourrier, 2020). Thermodynamic calculations within the carbonate system used the carbonic acid dissociation constants of Mehrbach et al. (1973) as refit by Dickson and Millero (1987), the dissociation constant for bisulfate of Dickson (1990) and Uppström (1974) for the ratio of total boron to salinity. These constants were used to ensure consistency with pH T units in some of the data sets compiled.

Independent Validation Dataset
The ANTARES site is located in the Ligurian Sea (42 • 48 N, 06 • 05 E, water depth of 2500 m; yellow square in Figure 1). It is visited monthly since 2010 and integrated into the MOOSE network (Lefevre, 2010). This time series extends from 2009 to 2018 for ancillary data, O 2 , nutrients, and from 2009 to 2017 for A T and C T . pH T was computed as described in section "Training and Validation Datasets." However, O 2 measurements are lacking prior to 2011, restricting the use of the data in this study to the period 2011-2018. For the whole dataset and after quality control, the distribution of observations per year and month (Figure 2) demonstrates the systematic under-representation of winter months (which are quite exclusively sampled thanks to the DYFAMED time series), as well as the difference in coverage according to year and variable of interest, such as the lower number of C T and pH T data.

Neural Network Development
Artificial Neural Networks (ANNs) are approximate functions adapted to any dataset (Marzban, 2009). One of the main advantages of these methods is their ability to recognize and exploit relationships in data that are not predefined (unlike regression techniques) and do not need to be made explicit by equations (Marzban, 2009). This makes them particularly suitable for mapping non-linear relationships, provided that data are sufficiently available to "train" the neural network (Lefevre et al., 2005). Similarly to Sauzède et al. (2017) and Bittig et al. (2018) for CANYON and CANYON-B, respectively, an iterative statistical learning-based method, and more specifically an ANN was developed to predict carbon and nutrient variables.
Separate the learning data into a training dataset for training the machine learning method and a validation dataset used to assess the performance of the trained method is a common practice. It ensures that the model can produce reliable estimates outside the range of learning data (generalization capabilities) (Bishop, 1995). In the present paper, the dataset was randomly split according to the proportions of 80% and 20% for training and validation sets, respectively. Additionally, an external dataset was also used to further validate the ANNs.

Multi-Layered Perceptron
Among the different types of ANNs available, Multi-Layered Perceptron (MLP; Rumelhart et al., 1986;Bishop, 1995) using a backpropagation algorithm (Bishop, 1995;Hagan et al., 2014) has been chosen for its properties as universal approximator  of any continuous and derivable function (Hornik et al., 1989). A MLP is an ANN organized in several layers (i.e., input, hidden and output layers) containing neurons that are connected to each other and able to exchange information through their connections (Figure 3). These connections are directional, and each connection is associated with a real number, called the "weight". The information is transmitted from one neuron to another through the weights that are readjusted iteratively during the training phase to minimize the difference between MLP outputs and observations. MLPs use an activation function between the neurons (here the sigmoid function f (Figure 3), with A and α equal to 4/3 and 1.7159, respectively) to ensure a quasi-linear behavior between −1 and 1 (Jamet, 2004;Sauzède et al., 2017): The backpropagation algorithm used for the MLP (Bishop, 1995) can be divided into two steps. First, the forward propagation of a stimulus (from the inputs) through the MLP generates an output. Second, the errors are propagated backward from the output through the MLP to change the weights in the opposite direction to the error gradient. The input data were normalized to have an average of zero and a standard deviation of one using the mean and standard deviation of the training data to improve convergence (Goodfellow et al., 2016) and to prevent neural networks saturation (caused by the difference in the range of the different input variables) according to: Where x,x, and σ are, respectively, the input data, their mean and their standard deviation. The factor 2/3 brings at least 80% of the data in the range [−1;1] (Jamet et al., 2012). In order to improve the generalization capabilities of the ANNs, Bayesian regularization (Bishop, 1995) was used. This method minimizes over-fitting by considering the goodness of fit as well as the network architecture. The training algorithm used was a supervised iterative training method updating weights according to Levenberg-Marquardt optimization (Linares-Rodriguez et al., 2013). This algorithm was chosen because it is better suited for function fitting and does not require excessive computing time and power, while still appropriately generalizing (Beale et al., 2018). To avoid falling to a local minimum, a linear combination of weights and quadratic errors was minimized through gradient descent, and multiple initializations with random weights were performed. Then, to determine the appropriate combination of weights and errors, Bayesian regularization was used. This allowed the generalization of the neural network through the optimization of the linear Frontiers in Marine Science | www.frontiersin.org 6 August 2020 | Volume 7 | Article 620 . The inputs are connected to a neuron and each connection is associated with a weight "w." The output is produced by combining the inputs and weights and adding the neuron's bias "b j " before going through a sigmoid activation function "f." A Multi-Layered Perceptron is composed of multiple neurons in hidden layers combined in the same manner to produce a single output.
combination of weights and errors (MacKay, 1992a,b;Foresee and Hagan, 1997;Hagan et al., 2014). The Matlab Neural Network Toolbox, and more specifically the algorithm "trainbr, " has been chosen for the ANN implementation. The ANN optimal architecture or topology (number of inputs, outputs, number of hidden layers and neurons in each hidden layer) hinges on the complexity of the relations between inputs and outputs. After testing several different configurations through a trial-and-error process, the topologies for neural networks producing the best results were determined with 2 hidden layers and a number varying between 15 and 50 neurons for the first layer and between 8 and 30 for the second hidden layer.

ANN Ensemble Model
The robustness and reliability of an ANN can be significantly improved by combining several ANNs into an ANN ensemble model (Sharkey, 1999;Linares-Rodriguez et al., 2013). The construction of an ANN ensemble is done in two main steps. First, the individual members of the model are created (as described above in section "Multi-Layered Perceptron"). Second, the combination of the outputs of these members is averaged to obtain the unique ensemble output. Thus, for each neural network [NO − 3 , PO 3− 4 , Si(OH) 4 , A T , C T , pH T ], the ten best topologies were chosen according to their statistics (as defined in section "Validation Statistics Metrics"). The final output of each neural network ensemble model (ANN-E) corresponds to the mean of the outputs of these ten best ANNs. Additionally, the best topology (ANN-1) among the ten best was also selected to compare between a one ANN structure and a neural network ensemble.

CANYON-MED
Based on the CANYON networks (Sauzède et al., 2017) principle, the CANYON-MED neural networks corresponding to neural-network ensemble models (ANN-E) (as described in "ANN Ensemble Model") were developed. Similarly to the work by Sauzède et al. (2017), the chosen input variables for the networks are in situ measurements of pressure, temperature, salinity (water mass characteristics), and oxygen together with geolocation (latitude and longitude), date of sampling (day of year or day, representing the seasonality of processes) and year of sampling (representing global trends linked to anthropogenic changes). Compared to the CANYON networks, where the year is an input only for pH T and C T , it has been chosen as an input for the six CANYON-MED networks [i.e., NO − 3 , PO 3− 4 , Si(OH) 4 , A T , C T , and pH T ]. This change, namely the addition of the year for each network, was based on the long-term increases in deep nutrients observed in the western basin by Béthoux et al. (1998Béthoux et al. ( , 2002, as well as the increases in temperature and salinity in the deep Mediterranean Sea over the past 40 years (Borghini et al., 2014) which demonstrate the need of a temporal component in CANYON-MED networks. According to the work by Sauzède et al. (2017), the pressure input was transformed according to the combination of a linear and a logistic curve to limit the degrees of freedom of the ANN in deep waters and to account for the large range of pressure values (from the surface to 4000 m depth) and a non-homogeneous distribution of data within this range: Furthermore, in agreement with the CANYON networks, the day has been modified to account for the periodicity of this measurement [day 365 (end of December) of a year is similar to day 1 (beginning of January) of the next year from a chronological perspective]. It was therefore transformed into radians according to: Furthermore, similarly to the aforementioned method, and due to the nature of the transfer function (a sigmoid varying in the range [−1;1]), the inputs and outputs of the neural networks have been centered and reduced to also fall into the range [−1;1] (Sauzède et al., 2016(Sauzède et al., , 2017. Moreover, also building on the CANYON networks, Bittig et al. (2018) developed CANYON-B, a Bayesian neural network improvement of CANYON. CANYON-B is constructed as a committee of neural networks and provides estimates of nutrients and carbonate system variables with a local uncertainty (whereas CANYON provides global uncertainties). Neural networks committees are composed of several neural networks and use the spread of predictions between individual members of the committee to improve the estimation of uncertainty of the committee output (Bishop, 1995). This other neural network method was also used for comparative purposes.

Neural Network Ensemble Improvement and Overall CANYON-MED Performance
For each studied variable [i.e., NO − 3 , PO 3− 4 , Si(OH) 4 , A T , C T , and pH T ], a CANYON-MED ensemble neural network was created as described in section "ANN Ensemble Model." Comparing the statistics for the case using a single neural network (ANN-1) and the ensemble neural network model (ANN-E, corresponding to CANYON-MED), the ANN-E model provides the most accurate nutrient and carbonate system estimates ( Table 2). For most variables, an increase in the determination coefficient is shown (i.e., from 0.94 to 0.95, 0.90 to 0.92, 0.93 to 0.96, 0.91 to 0.94, 0.84 to 0.86 for NO − 3 , PO 3− 4 , A T , C T , and pH T , respectively) as well as a decrease in MAE and RMSE (up to 20% and 30% of the errors, respectively). The low MAE values suggest that the ANN-E model is not biased, although the slopes are slightly less than 1, resulting in an underestimation of neural network outputs.
Using the validation dataset, the performance of the CANYON-MED method was evaluated by comparing CANYON-MED's results with the corresponding in situ values. Figure 4 shows these results as a function of pressure while the corresponding statistics are in Table 2. The accuracies obtained are very satisfactory with, for example, an accuracy of NO 3− extracted from the neural network method (0.73 µmol.kg −1 ) comparable to that obtained with optical sensors such as those mounted on BGC-Argo floats (1 µmol.kg −1 ; Johnson et al., 2017). Owing to the lower number of data presently available in our training dataset for C T and pH T , the ability of the corresponding neural networks to generalize correctly is lowered, explaining the less robust statistics for C T and pH T with slopes around 0.9, and determination coefficients of 0.91 and 0.84, respectively ( Table 2).
In Figure 4, two different deep value ranges (i.e., two "patches" of dark blue points) can be distinguished for NO 3− and PO 3− 4 corresponding to the deep values of the Western and Eastern Basins. The Eastern Mediterranean Sea is known to be more oligotrophic than its western counterpart (Ribera d' Alcalà et al., 2003;Pujo-Pay et al., 2011). Indeed, deep NO − 3 and PO 3− 4 values ( Figures 4A,B) are lower in the ultraoligotrophic eastern Mediterranean than in the oligotrophic western Mediterranean, as evidenced by the differences in concentrations of the deep values (5 and 10 µmol.kg −1 for eastern and western NO − 3 , respectively and 0.2 and 0.4 µmol.kg −1 for PO 3− 4 , respectively). The deep Si(OH) 4 values are quite similar between the two Mediterranean basins as it is not a limiting nutrient (Krom et al., 1991). A larger dispersion is observed for low concentrations of PO 3− 4 and NO − 3 (Figures 4A,B) recovered by CANYON-MED, coinciding with the low concentrations of surface nutrients in the Eastern Basin Kress et al., 2014).
Furthermore, deep A T ranges between 2590 and 2610 µmol.kg −1 (Figure 4D) corresponding to the deep waters of the Western and Eastern Mediterranean Sea respectively. The difference between the two basins stems from an eastward increasing trend for A T mirroring the increase in salinity and an eastward increase in pH T . However, TABLE 2 | Number of points in the training and validation datasets and statistics between in situ measurements from the validation database and the values predicted by CANYON-MED's "best topology" (ANN-1) and CANYON-MED (ANN-E) for NO − 3 , PO 3− 4 , Si(OH) 4 , A T , C T and pH T . this difference is not very visible owing to the large range of A T . For C T , the difference between the two Mediterranean basins is lower ( Figure 4E). C T variability is controlled by salinity, biological processes (such as photosynthesis, oxidation of organic matter, dissolution and precipitation of CaCO 3 ), as well as air-sea CO 2 exchange (Lovato and Vichi, 2015). Additionally, pH T ( Figure 4F) exhibits a large range below 1500 m (7.98 to 8.1). This also stems from the difference between the two Mediterranean basins, a pattern similar to that of A T , with higher surface and deep values in the Eastern Mediterranean Sea (Rivaro et al., 2010).

Validation on Independent Time Series: ANTARES
The ANTARES time series was chosen as an additional independent validation dataset because, among the few time series in the deep offshore Mediterranean, it is one of the few where measurements of nutrients and carbonate system variables are performed semi-regularly and over the entire water column (Lefevre, 2010). The vertical profiles of the differences between in situ measurements of the ANTARES dataset and the values predicted by CANYON-MED are represented ( Figure 5) along with their mean value and associated standard deviation. In general, the accuracies (Table 3) of each variable are comparable, albeit slightly worse, to those determined on the CANYON-MED validation dataset ( Table 2). These accuracies are even lower for A T and C T probably due to the smaller ranges covered by the ANTARES site. The errors seem quite homogeneous over the whole water column (Figure 5), with higher errors when lacking data at specific depths [e.g., NO − 3 and Si(OH) 4 at around 1250 dbar]. In addition, we can also note the clear overestimation of PO 3− 4 at the ANTARES time series, evident from the slope of 1.06 (Table 3) as well as from the observation of a clear shift in Figure 5B.
However, it is important to note that the ANTARES dataset, although subject to quality control, still exhibits a high nutrient dispersion along the water column. This variability may result from natural phenomena or potential issues in the measurement accuracy (e.g., measurement uncertainty, sampling procedure, change of operator, evolution of techniques). In the latter case, it can explain the dispersion of errors found for nutrients in Figure 5.

CANYON Methods
The performance of the CANYON, CANYON-B, and CANYON-MED methods were compared using the validation dataset (the remaining 20% of the database not used for training). The performances were computed by comparing the neural network outputs for nutrients and carbonate system parameters with the in situ measurements according to the metrics defined in section "Validation Statistics Metrics." Scatterplots of neural network-retrieved variables against their corresponding in situ measurements (Figure 6) reveal that the CANYON-MED method gives much better results than its global counterparts (i.e., CANYON and CANYON-B). The accuracies (RMSE) have been reduced, for each variable, by more than half between CANYON-B and CANYON-MED and by a third for nutrients and A T between CANYON and CANYON-MED, as shown by statistics in Table 4. The differences are primarily due to the under-representation of Mediterranean Sea cruises in CANYON and CANYON-B's training dataset (Mediterranean data poorly represented in the GLODAPv2), as opposed to the training dataset specifically designed for the Mediterranean Sea in CANYON-MED.
Specifically, for the CANYON and CANYON-B networks, a higher scatter is observed for NO − 3 and PO 3− 4 near-zero values (Figures 6A,B). CANYON-MED's MAE and RMSE are halved compared to CANYON and CANYON-B's ( Table 4). In addition, a significant number of values are predicted to be negative (around −0.1 µmol.kg −1 ). This feature is mainly caused by the difference in the nutrient concentrations for the oligotrophic to ultra-oligotrophic Mediterranean Sea which are close to the detection limits of the nutrients analysis method (Krom et al., 1991) compared to the higher concentrations found in the global ocean (high concentrations present in GLODAPv2, CANYON, and CANYON-B training data).
A very high scatter is also present for Si(OH) 4 CANYON and CANYON-B-retrieved values, especially for the lower values (i.e., <4 µmol.kg −1 ) ( Figure 6C). In the global ocean, Si(OH) 4   (Ragueneau et al., 2000;Pujo-Pay et al., 2011), thus explaining the difference in scatter between the Si(OH) 4 values obtained by CANYON and CANYON-MED.
A T and C T retrieval performance appears to be relatively similar using the three methods (Figures 6D,E) due to the wide range of values of these two variables, which reduces differences between approaches. However, for A T a large spread is present in CANYON estimates. Indeed, the MAE and RMSE range to a third less than CANYON and CANYON-B (Table 4). Moreover, a larger dispersion remains perceptible in the values predicted by CANYON and CANYON-B with predicted values lower by up to 100 µmol.kg −1 than their in situ measurements.
Likewise, pH T values are comparable between the three neural network-based methods (Figure 6F), with CANYON-MED projecting pH T with the lowest spread. Compared to the global average surface ocean, the Mediterranean Sea will be subject to amplified acidification Goyet, 2009, 2011;Palmiéri et al., 2015). The Mediterranean Sea is known to absorb more anthropogenic CO 2 per unit area (Palmiéri et al., 2015). Essentially, the Mediterranean's high A T increases its capacity to absorb anthropogenic CO 2 and the short timescales at which its deep waters are ventilated (Schneider et al., 2014) allow for deeper penetration of this CO 2 , thus resulting in a lower pH T .

Other Methods
A key advantage of CANYON-MED lies in the few inputs required to use it, but other methodologies exist to predict nutrients and carbonate system variables. CANYON-MED and the methods described in this section were applied to our validation dataset and their results evaluated by the MAE and RMSE (as defined in section "Validation Statistics Metrics") are presented in Table 5.
First of all, Carter et al. (2018) developed methods for locally interpolated estimations of NO − 3 , PO 3− 4 , Si(OH) 4 , A T , and pH T (LINR, LIPR, LISIR, LIARv2, and LIPHR, respectively). These methods base their computations on equations requiring salinity, Apparent Oxygen Utilization (derived from O 2 ), depth, temperature, as well as nutrients concentrations. For comparability to our method and its possible application on BGC-Argo floats, these regressions were applied on our validation dataset using only the equations with similar inputs as our neural networks (i.e., temperature, depth or pressure for CANYON-MED, salinity, and O 2 ). The computations with Locally Interpolated Regressions (LIRs) were performed using their functionality that selects, using the given inputs, the lowest-uncertainty estimate among possible estimates (Carter et al., 2018).
Furthermore, also building on the CANYON networks, and in addition to CANYON-B presented in section "CANYON-MED, " Bittig et al. (2018) developed another neural networkbased method: CONTENT. This method predicts carbonate system variables. In CONTENT, CANYON-B estimates of the four carbonate system variables (i.e., A T , C T , pH T , pCO 2 ) are combined through calculations of every pair (as the four variables can be derived from any pair of them). CONTENT therefore provides better estimates through the use of all four parameters of the carbonate system, whereas CANYON-B provides a unique direct estimate. These methods have the same inputs as CANYON-MED, except for the year which is only an input for the C T , pH T and pCO 2 neural networks in CANYON-B.
In addition, specifically for the Mediterranean Sea, Hassoun et al. (2015) derived equations to calculate A T and C T from salinity hereafter referred to as A T -S and C T -S. Equations are available for the entire Mediterranean Sea as well as specific equations for each sub-basin and several depth layers. For our comparison, the global equations were used since some equations for specific depth layers and locations had low performance. Furthermore, it is possible to use a single equation over the whole Mediterranean Sea, to derive A T from salinity, if marginal seas and regions of important freshwater influence are not considered (Cossarini et al., 2015), which is relevant because CANYON-MED is not suited for coastal areas.
As shown in Table 5, CANYON-MED has lower errors for all predicted variables than the methods presented above. CONTENT has very similar errors in the prediction of carbonate system parameters compared to CANYON-B. The equations from Hassoun et al. (2015), predict A T and C T with errors up to four times the errors of CANYON-MED. Moreover, the LIRs clearly stand out, with errors up to ten times higher than the other methods. However, it should be stressed that the results from the LIRs would have been more robust (errors 10% lower but still higher than the other methods, data not shown) if they had been applied using all inputs, that is including nutrients as predictors, but as mentioned before, for aims of comparability, they were not.
Generally, CANYON-MED neural networks are more accurate than other methods with the same inputs. This is not   (Sauzède et al., 2017), CANYON-B (Bittig et al., 2018), CANYON-MED (this paper), CONTENT (Bittig et al., 2018), A T -S and C T -S , LIARv2, LINR, and LIPHR (Carter et al., 2018)  surprising as it was built specifically for the Mediterranean Sea whereas CANYON, CANYON-B, CONTENT, and the LIRs were developed for the global ocean. However, high MAE and RMSE were obtained using A T -S and C T -S from Hassoun et al. (2015). This can be explained by the fact that, while having been developed specifically for the Mediterranean Sea, these equations were derived using data collected only in May whereas our validation dataset covers as much as possible the whole year. Furthermore, we only used the global equations developed for the entire Mediterranean Sea and for all depths.
It is acknowledged that the equations derived for specific areas and depth layers would have produced more accurate results in targeted areas, but the aim was to compare the methods on a global basin scale. Therefore, it is suggested that, while allowing for easy computations of carbonate system variables in cases where they are lacking, the simple approximation from salinity might not always produce the best results on a varied dataset.
Overall, CANYON-MED stands out as the most robust method for the prediction of nutrients and carbonate system parameters in the Mediterranean Sea. The variables required are limited to systematically measured variables such as temperature, pressure, and salinity as well as high-quality O 2 measurements as inputs which are also widely measured (Bittig and Körtzinger, 2015). Thus, with the increased densification of high-quality O 2 measurements from BGC-Argo floats, as well as ocean gliders and moorings in the Mediterranean Sea (Testor et al., 2019;Tintoré et al., 2019;D'Ortenzio et al., 2020), CANYON-MED has a strong potential to support the development of new applications for marine biogeochemistry.
As is the case for all neural networks, the combination of the weights connecting the different hidden layers is not transparent, contrary to the weights obtained with linear regressions (Cortez and Embrechts, 2013). Nevertheless, a sensitivity analysis of the relative contribution showed no significant difference between the inputs of CANYON-MED, also indicating that no input parameter stands out over the others in an unrealistic manner. Thus, confirming the relevance of the chosen inputs and their balance as none is superfluous nor solely driving the neural network's outputs.
It should be underlined that other approaches allow for the prediction of nutrients and carbonate system variables with higher accuracies than ours. But these methods are often developed for the global ocean and might therefore not be as satisfactory in the semi-enclosed Mediterranean Sea. Furthermore, they often require a larger amount of inputs to predict a single variable. For example, Broullón et al. (2019) developed a neural network method called NNGv2 to derive A T from geolocation, depth, temperature, salinity, and O 2 as well as nitrate, phosphate and silicate concentrations. NNGv2 predicts A T, for the global ocean, with a RMSE of 5-6 µmol.kg −1 , about half the RMSE obtained when retrieving A T using CANYON-MED. However, it is highly likely that those results would worsen in the Mediterranean Sea as NNGv2 was trained using the GLODAPv2 database (Olsen et al., 2016), similarly to CANYON, CANYON-B, and CONTENT, in which the Mediterranean Sea is poorly described. Furthermore, NNGv2 requires more predictors than CANYON-MED. The additional variables [Si(OH) 4 , PO 3− 4 , NO − 3 ] are not systematically measured, which hinders the use of this method, especially with the long-term objective to be applied on BGC-Argo, which are only equipped with O 2 sensors  and references therein).

Example of Application: Mediterranean Deep Values
Using our 20% validation dataset, the averages of CANYON-MED outputs were calculated for each variable [i.e., NO − 3 , PO 3− 4 , Si(OH) 4 , A T , C T , and pH T ] from 1000 m depth to the bottom of the water column (4000 m) and averaged. The corresponding in situ measurements were averaged in the same way. The differences between in situ values and those predicted by CANYON-MED are presented in Figure 7. The choice of deep values relies on their seasonal stability.
Overall, the variables provided by CANYON-MED and the corresponding in situ measurements are in satisfactory agreement, i.e., the difference is close to zero. No spatial trend is observed in the differences between the in situ data and neural network's outputs indicating that CANYON-MED adequately predicts without bias values along the known oligotrophy and acidity gradients between the Eastern and Western basins (Krom et al., 1991;Flecha et al., 2015). However, a few outlier points stand out for each parameter. These larger differences may be due to seasonal imprints (the data presented in Figure 7 refer to the complete validation dataset, regardless of their date and time).
In the Algero-Provencal basin, a deep A T value stands out with a high difference ( Figure 7D). The same occurs for a few C T and pH T values in the Gulf of Lion (Figures 7E,F).
These extreme values could be explained by local phenomena also impacting temperature, salinity and/or O 2 , the neural network's inputs, consequently impacting the retrieved values. Indeed, these areas are known to be dynamic with an eddy-driven mesoscale circulation, where the anomalous biogeochemical profiles could originate from (Pessini et al., 2018). We therefore hypothesize that discrete data may not sufficiently reflect the vertical distribution of these variables in a dynamic region such as the Gulf of Lion, owing to the lack of adequate observational resolution. High differences can also stem from erroneous stations not detected by the quality controls but being highlighted in CANYON-MED's outputs, causing the differences to deviate from zero values.
Additionally, some points standing out as high differences (exceptionally low values) for C T (and A T ) in the Alboran Sea, Gulf of Lion and Algero-Provencal basin appear, after further investigation, to correspond to the first sampling campaign of the carbonate system in our database (i.e., in 1981). We, therefore, hypothesize that a difference in the quality of the measurements could explain some of the extreme deviations. Furthermore, for all variables, a larger variability can be found in the Alboran Sea than in the rest of the Mediterranean Sea. This variability can be a result of the influence of the Atlantic Ocean through the Gibraltar Strait and the strong associated mesoscale activity (Viúdez et al., 1998;Baldacci et al., 2001). Given the disparate spatial coverage of the training data available, we expected CANYON-MED to predict nutrients and carbonate system variables with higher errors in the Eastern Basin compared to the Western Basin. As shown by the statistics for each basin, gathered in Table 6, carbonate system parameters are indeed predicted with less accuracy in the Eastern Basin. As for nutrients, the difference between Eastern and Western Basin is less marked: a slightly lower MAE and RMSE for NO − 3 and PO 3− 4 in the East are linked to a lower r 2 . The difference between basins is mainly caused by the disparity in spatial and temporal coverage in our training database, stemming from a lack of cruises in the Eastern part of the Mediterranean Sea. CANYON-MED remains unsatisfactory in areas where training data are too scarce (e.g., South Ionian Sea, off the Libyan coasts) and results accuracy might be lowered during anomalous events (extreme meteorological conditions impacting physical variables, such as deep convection). It should also be recalled that CANYON-MED is not suited for coastal areas.

CONCLUSION AND PERSPECTIVES
We have demonstrated the limited performance of the CANYON and CANYON-B methods to retrieve nutrients and carbonate system variables in the Mediterranean Sea. A new approach, CANYON-MED, was subsequently created and trained using an artificial neural network ensemble model. The approach takes advantage of the accuracy of EOVs systematically measured today (Wang et al., 2019), whether during scientific cruises or by autonomous platforms. The model was built as an ensemble of 10 optimized multi-layered perceptron feed-forward neural networks. CANYON-MED inputs were in situ measurements of pressure, temperature, salinity, and O 2 as well as geolocation (latitude and longitude) and sampling date (day of year and year). The resulting ensemble model produces accurate estimates of nutrients and carbonate system variables [0.73, 0.045, and 0.70 µmol.kg −1 for NO − 3 , PO 3− 4 and Si(OH) 4 , respectively, and 0.016 units, 11 µmol.kg −1 and 10 µmol.kg −1 for pH T , A T , and C T , respectively]. With such accuracy, CANYON-MED can produce estimates of variables that are not currently measured by autonomous platforms, as is the case for PO 3− 4 , Si(OH) 4 , A T , and C T . CANYON-MED can also help identify periods and areas in the Mediterranean Sea where the data density remains too low in space and/or time, which limits the understanding of some processes and the assessment of long-term variability. Indeed, the spatial and temporal domains where the method provides the least satisfactory results are related to weaknesses in the training database that does not sufficiently capture variability in space and time. Furthermore, more dynamic regions such as deep convection zones and mesoscale eddies may be less well reproduced by CANYON-MED.
Ship-based sampling remains imperfect either because of limited ship-time or human resources or because of weather conditions that prevent sampling in specific areas or at certain times of the year such as winter. CANYON-MED has the capability to fill gaps in observations in a cost-effective way, for example by filling the gaps in time series (subject to the absence of exceptional events). It can also be applied to a large network of BGC-Argo profiling floats and underwater gliders equipped with CTD and oxygen sensors, thus increasing the flow of biogeochemical data from systematically measured basic variables. CANYON-MED can also contribute to the quality control of NO − 3 and pH T obtained from these autonomous platforms by providing data to correct for sensor drift during deployments and adjust deep values (e.g., Johnson et al., , 2015Sauzède et al., 2020). In line with this, multiple Eulerian moorings acquire and provide high-quality and high-frequency measurements of temperature, salinity, and O 2 over the water column at fixed locations in the Mediterranean Sea (e.g., HYDROCHANGES, EMSO, and OceanSITES networks). CANYON-MED can be applied to data collected at these sites, complementing the measured core variables, and generating high-frequency biogeochemical datasets, hence supplementing temporally limited oceanographic cruises.
Finally, since the accuracy of the virtual data obtained for NO − 3 by this approach is comparable to that obtained with autonomous platforms, their use in oceanography would be beneficial, in particular by increasing the datasets used for the assimilation of some regional models that still lack crucial reference data (Doney et al., 2009;Cossarini et al., 2019).

DATA AVAILABILITY STATEMENT
All datasets generated for this study are included in the article/supplementary material.

AUTHOR CONTRIBUTIONS
HC, LC, FD'O, and MF initiated the study and designed the neural network configurations with the help of RS. MF collected and QCed the dataset and trained CANYON-MED. The manuscript was drafted by MF and LC. MF ran simulations and created the plots. All authors contributed to analysis and discussion of results, commented on, and contributed to the improvement of several versions of the manuscript.