A Neural Network-Based Analysis of the Seasonal Variability of Surface Total Alkalinity on the East China Sea Shelf

Total alkalinity (AT) is an important variable in the regulation of the seawater carbonate chemistry system, determining the capacity to buffer changes in pH. In the coastal oceans, carbonate system dynamics are controlled by numerous processes such as land-derived inputs, biological activity, and coastal water dynamics, and seasonal alkalinity variations can play an important role in the regional carbon cycle. However, our understanding of these variations on the East China Sea (ECS) shelf remains poor due to limited observations. In order to estimate and investigate the seasonal variability of AT on the ECS shelf, an artificial neural network (ANN) model was developed using five cruise datasets from 2008 to 2018. The model used temperature, salinity, and dissolved oxygen to estimate AT with a root-mean-square error (RMSE) of ∼7 umol kg–1, and was applied to calculate AT for eight cruises during 2013–2016. In addition, monthly water column AT for the period 2000–2016 was obtained using temperature, salinity, and dissolved oxygen from the Changjiang Biology Finite-Volume Coastal Ocean Model (FVCOM) Data. Spatial distributions, seasonal cycles and correlations of surface AT indicated that the seasonal fluctuation of the Changjiang River discharge is the major factor affecting seasonal variation of surface AT on the ECS shelf. The largest seasonal fluctuations of surface AT were found on the inner shelf near the Changjiang Estuary, which is under the influence of the Changjiang River discharge.


INTRODUCTION
Despite occupying a small proportion of the global surface area, coastal seas play an important role in the global carbon cycle because they receive a large amount of terrestrial materials and nutrients from rivers, rapidly transform different forms of carbon, and exchange large fluxes with the open ocean and atmosphere (Gattuso et al., 1998). It has been suggested that coastal seas may contribute greatly to the absorption of atmospheric carbon dioxide (e.g., Borges et al., 2005;Cai et al., 2006), and are more sensitive to global climate changes and anthropogenic influences such as global warming, eutrophication and ocean acidification (e.g., Doney et al., 2007;Cai et al., 2011;Omar et al., 2019). However, the carbonate system in the coastal oceans can change in an unpredictable way under multiple environmental stressors, and observational datasets often lack carbonate chemistry measurements or include only one carbonate chemistry parameter, while at least two are needed to fully characterize the seawater carbonate system (Millero, 2007).
Several studies have attempted to develop multiple linear regression (MLR) relationships to predict total alkalinity (A T ) from more commonly observed variables such as temperature and salinity (e.g., Millero et al., 1998;Lee et al., 2006;Carter et al., 2016Carter et al., , 2018Fine et al., 2017). However, it has proved difficult to find such relationships that maintain accuracy over large scales. A new method of self-organizing multiple linear output (SOMLO) was developed by Sasse et al. (2013), and showed a 19% improvement in predictive accuracy for dissolved inorganic carbon compared to a traditional MLR approach. Superior predictors have also been obtained using self-organizing maps (Velo et al., 2013) or neural networks (e.g., Sauzède et al., 2017;Broullón et al., 2019). To date, however, relatively few studies have attempted to develop A T predictors specifically for coastal regions, perhaps because of the complexity and heterogeneity of the continental shelves. Alin et al. (2012) developed an MLR model for A T in the southern California Current System, while Gemayel et al. (2015) derived polynomial fits to estimate A T in the Mediterranean Sea. As discussed by Friis et al. (2003), simple linear regressions between salinity and A T may not be suitable for broader coastal ocean regions. Numerous processes in the coastal seas lead to the complexity of carbonate system dynamics, which means that each specific region may have different variation characteristics of A T in different seasons and separate, regional algorithms may be required (e.g., Juranek et al., 2009;Kim et al., 2010).
The East China Sea (ECS) is the largest marginal sea in the western North Pacific Ocean and receives massive terrestrial inputs from the Changjiang River (Gong et al., 1996). Hur et al. (1999) investigated the monthly water mass variations in the ECS using more than 40 years of historical data and a cluster analysis approach. In order to reveal the seasonal variations of major water masses in the ECS, Li et al. (2006) proposed a simple spiciness index and found that monthly variations of the water masses can be classified into three phases per year. Spatial and temporal distributions of carbonate system parameters have also been investigated in the ECS (e.g., Chou et al., 2009Chou et al., , 2013Qu et al., 2015Qu et al., , 2017, and were found to largely reflect the distributions of various water masses in the ECS. The pattern of carbon sources and sinks exhibits substantial seasonal variation (Guo et al., 2015), and the ECS is generally considered as a sink of atmospheric CO 2 throughout the year except in fall (e.g., Shim et al., 2007;Zhai and Dai, 2009). However, the seasonal variability of A T in the ECS has been very little studied, mainly due to the limited observational coverage. Developing methods to extend the seasonal coverage of A T data may thus help to improve our understanding of the ocean carbon cycle in the ECS.
Artificial neural networks (ANNs) have been proposed as powerful tools for modeling uncertain and complex systems such as ecosystems and for environmental assessment (e.g., Olden and Jackson, 2002;Olden et al., 2004;Uusitalo, 2007;Raitsos et al., 2008). Their main advantage compared with MLR models is that they do not require an a priori model but rather "learn" the model from training data (e.g., Hornik et al., 1989;Raitsos et al., 2008). ANNs have been used to retrieve the partial pressure of carbon dioxide (pCO 2 ) (e.g., Friedrich and Oschlies, 2009;Laruelle et al., 2017), A T (e.g., Bostock et al., 2013;Sasse et al., 2013;Velo et al., 2013), and dissolved inorganic carbon (e.g., Bostock et al., 2013;Sasse et al., 2013). To our knowledge, no empirical relationship for A T has yet been developed for the ECS shelf, likely due to the limited observations and the complex interaction of different water masses.
We developed an ANN to predict A T on the ECS shelf and used it to investigate seasonal variability. This paper is structured as follows: section "Materials and Methods" introduces the research region and cruise data used to build the ANN; section "Results and Discussion" shows the ANN model performance, variable importance in the ANN model, and two applications: to calculate surface A T for 8 cruises on the ECS shelf during 2013-2016 using in situ measured temperature, salinity and dissolved oxygen; to retrieve monthly A T for the period 2000-2016 on the ECS shelf using the monthly temperature, salinity, and dissolved oxygen from the Changjiang Biology Finite-Volume Coastal Ocean Model (FVCOM) Data. Conclusions and perspectives are summarized in the last section.

Study Area and Observations
The ECS is framed by the Ryukyu Island chain in the east (Japan), mainland China in the west, Taiwan in the south, and Cheju Island (Korea) in the north. The winter monsoon from the north lasts from September to April, while the summer monsoon from the south lasts from July to August (Lee and Chao, 2003). The Changjiang Diluted Water (CDW) spreads eastward in summer during the prevailing southwest monsoon, while it is confined to the western side of the shelf under the influence of the northeast monsoon (Chou et al., 2009(Chou et al., , 2013. The Taiwan Warm Current (TWC) flows into the ECS through the Taiwan Strait, the Kuroshio Current (KC) flows northeast along the shelf break (e.g., Lee and Chao, 2003;Chou et al., 2009), and the Yellow Sea Coastal Current (YSCC) enters the northern part of the ECS under the influence of the northeast monsoon (Gong et al., 1996).
Four cruises were conducted in the ECS from 2017 to 2018. Three cruises were carried out during the "National Natural Science Foundation Shared Voyage Plan, " from 10 to 19 March 2018, 12-20 July 2018, 12-21 October 2018; the remaining cruise was carried out during "Vulnerabilities and Opportunities of the Coastal Ocean" on the ECS shelf during 12-24 May 2017. Water samples were collected at three or four different depths during all cruises. One additional cruise dataset from 2 to 9 January 2008 in the ECS has been reported previously by Chou et al. (2011) and was downloaded from the Carbon Dioxide Information Analysis Center 1 . Temperature (T) and salinity (S) profiles were obtained directly using a conductivity temperature-depth/pressure (CTD) recorders (SBE 25plus or 911plus). Measurement of dissolved oxygen (DO) followed the Winkler procedure, as described previously by Zhai et al. (2014b). A T samples were potentiometrically titrated with standardized 0.1 M HCl (0.7 M in NaCl) to the carbonic acid end point using a VINDTA 3C system, as described by Mintrop et al. (2000). Certificated Reference Materials (CRMs) were used to determine a precision of ± 2 µmol kg −1 (Dickson et al., 2007). The final number of data used by the ANN model was 699, and the distribution of the sampling sites from the five cruises is shown in Figure 1.

Artificial Neural Network Development
Similar to the input variables selected by Bostock et al. (2013), we selected T, S, and DO as predictors into the ANN model. The input variables also included the sampling position (longitude and latitude) and sampling time (month). The sampling position and time were included to help the network to learn spatiotemporal patterns that cannot be explained by other input variables (Sasse et al., 2013). The ANN we used is a feed-forward multilayer perceptron (Tamura and Tateishi, 1997) with two hidden layers. The neurons of each layer are connected with the neurons of the previous layer and the next layer by weights (Figure 2). The coefficients of the weight matrix are iteratively tuned in the training step. Here we used the back-propagation conjugate-gradient technique (Hornik et al., 1989). In order to avoid overfitting, a ten-fold cross-validation was used to assess model prediction accuracy. In this technique, all cruises data was randomly divided into ten equal subsamples. One subsample was used as the independent validation data (10% of all data), which was always excluded from training, and the nine remaining FIGURE 2 | Schematic representation of the neural network algorithm to retrieve total alkalinity. Input variables are observed temperature, salinity, and dissolved oxygen together with the geolocation (longitude and latitude) and time (month) of sampling.
subsamples were together used as training data (90% of all data). Within the training data, the data was further divided randomly into a training set (70% of training data), validation set (15% of training data), and testing set (15% of training data). We compared performance in predicting the independent validation data from the ten-fold cross-validation and selected the optimal model based on the lowest root mean square error. All calculations were done in the MathWorks Matlab environment.
There is no fixed criterion to set up the optimal number of neurons in the two hidden layers, which was tested varying between 1 and 30, respectively ( Table 1). The optimal architecture Three statistics are the coefficient of determination (R 2 ), the root mean squared error (RMSE), and the mean absolute error (MAE).
was composed of two hidden layers with twenty neurons in the first and twelve neurons in the second. In order to avoid bias toward high-value inputs/outputs and to eliminate the dimensional influence of the data, all data used by the ANN model were normalized using the following equation (e.g., Sauzède et al., 2015Sauzède et al., , 2016: with σ the standard deviation of the considered input variable or the output variable A T . Similar to the approach of Sauzède et al. (2015Sauzède et al. ( , 2016, the longitude and month input variables were transformed as follows to account for periodicity: The latitude variable was transformed into the range of the sigmoid function (Sauzède et al., 2015) by divided by 90, then was processed using Equation (1).

The ANN Model Performance
To evaluate the performance of the ANN model, we compared the model retrieved A T (A TM ) with corresponding observations (A O T ) using several statistical indices: the mean absolute error (MAE), the coefficient of determination (R 2 ), and the root mean squared error. The model simulated A T with RMSE = 7.4 µmol kg −1 and R 2 = 0.96 for the training data (90% of all data, Figure 3A), and predicted A T with RMSE = 6.7 µmol kg −1 and R 2 = 0.95 for the independent validation data (10% of all data, Figure 3B). The normal distribution of the differences (A TM -A O T ) shows that only a few points exceed ±2RMSE (Supplementary Figure S1), and 52% of our model determinations are within the normal accuracy for A T measurements (internationally) ± 4 µmol kg −1 . Supplementary Figure S4 shows the performance of model extrapolation for longitude and month.
In order to further explore where the ANN model result in differences beyond ±2RMSE, we plotted the distribution of the differences larger than ±2RMSE against longitude and latitude (Figure 4). These points are concentrated in an area strongly influenced by Changjiang River runoff, Yellow Sea Coastal Current (YSCC) and shelf seawater, and the wet season (May and   Figure S2). The reduced performance of the ANN model can be primarily attributed to the sudden increase in the Changjiang River discharge and appearance of seawater vertical stratification during the wet season. During this special period, large amounts of nutrients inputs from the Changjiang River can stimulate primary production, seawater vertical stratification can hinder material exchange in the water column, and massive freshwater input can suddenly reduce salinity, all of which poses a challenge for empirical modeling.  Lee et al. (2006) with corresponding surface observations. The 1:1 line is shown in each plot as visual reference. N represents the number of surface data points. Three statistics are the mean absolute error (MAE), the root mean squared error (RMSE), and the coefficient of determination (R 2 ).

July), during which Changjiang River is in flood (Supplementary
Although the RMSE of 7.4 µmol kg −1 for A T we obtained here was higher than the 6.4 µmol kg −1 obtained by Alin et al. (2012), it was lower than obtained in other previous studies. For example, Evans et al. (2013) derived a MLR to estimate A T with RMSE of 9 µmol kg −1 in the northern Gulf of Alaska, Gemayel et al. (2015) presented polynomial fits to predict A T with RMSE of 10.6 µmol kg −1 in the Mediterranean Sea. In addition, an empirical relationship between A T and S was established for all seasons with the residual of 17 µmol kg −1 in the Washington State Coastal Zone (Fassbender et al., 2017). Furthermore, relationships between T and S with A T by Lee et al. (2006) were applied to compute surface A T with RMSE of 17.6 µmol kg −1 (Figure 5), which suggests that this relationship fails to compute A T in this shallow sea with the high river runoff and the ANN model is a better approach than Lee et al. (2006) on the ECS shelf.

Variable Importance in the ANN Model
To quantitatively estimate input variables that affect A T in the ANN model, we used the following method: for each input variable separately, add 5% and calculate the resulting percentage change in the predicted A T . The A T is positively correlated with salinity and longitude, and negatively correlated with temperature (Figure 6). The two variables with the greatest weight are salinity and longitude, and the weights of other variables are small and can almost be ignored when compared with salinity and longitude. The significant positive correlation between A T and salinity was also found by Zhai et al. (2014a). The positive correlation between A T and longitude reflects the distribution pattern of A T in space, which is similar to salinity and generally increasing eastward from the China coastline to the shelf break (e.g., Chou et al., 2013;Qu et al., 2017).

Model Applications
In order to retrieve A T on the ECS shelf, the monthly T, S, and DO from the Changjiang Biology Finite-Volume Coastal Ocean Model (FVCOM) Data 2 were applied to the ANN model as the input variables. The performance of monthly T, S, and DO from the Changjiang Biology FVCOM model was shown in Supplementary Figure S3. Monthly A T for the period 2000-2016 was obtained at the spatial resolution of the FVCOM output: 1-10 km in the horizontal, 10 depth levels in the vertical, and 12 months. Also, since A T was not measured during 8 cruises from 2013 to 2016 on the ECS shelf, the surface A T was retrieved through the ANN model using in situ measured T, S, and DO.

Surface Total Alkalinity Retrieved From Cruise Observations
The distributions of retrieved A T in winter and summer from 2013 to 2016 are shown in Figure 7. The distribution Frontiers in Marine Science | www.frontiersin.org FIGURE 9 | Comparison of monthly average surface total alkalinity on the ECS shelf [(122-124 • E;28.5-32.5 • N)]. Blue solid line represents retrieved A T using Changjiang Biology FVCOM Data; black dotted line represents retrieved A T using Changjiang Biology FVCOM Data ±2RMSE; red point represents mearsured A T from published papers (e.g., Chou et al., 2009Chou et al., , 2011Zhai et al., 2014a;Qu et al., 2015Qu et al., , 2017; green point represents retrieved A T using in situ data from 2013 to 2016. characteristics of A T we calculated in 2013-2016 are consistent with that of A T previously published in other years during summer and winter (e.g., Chou et al., 2009Chou et al., , 2011Qu et al., 2015Qu et al., , 2017. In winter, high A T is found in the north of the study area, related with the YSCC, while low A T is confined to a narrow coastal region (water depth <50 m), controlled by the prevailing northeast monsoon. In summer, high A T is found in the eastern and southeastern parts of the study area, related with the intrusion of the TWC and the KC, while low A T is confined mainly to the western and northwestern parts of the study area, influenced by the CDW and the southwest monsoon.

Surface Total Alkalinity Retrieved From FVCOM Output
The temporal and spatial variations of monthly surface A T from 2000 to 2016 based on FVCOM output are shown in Figure 8. During the dry season (November to April of the next year), A T values vary within a relatively narrow range, from ∼2130 to ∼2290 µmol kg −1 , water of lower A T is confined to the coast of mainland China (water depth <50 m), whereas waters of higher A T are found in the north and southeast of the study area. Generally, the surface distributions of A T corresponded well to the winter circulation pattern, which is modulated by the northeast winds lasting from September to April. Water with higher A T in the north of the study area is strongly influenced by YSCC, which is characterized by relatively low temperature (Gong et al., 1996). Higher A T water in the southeast of the study area (50-100 m water depth) reflects the intrusion of the TWC and Kuroshio Branch Current (Luo et al., 2015), which is characterized by high salinity. The narrow band of water with the lower A T values is indicative of CDW, which is confined to the western side of the shelf by the prevailing northeast monsoon and identified by low surface salinity in autumn and winter (Chou et al., 2013).
During the wet season (May to October), A T values show a wide range, from ∼2000 to ∼2270 µmol kg −1 , water of lower A T is confined mainly to the northwestern part of the study area, near Changjiang Estuary, whereas water of higher A T is found in the southeastern part of the study area. Concentrations generally increase moving eastward from the coast to the shelf break, and strongly reflect the summer circulation pattern. Water with low A T in the northwestern part of the study area is indicative of CDW, spreading eastwards under the influence of the southwest monsoon and characterized by low salinity during the wet season. Water with higher A T in the southeastern of the study area is strongly influenced by the TWC, which flows into the ECS shelf from the Taiwan Strait.
To assess the approach of combining the ANN model with the Changjiang Biology FVCOM Data to estimate A T on the ECS shelf, we compared retrieved A T using the Changjiang Biology FVCOM Data with retrieved A T using in situ measured T, S, and DO, and also published A T values (Table 2 and Figure 9). Overall the agreement is good here and supports the reliability of the ANN model on the ECS shelf.

Seasonality of Total Alkalinity Retrieved From FVCOM Output
Seasonal cycles of surface, middle and bottom layer T and S from FVCOM Data and ANN-derived A T were calculated for the ECS shelf from 2000 to 2016 (Figure 10). The cycles of A T and S in the middle and bottom layers are consistent (Figures 10B,C), gradually decreasing from March to September then slowly increasing from September to December, after reaching minimum values in September. This reflects the strong positive correlations between A T and S in the middle and bottom layer. In the surface layer, seasonal salinity variations strongly reflect freshening due to Changjiang River discharge (Supplementary Figure S2). However, the seasonal cycle of retrieved A T is lagged by 2 months relative to the salinity cycle, reaching its minimum in September rather than July ( Figure 10C). It seems strongly weighted by the period between July and October, when no data were available to  train the neuronal network. In order to get more accurate results, training cruises that cover well enough the seasonal cycle are needed. The surface, middle and bottom A T displays its maximum in January and minimum in September, and the A T values vary seasonally by up to ∼112 µmol kg −1 in the surface layer, up to ∼78 µmol kg −1 in the middle layer, and up to ∼66 µmol kg −1 in the bottom layer. This is an order of magnitude higher than the open ocean A T variation estimated by Lee et al. (2006).

Correlations and Seasonal Amplitudes of Surface Total Alkalinity and Salinity Cycles
The retrieved surface A T distribution appears to reflect mixing between different water masses during the dry and wet seasons. During the dry season (November to April of the next year), S and A T values vary within a relatively narrow range from 21 to 34 and from 2130 to 2290 µmol kg −1 , respectively, while during the wet season (May to October), S and A T values vary within a relatively wide range from 15 to 34 and from 2000 to 2270 µmol kg −1 , respectively. To further understand the correlations between surface A T and S, monthly A T -S diagrams (Figure 11) were created.
The study region is mainly influenced by three water masses: the Yellow Sea Coastal Water (YSCW), the CDW, and the Taiwan Strait Warm Water (TSWW). YSCW flows into the northern part of the study area under the influence of coastal current ( Figure 1B) and is indicated by high A T (Figure 11), while the CDW spreads eastward during the prevailing southwest monsoon (Figure 1B), characterized by the lowest S and A T (Figure 11). The remaining TSWW flows into the ECS through the Taiwan Strait (Figure 1B), characterized by relatively high S and A T . Monthly linear slopes and intercepts between A T and S were fitted by Matlab cftool (R2013b) using the robust least-squares fitting method (Table 3). There are seasonally distinct slopes and the intercepts, with smaller slopes from 3.46 to 9.18 and higher intercepts from 1964 to 2126 µmol kg −1 during the dry season and greater slopes from 10.22 to 14.47 and lower intercepts from 1749 to 1913 µmol kg −1 during the wet season. This difference may be mainly attributed to strong YSCW and weak CDW during the dry season and strong CDW during the wet season.
The magnitude of the seasonal variability of surface A T , computed by differencing the maximum and minimum monthly mean A T values in each grid point, has a spatial pattern that is similar though not identical to that of the magnitude of seasonal salinity variability (Figure 12). The largest seasonal fluctuations of surface A T and salinity are found on the inner shelf near the Changjiang Estuary, which is under the influence of the Changjiang River discharge. In contrast, A T in the southeastern part of the study area exhibits a very weak seasonality.

CONCLUSION AND PERSPECTIVES
We have developed an ANN model, and used it to calculate surface A T for eight cruises during 2013-2016, and to retrieve monthly A T for the period 2000-2016 on the East China Sea shelf. The two most important predictor variables were salinity and longitude, and seasonal variations in retrieved A T could be mainly attributed to the seasonal cycle of the Changjiang River discharge on the East China Sea shelf.
The model has several potential applications. For example, it can provide estimates of seawater A T with known accuracies for the East China Sea shelf. Within this region the model could be used as a cost-effective way to overcome restrictions of limited marine observations conducted from ships, such as coarse resolution and under-sampling of carbonate system variables, and may be a valuable tool for understanding the seasonal variation of A T in poorly observed regions. This approach can also be applied to other regions to estimate A T by suitably adapting the input variables and network structure. In order to get more accurate seasonal trend, training cruises that cover well enough the seasonal cycle are needed.

DATA AVAILABILITY STATEMENT
Matlab code of the ANN model for A T estimation and five cruises data used from 2008 to 2018 are available http: //doi.org/10.5281/zenodo.3491486, including one cruise data during 2008 downloaded from https://www.nodc.noaa.gov/ ocads/oceans/RepeatSections/clivar_ORI_885.html.
Requests to access the raw data should be directed to RB: Richard.Bellerby@niva.no.
The input variables from the Changjiang Biology Finite-Volume Coastal Ocean Model (FVCOM) Data were first downloaded from http://47.101.49.44/wms/demo, then monthly input variables (T, S, DO) and retrieved total alkalinity from 2000 to 2016 and retrieved surface total alkalinity of eight cruises from 2013 to 2016 on the East China Sea shelf are available: http://doi.org/10.5281/zenodo.3406551.
Distributions and seasonal amplitudes of surface total alkalinity retrieved from the Changjiang Biology FVCOM Data and Correlations between surface total alkalinity and salinity are available: http://doi.org/10.5281/zenodo.3491996.
Seasonal cycles of surface, middle, and bottom total alkalinity retrieved from the Changjiang Biology FVCOM Data on the ECS shelf from 2000 to 2016 are available: http://doi.org/10.5281/ zenodo.3491998.
Comparison between retrieved A T using the Changjiang Biology FVCOM Data and retrieved A T using in situ measured T, S, and DO, and also published A T values are available: http: //doi.org/10.5281/zenodo.3492004.

ACKNOWLEDGMENTS
We deeply thank the people who worked on the cruises and in the laboratory and Chou et al. (2011) providing one cruise data from 2 to 9 January 2008 in the ECS.