A First Global Oceanic Compilation of Observational Dissolved Aluminum Data With Regional Statistical Data Treatment

Large national and international observational efforts over recent decades have provided extensive and invaluable datasets of a range of ocean variables. Compiled large datasets, structured, or unstructured, are a powerful tool that allow scientists to access and synthesize data collected over large spatial and temporal scales. The data treatment approaches for any element in the ocean could lead to new global perspectives of their distribution patterns and to a better understanding of large-scale oceanic processes and their impact on other biogeochemical cycles, which may not be evident otherwise. Ocean chemistry Big Data analysis may not just be limited to distribution patterns, but may be used to assess how sampling efforts and analytical methodologies can be improved. Furthermore, a systematic global scale assessment of data is important to evaluate the gaps in knowledge and to provide avenues for future research. In this context, here we provide an extensive compilation of oceanic aluminum (Al) concentration data from global ocean basins, including data available in the GEOTRACES Intermediate Data product (Schlitzer et al., 2018), but also thus far unpublished data.


INTRODUCTION
Large national and international observational efforts over recent decades have provided extensive and invaluable datasets of a range of ocean variables. Compiled large datasets, structured, or unstructured, are a powerful tool that allow scientists to access and synthesize data collected over large spatial and temporal scales. The data treatment approaches for any element in the ocean could lead to new global perspectives of their distribution patterns and to a better understanding of large-scale oceanic processes and their impact on other biogeochemical cycles, which may not be evident otherwise. Ocean chemistry Big Data analysis may not just be limited to distribution patterns, but may be used to assess how sampling efforts and analytical methodologies can be improved. Furthermore, a systematic global scale assessment of data is important to evaluate the gaps in knowledge and to provide avenues for future research. In this context, here we provide an extensive compilation of oceanic aluminum (Al) concentration data from global ocean basins, including data available in the GEOTRACES Intermediate Data product (Schlitzer et al., 2018), but also thus far unpublished data.
Aluminum is the third most abundant element in the continental crust (Wedepohl, 1995), but is only found in trace amounts in seawater. Aluminum is supplied to the oceans through atmospheric deposition, hydrothermal venting, sediment resuspension, diffusion from sediment pore waters, and continental drainage (e.g., fluvial). The element is a widely used tracer of atmospheric deposition to the surface ocean (Measures and Brown, 1996;Measures and Vink, 2000;Grand et al., 2015a;Menzel Barraqueta et al., 2019), and is also used as a tracer of water masses and input sources (Measures and Edmond, 1990;Brown et al., 2010;Resing et al., 2015;Menzel Barraqueta et al., 2018). Dissolved Al (dAl) is particle reactive, and displays a scavengedtype vertical profile in many ocean regions (Bruland et al., 2013). However, recent dissolved Al data reveal that this element also displays a recycled or reversible scavenging type distribution in certain ocean regions (Van Hulten et al., 2013;Rolison et al., 2015;Menzel Barraqueta et al., 2018). Since the first publication on Al concentrations in seawater [ (Hydes, 1979); sampling year 1973], sampling and analytical techniques [review in (Tria et al., 2007)] have improved, contamination risks have been reduced, and analytical precision, accuracy and detection limit improved. For example, the introduction of a solid-phase extraction method (Resing and Measures, 1994) with further optimizations (Brown and Bruland, 2008) have yielded detection limits at the sub-nanomolar level (e.g., 0.1 nM) which are lower than previous values (e.g., 1.85 nM, Hydes and Liss, 1976). This may indicate lower Al concentrations measured in recent years (see section Data Variability Over Time). Dissolved Al is defined as the Al concentration in the filtered fraction (filter cut-off 0.2-0.45 µm) of a solution. Studies have shown that only a small percentage of dAl is present in the colloidal phase in open ocean waters (Dammshäuser and Croot, 2012) and that Al concentrations in unfiltered and unacidified samples measured onboard shortly after collection are consistent with those from filtered samples as the dissolution of particulate Al is negligible at the pH of natural seawater (Measures et al., 1986). As a result of the development of intercalibration exercises [e.g., SAFE, (Johnson et al., 2007)] and best sampling practices and protocols (e.g., GEOTRACES Cookbook, www.geotraces. org) a recommendation on filtration protocols for all trace metals has been adopted in order to ensure data comparability among different groups. In this manuscript we focus on dAl and make the simple assumption that the strong vertical and basin-scale gradients in dAl concentrations within the ocean are larger than the systematic differences between unfiltered (unacidified and measured shortly after collection) and filtered samples.
The aim of this manuscript is to provide the scientific community with a global oceanic dAl database containing historical and recent data. We provide general global statistics with a focus on spatial, temporal and seasonal dAl variability. There is no intention to create a fully calibrated database, since this has only become possible in recent years due to the efforts made by the GEOTRACES programme (Frank et al., 2003) that aims to provide the scientific community with intercalibrated datasets in the form of intermediate data products (Mawji et al., 2015;Schlitzer et al., 2018). Nevertheless, having a readily accessible database, even if not fully intercalibrated, is fundamental for research and comparison purposes. We note that the amount of intercalibrated dAl data will steadily increase in the coming years as Al is a key parameter of the GEOTRACES programme.

DESCRIPTION OF THE DATABASE
The global oceanic and coastal dAl database file (Oceanic_Al_Menzel.xlsx) compiled here is accessible via the following link (https://doi.org/10.25413/sun.12167877.v1). The database forms an extension of the data files gathered by Gehlen et al. (2003) and Van Hulten et al. (2013), who collated Al data while applying internally chosen quality standards for specific, purpose driven, modeling efforts. We note another modeling manuscript (Han et al., 2008), that also compiled Al oceanic data; however, none of these manuscripts provide publicly available Al data. The Al data compiled here includes a considerable amount of unpublished data (ca. 20%, Table 1) which will be useful for the community and may foster further scientific discussion. Unpublished data may not be reproduced without prior permission from the respective data originator, whose details are found in the database file (fourth tab). We compiled data reported in published papers, data reported in publicly accessible data-bases and additionally data was obtained directly from researchers if quantitative Al data was not reported in a publication (a list of Al manuscripts whose data is not included can be found at the end of Table 1). Despite our best efforts to retrieve all available Al data by searching the literature and approaching researchers we cannot assert that all historical data is included in the data file (apart from the previous mentioned manuscripts). Thus, we encourage researchers to submit generated Al data (to JLMB) to keep updating the Al database.
The dataset presented here includes (1) open ocean Al data for all major oceans (Atlantic, Arctic, Indian, Pacific, and Southern Oceans), (2) semi-and enclosed basins such as the Mediterranean Sea, Black Sea, and Arabian Sea and (3) regions where Al data was collected in proximity to major sources such as hydrothermal vents, shelf regions and river outflows. In some instances, a dataset includes repeated sections such as the A16N cruise which was sampled in 2003 and 2013 (Measures et al., 2008a;Barrett et al., 2015) thus providing temporally different Al concentrations for the same coordinates. Data from process studies, which, for instance, sample the same region over a determined period of time have also been included. We have also included one study which was undertaken at the LOREX site in the Arctic Ocean while the ship was drifting (Moore, 1981). In the latter case we chose a point halfway between the start and the end of sampling as station coordinates. From the dataset, we have excluded Al in brackish to freshwater environments (< salinity 25). We acknowledge that the compilation of Al concentration data would be greatly improved by accompanying ancillary parameters such as salinity, temperature, and nutrients. This could help to better understand key processes occurring within the ocean and how these controls the distribution of Al therein.
The time span covered by the database ranges from 1973 to 2017. Table 1 lists all the publications included in the database. Following publication, the file includes, if given, the following additional information for each dataset: station number, sampling year, vessel, location, analytical method, reference material, filtration, and publication reference. The file can be easily reorganized alphabetically or numerically under the following terms and options: Location (Atlantic, Arctic, Indian, Pacific, and Southern oceans, and Arabian, Black and Mediterranean Sea), reference material (Yes & No), filtration (Yes & No), sampling year etc. The dAl concentration is given in nanomolar (nM) units. In addition, a third tab records the changes made to the database following publication (e.g., in case new data are submitted).

DATABASE
The data file contains 24,194 dAl data points and are presented in an alphabetical order in reference to the data originator. Figure 1 shows graphical representations of the dAl concentration data included in the database. Below we provide three different sets of information; (i) general sampling efforts; (ii) global   Data not included in the database (reference). Kaupp et al. (2011) (Surface data in the database); Li et al. (2013), Liu et al. (2017), Measures et al. (1986), Measures et al. (2005), Measures and Edmond (1992), Ren et al. (2006), Ren et al. (2011. Month refers to the month of sampling being 1 and 12 January and December, respectively. Information not available in the manuscript is noted with an interrupted dash line. Yes and No are denoted as Y and N, respectively. Region sampled refers to the location were the majority of sampling took place. The non-differentiation between northern and southern regions in the Pacific Ocean and Indian oceans arises from a lack of dAl data. The division of the Atlantic Ocean tries to capture the large differences in dAl concentrations between the northern, southern and tropical regions.

Glimpse Into Sampling Efforts
The Al dataset shows an uneven coverage in sampling between different regions, different seasons, and depths in the water column. Geographically, half of the data points correspond to the Atlantic Ocean (48%), followed by the Pacific and Indian Oceans (18 and 11%, respectively). The Mediterranean Sea and the Southern Ocean account for 6 and 7% of the total dAl data, while the Arctic Ocean represents just 3% of the total data. From a historical point of view, a sharp rise in total seawater dAl concentration data is observed on a decadal timescale with 69% of the total data collected after the year 2000 (Figure 1). The increase in basin scale knowledge of dAl biogeochemistry arises from the GEOTRACES programme, and other recent smaller scale research programmes. For example, 83 and 50% of dAl concentration data for the Mediterranean Sea and Atlantic Ocean were collected in the last 7 years of the database (2010 to 2017). In contrast, the Arctic and Southern Oceans have not seen any new data during the same period (2010 to 2017). During the last decade (2000 to 2009) and thanks to the International Polar Year cruises the number of samples collected in the two previously mentioned ocean basins increased by 91 and 68%, respectively. Nearly 71% of the expeditions have taken place in the northern hemisphere which was thus sampled more often than the southern hemisphere (63 vs. 37%). From a seasonal point of view, with 61 and 64% of the total expeditions for the northern and southern hemisphere, respectively, most of the sampling was conducted during spring and summer months ( Table 1). However, nearly no data exists in austral winter for the Southern Ocean. Within the water column, 21% of the sampling points correspond to the first 20 m, 43% to the upper 100 m, 26% to a depth between 100 and 500 m, and 33% to a depth below 500 m. A special case is seen in the Indian Ocean where observations below 1,000 m are limited to only 372 samples. Table 2 shows global mean, median, minimum and maximum dAl concentration data for the different oceanic regions. Globally, the mean and median oceanic concentrations of dAl are 15.4 ± 25.9 and 6.8 nM (n = 24,194), respectively. Maximum and minimum Al concentrations were 674 nM (Arabian Sea) and 0.05 nM (Southern Ocean), respectively. It is noticeable that there is a large amount of data (n = 590) with concentrations over 100 nM which correspond to regions that are strongly influenced by Al sources (e.g., atmospheric, hydrothermal, riverine, and sedimentary). The mean dAl concentration includes data points (2.4%) directly affected by sources of Al, therefore the median value may represent a better estimate. In the following, samples with concentrations above 300 nM are not considered. Highest mean and median concentrations are found in the Mediterranean Sea (91.1 ± 42.6 and 85.3 nM) followed by the Arabian Sea (16.7 ± 37 and 7.8) and the Atlantic Ocean (16.3 ± 12.5 and 13.9 nM). Lowest mean and median Al concentration data are found in the Southern Ocean (1.7 ± 20.6 and 0.5 nM) followed by the Pacific (3.1 ± 3.1 and 2.3 nM), Indian (4.1 ± 3.7 and 3.3), and Arctic (8.4 ± 7.9 and 4.9 nM) Oceans. In surface waters (0-100 m) highest mean Al concentrations are observed in the Mediterranean Sea (54.9 ± 22.9 nM, n = 375) followed by the Arabian Sea (20 ± 45.3, n=970), and the Atlantic (19.8 ± 15.8 nM, n = 4,747), Indian (5.5 ± 5.5 nM, n = 882), Pacific (3.3 ± 3.5 nM, n = 2,118), Arctic (2.7 ± 3.4 nM, n = 234) and Southern (0.8 ± 0.9 nM, n = 714) Oceans. In intermediate waters (500-2,000 m) highest mean Al concentrations are observed in the Mediterranean Sea (127 ± 29 nM, n = 321) followed by the Arabian Sea (25.7 ± 12.6, n = 58), and the Atlantic (13.1 ± 8 nM, n = 2,673), Arctic (6 ± 2.3, n = 76), Indian (3.1 ± 1 nM, n = 600), Pacific (2 ± 1.7 nM, n = 678), and Southern (0.7 ± 0.7 nM, n = 273) Oceans. In deep waters (2,000 m to bottom) the same trend is noticeable with highest mean Al concentrations observed in the Mediterranean Sea (118 ± 53.3 nM, n = 236) followed by the Arabian Sea (28.8 ± 7.7, n = 7), and the Atlantic (17.6 ± 8.7 nM, n = 1,790), Arctic (12.3 ± 4.4 nM, n = 106), Pacific (3.1 ± 2.6 nM, n = 674), Indian (2.5 ± 0.9 nM, n = 148), and Southern (0.9 ± 0.6 nM, n = 196) Oceans. To provide a more useful surface water Al estimate we extracted the depth of the surface mixed layer (SML) for the regions of interest from a monthly mixed layer climatology calculated using the density threshold method (density criterion of 0.03 kg m −3 , Holte et al., 2017. In the SML highest mean Al concentrations are observed in the Mediterranean Sea (55.3 ± 43) followed by the Arabian Sea (23.3 ± 37.3), and the Atlantic (21.6 ± 16.1 nM), Indian (6.2 ± 6.3 nM), Pacific (3.2 ± 3.4 nM), Arctic (3 ± 7.9), and Southern (0.8 ± 0.9 nM) Oceans.

"Representative" Al Depth Profiles
With the aim to provide representative dAl depth profiles for major oceanic regions we have further constrained the available Al data. To avoid profiles affected by continental inputs, we have selected only off-shelf samples, which we define as being located at least 50 nautical miles offshore. Large intra and inter-basin differences in the shape of mean Al depth profiles are evident (Figure 1) with varying dAl distributions displaying either a nutrient, scavenged, or mixed-type profile. This reveals that Al is a highly dynamic metal with different responses to different oceanographic settings and therefore is a useful indicator for processes (e.g., input and removal of metals) occurring in the marine environment.

Data Variability Over Time
Whether dAl data variability over time can be attributed to improvements in sampling and analytical techniques is difficult to assess and relies on large assumptions that may not be correct (e.g., comparable seasonal and geographical coverage in sampling). The choice of a cut-off year between historical and recent observations is complicated. In this attempt to assess data variability over time we have chosen the year 2002  = 3). The depth interval statistics shows average and standard deviation. The mid and deep depth interval for the Arctic is 500-1,000 and 1,000-2,000 m, respectively. Hypothesis 0 (H 0 ) refers to no statistically significant difference between samples collected prior and after the year 2000. MLD refers to Mixed Layer Depth. The MLD depth has been extracted for the regions of interest from the monthly mixed layer climatology calculated using the density threshold method (Holte et al., 2017).
Frontiers in Marine Science | www.frontiersin.org which coincided with the advent of large-scale clean seawater sampling capabilities (De Baar et al., 2008;Measures et al., 2008b). We have investigated the Atlantic (including the three Atlantic regions defined in the above section), Indian (excluding the Arabian Sea), Pacific, and Southern Oceans as well as the Mediterranean Sea and include analysis for all data as well as for values below 500 m depth (to reduce the influence of seasonal variability in an element with short residence times). A graphical representation on the location of data for both analysis is shown in the Supplementary Figures 1, 2.
We acknowledge that no differentiation is made between analytical methods and that some datasets may be directly influenced by Al sources. However, we assume improvements in analytical and sampling techniques in recent decades and we consider that both datasets (before the year 2002 and after) will contain regions where samples directly influenced by strong Al sources were collected. To assess if Al data have varied over time for the same oceanic regions, we applied the two tailed non-parametric Mann-Whitney test (including outliers) which shows the probability that a randomly chosen group (e.g., A or B) is statistically different to the other group. The test rejected the H 0 hypothesis (A = B) at a significant level α < 0.001 for all analysis except for the Mediterranean Sea in both analysis (all data and below 500 m, Table 2). An exception is the South Atlantic Ocean where H 0 is accepted when considering data below 500 m. Although it is noticeable that Al data below 500 meters for the South Atlantic Ocean before the year 2002 are restricted to only two stations and 22 data points (Supplementary Figure 2 and Table 2). This indicates that the distribution of independent observations from both groups (prior to 2002 and onwards) are significantly different for all major oceans except for the Mediterranean Sea and the South Atlantic Ocean when considering only data below 500 m. The reason for no significant difference for the Mediterranean Sea could be to the high concentrations found in this region ( Table 2), making any influence of improvements in analytical and sampling techniques for trace constituents negligible. Although an effect of improved analytical and sampling techniques seems likely based on the observed difference between historical and recent observations, this observation is by no means conclusive evidence. Therefore, the influence of spatial coverage needs to be considered. Ideally, in order to provide better evidence relating lower Al concentrations with improvement in sampling and analytical techniques we would require repeated sampling in the same region over long periods of time.

SUMMARY AND PROSPECTS
We have compiled 24,194 dAl concentration data points for the global ocean spanning a sampling period from 1973 to 2017. We further provided global dAl concentration statistics for each of the major oceanic, enclosed, and semi-enclosed basins and provided global average depth profiles for representative areas within the major regions. The compilation shows a large variability in dAl concentration data which highlights the varying magnitude of input and removal rates. It also highlights our limited knowledge on how dAl varies between seasons and the relative lack of observations in regions such as the Pacific and Indian Oceans. This opens interesting research questions to plan future research expeditions. For example, an interesting region for research is the western Indian Ocean. Here the role of the southwards flowing western boundary currents as a carrier of Al (and other trace metals) to the Southwest Indian subtropical gyre via the Mozambique Channel has been suggested (Grand et al., 2015a) but requires additional studies to highlight the extent of this supply and the influence that Mozambique eddies embedded in the Agulhas Current could have further south. This area would also produce new knowledge on the continent-island influence on trace metal distributions in a highly dynamic area. Another interesting area is the western Pacific along the American continent where the influence of upwelling on the distribution of Al remains less well-constrained than, for example, for the upwelling regions along the western African continent. We believe that the Al compilation will be a useful tool for modeling purposes and will help to better understand the global biogeochemical cycle of Al. Combined with global data products on mixed layer depths the Al data compilation can be used to revise residence times of Al in surface waters as well as updating the global oceanic Al inventory. Furthermore, an additional combination with global chlorophyll data products could investigate the role that biogenic particles (e.g., assuming the more chlorophyll the more particles present in surface waters) may have on the residence times of Al in surface waters and elucidate if chlorophyll global products could be used to better constrain Al residence times in surface waters.

DATA AVAILABILITY STATEMENT
The data compiled in this study have been deposited in an open access repository and is available under the following link https:// doi.org/10.25413/sun.12167877.v1.

AUTHOR CONTRIBUTIONS
J-LM conceived the study, wrote the manuscript with early contributions from AR, and compiled the database. TD, RC, JL, and SS helped with data compilation. EA, JK, RM, AB, PC, TR, and MG-C submitted unpublished data. AR provided funds for the study. All co-authors commented on previous manuscript drafts and on the final manuscript draft. All authors contributed to the article and approved the submitted version.

FUNDING
This study was funded by NRF, South Africa (UID# 105826 AND 110715).
samples included in the database. Special thanks to Marco Van Hulten and Marion Gehlen for providing their early aluminum compilations. Also, special thanks to all researchers who provided, upon request, their Al data. This study was funded by NRF, South Africa (UID# 105826 AND 110715).