Heavy Metals in the Adriatic-Ionian Seas: A Case Study to Illustrate the Challenges in Data Management When Dealing With Regional Datasets

Harmonization of monitoring protocols and analytical methods is a crucial issue for transnational marine environmental status assessment, yet not the only one. Coherent data management and quality control become very relevant when environmental status is assessed at regional or subregional scale (e.g., for the Mediterranean or the Adriatic Sea), thus requiring data from different sources. Heavy metals are among the main targets of monitoring activities. Significant efforts have been dedicated to share best practices for monitoring and assessment of ecosystem status and to strengthen the network of national, regional and European large data infrastructures in order to facilitate the access to data among countries. Data comparability and interoperability depend not only on sampling and analytical protocols but also on how data and metadata are managed, quality controlled and made accessible. Interoperability is guaranteed by using common metadata and data formats, and standard vocabularies to assure homogeneous syntax and semantics. Data management of contaminants is complex and challenging due to the high number of information required on sampling and analytical procedures, high heterogeneity in matrix characteristics, but also to the large and increasing number of pollutants. Procedures for quality control on heterogeneous datasets provided by multiple sources are not yet uniform and consolidated. Additional knowledge and reliable long time-series of data are needed to evaluate typical ranges of contaminant concentration. The analysis of a coherent and harmonized regional dataset can provide the basis for a multi-step quality control procedure, which can be further improved as knowledge increases during data validation process.


INTRODUCTION
The increased human use of the marine and coastal areas may compromise marine ecosystems through several kinds of physical, chemical and biological disturbances and contamination by hazardous substances. In particular, in the Adriatic and Ionian Seas, the overall increase in maritime transport, the increasing coastal urbanization and the foreseen growth in offshore oil and gas extraction pose serious risks of pollution from hazardous substances for several coastal countries (Mosetti et al., 2014). Several relevant and growing economic maritime activities such as coastal and maritime tourism, fishery and aquaculture rely on the preservation of ecosystem services and reduction of pollution. Besides, this region is a hotspot of biodiversity (Coll et al., 2012) and hosts natural protected areas, sites of conservation interest of global importance (National Marine Protected Areas, NATURA 2000 sites) and other areas with different protection regimes according to the IUCN categorization.
Due to the environmental regulations in place in the Mediterranean (MSFD, 2008/56/EC, European Commission, 2008WFD, 2000/60/EC, European Commission, 2001IMAP, UNEP/MAP, United Nations, 2016b), there is already a comprehensive coastal and marine monitoring undertaken in the Adriatic and Ionian seas. The ecosystem-based approach and the management at sea basin scale increase the needs of data availability, sharing and comparability.
European Marine Board (2008) highlighted a series of challenges related to marine data that are still valid: data availability, quality assurance and tools for policymaking.
Concerning marine contaminants, in particular, cost-effective measurements are still a challenging issue as in situ monitoring and complex analytical procedures are expensive and timeconsuming. In this sense, data sharing becomes extremely valuable. However, sampling and analytical procedures may vary from country to country and efforts on standardization and harmonization of sampling and analysis, at least at regional or subregional level, are required to improve considerably the comparability of data. In addition, if data management is based on commonly agreed and standardized approaches, it will facilitate data sharing and access.
The setup of the FAIR -Findable, Accessible, Interoperable and Reusable-principles (Wilkinson, 2016) in ocean data management has contributed to considerably improve the work of the European marine data infrastructures (Tanhua et al., 2019).
Thanks to consolidated European data infrastructures, the accessibility of data has improved. However, quality of available data as well as data comparability are still critical issues. Environmental assessment needs information about protocols applied in the monitoring network of the different countries, to evaluate the comparability of data that are being collected. Without this information, the datasets may be unusable for assessment purposes. There is high heterogeneity in the procedures related to contaminants sampling and laboratory analysis (Berto et al., 2020) and, besides, these aspects require improvement in relation to contaminant data management and storage. Lastly, procedures for quality control of data of marine contaminants need to be agreed and consolidated.
Analysis of a harmonized regional dataset can be a good basis to understand the overall approach, from monitoring/sampling to data quality control aspects on a regional scale. The analysis of a regional case study will allow us to understand the benefits provided by a large data infrastructure that integrates heterogeneous data from multiple sources, but also to identify the specific aspects of data management related to contaminants that need improvement. The experience and knowledge gained with this exercise can lead to the definition of best practices that might be further implemented at a wider scale.
The aim of this study is to evaluate heterogeneity in data of heavy metals collected in seawater, sediment and biota in the Adriatic-Ionian subregions, to verify the comparability of data made available from different sources and to propose guiding principles to improve a harmonized approach of data management.

BACKGROUND AND RATIONALE
The Adriatic and Ionian Seas are bounded by both EU and non-EU countries, which determines a number of implications in terms of the implementation of the environmental legal framework. The coastal States that share a marine region or subregions are meant to cooperate to ensure coherent management with an ecosystem-based approach.
The existing legal framework has been defined by the EU (Marine Strategy Framework Directive-MSFD-, Water Framework Directive-WFD-, Maritime Spatial Planning Directive-MSP-, etc.). and UNEP/MAP as Regional Sea Convention (Offshore Protocol, land-based sources and Activities Protocol, Protocol on Integrated Coastal Zone Management-ICZM-, etc.). and cover various aspects of environmental protection. However, they overlap to some extent and are not binding on all coastal states, which leads to several issues in their implementation.
The assessment of EU Member States reporting for the first MSFD cycle underlined that the level of coherence in the ADRION marine subregions concerning the implementation of EU environmental policies and MEDPOL program (Programme for the Assessment and Control of Marine Pollution in the Mediterranean) is considered low (Palialexis et al., 2014), particularly in the case of pollution from hazardous substances.
The descriptors 8 and 9 of the MSFD and the Ecological Objective (EO) 9 of the Ecosystem Approach of the UNEP/MAP deal with pollution from contaminants. Namely, heavy metals are a wide group of contaminants that continues to accumulate due to new productive activities. As defined by IUPAC, the term heavy metals "is often used as a group name for metals and semimetals (metalloids) that have been associated with contamination and potential toxicity or ecotoxicity" (Duffus, 2002). Heavy metals are a natural part of the marine environment whose concentrations have been constantly increasing due to anthropogenic activities (Ansari et al., 2004). Several metals and metalloids are directly linked to sea-based sources of pollution, such as shipping, offshore oil-and gasindustry, mariculture Hanke, 2016, 2018) and are, therefore, included in the guidelines for environmental status assessment by EU and UNEP/MAP directives. Sediment and biota are highly conservative environmental matrices, representative of the state of contamination of the marine environment. Sediments are considered the main sink for heavy metals in aquatic environments, while heavy metals are known to accumulate in marine organisms and even be biomagnified through the trophic web (DeForest et al., 2007;Rainbow and Luoma, 2011). Due to heavy metal toxicity, their persistence and tendency to accumulate into sediment and biota, these two matrices, should be preferred for monitoring and assessment purposes with respect to water. Furthermore, besides chemical investigations, biological tools such as biomarkers and bioassays on selected target species may add information on the bioavailability and possible toxic effects of these contaminants, at the molecular, cellular or physiological level, and can be usefully associated/integrated to chemical approaches. However, heterogeneity in monitoring and analytical protocols may limit data comparability, although the environmental assessment and large geographic scale requires consistency. To improve the comparability of the data, storage of the proper documentation related to monitoring and analytical protocols is fundamental. With this in mind, assessments at regional or subregional seas levels require proper archiving of complete metadata together with data and suitable mechanisms of data discovery and access.
In the last few years, research and monitoring efforts have been strongly influenced by environmental policies implementation. However, besides environmental status assessment, also scientific research on heavy metals needs standard data. Biota interspecific differences, tissue, stage of development (Cenov et al., 2018), geographic location (Perić et al., 2012), grain size classes (Živković, 2010), among others, are variables that influence the accumulation of metals both in biota and sediment (Oros and Gomoiu, 2012). Consequently, diverse scientific approaches would benefit from using a wider range of data accurately documented and acquired.
A shared and agreed approach to data management may assist to discover, obtain and analyze large scale datasets. This might lead to an improvement of the knowledge base in the field of marine pollution.
The use of data infrastructures, that work on the basis of standardized procedures and vocabularies and provide tools for data discovery, can help to handle the heterogeneity of existing data and to promote access to data collected by different institutions. Data structured in a unique way, defined by common and harmonized parameters are the basis for the creation of data collections that are needed to bolster the quality control methodologies for contaminants and to assess pollution coherently in different areas.

Data Infrastructures
The use of large data infrastructures to manage data and required for supplying fragmented marine data offers an enormous advantage when dealing with large scale studies (e.g., basin scale, European scale, global scale) (Benson et al., 2018). They are the link between observations, data management and users and are fundamental to: • Give access to managers and policymakers to updated data and information for decision making. • Provide scientists with a framework to integrate individual observations in order to build a strong network of knowledge.
In Europe, the consolidated EU initiative EMODnet (European Marine Observation and Data Network) is an important reference (Martín Míguez et al., 2019) for in situ measurements. EMODnet Chemistry, in particular, constitutes the spatial data infrastructure in charge of providing access to marine chemical (eutrophication, ocean acidification, contaminants and marine litter) data (Giorgetti et al., 2018). EMODnet Chemistry relies on SeaDataNet standards, established through a Pan-European infrastructure for ocean and marine data management (Schaap and Maudire, 2009), and adopts FAIR principles to guide the whole data management approach. Interoperability is guaranteed by the use of controlled vocabularies, the utilization of standard metadata and data formats (Vinci et al., 2017), the use of common and transparent quality control procedures and quality flagging schema, all developed in a framework of international cooperation. Particularly, the use of controlled vocabularies represents an important prerequisite to allow consistency and interoperability. Taking into consideration the high heterogeneity in marine chemical data with regard to sampled matrix characteristics, sampling and analytical protocols, a very specific vocabulary (Parameter Usage Vocabulary, P01) 1 was implemented by the British Oceanographic Data Centre (BODC). The P01 vocabulary was initially introduced during the EU/FP5 SeaSearch project and further developed in the framework of SeaDataNet for labeling measured substances and to keep relevant information linked to the data. It allows to label parameters with a standard description and is updated upon data originators needs, as new parameters are made available.
Lastly, the adoption of a standard data policy, consistent with the data providers' policies, regulating data access, allows to appropriately acknowledge data originators and encourages data sharing. All these standards allow data to be reusable, taking advantage of the already invested efforts in monitoring.
Data comparability and interoperability depend not only on sampling and analytical protocols but also on how data and metadata are archived. For data to be usable, the information about what, how, when, where and why must be on hand (Ma et al., 2014;Benson et al., 2018).

ADRION Regional Data Collection of Heavy Metals
In order to evaluate the advantages of using large data management infrastructures and the needs of improvements in data management related to contaminants, a data collection of heavy metals available for the Adriatic and Ionian (ADRION) Seas has been analyzed in this study.
The used data collection consisted of all data (restricted and non-restricted) made available in the framework of HarmoNIA project 2 (INTERREG VB-ADRION, 2018-2020) by institutions listed in Supplementary Table 1, and unrestricted data available through EMODnet Chemistry infrastructure, covering the Adriatic and Ionian Seas. The ADRION Regional Data Collection, thus, includes over 5500 datasets related to marine contaminants, provided by six neighboring coastal countries of the subregion (Italy, Slovenia, Croatia, Montenegro, Albania and Greece). Specifically, related to heavy metals, there are around 5000 datasets with a temporal coverage spanning from 1981 to 2018.

Data Management
Datasets are compiled by data originators using standard "Ocean Data View (ODV) format, " which contains three types of columns: metadata, primary variable and data. Metadata columns provide information on cruise, station, data originator, bottom depth, project and access policy. The primary variable can be time, in cases of monitoring stations repeated in time, or depth, when data are available as vertical profiles. Metadata and data comply with common vocabularies 3 set up within SeaDataNet infrastructure, which is a fundamental part of the standardization process. Data are accompanied by quality control flags (QF) (according to SeaDataNet quality flag scale) defined by data originators (more details in Supplementary Material). The ADRION Regional Data Collection has been processed and validated with ODV software 4 , which is continuously being adapted to fit the needs of management of data from different disciplines. From the original ADRION Regional Data Collection, ODV was used to obtain three data collections, one for each matrix (seawater, sediment, biota). A builtin "harmonization tool, " specifically implemented to handle the heterogeneity in measurement units, has been used to convert concentrations to standard units, defined according to recent EU Directives (European Commission, 2013. The resulting "harmonized and transposed matrix, " provided as an optional data output format specific for data of chemical contaminants, was used to explore metadata completeness and data heterogeneity and to perform data Quality Control (QC).
A stepwise and iterative QC approach was adopted to obtain a harmonized validated regional dataset. The applied QC procedure included: • Harmonization of measurement units and parameter naming. • Metadata completeness and dataset format control.
• Inspection related to checks for inconsistent measurement units. • Verification of quality flagging for data and metadata.
• Screening of data ranges to search for clearly impossible values (e.g., different orders of magnitude).
The results of the first validation cycle are reported to data originators who are in charge of revising, correcting encountered issues and, possibly, providing missing metadata needed to make data comparable and fit for further use.

RESULTS
Comparing the different countries, data distribution was heterogeneous and there were differences in the number of substances and matrixes monitored by the different countries (Figure 2). The geographical data coverage was mainly associated with coastal waters (Figure 1).
Only 11 "heavy metals" (As, Cd, Cu, Cr, Fe, Hg, Mn, Ni, Pb, Ag, and Zn) were measured in all three matrices and only three were measured by all countries (Cu, Pb, Zn) (Figures 2, 3), out of a total of 34 metal elements.

Sediment Collection
The largest part of data were available for the sediment matrix, for the whole Adriatic -Ionian Seas (Figure 2). This collection contained 133 different parameters (P01 codes) related to 52 different metal and metalloid compounds, but only Cu, Pb, and Zn were measured by all ADRION countries in this matrix. Contaminant concentrations in the sediment were reported on dry weight basis (with the exception of data related to sediment pore waters, which were removed from the sediment-matrix collection) and referred to different sediment grain sizes (total sediment, <2000 µm, <500 µm, and <63 µm). Within the regional data collection, information related to sampling (sampled sediment depth, instrument and thickness), matrix characteristics (i.e., specific sediment grain size), sample pre-treatment and analytical techniques is not always complete, which limits data comparability and, in the worst case, possibility for use of data. In particular, the lack of specific metadata such as information on sampled thickness, which was directly linked to deposition history, may hinder a solid data comparability.
With regard to contaminant concentration values, after quality control of the regional data collection, around 83% of data are flagged as "good, " 3% are below limit of detection or, when provided, below limit of quantification, 11% are flagged as "probably bad" due to values outside ranges reported for the region and 3% still have to be verified by data providers.

Biota Collection
The data collection related to biota contains 68 parameters, related to 12 heavy metals (Ag, As, Cd, Cu, Cr, Fe, Hg, Mn, Ni, Pb, Se, and Zn). Only four countries provided data on heavy metals in biota and only two substances (Hg and Cd) were common to all four countries (Figure 3). Data were related to 8 different species among fishes, mollusks and annelids and were mainly reported on dry weight basis. Information about the analyzed tissue was mostly included, conversely, size class of the organisms was rarely provided. As in the case of sediment, sample pre-treatment and method of analysis details were not always described. After data QC, 4% of data result below the limit of detection or quantification, 90% of the data are flagged as good, while 5% still need to be validated by data originators.

Water Collection
For water matrix, the data collection contained 57 different parameters associated to 16 metals and metalloids (Ag, Al, As, Ba, Cd, Cr, Co, Cu, Fe, Hg, Mn, Mo, Ni, Pb, V, and Zn). Data on heavy metal concentration in the water matrix were available for four countries and only Cd, Cu, and Hg were measured by all four countries. Heavy metal concentration refers to the water volume and data are mainly related to the dissolved phase, mostly filtered up to 0.4/0.45 um, although a minority of data were related to total, i.e., dissolved plus reactive particulate, or to the particulate phase (>0.4/0.45 um). Availability of correct information on the sampled phase is fundamental to allow comparability of data from several sources. However, indications on sampling depth, sample preparation and analytical methodology are not always complete or even provided. After QC procedure, 23% of data FIGURE 2 | Number of data available for heavy metals in sediment (orange), biota (green), and water (violet). Note the logarithmic scale. were below detection or quantification limit, 9% were labeled as "good" data, while for the 68% of the data the quality control has not yet been finalized as additional verification by the data originators is needed.

DISCUSSION
Despite the ongoing improvements in observing capability and consequent growing number of data, the availability of frameworks to similarly increase the conversion of data to information, which requires data of known quality, origin, use and attribution conditions, is still a challenge (Buck et al., 2019). Anthropogenic contaminants, in particular, represent high gaps in terms of geographical and temporal data availability (Astiaso Garcia et al., 2019), and data comparability and quality assurance are still limiting environmental status assessment at subregional and regional scale (United Nations, 2017). The ADRION Regional Data Collection represents the largest, harmonized and validated accessible dataset on contaminants, in particular on heavy metals, for the Adriatic-Ionian Seas. Access to marine data of known-quality is a key issue for sustainable economic development and for marine environmental management (European Commission, 2010EEA, 2015). By improving monitoring data validation, coherence and accessibility, this product contributes to the need of improved data availability, integration and flows underlined by both scientific community and monitoring authorities (United Nations, 2017; Astiaso Garcia et al., 2019;Painting et al., 2019) and can represent a valuable resource to improve addressing environmental threats and pressures in the ADRION area. The standardization and harmonization of the datasets, produced by different laboratories, have made it possible to obtain a homogeneous product that shows the high heterogeneity in terms of matrix properties (water phase, sediment grain size, biota taxa, target organs, size of the sampled individuals, etc.) as well as in the measured parameters and analysis protocols. At the same time, the analysis of the harmonized regional data collection allowed to identify several aspects of data management that need to be improved. In particular, the need of more complete and accurate metadata related to the sampling and analysis has been identified as an important field of improvement; the lack of relevant information such as sampling depth, the thickness of sediment samples, sample preparation, analysis methodology or normalization parameters may limit the comparability and usefulness of the data. To address these issues, the establishment of the data Quality Control feedback process (Vinci et al., 2017 and Supplementary Material), which is carried out in contact with the data originators, enables to obtain additional relevant metadata and promotes continuous improvement of the data at all times.
The vocabulary (P01), adopted by SeaDataNet, EMODnet Chemistry and HarmoNIA, allows to keep several kinds of information (e.g., matrix characteristics, sampling, analytical protocols, etc.) associated to the measured substance compacted in just one code, thus allowing to maintain relevant metadata connected to the data (IODE/UNESCO, 2019). This approach is fundamental to enable the comparison of the same type of monitoring data where "type" is identified by the whole set of information related to substance, matrix, sampling, etc. (Supplementary Material). On the other hand, such approach shows some limitations. In fact, the multiple combinations of the information mentioned above result in a huge list of parameters included in the datasets, which makes the data structure quite complex ( Table 1). This is particularly striking for data related to biota, including bioassays and biomarkers. For these specific data, several additional parameters (e.g., age, stage of development or size of individuals, protocol details regarding organism exposure or handling, endpoint, etc.) are required to correctly evaluate the contamination status (Bajt et al., 2019). However, their inclusion into P01 expands considerably such vocabulary. The use of standard vocabularies represents an important prerequisite toward consistency and interoperability and assures data comparability (Astiaso Garcia et al., 2019). At the same time, the complexity of specific data types, such as contaminants, may require ad hoc adaptations to facilitate data processing by users. To better meet users' needs, specific tools were implemented to manage the complexity of contaminant parameters and allow to decompose the P01 terms on its subcomponents (i.e., matrix type and characteristics, chemical substance, sample treatment and analytical method, etc.). The "decomposition tool, " thus, enables to obtain a dataset format suitable to filter data according to the user needs. This "decomposed" dataset format can be particularly useful if contaminant concentrations need to be compared, for example, between different areas but in the same sediment fractions, as required by MSFD and Barcelona Convention Protocols, and can support refining the definition of threshold values (United Nations, 2016a) required for environmental status assessment (United Nations, 2017). This specific dataset format helps to overcome the high complexity and facilitates the processing of highly heterogeneous datasets such as those related to marine contaminants measured by multiple laboratories.
Achievement of a harmonized regional dataset derived from multiple and heterogeneous data sources, thus, requires a stepwise approach consisting in data collection, standardization (same metadata, controlled vocabularies, same dataset formats, standard Quality Flag scale), harmonization (measurement unit conversion), Quality Control loop engaging data providers, parameter decomposition (P01 subcomponents) and dataset format transposition.
The final product meets the requirements of standardized, validated and interoperable data indicated by scientific and environmental status assessment community.

CONCLUSION
The use of data infrastructures for data archiving and management provides a standard and harmonized framework to improve access to information supplied by multiple and heterogeneous sources. Harmonized and validated data and information availability (particularly in the field of marine contaminants) are fundamental to support, both, environmental status assessment and scientific research needed to evaluate effects of contaminants on the ecosystem. The analysis focused on the ADRION Regional Data Collection allowed to identify the specific needs of data management related to contaminants and the specific metadata required to enable data comparability and fitness for use.
Data related to contaminants are complex and their management and validation protocols still have to be improved. However, the use of available data is fundamental, considering that in situ measurements related to pollution are expensive and difficult to obtain. Further improvements in data and metadata completeness and harmonization are crucial issues to be addressed and developed.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: http://harmonia.maris2.nl/search.

AUTHOR CONTRIBUTIONS
MM contributed to data and metadata quality control and led the writing. ML developed the idea of the manuscript, participated to the writing, and coordinated the project. MK contributed to data quality control and manuscript revision. EP contributed to metadata preparation and manuscript revision. GG, AR, DI, AI, BČ, MF, AC, and RB contributed with data and to manuscript revision. CG and MV contributed to manuscript revision. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by the project "HarmoNIA (https:// harmonia.adrioninterreg.eu/) -ADRION 340 (2018-2020)", funded by the European Regional Development Fund and IPA II fund, in the framework of INTERREG VB Adriatic-Ionian (ADRION) Cooperation Programme. The content of the publication is the sole responsibility of beneficiaries of the project and can under no circumstances be regarded as reflecting the position of the European Union and/or ADRION programme authorities.