The Reef Check Med Dataset on Key Mediterranean Marine Species 2001–2020

Dipartimento di Beni Culturali, Università di Bologna, Ravenna, Italy, Centro Interdipartimentale di Ricerca Industriale Fonti Rinnovabili, Ambiente, Mare ed Energia, Università di Bologna, Ravenna, Italy, Dipartimento di Scienze Biologiche, Geologiche e Ambientali, Università di Bologna, Ravenna, Italy, 4 Reef Check Italia Onlus, Ancona, Italy, Consorzio Nazionale Interuniversitario per le Scienze del Mare (CoNISMa), Rome, Italy, Dipartimento di Scienze della Vita e dell’Ambiente, Università Politecnica delle Marche, Ancona, Italy, 7 Stazione Zoologica Anton Dohrn, Naples, Italy, 8 Fano Marine Center, Fano, Italy


BACKGROUND
Mediterranean marine coastal habitats have been and continue to be threatened by human-related pressures, such as resource over-exploitation, pollution, habitat loss and fragmentation, and the invasion of non-native species (Airoldi and Beck, 2007;Micheli et al., 2013). These pressures are exacerbated by disturbances associated with global climate change, which have led to major shifts in marine ecosystems, impacting their resilience and ability to provide goods and services (Ponti et al., 2014;Garrabou et al., 2019). Ecological shifts in marine benthic communities are difficult to recognize because of the scarcity of: findable, accessible, interoperable, and reusable quantitative data (FAIR data principles; Wilkinson et al., 2016), which could serve as a baseline. The situation is impaired further by a lack of long-term monitoring capability at a regional scale. To be successful, marine coastal habitat conservation requires ecosystem-based management approaches that give ample consideration to the spatial and temporal distribution of key species over broad scales (Foley et al., 2010). It is evident that, easily accessible, reliable, and accurate data are essential to successfully monitor marine ecosystem health providing the knowledge needed to address the threats to coastal marine habitats, develop policies and regulations to protect vulnerable areas, understand trends, and forecast future changes (Martín Míguez et al., 2019). However, data obtained from scientific investigations and institutional monitoring programs, albeit very accurate, are generally too scarce and fragmented to be used effectively for spatial planning (Hochachka et al., 2012). This is particularly true for subtidal marine environments, as making sufficient repeated observations and measurements requires a large effort. As a solution, volunteers-citizen scientists-trained in the use of specifically developed monitoring protocols can help fill the gap in high-quality data acquisition, by performing monitoring over broad spatial and temporal scales.
Since 2001, volunteer certified trained snorkelers, freedivers, and scuba divers (hereafter EcoDivers) have collected data for selected key marine species, recording their occurrence, distribution, abundance, and bathymetric range along the Mediterranean Sea coasts, using the Reef Check Mediterranean Underwater Coastal Environment Monitoring (RCMed U-CEM) protocol (Turicchia et al., 2021b). Here, we describe the resulting dataset, the "Reef Check Med dataset on key Mediterranean marine species 2001-2020" (RCMed_2001-2020; Ponti et al., 2021), which is hosted by the European Marine Observation and Data Network (EMODnet;Martín Míguez et al., 2019) open repository. The organization and consistency of the data, the standards adopted, and how they can be accessed and used are also reported. The dataset is maintained by the non-profit organization Reef Check

METHODS
Abundance data for target species were collected according to the RCMed U-CEM protocol developed by RCI for a Citizen Science (CS) initiative that aims to monitor the ecological status of the Mediterranean marine coastal habitats. For this protocol, 43 taxa were selected based on two or more criteria, including ease of identification, being included in the international lists of protected species, being sensitive to human impacts, and being key indicators of the shift that Mediterranean coastal habitats can undergo under local pressures and climate change. Morphologically and ecologically similar species have been included at the genus or higher taxa level (Cerrano et al., 2017). Before going diving or snorkeling, each trained EcoDiver chooses one or more taxa, among the 43 included in the protocol ( Table 1), to actively search for, according to the type of habitat typology, survey depth, and personal interests. EcoDivers make independent observations along random swims (as defined in Hill and Wilkinson, 2004) and upload their records to the online database using the specific smartphone app or the Internet form. Not encountered but actively searched taxa are reported as absent. No data is provided for not searched taxa. New data are made publicly available following quality assurance and control (QA/QC) procedures. Data that do not meet the standards of the QA/QC procedures are discarded. The detailed monitoring protocol and methodology used to collect and record the data, including species selection, participant training and QA/QC procedures, is described in Turicchia et al. (2021b). EcoDiver personal data are managed in accordance with the European general data protection regulation (GDPR), which allows sharing the collected data on their behalf but leaves each one responsible for the quality of the data they provided. No ethical approval was obtained regarding plants and animals because the protocol does not provide for collecting or manipulating organisms, but only visual observations into the wild.

DATASET STRUCTURE
The RCMed_2001-2020 dataset is fully compliant with the EMODnet biology standards (Martín Míguez et al., 2019). The taxonomic guideline used is based on the World Register of Marine Species (WoRMS; Vandepitte et al., 2015), the authoritative and comprehensive global list of marine organisms' names. Biotic and abiotic measurements are reported using the controlled thesaurus from the Natural Environment Research Council (NERC; http://vocab.nerc.ac.uk) Vocabulary Server maintained by the British Oceanographic Data Center (BODC), and the Darwin Core Archive (DwC-A), an internationally recognized biodiversity informatics standardized data system intended to facilitate information sharing on biological diversity. 1 www.reefcheckmed.org This ensures interoperability and maximizes reusability, by providing a core standard (Wieczorek et al., 2012).
Following the EMODnet biology standards, data are organized in three tables: the DwC Event Core table stores  information on the survey events, the DwC Occurrence  extension table stores

DwC Event Core Table
Individual survey events correspond to single dives or swims, carried out independently by single EcoDivers inspecting the seabed at a specific time and place to collect data on single or multiple species. Several EcoDivers can investigate the same place simultaneously, but each provides an independent survey event. Each survey has a unique ID (eventID, including a progressive number, automatically attributed at the time of data entry) and is characterized by: the survey date (eventDate, in the format YYYY-MM-DD, conforms to the ISO 8601 1:2019); geographical coordinates in decimal degrees (decimalLatitude, decimalLongitude) based consistently on the same geodetic datum (geodeticDatum = WGS84; i.e., EPSG:4326); and with an accuracy (coordinateUncertaintyInMeters) of 200 m, as provided by the adopted protocol. Minimum (minimumDepthInMeters) and maximum (maximumDepthInMeters) depths represent the bathymetrical range of the survey and are expressed in meters.
The verbatimLocality field contains textual information on the survey site (i.e., the site's local name and municipality).
The prevailing habitat surveyed is identified according to the following categories (when available, the corresponding European Nature Information System marine habitat classification, EUNIS v2019 2 , is shown in parentheses): All records also report the codified institution name providing data (institutionCode = RCI), the name of the dataset (datasetName = Reef Check Med -key Mediterranean marine 2 Permalink to this version https://www.eea.europa.eu/ds_resolveuid/ 6d0484fd0078483ca73bec230574b34e. species 2001-2020), and the protocol name (samplingProtocol = RCMed U-CEM protocol).

DwC Occurrence Extension Table
The DwC Occurrence extension table stores details on species occurrence linked to the individual survey events (eventID). Each record has a unique numeric identifier (occurenceID), attributed in post-processing after the QA/QC procedures, and is related to a single taxon that was searched for during the survey. Taxa are identified by their scientific name at the lowest possible taxonomic level (scientificName), with the indication of multiple species (spp.) belonging to the same genus when appropriate (identificationQualifier), and the corresponding Life Sciences Identifier (LSID), a consistent globally unique identifier based on the AphiaID (Vandepitte et al., 2015) from the World Register of Marine Species (stored in the field scientificNameID).
Each record reports whether the species searched for during the survey was found or not (occurrenceStatus = present or absent). As explicitly indicated, all records are based on an onsite visual census (basisOfRecord = HumanObservation) carried out by an EcoDiver identified by name and unique certification number (identifiedBy).

DwC Measurement or Facts Extension (eMoF) Table
The DwC eMoF table contains additional quantitative information on species occurrences and events. Records are linked to every single occurrence (occurenceID) and to the individual survey events (eventID) to which they belong. Four types of measurement (measurementType) are stored: • "Abundance category of a biological entity specified elsewhere" for each occurrence; • "Depth minimum of biological entity specified elsewhere on the bed by epibenthic sampling" (in meters) for each occurrence; • "Depth maximum of biological entity specified elsewhere on the bed by epibenthic sampling" (in meters) for each occurrence; • "Sample duration" (in minutes) for each survey event.

DATA SEARCH, UPDATES, AND USE
The RCMed_2001-2020 dataset is distributed under the international Creative Common license (CC BY 4.0), which guarantees transparency on the origin of the data and allows for free sharing and adaptation, giving appropriate credit to the Reef Check Mediterranean network. It can be directly accessed from the EMODnet Biology Portal 3 that offers different services, including the data catalog, a data download toolbox with a step-wise filtering approach, a map viewer, the atlas of marine life data, and a web feature service (WFS), compliant with the Open Geospatial Consortium (OGC) standards for direct integration in geographic information systems (Martín Míguez et al., 2019). Thanks to the interoperability of the network (Tanhua et al., 2019), the dataset is redistributed under the Ocean Biodiversity Information System (OBIS) networks (including EurOBIS, MedOBIS; Costello and Vanden Berghe, 2006 and references therein), the European infrastructure on biodiversity and ecosystem research (LifeWatch; Basset and Los, 2012), and the Global Biodiversity Information Facility (GBIF; Flemons et al., 2007). Periodic submissions of newly acquired data to EMODnet are expected.

DATASET CONTENTS AND APPLICATIONS
The RCMed_2001-2020 dataset consists of 50,255 observations unevenly distributed among 43 key taxa in the Mediterranean Sea recorded in 4,898 individual survey events, carried out by 692 EcoDivers from 2001 to 2020. The data comes from Croatia, France, Greece, Italy, Spain, and Tunisia, covering parts of the following ecoregions (sensu Spalding et al., 2007): Western Mediterranean (52.3% of the surveys), Adriatic Sea (42.2%), Ionian Sea (4.9%), Alboran Sea (0.2%), Aegean Sea (0.2%), and Tunisian Plateau/Gulf of Sidra (0.1%; Figure 1A). After an initial period of protocol development in the Adriatic Sea (2001-2003, originally called "Adriatic Underwater Watching Project") with 200-300 surveys carried out per year, there was a reduction in the number of surveys the following two years. After this, the number of surveys per year has varied from 150 to 600, with the minimum value in 2020. This is likely related to the COVID-19 pandemic lockdown ( Figure 1B). While ∼ 97% of observations took place in the recreational diving depth range (0-40 m), the maximum depth reached during surveys was 95 m (Figure 1C). The spatial and temporal distribution of the data is affected by the volunteers' willingness, habits and preferences applying the RCMed U-CEM protocol. However, spatial and temporal biases are recognized as major issues in CS projects and biodiversity databases, remaining 3 https://www.emodnet-biology.eu/data-catalog?module=dataset&dasid=6454 intrinsically unavoidable for this and most other CS initiatives (Beck et al., 2014). The United Nations Decade of Ocean Science for Sustainable Development (2021-2030) asks for an urgent improvement in marine conservation actions worldwide. Similarly, the EU Biodiversity Strategy for 2030 includes among its main tasks an enhanced focus on Natura 2000 species and habitats and a Nature Restoration Plan of degraded ecosystems across the EU, addressing the key drivers of biodiversity loss. Without a detailed census and mapping of the distribution and abundances of target species, it is impossible to address these objectives effectively. Marine Citizen Science is a promising and powerful tool to enhance engagement in marine conservation worldwide and increase ocean observation capability ensuring long-term monitoring whenever appropriate protocols are applied. In these regards, the application of the RCMed_2001-2020 dataset ranges from: monitoring the ecological status of Mediterranean coastal environments to assessing the effects of human impacts and management interventions (Turicchia et al., 2021a); raising public awareness; and involving people in marine conservation (Lucrezi et al., 2018 and references therein). Moreover, the dataset has been used to complement scientific papers on species distribution and abundance, distribution modeling, and comparing historical data series (Cerrano et al., 2017;Ponti et al., 2018;Turicchia et al., 2018). A list of applications and publications obtained by applying the protocol and using this data is kept up to date on the Reef Check Med website, and authors are encouraged to report their outcomes.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: EMODnet Biology data portal, https://doi.org/10.14284/468.

ACKNOWLEDGMENTS
We thank all the EcoDivers and their trainers, who provided and continue to provide new data. The following MPAs supported the training of EcoDivers and promoted data collection: Cabo de Palos, Capo Gallo -Isola delle Femmine, Cinque Terre, Isola di Ustica, Isole Egadi, Isole Tremiti, Miramare, Porto Cesareo, Portofino, Tavolara -Capo Coda Cavallo. We want to thank Leen Vandepitte, Joana Beja, Gizem Poffyn and Ruben Perez from the Vliz Flanders Marine Institute for their assistance in making the dataset compliant with the EMODnet Biology standards. We thank the editor and reviewers for their valuable suggestions for improving the report. This study is part of ET's Ph.D. thesis.