Reliability of Data Collected by Volunteers: A Nine-Year Citizen Science Study in the Red Sea

The quality of data collected by non-professional volunteers in citizen science programs is crucial to render them valid for implementing environmental resources management and protection plans. This study assessed the reliability of data collected by non-professional volunteers during the citizen science project Scuba Tourism for the Environment (STE), carried out in mass tourism facilities of the Red Sea between 2007 and 2015. STE involved 16,164 volunteer recreational divers in data collection on marine biodiversity using a recreational citizen science approach. Through a specifically designed questionnaire, volunteers indicated which of the seventy-two marine taxa surveyed were observed during their recreational dive, giving an estimate of their abundance. To evaluate the validity of the collected data, a reference researcher randomly dived with the volunteers and filled in the project questionnaire separately. Correlation analyses between the records collected by the reference researcher and those collected by volunteers were performed based on 513 validation trials, testing 3,138 volunteers. Data reliability was analyzed through 7 parameters. Consistency showed the lowest mean score (51.6%, 95% Confidence Interval CI 44.1–59.2%), indicating that volunteers could direct their attention to different taxa depending on personal interests; Percent Identified showed the highest mean score (66.7%, 95% CI 55.5–78.0), indicating that volunteers can correctly identify most surveyed taxa. Overall, results confirmed that the recreational citizen science approach can effectively support reliable data for biodiversity monitoring, when carefully tailored for the volunteer skills required by the specific project. The use of a recreational approach enhances massive volunteer participation in citizen science projects, thus increasing the amount of sufficiently reliable data collected in a reduced time.


INTRODUCTION
Institutions and natural resource managers are often under fund restrictions, which odds with the need to collect fundamental data to implement conservation strategies (Lewis, 1999;Foster-Smith and Evans, 2003;Jetz et al., 2012;Forrester et al., 2015;McKinley et al., 2017). Effective conservation strategies must also integrate public input and engagement in designing solutions (McKinley et al., 2017). Involving volunteers in data collection for monitoring activities can be a costeffective strategy to complement or replace the information collected by professionals (Starr et al., 2014). Citizen science projects can improve environmental education of volunteers, increase scientific knowledge and allow the collection of large datasets (Foster-Smith and Evans, 2003;Bonney et al., 2009;Sullivan et al., 2009;Jordan et al., 2011;Branchini et al., 2015b;Callaghan et al., 2019). Participating in a citizen science project can have an educational role both in the short and long term, with the retention of acquired environmental awareness after years (Branchini et al., 2015a;Meschini et al., 2021).
Observations of the natural world, including weather information, plants and animals distribution, astronomical phenomena and many other data have been recorded for decades by citizens (Miller-Rushing et al., 2012;Bonney et al., 2014). One emblematic example come from ornithology, with the Audubon Society's annual Christmas bird counts, started in 1900 and it still engaging 60-80,000 volunteers annually (Forrester et al., 2015). Nowadays millions of volunteers are participating in many scientific research projects by collecting, categorizing, transcribing and analyzing data (Dickinson et al., 2012;Callaghan et al., 2019). Ultimately, citizen science presents an enormous potential to influence policy and guide resource management by producing datasets that would be otherwise unobtainable .
Citizen science is blooming across a range of disciplines in natural and social sciences, as well as humanities (Lukyanenko et al., 2019). A large body of environmental research is based on citizen science (e.g., biology, conservation and ecology); anyway, the development of information and communication technologies (ICT) have expanded the scale and scope of data collection from geographic information research (e.g., projects for geographic data collection) to social sciences and epidemiology studies (e.g., projects that study the relationship between environmental issues and human health) (Kullenberg and Kasperowski, 2016;Hecker et al., 2018). Citizen science is becoming of central importance to reinforce literacy and societal trust in science and foster participatory and transparent decision-making 1 . It is also gaining an increasing interest for policy makers, government officials and non-governmental organizations (Turbé et al., 2019). Data collected through citizen science are a non-traditional data source that is giving a contribution to measure the United Nations (UN) Sustainable Development Goals (Fritz et al., 2019). The role of citizens is becoming central also in European Union (EU) policies, such 1 https://cordis.europa.eu/programme/id/H2020_IBA-SWAFS-Citizen-2019 as the Horizon 2020 funding program 2 . The next European Research and Innovation Program Horizon Europe includes a specific mission supporting this process by connecting citizens with science and public policy 3 . In the Mission Starfish 2030 program, citizens are protagonists of one of the five overarching objectives for 2030 and one goal of this program for the 2025 checkpoint, is that 20% of data collection comes from citizen science initiatives 4 . Those are some examples of the increasing importance that citizen science is gaining in European funding programs, where citizen science will be a transversal topic to all missions.
Citizen science projects vary extensively in subject matter, objectives, activities, and scale, but the common goal is collecting reliable data to be used for scientific and policy making purposes for implementing environmental management and protection plans (Forrester et al., 2015;Van der Velde et al., 2017). Volunteers involved in citizen science projects can produce data with sufficient to high accuracy (Foster-Smith and Evans, 2003;Goffredo et al., 2010;Kosmala et al., 2016), although some cases of insufficient volunteer data quality have been reported (Foster-Smith and Evans, 2003;Galloway et al., 2006;Delaney et al., 2008;Silvertown, 2009;Hunter et al., 2013).
Data collection in citizen science projects usually addresses easy-to-recognize organisms, with interest on qualitative and semi-quantitative data that can be useful for management plans (Bramanti et al., 2011). The marine environment data collection is particularly challenging because it requires swimming or scuba diving skills in addition to the usual sampling difficulties (Goffredo et al., 2004(Goffredo et al., , 2010Gillett et al., 2012;Forrester et al., 2015). Citizen science in the marine environment can be used to monitor shallow water organisms (up to 40 meters depth, the Professional Association of Diving Instructors (PADI) limit for recreational scuba skills) over a large geographical and temporal extension (Goffredo et al., 2010;Bramanti et al., 2011;Gommerman and Monroe, 2012). Several studies analyzed the correlation between data collected by professionals and volunteers on a single taxonomic group, such as fishes (Darwall and Dulvy, 1996;Holt et al., 2013), e.g., sharks (Ward-Paige andLotze, 2011) or corals (Bramanti et al., 2011;Marshall et al., 2012;Forrester et al., 2015) showing that volunteers were able to collect good quality data that could be used to complement professional data and describe population trends in spatial and temporal scales.
The aim of this study was to replicate the standardized methodology used in Goffredo et al. (2010) and Branchini et al. (2015b) to assess the quality of data collected by non-specialist volunteers on seventy-two Red Sea taxa during the recreational citizen science project Scuba Tourism for the Environment (STE). Previous reported studies were, respectively, based on 38 and 61 validation trials, in this study we analyzed 513 validation trials mainly performed in Egypt between 2007 and 2015. Our study used a recreational survey protocol based on casual diver observations. This protocol allowed divers to carry out their normal recreational activities and ensured the reliability of collected data through standardized data collection (Branchini et al., 2015b). To evaluate the possible influence of independent variables (date, team size, diving certification level, depth and dive time on volunteers data quality, we used correlation analyses using Spearman rank correlation and distance-based redundancy linear modeling (DISTLM) to test the contributions of independent variables to data variability.

MATERIALS AND METHODS
From 2007 to 2015 16,164 recreational scuba divers in mass tourism facilities and diving centers in the Red Sea were involved in the citizen science project Scuba Tourism for the Environment (STE). Project goal was to monitor coral reef biodiversity in the Red Sea, using specifically developed illustrated questionnaires. A first section of the questionnaire was dedicated to volunteer environmental education to limit human impact on the reef and increase volunteer awareness on the vulnerability of coral reefs (Supplementary Figure 1). The second section of the questionnaire consisted in seventy-two photographs of target taxa, chosen because they are: (i) representative of the main ecosystem trophic levels, (ii) expected to be common and abundant in the Red Sea, and (iii) easily recognizable by non-specialist volunteers (Supplementary Figure 2). These characteristics were selected to increase the accuracy of data collected by volunteers (Goffredo et al., 2004(Goffredo et al., , 2010. The third section of the questionnaire was dedicated to the collection of personal information (i.e., name, address, email, level of diving FIGURE 1 | Red Sea map with black dots indicating sites in which data for the reliability analysis were collected. Frontiers in Ecology and Evolution | www.frontiersin.org certification and diving agency), technical information about the dive (i.e., place, date, depth, dive time, duration of the dive), type of habitat explored (i.e., rocky bottom, sandy bottom or other habitat) and the data collection table about sighted taxa with an estimation of their abundance (Supplementary Figure 3). The abundance estimation of each taxon was based on literature (Wielgus et al., 2004) and databases 5 , and expressed in the three categories "rare, " "frequent" or "abundant." Completing questionnaires shortly after the dive facilitated the quality control of collected data. The STE project used a recreational citizen science approach (Goffredo et al., 2004(Goffredo et al., , 2010Branchini et al., 2015b) in which normal recreational diving features and volunteer behavior are not modified by project participation. Researchers of the STE project performed an annual training session for scuba instructors of the diving centers involved in the project, based on the methodology used for the study and obtained results. This allowed scuba instructors to directly involve their clients in data collection. The STE project received the approval of the Bioethics Committee of the University of Bologna (prot. 2.6). Data were treated confidentially, exclusively for institutional purposes (art. 4 of Italian legislation D.R. 271/2009 -single text on privacy and the use of IT systems). Data treatment and reporting took place in aggregate form.

Data Validity Assessment
To assess the validity of data collected by volunteers, records of 3,138 volunteer were compared with those collected by a marine biologist of the Marine Science Group of the University of Bologna ("control diver") during 513 validation trials mainly performed in Egypt (Figure 1). The characteristics of the validation trials were: (1) the control diver dived with at least three volunteers; (2) the validation trial did not affect the diving center normal choice of dive site; (3) the dive was conducted between 9.00 am and 4.00 pm; (4) after the dive, the control diver filled in the questionnaire 5 http://www.gbif.org; http://www.marinespecies.org apart from volunteers, as to avoid interference with volunteers data recording (Goffredo et al., 2010). For each trial, the inventory of each taxa (with abundance ratings) sighted by the control diver was correlated with that collected by each volunteer to verify their similarity (Darwall and Dulvy, 1996;Foster-Smith and Evans, 2003;Aceves-Bueno et al., 2017). To measure the quality of volunteer data, 7 reliability parameters were used: Accuracy, Consistency, Percent Identified, Correct Identification, Correctness of Abundance Ratings, Similarity, Reliability (Table 1). Non-parametric statistical tests were used for the analysis: (1) Spearman rank correlation coefficient, to evaluate the accuracy of data collected by volunteers in comparison to those obtained by the control diver; (2) Cronbach's alpha (α) correlation, to evaluate the reliability of collected data between each volunteer and the control diver; and (3) Czekanowski proportional similarity index (SI) to obtain a measure of similarity between each volunteer and the control diver ratings (Goffredo et al., 2010). Tests results were reported as mean with 95% Confidence Interval (CI) (Sale and Douglas, 1981;Darwall and Dulvy, 1996). For the Similarity and Reliability parameters the lower bound (calculated from 95% Confidence Interval (CI) of the mean values) was used (Goffredo et al., 2010). We also examined the effect of date, team size (the number of participants present in each validation trial), diving certification level of each participant, depth and dive time on volunteer accuracy using the Spearman's rank correlation coefficient. All these statistical analyses were computed using the SPSS 22.0 statistical software. Using PRIMER v6, distance-based redundancy linear modeling (DISTLM) with a test of marginality was also performed, based on Euclidean distance, to test the contributions of variables to data variability.

RESULTS
The mean accuracy of each validation trial ranged from 38.2 to 81.5%, with 94.2% of trials with mean accuracy between 40 and 70% (Supplementary Table 1; Figure 2). Accuracy TABLE 1 | Reliability parameters used to analyze data collected by volunteers (modified from Goffredo et al., 2010).

Parameter
Definition and derivation of parameter Accuracy Similarity of volunteer-generated data to reference values from a control diver measured as Spearman rank correlation coefficient (rho) and expressed as a percentage in the text. This measure of accuracy is assumed to encompass all component sources of error.

Consistency
Similarity of data collected by separate volunteers during the same dive. This was measured as rank correlation coefficient and expressed as percentage in the text. This measure of consistency is assumed to encompass all component source of error.

Percent identified
The percentage of the total number of taxa present that were recorded by the volunteer diver. The total number of taxa present was derived from the control diver data (i.e., we assumed the taxa recorded by the control diver to be all the taxa present).

Correct identification
The percentage of volunteers that correctly identified individual taxa when the taxon was present.
Correctness of abundance ratings (CAR) This analysis quantified the correctness in abundance ratings made by the volunteer. It has been expressed as the percentage of the 72 surveyed taxa whose abundance has been correctly rated by the volunteer (i.e., the value of the rating indicated by the volunteer was equal to the reference value recorded by the control diver).
Similarity index Measure of similarity between each volunteer and the control diver ratings, using Czekanowski proportional similarity index.

Reliability
Measure of reliability between each volunteer and the control diver ratings, using Cronbach alpha (α) correlation.
The mean correct identification of each taxon varied from 3.8 to 94.7%, with a positive correlation between the number of validation trials in which the taxon was present and the level of correct identification performed by volunteers (ρs = 0.610, N = 77, p < 0.001), with a score increase of 21.5% between less present and most present taxa (Table 4; Figure 5).
The mean lower bound of the Czekanowski proportional similarity index (SI) of each validation trial ranged from 27.3 to 78.8%, with 91.2% of trials with mean SI between FIGURE 3 | Significant correlations between reliability parameters (Accuracy, CAR, Reliability, and Similarity Index) and independent variables (Date and Team Size). Results based on the 513 validation trials. Indicated in red the trendline of the correlations. ρs = Spearman correlation coefficient.  1; Figure 2). A 194 trials (37.8%) performed with levels of precision below the sufficiency threshold (SI, 95% CI lower bound ≤ 50%); 317 trials (61.8%) scored a sufficient level of precision (SI, 95% CI lower bound > 50% ≤ 75%), and 2 trials (0.4%) scored high levels of precision (SI, 95% CI lower bound > 75% ≤ 100%). SI was positively correlated with: team size (ρs = 0.107, N = 513, p < 0.05, Table 2; Figure 3), volunteers scores increased with number of present divers, with a score increase of 8.7% between small and big groups ( Table 3); volunteer diving certification level (ρs = 0.253, N = 513, p < 0.001, Table 2; Figure 4), volunteers scores increased with higher divers certification level, with a score increase of 21.2% between beginners and professional divers ( Table 3); dive time (ρs = 0.186, N = 513, p < 0.001, Table 2; Figure 4), volunteers scores increased with time spent underwater, with an increase of 21.4% between short and long dives ( Table 3). SI was not correlated with date (ρs = 0.032, N = 513, p = 0.465, Table 2) and depth (ρs = −0.004, N = 513, p = 0.924, Table 2). The mean lower bound reliability (α) of each validation trial ranged from 38.9 to 88.4%, with 93.4% of trials with FIGURE 4 | Significant correlations between the studied reliability parameters (Accuracy, Consistency, Percent Identified, Similarity Index, and Reliability) and the independent variables Diving certification level and Dive time. Results based on the 513 validation trials. Indicated in red the trendline of the correlations. ρs = Spearman coefficient value.  N is the number of trials in which the taxon was present (based on control diver sights). Table 1; Figure 2). Only 23 trials (4.5%) performed with an insufficient level of reliability (α, 95% CI lower bound ≤ 50%); 160 trials (31.2%) scored acceptable relationship with the control diver census (α, 95% CI lower bound > 50% ≤ 60%); 238 trials (46.4%) scored an effective reliability level census (α, 95% CI lower bound > 60% ≤ 70%); 92 trials (17.9%) performed from definitive to very high levels of reliability census (α, 95% CI lower bound > 70% ≤ 100%). Reliability was positively correlated with: team size (ρs = 0.212, N = 513, p < 0.001, Table 2; Figure 3), volunteers scores increased with number of present divers, with a score increase of 12.4% between small and big groups ( Table 3); volunteer diving certification level (ρs = 0.200, N = 513, p < 0.001, FIGURE 5 | Significant correlation between the percentage of correct identification performed by volunteers (expressed as mean percentage for each taxon) and number of trials in which each taxon was present (based on the control diver sighted). Based on 72 studied taxa, litter presence and sight of damaged corals (see Table 3). Indicated in red the trendline of the correlations. N = number analyzed organisms; ρs = Spearman coefficient value.  Figure 4), volunteers scores increased with higher divers certification level, with an increase of 11.1% between beginners and professional divers ( Table 3); dive time (ρs = 0.145, N = 513, p < 0.001, Table 2; Figure 4), volunteers scores increased with time spent underwater, with an increase of 11.0% between short and long dives ( Table 3). Reliability was not correlated with date (ρs = 0.029, N = 513, p = 0.515) and depth (ρs = −0.024, N = 513, p = 0.591) ( Table 2). Distance-based redundancy linear modeling analysis showed that the two variables "diving certification level" and "dive time" comprehensively explained about 82.7% of data variability, while the variable "team size" explained 13% of variability (Table 5; Figure 6).

DISCUSSION
Notwithstanding the large number of studied species, the accuracy of validation trials was promising, with most trials achieving a mean score between 50 and 70%. As pointed out by correlation and DISTLM analyses, most reliability parameters were positively correlated with the diving certification level, indicating that more experienced divers collected more accurate data. A possible explanation could be that expert divers have major confidence with the diving equipment and their underwater skills in comparison to beginner divers, allowing them focus more on the surrounding environment (Goffredo et al., 2010;Branchini et al., 2015b). Also, the dive time was positively correlated with most reliability parameters, suggesting that longer dives lead to higher data accuracy possibly because divers have more time to look around them and identify organisms.
Two reliability parameters (Accuracy and CAR) showed a positive correlation with the date. Although they are only two of seven parameters, this could suggest that citizen science projects should aim at a long-term duration due to the possibility to improve its implementation through feedbacks from volunteers, thus improving data quality.
Three reliability parameters (CAR, Similarity Index and Reliability) were positively correlated with team size, differently from previous studies where these relationships were not significant (Goffredo et al., 2010;Branchini et al., 2015b). This result could likely be related to presence of big groups belonging to the same diving school, that may be more guided by the instructor while filling in the questionnaire after the dive respect to single independent divers. Moreover, big groups of divers that stay close to each other to prevent the group from dispersing, could survey the marine environment in a more similar way to the control diver compared to small groups in which divers are free to dive. The anonymous data analysis did not allow us to test this aspect.
The lowest score within the analyzed reliability parameters was obtained by the Consistency parameter, with 86.9% of trials with mean consistency between 40 and 70%. This result is in line with previous studies that used the recreational approach and is likely related to the different personal interests of volunteers which made them focus on different species (Branchini et al.,   2015b). For example, divers interested in macro photography may have focused their attention on small benthic organisms, while others interested in large pelagic fish (e.g., sharks) may have focused their attention away from the reef. Higher consistency results have been found using intensive training program in marine life identification and survey techniques (Mumby et al., 1995;Forrester et al., 2015). While an intense training could increase the consistency of data collected, it will drastically reduce the number of volunteers involved. This could limit the educational role of citizen science projects on volunteers for the lower number of involved volunteers. The Czekanowski proportional similarity index (SI) showed that volunteers abundance ratings were below the sufficiency threshold in 37.8% validation trials, indicating that volunteers could encounter difficulties in abundance estimation as already found in other studies (Gillett et al., 2012;Done et al., 2017).
The wide variability of mean scores of the Correct Identification parameter could be due to the difficulty for volunteers to see and report the presence of less common or evident taxa (e.g., hermit crab that is frequently found between the rocks and blends in very well), while they performed better in recording the most common, well-known and straightforward species, as previously observed (Goffredo et al., 2010;Cox et al., 2012;Bernard et al., 2013;Branchini et al., 2015b;Forrester et al., 2015;Kosmala et al., 2016).
Previous studies that used the same methodology were performed, respectively, on 38 (Goffredo et al., 2010) and 61 validation trials (Branchini et al., 2015b). This study analyzed 513 validation trials that confirms previous trends permitting to generalize our results. A new result of this study is the team size variable as possible predictor for volunteers data quality, indicating that future data reliability studies should also consider this parameter.
As highlighted by different authors (Lewandowski and Specht, 2015;Kosmala et al., 2016;Specht and Lewandowski, 2018), a limitation of the approach used in this and other studies (Bell, 2007;Oscarson and Calhoun, 2007;Delaney et al., 2008;Aceves-Bueno et al., 2017) is that using professional or expert data, in the case of our study the "control diver, " as reference for evaluating volunteer data would also need an evaluation of correctness of the data collected by professionals or experts (Specht and Lewandowski, 2018). In this study control divers were marine biologist of the Marine Science Group trained in the project specifics that spent some weeks monitoring the biodiversity of the surveyed sites, which should assure a good quality of collected data.
In citizen science projects it is fundamental to develop suitable tasks for volunteers to assure good data quality collection (Schmeller et al., 2009;Magurran et al., 2010;Tulloch et al., 2013;Kosmala et al., 2016;Brown and Williams, 2019). In the present study data quality was assured: (1) by asking volunteers to fill the questionnaire soon after the dive, to avoid possible species oversight; (2) by training scuba instructors on the methodology of STE data collection on an annual basis (during public events) or on site when the control diver was present in the diving centers.
Moreover the overall data accuracy of this study was comparable to that performed in other projects by volunteer divers on precise transects (Mumby et al., 1995;Darwall and Dulvy, 1996;Goffredo et al., 2010;Done et al., 2017). This suggest that data from citizen science programs can complement professional datasets with sufficiently accurate data, increasing the possibility of researchers to estimate species richness and providing valuable information on species distributions that are relevant for the detection of the biological consequences of global change (Soroye et al., 2018).
Volunteers quality of data varies with tasks, they perform better at identifying iconic or well-known species while they can be confused by cryptic, rare or unknown specie Swanson et al., 2016). Some of the methods used to improve the quality of data collected by volunteers are training programs or the request of prequalification via a skill test and the use of ongoing feedback on the volunteers identification for long-term engaged volunteers (Danielsen et al., 2014;Kosmala et al., 2016;van der Wal et al., 2016). Volunteers improve their data accuracy by gaining experience with a project, so a long-term engagement could bring to higher quality of data collected (Weir et al., 2005;Crall et al., 2010;Kelling et al., 2015).
Scuba Tourism for the Environment project was developed in collaboration with several mass tourism facilities and diving centers. During the project, annual meetings with Ministry of Tourism of the Arab Republic of Egypt were carried out to give management and conservation suggestions based on project results.

CONCLUSION
This project provided additional evidence that "recreational" (Goffredo et al., 2004(Goffredo et al., , 2010 and "easy and fun" (Dickinson et al., 2012) citizen science is an efficient and effective method to recruit many volunteers and provide reliable data if well designed (Branchini et al., 2015b). The recreational citizen science approach used in the present study can be exported to different countries and used as a valuable tool by local governments and marine managers to achieve large-scale and long-term data collection, required in a fast-changing world where climate change and anthropogenic pressure on natural resources are leading to fast environmental changes worldwide.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Bioethics Committee of the University of Bologna. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
SG, SB, MMe, CM, AM, and EC collected data during the STE project. MMe, MMa, LL, MD, MTr, EN, MTi, RB, SB, PN, and SG analyzed the data. MMe, MMa, CM, EC, FP, AM, SF, and SG wrote the manuscript. SG supervised the research. All authors discussed the results and participated to the scientific discussion.

FUNDING
Sources of funding have been the Italian Government (Ministry of Education, University and Research; www.istruzione.it), the Egyptian Government (Ministry of Tourism of the Arab Republic of Egypt and the Egyptian Tourist Authority; www.egypt.travel), ASTOI (Association of Italian Tour Operators; www.astoi.com), the tour operator Settemari (www.settemari.it), the diving agencies SNSI, Scuba Nitrox Safety International (www.Scubasnsi.com) and SSI, Scuba School International (www.divessi.com), the traveling magazine TuttoTurismo, the airline Neos (www. neosair.it), the association Underwater Life Project (www. underwaterlifeproject.it), the Project Aware Foundation (www.projectaware.org) and the diving centers Viaggio nel Blu (www.viaggionelblu.org), and Holiday Service (www. holidaydiving.org). The project has had the patronage of Ministry of the Environment and Land and Sea Protection (www.minambiente.it).