Adopt a Pixel 3 km: A Multiscale Data Set Linking Remotely Sensed Land Cover Imagery With Field Based Citizen Science Observation

Citation: Low RD, Nelson PV, Soeffing C, Clark A and SEES 2020 Mosquito Mappers Research Team (2021) Adopt a Pixel 3 km: A Multiscale Data Set Linking Remotely Sensed Land Cover Imagery With Field Based Citizen Science Observation. Front. Clim. 3:658063. doi: 10.3389/fclim.2021.658063 Adopt a Pixel 3 km: A Multiscale Data Set Linking Remotely Sensed Land Cover Imagery With Field Based Citizen Science Observation


INTRODUCTION
Public participation is critical to the mission of Earth system science. Citizen science provides a personally meaningful way for the public to engage with the dynamic changes taking place on our planet, and to participate in scientific data collection and analysis at scales that are not otherwise feasible. An unexpected contribution of citizen science emerged during the 2020 COVID-19 pandemic, when we deployed an existing mobile app to engage spatially distributed students in co-creating and testing a citizen science project in lieu of a residential research internship program.
Global Learning and Observations to Benefit the Environment (GLOBE) is an international science and education program established in 1995, connecting students, teachers, and scientists in monitoring changes in the Earth system (Rock et al., 1997;Finarelli, 1998;Means, 1998;Berglund, 1999;Butler and MacGregor, 2003;Muller et al., 2015;Nugent, 2018). GLOBE participants from 126 countries, using more than 50 scientific protocols, have collected more than 200 million data environmental observations for use by scientists and students in research.
GLOBE recently expanded its mission to support citizen scientists at large and launched the GLOBE Observer 1 (GO) mobile application  to increase the spatial and temporal coverage of GLOBE data. Using four tools on the GO platform, citizen scientists report in-situ, ground-based observations of clouds, land cover, mosquito habitats, and/or tree height. GLOBE citizen science data complement remotely sensed data obtained from sensors on NASA's suite of airborne and spaceborne observing platforms . Citizen scientists are encouraged to make coincident observations using more than one GO tool, for instance, using the Land Cover and the Mosquito Habitat Mapper tools at the same site. Associating data from multiple tools increases its usefulness for a wider range of projects.
Building on the GLOBE mission to promote student and citizen science research, the Adopt a Pixel 3 km (Adopt a Pixel) framework was created to take advantage of the personal connection and familiarity citizen scientists have with their local landscapes. Adopt a Pixel applies a nested sampling framework to GO-obtained data, thus enabling quantitative and statistical analysis.
We piloted the Adopt a Pixel project with 74 high school research interns in summer 2020 as part of the STEM Enhancement in Earth Sciences summer research internship experience, hosted by Texas Space Grant and the University of Texas, Austin. The 2020 pandemic necessitated all internships to be conducted virtually. Interns were scattered across the continental U.S. and in two locations overseas. Because "safer at home" orders were in place in many areas, the project could not require students to participate in field data collection, as in previous summers, but all participants could examine and analyze very high-resolution satellite imagery. These logistical conditions contributed to our project's structure, research design, and resulting dataset. Participants selected a local study site of interest and applied their own knowledge to develop and analyze a robust 9 km 2 land cover data set. We provided 8 weeks of virtual research training support to participants, including "Meet Up and Do Science" coworking webinars, peer discussions, lectures, and mentor relationships with NASA scientists.
Ground-based observations of environmental data are critical to the interpretation and downscaling of satellite products, but more in-situ observations are needed, especially in regions where variable conditions are pronounced or where rapid change is occurring (Moorthy et al., 2020;World Health Organization, 2020). In-situ validation of land-use and landcover (LULC) products plays an important role in improving the accuracy of models employing remotely sensed data and map products for practical management purposes (Eriksen et al., 2018).
This project contributes to the ecosystem of citizen science initiatives documenting land cover, land use, and landscape change through photo data collection and/or analysis of remotely sensed aerial or space imagery. The ecosystem includes such projects as Geo-Wiki (Fritz et al., 2009), VIEW-IT (Clark and Aide, 2011), FieldScope (Switzer et al., 2012), LACO-Wiki , Field Photo Library (Xiao et al., 2011), and the Degree Confluence Project (Qian et al., 2020). These projects have enabled participating volunteers to produce data at an unprecedented rate (Muller et al., 2015) and play a critical role in obtaining the velocity, volume, and variety of data needed to continuously monitor our changing planet. Opportunistically collected data can be especially informative at large spatial scales. Projects employing Geo-Wiki data provide examples of robust outcomes arising from citizen science data (Fritz et al., , 2017See et al., 2015). Both the observers and the environmental features or phenomena of interest are not distributed evenly across space and time, and not all observations are equally valuable scientifically (Callaghan et al., 2019). Our approach enables citizen scientists, who are using GO, to access systematically collected data and conduct robust analyses of smaller, site-based data sets.
All GLOBE data are openly accessible and readily downloadable as CSV files using either the Advanced Data Analysis Tool or an API through the GLOBE website. 2 Descriptions of GLOBE metadata and data quality assurance procedures are detailed in Amos and Andersen (2019) and Global Learning to Benefit the Environment (GLOBE) (2019). Data quality assurance parameters for the GLOBE Observer tools used in this project are presented in . 2 https://www.globe.gov/globe-data

Geographic Area
For this dataset, each citizen scientist identified the center of a 9 km 2 Area of Interest (AOI) that they could access (Figure 1). This resulted in 49 AOIs unevenly distributed across the United States, Puerto Rico, and Germany.

Very High-Resolution Satellite Image Derived Land Cover Data
The center latitude and longitude of each AOI was uploaded to a project on Collect Earth Online (CEO), an open-source, cloudbased satellite image viewing and interpretation system (Saah et al., 2019). This platform ensures that there is "consistency in locating, interpreting and labeling reference data plots for use in classifying and monitoring land cover/land use change, " (Saah et al., 2019). Using the provided sampling tools, each AOI had Primary Sample Units (n = 36), each with dimensions of 100 × 100 m, systematically located with a 500 m spacing on a grid to minimize spatial autocorrelation at moderate resolution (Buchhorn et al., 2020). Each of these Primary Sample Units then had a systematic dot grid overlaid with 10 m spacing (n = 121).
These secondary 2 m circular sample units were subsequently labeled using a slightly modified land cover classification protocol ( Table 1) (Becker et al., 1998). Previous studies have shown that citizen scientists who collected land cover reference data using this protocol, "are at least as accurate as that collected by professionals, " (Becker et al., 1998).
Summarizing these secondary sample units allows calculation of the fractional and overall land cover for the Primary Sample Unit and the AOI. The selected very high-resolution imagery for interpretation was sourced as the MapBox Global Satellite Basemap 3 that is provided in a cloud-free color-corrected and sharpened 3-band imagery for the visible wavelengths of red, green, blue (RGB) derived from various sources with reported ground resolution of 50 cm.
Citizen scientists were aided in understanding their Primary Sample Units through the customizable GeoDash feature on the CEO platform (Markert et al., 2017). Both a Normalized Difference Vegetation Index (NDVI) and a Normalized Difference Water Index (NDWI) timeseries for each Primary Sample Unit were calculated from MODIS data and presented in the GeoDash (Geo, 1996;Didan, 2015).
Across the 49 AOIs based on the very high-resolution satellite imagery, the team labeled 1764 Primary Sample Units that had a representation on average of 33% tree canopy cover, 19.5% impervious surface cover, 16.3% grass cover, 16.5% building cover, 6% cultivated vegetation cover, 2% shrub cover, 1% river/stream flowing water cover, 1% lake or ponded water cover, and less than 1% for the categories of treated water (pools, containers) or irrigation ditches.

Ground Reference Land Cover Data
Within each of the 49 AOIs, oblique ground photos were collected using the GO Land Cover protocol (Kohl et al., 2021). These ground images provide a corroborating data source for land cover labeling in this dataset. The date and time of observation and the geolocation as obtained by the GPS receiver/location services built into the user's mobile device is collected by the GO app. The user answers a series of yes/no prompts to describe the surface conditions (potential reflectivity) at the site. Land cover is documented though 6 directional photos. For each cardinal direction (north, east, south, west), users are instructed to orient their camera to capture an image focused on the nearest 50 m. Upward and downward/images are collected to document atmospheric conditions/canopy cover and ground cover, respectively. Along with text field notes, citizen scientists have the option to label land cover elements and estimate the percentage they observe in the field for each directional image. These data are then submitted over the internet to the GLOBE database for archiving and eventual retrieval. Across the AOIs, there were 8,312 ground images collected using the GO Land Cover protocol at 1,047 locations. Only 39% of these locations were classified in the field using this tool. The viewsheds corresponding to field classified images show a dominance of building cover (26%), followed by impervious surface cover (25%), tree canopy cover (22%), herbaceous vegetation (19%), barren land (4%), open water (1%), and shrub cover (1%).

Data Advantages, Limitations, and Challenges
This Adopt a Pixel dataset has the advantage of being part of the GLOBE data ecosystem. Since 1995 GLOBE citizen scientists have contributed more than 200 million environmental measurements to the GLOBE database. Adopt a Pixel data is readily associated with these environmental observations, collected using more than 50+ scientist-developed research protocols. Together, these data contribute to the examination of diachronic landscape evolution resulting from large scale processes such as urbanization, globalization, and climate change (Kennedy et al., 2015).
GO enables a citizen scientist to collect coincident data using more than one tool: atmospheric conditions (Clouds), canopy height (Trees), still or stagnant pools of water (Mosquito Habitat Mapper), and land use and vegetation cover (Land Cover). We developed the nested Adopt a Pixel 3 km project with an eye toward future applications of land cover data to projects that employ more than one of the GO tools. For instance, land cover variables are critically important to include in predictive mosquito vector borne disease risk models because mosquito species have specific habitat preferences and microhabitat requirements, including plant height and density, both of which are captured in in-situ ground photos. Vegetationdependent associations have been identified in numerous studies (see systematic review by Sallam et al., 2017). However, the field research that documents spatial patterns in mosquito habitats (oviposition sites) and uses these data to inform the interpretation of mosquito breeding sites from satellite imagery is still nascent. In a recent publication, Lorenz et al. (2020) called for scientists to test mosquito habitat and distribution models using freely available satellite imagery, such as Landsat 8.
From an initial 79 submitted AOIs, a filter for completeness was applied to select those that included the 36 labeled Primary Sample Units and had associated GO Land Cover photos, resulting in the 49 AOIs presented here. During data collection activities and subsequent review of the satellite image classifications, we identified context-dependent errors in the categorical assignment of specific thematic attributes by the citizen scientist team. We adjusted the definitions used by the volunteers to provide clearer definitions of irrigated fields vs. cultivated lawns vs. open rangeland, and manually fixed these errors. In a future iteration of this seasonal project, we intend to expand our work to systematically identify bias and residual errors in land cover classification by the citizen scientists. To improve the quality of this data set, we are working on adding inter-rater reliability statistics on Adopt a Pixel 3 km data and assign a reliability index to land cover classifications which will facilitate an independent dataset update with appropriate documentation. These data are available to be employed training computer-vision algorithms that will verify the accuracy of citizen scientist classifications (Xing et al., 2018;Ceccaroni et al., 2019;McClure et al., 2020).
Spatial accuracy is problematic for many citizen science programs that rely on the built-in GPS receiver of a user's personal mobile device. A recent study of horizonal positional error exhibited in geolocations identified using an iPhone 6 averaged between 7 and 13 m (Merry and Bettinger, 2019). The GO Land Cover tool provides an estimate of accuracy to the user, and they are asked to refresh their GPS reading until they obtain the lowest error reading (3-65 m). This step introduces potential human error: if the GPS receiver is not refreshed, the reported geolocation will have greater positional error. We plan to document such errors systematically in the next season and explore the potential of the Adopt a Pixel data to quantify positional error for citizen science data obtained through GO.
One of the analytical challenges associated with opportunistic data sets relates to the unique and inherent spatial and temporal biases that pose statistical and data informatics challenges for the end user (Muller et al., 2015). A variety of techniques and statistical procedures can be applied to improve or characterize the reliability of opportunistic data (Isaac et al., 2014;Lukyanenko et al., 2016Lukyanenko et al., , 2020Aceves-Bueno et al., 2017). One of the most common approaches is through models comparing opportunistic data with an embedded data set that employs a structured sampling design (Giraud et al., 2016). Such a model-based approach has been used with land cover data to meet conditions necessary to reduce errors and obtain useful outcomes from data collected opportunistically (Stehman et al., 2018;Henckel et al., 2020). However, the Adopt a Pixel data set was collected using a systematic sampling design to overcome some of the biases and analytical limitations associated with opportunistic data sets.
There is a concern that rapidly expanding access to personal mobile devices is resulting in "a fragmented landscape where there are a large, and increasing, number of citizen science type projects collecting data which are often highly specific to those projects." (Higgens et al., 2016). While Adopt a Pixel supports research using data specific to the GLOBE Program, it can also contribute to broad scale mapping science initiatives such as the Land Change Monitoring, Assessment, and Projection project (LCMAP) Pengra et al., 2020), the scale of which requires the use of all available data.
In this pilot project, citizen scientists examined satellite imagery coupled with their in-situ LULC observations and classified the images using Collect Earth Online (CEO), an open source, web-based tool designed for systematic LULC data analysis. Adopt a Pixel contributes to describing GO Land Cover data in a way that will improve the ability of interested scientists to assess the quality and fitness-for-use of GO data in their research. As we continue to find ways to evaluate and document the quality of GO data, we expect to see increasing scientific and societal applications of the data in research.

DATA ACCESS
The Adopt a Pixel 3 km data, along with its metadata description, is hosted in the openly accessible Earth System Data Exploration Portal, at https://geospatial.strategies.org/pages/publication-data (accessed October 06, 2021). This portal hosts curated data sets and associated metadata derived from citizen science data reported using GLOBE Observer. Additional functionalities provided in the portal include ready to use source data, dashboards, and data processing scripts.

DATA AVAILABILITY STATEMENT
These data were obtained from the GLOBE Program. Curated data sets on which this article is based, as well as the Python code employed in quality assurance and metadata descriptions are available at https://geospatial.strategies.org/pages/publicationdata .