Toward a global platform for linking soil biodiversity data

Soil biodiversity is immense, with an estimated 10-100 million organisms belonging to over 5000 taxa in a handful of soil. In spite of the importance of soil biodiversity for ecosystem functions and services, information on soil species, from taxonomy to biogeographical patterns, is incomplete and there is no infrastructure to connect pre-existing or future data. Here, we propose a global platform to allow for greater access to soil biodiversity information by linking databases and repositories through a single open portal. The proposed platform would for the first time, link data on soil organisms from different global sites and biomes, and will be inclusive of all data types, from molecular sequences to morphology measurements and other supporting information. Access to soil biodiversity species records and information will be instrumental to progressing scientific research and education. Further, as demonstrated by previous biodiversity synthesis efforts, data availability is key for adapting to, and creating mitigation plans in response to global changes. With the rapid influx of soil biodiversity data, now is the time to take the first steps forward in establishing a global soil biodiversity information platform.


Introduction
Soils are increasingly recognized as crucial components of ecosystems and biodiversity (Wardle et al., 2004;Bardgett and Wardle, 2010), and they represent unique compartments of terrestrial ecosystems by comprising components of the atmosphere, biosphere, hydrosphere, and lithosphere. Soil biodiversity supports many terrestrial ecosystem functions (Wall et al., 2012) and delivers important ecosystem services such as food and fiber production, carbon sequestration, and degradation of pollutants (Wardle, 2002;Wall et al., 2010). However, the data and information regarding diversity that lives in soil remains insufficiently cataloged and coordinated, and this limits our ability to fully assess the key role soil biodiversity plays in supporting terrestrial systems and ecosystem services. In contrast to soil systems, greater effort has been put toward cataloging global diversity in marine and other terrestrial systems (Appeltans et al., 2012;Jetz et al., 2012;Canhos et al., 2014;Hudson et al., 2014) and into making these data free and open access (Guralnick et al., 2007;Wieczorek et al., 2012). Global efforts to synthesize biodiversity data have proven highly successful in the transfer of information, have improved our understanding of species ecology and distribution patterns, and allows for better monitoring and response plans to global change effects (Hampton et al., 2013;Dirzo et al., 2014). Given that we are facing unprecedented environmental alterations through climate change, land use change, soil erosion, invasive species, desertification, and pollution, a better understanding of the global distribution and drivers of soil biodiversity is urgently needed to forecast functional changes of terrestrial ecosystems and to develop appropriate management practices. Therefore, here we review the rationale behind and the benefits of bringing together soil biodiversity data and information through a single global data platform.
Although it is known that soils are extraordinarily diverse, the scale of soil biodiversity is not yet fully understood (Wall et al., 2010). Global patterns of soil biodiversity are at most weakly documented (Decaëns, 2010;Tedersoo et al., 2014), and the locations of many soil biodiversity hotspots have not been identified. Part and parcel to the plethora of hyperdiverse taxonomic groups, global patterns of soil biodiversity are thought to differ significantly from what is reported aboveground (Maraun et al., 2007;Decaëns, 2010;Tedersoo et al., 2012;Ramirez et al., 2014). For example, soil microorganisms do not respond to large-scale environmental gradients in the same way as metazoans and belowground biodiversity hotspots do not necessarily mirror aboveground biodiversity patterns (Fierer and Jackson, 2006;Wu et al., 2011) Further, many species residing in soil remain taxonomically, phylogenetically, and functionally undescribed. This is most notable for microorganisms (McDonald et al., 2012) but it is also true for soil fauna (Behan-Pelletier, 1999;Rougerie et al., 2009;Bik et al., 2012). Therefore, categorizing species into discrete taxonomic units represents a challenge for soil biodiversity documentation where many of the species' characteristics and phylogenies are not yet available (Bardgett and van der Putten, 2014).
Regardless of these challenges, soil biodiversity research has dramatically increased over the last three decades, and the scope of soil biodiversity data is immense. Soil biodiversity data types range from classical specimen based collections (Burkhardt et al., 2014) to molecular and genomics samples (Gilbert et al., 2014). In between are a wide spectrum of communityaggregated data (i.e., trophic levels to relative abundances) organism attributes (e.g., abundance, biomass, and traits), and environmental measurements (e.g., georeference coordinates, biome type, soil characteristics, and climatic variables). Like other biodiversity information, soil biodiversity data can be digital and available online, though much data remains "dark"-not digitized or not available (Heidorn, 2008). Whether in a national repository, stored on a personal computer, or found in a museum drawer, the first step in any data synthesis project is to make dark data digitally accessible (Box 1) (Hill et al., 2012). Next is to establish a mechanism to link digitally available data globally (such as an online portal).
Here we present an independent initiative to assess and store information on global soil biodiversity; to link species, environmental, and other data and make data accessible at a global level. Our goal was to propose a system that could be linked to other biodiversity and ecosystem relevant databases, accommodate new and future methods and technologies, be useful to a wide array of end users (from the public to scientists to policy makers), and be free and open access.

Applied Advances
It is now commonplace to concurrently survey soil biodiversity and explore the role these organisms play in ecosystem functions and global sustainability (Wall et al., 2012;Bardgett and van der Putten, 2014). However, we still lack baseline values for soil biodiversity as well as reference values (either abundance ranges or occurrence) that may prove critical in assessing the current status of soils and implementing management and policy efforts to keeping soils and soil biodiversity in a so-called "normal operating range" (Jackson et al., 2007;Koch et al., 2013). This will be particularly important as we continue to understand the impact of certain global changes on soil biodiversity and their interactions within functioning food webs (Blankinship et al., 2011;García-Palacios et al., 2015). For example, agricultural intensification reduces the abundance of soil fungi relative to bacteria, reduces earthworms, mycorrhizal fungi, and increases the numbers of plant parasitic nematodes (Tsiafouli et al., 2015). Less is known on effects of incipient changes, or changes that BOX 1 | Digital soil biodiversity information is currently stored in a wide array of databases, warehouses, catalogs, and other repositories, and contains various types of data (see Supplemental Table 1  encompass temporally complex and indirect feedback effects, such as consequences of global warming, biological invasions, or habitat fragmentation (Blankinship et al., 2011;Lindo et al., 2012;Dickie et al., 2014). Reference values can be an important tool for determining the success of ecosystem restoration and comparing data across time scales (Frouz et al., 2004;Kardol and Wardle, 2010) and for detecting subtle trends in temporal soil biodiversity assessments (Bardgett, 2005). Specific indicators that can be accessed from a global platform, such as disease-suppression (Mendes et al., 2011) and nutrient retention capacity of soil (De Vries et al., 2013), can also be used by land managers in order to calibrate and further improve sustainability of production methods, or used to develop rapid and economic soil biodiversity assessment tools for use by policy makers and end users (Wall et al., 2012;Bone et al., 2014). As demonstrated by the Global Biodiversity Information Facility (GBIF) and other global data synthesis efforts (Otegui et al., 2013), access and availability of data has helped to predict the impact of climate change (Warren et al., 2013), monitor invasive species (Gatto et al., 2013) and inform on issues like human health (Daszak et al., 2013) and food and farming (Vincent et al., 2013). Further, the efforts by GBIF and Map of Life (MOL) support the work of the CBD, IPBES, GEO-BON, and many others (see GBIF.org). The inclusion of soil biodiversity data in such global assessments is a highly important and necessary next step.

Theoretical and Research Advances
The prospect of accessing global soil biodiversity information through a single portal will create novel opportunities to develop, refine, and test underlying ecological theory. The synthesis of biodiversity data across larger spatial scales and greater taxonomic breadth may uncover emergent properties that cannot currently be foreseen (Brose et al., 2012) and will give better insight into species' ecological preferences and geographical ranges (Brose et al., 2004;Fierer et al., 2013;Tedersoo et al., 2014). Here we identify five topic areas that, while not exhaustive, will be enhanced by a global data platform effort: (1) Macroecology and biogeographical patterns: Characterizing global patterns is of paramount importance for conservation of soil biodiversity and global change scenarios on the functioning of soil systems in a future world. A comprehensive view of biogeographic patterns will be critical to reveal important scientific questions, to discover where and why there are hot spots of biodiversity, to identify the drivers of belowground diversity, and will ultimately boost the use of macroecological approaches in soil ecology research (Fierer et al., 2013;Tedersoo et al., 2014).
(2) Biodiversity maintenance and loss: A synthesis of soil biodiversity data will help identify drivers and mechanisms underlying both the maintenance and loss of biodiversity in soil and dependent terrestrial systems. The support that belowground diversity gives to aboveground diversity is drastically underestimated, and by overlaying belowground and aboveground biodiversity patterns we can better assess the impact of biodiversity losses. Further, these efforts may prove especially important in terms of invasion ecology, identifying which groups are prone to invade (e.g., earthworms, Hendrix et al., 2008), and the mechanisms facilitating invasion (e.g., Dickie et al., 2014) and prevention efforts.
(3) Ecosystem functions and services: Soil organisms codetermine a plethora of provisioning and regulating ecosystem services (Wardle et al., 2004;Lavelle et al., 2006), but the appreciation of their functional significance remains deficient due to their cryptic nature and overlapping functions (Setälä et al., 2005). While conventional anthropogenic land management practices often have aimed to optimize certain (single) ecosystem functions or services (Cardinale et al., 2012), soil biodiversity exemplifies the value of multifunctional ecosystems (Setälä et al., 2014;Wagg et al., 2014). Recent evidence shows that the structure and composition of the soil community and the presence of specific functional groups, is key to delivering a range of ecosystem services, such as N retention and C storage (De Vries et al., 2013;Lange et al., 2015). (4) Community ecology: Soil communities are notoriously complex and conventional community ecological theory may be challenged by the spatially complex habitat soil organisms live in (Ettema and Wardle, 2002). Multitrophic soil biodiversity assessment may help to refine existing soil food web models (Digel et al., 2014). Further, globalscale information on the co-occurrence of different taxa in soil will shed light on the relative significance of trophic vs. non-trophic interactions in soil, top-down vs. bottom up forces and their interplays (Moore et al., 2004) and ecological network perspectives may provide useful tools to clarify interactions among the different soil functional groups and to certain ecosystem functions (Barberán et al., 2011;Morriën and van der Putten, 2013). (5) Aboveground-belowground interactions: As our knowledge of belowground communities increases, so too does our awareness of the important, complex interactions between soil organisms and aboveground biodiversity (Hooper et al., 2000). By revealing belowground biodiversity patterns, we can gain better insight into the linkages between above-and below-ground systems. Plus, soil biodiversity data will be made more valuable if it can be clearly linked to with data pertaining to aboveground communities (such as through the MOL or GBIF).

A Proposed Framework
Our ability to address a range of applied and theoretical questions, or to assess biogeographical patterns, is to a large extent limited by access and integration of the available data. Currently, there is no single repository or platform that allows access to soil biodiversity information, across all species, or at a global scale. Therefore, we propose a framework to initiate linking different databases and repositories via the internet (Figure 1). The end platform will be both a database and a free, open access portal to link various national and local data sources FIGURE 1 | Integration and access to soil biodiversity data will be accomplished in three phases: (I) discovery, (II) standardization, and (III) a final user interface, and the timing of these phases will be directly related to the effort and resources put in to the framework.
around the world. Linking data from existing databases is not trivial, nor is it a new challenge (Jetz et al., 2012). Previous efforts such as GBIF and MOL have demonstrated that because there are no required guidelines or consistency between studies or preestablished databases, minimum standards, and classifications must be identified. Soil biodiversity standards must then be harmonized with the global standards already in place (e.g., Yilmaz et al., 2011;Wieczorek et al., 2012). While applying even simple standards will lead to the omission of some studies and data, quality of the data will be valued over quantity, ultimately resulting in a higher quality synthesis. Integration and access to soil biodiversity data will be accomplished in three phases: discovery, standardization, and a final user interface: Phase I-"Discover" where soil biodiversity data is housed: This phase will be two-fold; first to establish a taxonomy list-a list of organisms living in the soil, and second to inventory soil biodiversity information. The taxonomy list will be shared with the GBIF to tag pre-existing soil related biological observations that can thereon be searched and queried [much like the Global Mountain Biodiversity Assessment (GMBA) (gmba.unibas.ch)] and allow for easier integration of new data. The "taxonomy list" and an inventory of soil biodiversity information will be made available through the Global Soil Biodiversity Initiative (GSBI). It is in this stage that data quality will be also assessed, a complicated issue all biodiversity data studies must deal with. We propose to follow guidelines set forth and established by GBIF.
Phase II-Establish a standardization framework by which to link past, present, and future data: Besides taxonomic synonyms it also will be necessary to develop and implement thesauri for the various information fields (i.e., regarding habitat or climate parameters, methods etc.). Standardized ontologies are necessary to link between different data sources and into GBIF (Supplemental Table 1) and other global data centers (such as MOL, ISRIC, EOL, Genebank, and others). Furthermore, to allow data comparability from the individual data sources, standardization of numeric (abundances, pH values, etc.), and nominal (i.e., habitat types, soil types) data will be crucial. Concurrently, we must also establish the minimum set of parameters needed, and formalize data copyright privacy and licensing rules. Together these efforts will provide the critical foundation and quality criteria on which to build the platform.
Short read sequence data: In the case of microbial marker gene sequence data (either 16S, 18S, ITS or similar) it is difficult to extract taxonomic information for a number of reasons (otu picking methods, chimeras, read length, Orgiazzi et al., 2014). Plus due to the enormous amount of sequence data, reprocessing the full datasets would not be tractable. Therefore, we propose to link short Frontiers in Ecology and Evolution | www.frontiersin.org read sequence data by location, rather than by taxon identification. This is based on the fact that there is currently no consensus on the correct protocol for handling these data, and integrating processed sequence data would introduce substantial methodological artifacts (Caporaso et al., 2010). Instead, our approach allows convenient access to these data linked to geography and allows users to process the data of interest using a consistent protocol based on individual research questions.
Phase III-Establish a user-friendly interface that allows for the integration and comparison of soil biodiversity data-here called "Soil Portal": The portal will be designed specifically for manipulation and analyses of the data in order to address the theoretical questions outlined above and to provide stakeholders with the type of information needed for management and policy decisions. It is in this phase that we would finally be able to combine collection data across taxonomic groups, spatial scales and research experiments. As demonstrated previously (Hill et al., 2012), users are reluctant to use any interface that costs time, therefore, we propose a platform that would offer researchers a set of tools, rewards for contributing their data to the community-such as data analyses tools, DOIs for data publication, and a link to other initiatives and data portals.

Outlook
In order to progress this project, first, buy-in from the community of soil biologists is required; our goal is to galvanize and guide soil ecologists to make their data available. Researchers can continue to upload data from their home repositories, data will not have to be uploaded more than once, and there is no need to support a single, comprehensive database-a monetarily expensive and time consuming task. The framework is designed so that participation in the effort to liberate individual datasets will only require minor changes to how researchers work (i.e., time for data input and training for students and young scientists), but has the potential for great individual rewards such as more publications (e.g., "data papers"), increased exposure leading to invitations and collaborations, as well as reciprocal access to a wealth of data from colleagues. Admittedly, in addition to the technical challenges outlined in the introduction, the main limiting factor of this proposal will be resources. Specifically, time and funds must be invested upfront to move this effort forward in an efficient way.

Conclusion
In response to unprecedented global environmental changes and the drastic impacts on biodiversity (Sala et al., 2000), there is a sense of urgency to bring together global biodiversity information that will provide the basis to determine the species and communities that are particularly vulnerable to change and extinctions (Scholes et al., 2008;Cardinale et al., 2012;Jetz et al., 2012) and focus conservation and management practices (Turner et al., 2015). The organisms that live in the soil are no exceptions. The focus of the outlined framework goes beyond species information, and therefore a major challenge and goal will be to integrate the different information types whereby a range of ecological questions can be addressed. Soil biodiversity information is of broad interest to other disciplines, including plant ecologists, agriculturalists, invertebrate ecologists, carbon and climate modelers, and would open new unique opportunities for collaboration between the groups. As such, we have designed a framework that will interface with other disciplines through GBIF and the like. In addition to data access and standardization, a priority of this effort will be analytical and visualization tools for end users. Beyond progressing scientific research these tools should help to communicate results and bring the interest of a larger, more general audience. Altogether, access to rapidly accumulating soil biodiversity information across the globe has the potential to improve research and elevate soil ecology to be on par with our understanding of aboveground systems.