Key Questions for Next-Generation Biomonitoring

Makiola, Andreas; Compson, Zacchaeus G.; Baird, Donald J.; Barnes, Matthew A.; Boerlijst, Sam P.; Bouchez, Agnès; Brennan, Georgina; Bush, Alex; Canard, Elsa; Cordier, Tristan; Creer, Simon; Curry, R. Allen; David, Patrice; Dumbrell, Alex J.; Gravel, Dominique; Hajibabaei, Mehrdad; Hayden, Brian; van der Hoorn, Berry; Jarne, Philippe; Jones, J. Iwan; Karimi, Battle; Keck, Francois; Kelly, Martyn; Knot, Ineke E.; Krol, Louie; Massol, Francois; Monk, Wendy A.; Murphy, John; Pawlowski, Jan; Poisot, Timothée; Porter, Teresita M.; Randall, Kate C.; Ransome, Emma; Ravigné, Virginie; Raybould, Alan; Robin, Stephane; Schrama, Maarten; Schatz, Bertrand; Tamaddoni-Nezhad, Alireza; Trimbos, Krijn B.; Vacher, Corinne; Vasselon, Valentin; Wood, Susie; Woodward, Guy; Bohan, David A.

doi:10.3389/fenvs.2019.00197

REVIEW article

Front. Environ. Sci., 09 January 2020

Sec. Environmental Informatics and Remote Sensing

Volume 7 - 2019 | https://doi.org/10.3389/fenvs.2019.00197

This article is part of the Research TopicA Next-Generation of Biomonitoring to Detect Global Ecosystem ChangeView all 11 articles

Key Questions for Next-Generation Biomonitoring

Andreas Makiola¹

Zacchaeus G. Compson^2,3

Alex Bush^2,9

Elsa Canard¹⁰

Tristan Cordier¹¹

Simon Creer⁸

R. Allen Curry¹²

Patrice David¹³

Alex J. Dumbrell¹⁴

Dominique Gravel¹⁵

Mehrdad Hajibabaei¹⁶

Brian Hayden¹⁷

Berry van der Hoorn⁵

Philippe Jarne¹³

J. Iwan Jones¹⁸

Battle Karimi¹

Francois Keck⁷

Martyn Kelly¹⁹

Ineke E. Knot²⁰

Louie Krol^5,6

Francois Massol^21,22

Wendy A. Monk^2,23

John Murphy¹⁸

Jan Pawlowski¹¹

Timothée Poisot²⁴

Teresita M. Porter^16,25

Kate C. Randall¹⁴

Emma Ransome²⁶

Virginie Ravigné²⁷

Alan Raybould^28,29,30

Stephane Robin³¹

Maarten Schrama^5,6

Bertrand Schatz¹³

Alireza Tamaddoni-Nezhad³²

Krijn B. Trimbos⁶

Corinne Vacher³³

Valentin Vasselon³⁴

Susie Wood³⁵

Guy Woodward²⁶

David A. Bohan¹^*

¹Agroécologie, AgroSup Dijon, INRA, Université Bourgogne, Université Bourgogne Franche-Comté, Dijon, France
²Environment and Climate Change Canada @ Canadian Rivers Institute, Department of Biology, University of New Brunswick, NB, Canada
³Centre for Environmental Genomics Applications, St. John's, NL, Canada
⁴Department of Natural Resources Management, Texas Tech University, Lubbock, TX, United States
⁵Naturalis Biodiversity Center, Leiden, Netherlands
⁶Institute of Environmental Sciences, Leiden University, Leiden, Netherlands
⁷CARRTEL, USMB, INRA, Thonon-les-Bains, France
⁸School of Natural Sciences, Bangor University, Bangor, United Kingdom
⁹Lancaster Environment Centre, Lancaster University, Lancaster, United Kingdom
¹⁰UMR 1349 IGEPP, INRA, Université de Rennes 1, Agrocampus Ouest Rennes, Domaine de la Motte, Le Rheu, France
¹¹Department of Genetics and Evolution, University of Geneva, Science III, Geneva, Switzerland
¹²Canadian Rivers Institute, Biology, Forestry and Environmental Management, University of New Brunswick, Fredericton, NB, Canada
¹³CEFE UMR 5175, CNRS—Université de Montpellier - Université Paul-Valéry Montpellier–IRD—EPHE, Montpellier, France
¹⁴School of Biological Sciences, University of Essex, Colchester, United Kingdom
¹⁵Département de Biologie, Université de Sherbrooke, Sherbrooke, Canada
¹⁶Centre for Biodiversity Genomics and Department of Integrative Biology, University of Guelph, Guelph, ON, Canada
¹⁷Stable Isotopes in Nature Laboratory, Canadian Rivers Institute, University of New Brunswick, Fredericton, NB, Canada
¹⁸School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom
¹⁹Bowburn Consultancy, Durham, United Kingdom
²⁰Institute of Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, Netherlands
²¹Univ. Lille, CNRS, UMR 8198—Evo-Eco-Paleo, SPICI Group, Lille, France
²²Univ. Lille, CNRS, Inserm, CHU Lille, Institut Pasteur de Lille, U1019–UMR 8204–CIIL–Center for Infection and Immunity of Lille, Lille, France
²³Faculty of Forestry and Environmental Management, University of New Brunswick, Fredericton, NB, Canada
²⁴Département de sciences biologiques, Université de Montréal, Montreal, QC, Canada
²⁵Great Lakes Forestry Centre, Natural Resources Canada, Sault Ste. Marie, ON, Canada
²⁶Department of Life Sciences, Silwood Park Campus, Imperial College London, London, United Kingdom
²⁷CIRAD, UMR PVBMT, Saint-Pierre, France
²⁸Syngenta Crop Protection AG, Basel, Switzerland
²⁹School of Social and Political Science, The University of Edinburgh, Edinburgh, United Kingdom
³⁰The Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Global Academy of Agriculture and Food Security, Edinburgh, United Kingdom
³¹Laboratoire MMIP—UMR INRA 518/AgroParisTech, Paris, France
³²Department of Computer Science, University of Surrey, Guildford, United Kingdom
³³BIOGECO, INRA, Univ. Bordeaux, Pessac, France
³⁴Agence Française pour la Biodiversité, Pôle R&D ECLA, Évian-les-Bains, France
³⁵Cawthron Institute, Nelson, New Zealand

Classical biomonitoring techniques have focused primarily on measures linked to various biodiversity metrics and indicator species. Next-generation biomonitoring (NGB) describes a suite of tools and approaches that allow the examination of a broader spectrum of organizational levels—from genes to entire ecosystems. Here, we frame 10 key questions that we envisage will drive the field of NGB over the next decade. While not exhaustive, this list covers most of the key challenges facing NGB, and provides the basis of the next steps for research and implementation in this field. These questions have been grouped into current- and outlook-related categories, corresponding to the organization of this paper.

Introduction

Classical biomonitoring techniques (Table 1) have focused primarily on measures linked to various biodiversity metrics (e.g., species richness, beta diversity; Li et al., 2010; Gutiérrez-Cánovas et al., 2019) and indicator species (but see Vandewalle et al., 2010; Culhane et al., 2014; Saito et al., 2015 for other approaches). Next-generation biomonitoring (NGB) describes a suite of tools and approaches that allow the examination of a broader spectrum of organizational levels—from genes to entire ecosystems. A more holistic vision of evaluating ecological structure and change has long been a goal of ecology, but only recently have the tools emerged to bring it toward fruition. In this issue of Frontiers in Ecology & Evolution, which explores the research topic, “A Next Generation of Biomonitoring to Detect Global Ecosystem Change,” we explore this complementary suite of new tools that could be forged into a global approach to biomonitoring. In this overview paper, we attempt to synthesize opinion on the key issues that are necessary to address en route to this next generation of biomonitoring tools. We focus on a key subset of these tools—those based on DNA metabarcoding as a new standard methodology for multiple taxonomic identifications—for which the number of papers published has increased exponentially since 2010 (Figure 1).

TABLE 1

Table 1. Glossary of terms as used in this paper.

FIGURE 1

Figure 1. Exponential rise in the number of published, peer-reviewed articles (A) and the number of citations of these articles (B) about next-generation biomonitoring. Figures depict data obtained through a systematic query of the Web of Science database using the Boolean search: “^*monitoring” AND “^*DNA” AND “metabarcoding”.

DNA metabarcoding generates massive amounts of data on taxonomic units (e.g., operational taxonomic units, OTUs, or exact sequence variants, ESVs; Callahan et al., 2017) rapidly, and these can be linked increasingly to functional attributes (Douglas et al., 2018; Makiola et al., 2019). DNA metabarcoding is highly complementary to whole metagenomic and metatranscriptomic sequencing (Knight et al., 2018), existing sources of ecological information (Cordier et al., 2018; Derocles et al., 2018) and classical biomonitoring approaches (Deiner et al., 2017); in all cases, adding genomic and/or ecological information to the rich taxonomic lists afforded by DNA metabarcoding would allow deeper exploration of ecological or biodiversity patterns. This would move biomonitoring closer to being able to extract both structural and functional attributes from the same multispecies sample (Keck et al., 2017; Cordier et al., 2019). By merging DNA metabarcoding with ecological information and machine learning approaches, NGB extends modern analytical potential beyond the classical morphological identification of bioindicator species. For instance, taxonomic lists from DNA metabarcodes can identify anthropogenic drivers behind community change and infer networks of possible ecological interactions and associated ecosystem properties (Bohan et al., 2017; Compson et al., 2018). While challenges to constructing these networks from NGB data remain (e.g., Barner et al., 2018; Freilich et al., 2018; Deagle et al., 2019), this overview paper discusses some promising ways of overcoming these limitations, including using trait filters developed from published literature and methods of inferring interactions (e.g., machine learning), and these ideas are developed in more depth in the associated manuscripts of this special issue. Indeed, the ultimate aim of NGB is to deliver this more integrated view of natural ecosystems at a fraction of the time and cost of classical approaches (Baird and Hajibabaei, 2012; Keck et al., 2017; Leese et al., 2018; Cordier et al., 2019). Building this large-scale monitoring poses many challenging questions, from the practical and logistical to the political and philosophical.

Here, we frame and describe the interplay of ten key questions that we envisage will drive the field over the next decade (Figure 2). Questions 1–7 address issues that are of current importance, and pertain to the scope of NGB. Questions 8–10 are questions of outlook and opportunity, exploring where the field might be going. This list emerged as an overview of the current Frontiers special issue on the research topic: “A Next Generation of Biomonitoring to Detect Global Ecosystem Change.” While not exhaustive it covers most of the key challenges facing NGB, and provides the basis of the next steps for research and implementation in this field.

FIGURE 2

Figure 2. Diagrammatic representation of the interplay between the Key questions for next-generation biomonitoring presented in this paper. Next-generation biomonitoring (NGB) is based on a holistic view of ecosystems through integrating new technologies and exploring synergies with existing data sources. For its realization, it will be necessary to both automate many bioassessment processes and separate the steps of biodiversity detection and explanation of ecosystem change.

Current Questions

How Can the Benefits of NGB Be Most Successfully Communicated to Citizens, Scientists, and Policymakers?

Managing issues of human health, food production and security, and the intertwined environmental issues of biodiversity and ecosystem services necessitates biomonitoring (Bush et al., 2019a; Schmidt-Traub et al., 2019). Information about the status of these issues, such as changes in the frequency of human (Jones et al., 2008) and crop diseases (Savary et al., 2019), insect declines (Hallmann et al., 2017), and losses of species of flowering plants (Carvell et al., 2006) are expected to lead to profound changes in human behavior and appreciation of the environment (Schröter et al., 2017). However, the vision of a broader scale evaluation of ecosystem change, and the benefits this will bring to citizens, scientists, and policymakers, needs to be clearly communicated if wide adoption of NGB approaches is to be realized.

There are three clear benefits of NGB. First, as is argued across the papers of this Issue, NGB has the potential to provide a more holistic method of assessment than classical biomonitoring, affording improved decision-making and management of issues that affect citizens' quality of life. Second, while NGB will provide methods for detailing the complexity of ecosystems, it will also use methods, such as ecological networks, which render this complexity comprehensible, communicating to citizens the richness of their local ecosystem and responses to change (Pocock et al., 2016). Third, NGB can foster citizen participation and buy-in to biomonitoring if it underpins evidence-based decision-making (Hodgetts et al., 2018), and projects with high public participation or strong community science components can produce tangible change in management (Schröter et al., 2017). Portable DNA sequencing instruments allow individuals with relatively little training to generate data; for example, Quick et al. (2016) used this approach to develop a tool to monitor the 2015 Ebola outbreak in Central Africa with a 24 h response time. Similar kits are being developed for use by members of the public to monitor local plant and human disease prevalence and the status of pests in agricultural fields and waterways.

For policy, NGB will not only achieve what classical biomonitoring currently does, such as by reporting on agreed classic indicator species or assemblages, but will also allow the inference and prediction of higher level ecosystem properties (Evans et al., 2016; Compson et al., 2019). In principle, NGB could facilitate remedial decision-making, allowing its accompanying management to be trialed before it is implemented. NGB has the potential to enable monitoring of changes in ecosystem structure and function in something close to real time, because large elements of biomonitoring can be automated, reducing the latencies and biases in human-dependent biomonitoring (Quick et al., 2016; Bohan et al., 2017), bringing science one step closer to the vision of biomonitoring any ecosystem in any biome of the globe. Large coverage would also help to avoid some of the “shocks” associated with the loss or the sharp decline of keystone species and major ecosystem processes long after a tipping point has occurred (Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services, IPBES¹). Finally, the generality of NGB enables fusing of multiple areas of biomonitoring that are currently distinct and managed separately. Monitoring of disease, invasions, climate, and land-use change could be undertaken simultaneously, greatly reducing the cost of NGB by pooling resources and sharing expenses. This would, in turn, increase the amount of biomonitoring that might be done, increasing its efficiency.

Despite these potential benefits, which have become apparent, adoption of the latest methodologies into management and decision-making processes has been slow, often hindered by miscommunication between research/scientists and management/policy partners (Darling and Mahon, 2011). Nevertheless, the number of initiatives for a global-scale biomonitoring of biodiversity that maximize cooperation and communication between scientists, policymakers and citizens is increasing. These include the development of new indicators, such as Essential Biodiversity Variables (EBVs, Kissling et al., 2018), ontologies for global biomonitoring (Global Infrastructures for Supporting Biodiversity, GLOBIS-B), storage and linking of data-sets (Global Biodiversity Information Facility, GBIF²), and routes into global scale policy (Group on Earth Observations Biodiversity Observation Network, GEO BON³; Global Earth Observation System of Services, GEOSS⁴). Scientists working on NGB should participate actively in these efforts. For example, the EU Co-Operation in Science and Technology (COST) action DNAqua-Net⁵ gathers scientists in order to improve biomonitoring of aquatic ecosystems (Leese et al., 2018; Pawlowski et al., 2018), and has a working group dedicated to the discussion of regulatory and policy frameworks where scientists and stakeholders work collaboratively (Hering et al., 2018). The Interreg European Regional Development Fund project Synergie transfrontalière pour la bio-surveillance et la préservation des écosystèmes Aquatiques (SYNAQUA⁶) shares a similar aim to gather panels of stakeholders to design scenarios for future NGB implementation for freshwater ecosystem biomonitoring in France and Switzerland (Lefrançois et al., 2018). The benefits of NGB should, in turn, leverage new policy, providing a better fit into the current regulatory and policy frameworks for the more “complex” metrics and indicators of ecosystem structure, function, and services. The role of science and scientists should be to critically appraise the development of the NGB approach and, in so doing, advocate for the benefits of NGB and establish a dialogue between relevant biomonitoring scientists, citizens, industry end-users, and policymakers.

What Is the Appropriate Spatio-Temporal Scale for NGB?

A recurrent message of the papers in this issue is that the scales of biomonitoring, both in terms of spatial extent and temporal frequency of sampling, need to be greatly enlarged if we are to appropriately monitor and assess the risks to ecosystems (see Ovaskainen et al., 2019), identify and evaluate the core drivers of ecosystem dynamics and stability, and make decisions for their management. This will require solutions to some of the practical framework problems that limit the scales of the current generation of biomonitoring approaches, including socio-economic, political, and local management issues.

Current biomonitoring is heavily skewed toward terrestrial Europe, North America, Australia, and New Zealand (Cavallo et al., 2019; McGee et al., 2019). This is due, in part, to a lack of expertise in biomonitoring and interpretation in many countries, a global shortage of finance, as well as a limited acceptance of conventional methods. One avenue that might contribute to a solution, besides better communication of NGB (see Question 1), is to simplify the biomonitoring process into component steps. NGB would consist of two essential steps: (1) sample collection and the detection of ecosystem change; and then, only where change is detected, (2) explanation and prediction. Such separation would greatly reduce the need for expertise in all parts of the globe. Automated and high-throughput sampling and detection of change would take place at large temporal and spatial scales, including parts of the globe with poor coverage at present (using field technicians, citizen scientists, or drones), with the expertise to explain any detected change being outsourced to regional centers of excellence, much as already exists for the World Health Organization Regional Offices and the networks of experts they support (WHO⁷). The two-stage process would also lower costs for a given scale of coverage, thus making better use of the available finance. A challenging framework question will be what the definition of “change” is, which may vary between different countries and regions. Having the necessary, near-real-time assessments of change is something that is currently only achievable using the NGB approach.

While it is clear that scalability and reusability of global biomonitoring data are necessary to answer large-scale ecological management questions, this can only be achieved where the steps of sampling and detection of change are automated and standardized, making data machine-readable so that information from different systems is comparable and shareable, and can be integrated with other, existing sources of environmental and ecological information (Poisot et al., 2016, 2019). Automating the process of taxa identification, network construction and inference, and comparison to reference states will require considerable technological development (Bohan et al., 2017; Lausch et al., 2018).

Environmental DNA (eDNA) describes genomic materials shed from organisms into their environment that represent the “template” for NGB analysis. eDNA data quality can be influenced by almost every step in the taxa identification workflow (Zinger et al., 2019), from sample collection (Dickie et al., 2018), DNA extraction (Lear et al., 2018), choice of gene or target region, selection of Taq polymerase, polymerase chain reaction (PCR) cycling protocol, primer, choice of sequencing platform, bioinformatic pipelines (Deiner et al., 2017; Makiola et al., 2018; Bush et al., 2019a; Pauvert et al., 2019), and taxonomic reference databases utilized (Porter and Hajibabaei, 2018). These potential challenges compound with the myriad context-specific influences on the ecology of eDNA, such as abiotic and biotic influences on eDNA production, degradation, and transport in the environment (Barnes and Turner, 2016). Standardization or calibration of sampling protocols and other methods in the workflow can improve reproducibility by allowing compilation and comparison of data from across studies (Dickie et al., 2018). Such standardization can be attractive for the majority of users, being both cheap and efficient, even where their research needs differ, as has been successfully demonstrated in The Earth Microbiome Project⁸ (Thompson et al., 2017) and the Global ARMS (Autonomous Reef Monitoring Structures) Program⁹ (Ransome et al., 2017).

To tap the full potential of biomonitoring data, it will be necessary to improve curation and access to the rich reference datasets that have already been generated. Due in part to specific institutional regulations, there is a lot of genetic reference material that is only available to researchers within certain institutions. Since molecular-based identifications are heavily dependent on the quality and completeness of the reference databases, this research field will collectively benefit from incentives to curate and upload reference sequences to publicly available databases. Ensuring that these datasets are available in a usable format to interested researchers across the globe represents a major challenge to the field, but one which must be met in order to address global changes in biodiversity and species distribution (Poisot et al., 2016, 2019; Desjardins-Proulx et al., 2019). The definition of the ontologies that will allow NGB data to be machine-read and automated, assuring quality control and the integration of metadata from biomonitoring and associated disciplines, has begun but requires large-scale adoption across fields to be useful.

Knowledge from existing sources (e.g., remote sensing, chemical screening, trait databases) could be integrated into NGB via machine-readable ontologies to generate data synergies and explore novel ecological questions (Bohan et al., 2017; Lausch et al., 2018). For example, this approach could be used to supplement DNA taxa lists with functional trait information for the development of more advanced, predictive heuristic network models (sensu, Compson et al., 2018), while simultaneously creating new—and supplementing existing—databases of taxonomic traits, such as organismal body size or trophic linkages (Kissling et al., 2018). Since the integration of multiple traits and bioindicators holds one of the biggest potential synergies, a possible answer to this question could be working with other initiatives, such as GLOBIS-B, GEO BON, GBIF (Canhos et al., 2015), and the Aquatic eDNA Atlas Project¹⁰, as noted in Question 1, toward a common, decentralized, global biodiversity data platform.

What Is the Most Productive Balance Between Case-Specific and Generic NGB Methodologies?

One promise of NGB is to provide general biomonitoring methodologies and comparisons across potentially any ecosystem, including those currently poorly studied or unknown. The search for rationalized, common approaches has begun in certain disciplines, including in aquatic environments (Goldberg et al., 2016), but as the field matures more general guides or approaches may be achievable. Ecosystems are ecologically distinct, but each has unique scales of operation that should be reflected in the spatial scales, frequencies, and replication of sampling. The scales of application of biomonitoring are currently constrained by the methodology used, with most survey methods designed to assess local taxonomic groups of interest. This leads to methodological heterogeneity across regions (Borja et al., 2009; Birk et al., 2012), encumbering efforts to scale up to regional or national levels (Voulvoulis et al., 2017).

Two approaches might be adopted to standardize NGB methodologies. The first would be to sample at the finest spatial resolution possible—at high frequency, in any or all ecosystems across the globe—to store copious amounts of data and to invest in the computational hardware and bioinformatics to detect, forecast, and monitor change. This approach would produce datasets that are both close to complete and an invaluable monitoring and ecological resource, with as yet unforeseen benefits, but the data would come at the cost of collection and curation that may not warrant the increase in efficacy, especially where the detection of system change or changing processes does not require such high-resolution data. However, with plummeting costs this approach will likely be increasingly feasible in the future.

The alternative approach would build upon generic expectations of the rate and temporal dynamics of change in order to identify the required frequency of sampling. The spatial scales of sample independence and representation might then be identified across examples of the ecosystem, indicating appropriate levels of replication to assure detection, with an appropriate power, of given levels of acceptable change. Ma et al. (2018a) described generic, multiscale approaches adopted from the theory of networks to examine temporal and spatial variation. These approaches treat network structure as essentially being independent of the taxa involved in the networks, and use network profiling, null models, and multilayer networks to make statements about the expected level of change that is and is not acceptable in pure network structural terms. This information can then be fed into ecological modeling and robust forecasting studies.

A standardized but general methodology for sampling would maximize scalability, interpretability, and impact of NGB. It is unlikely, however, that the specification of sampling would conveniently lead to a common set of results for all ecosystems to be examined. Rather, any generality that might exist would likely be limited to some combination of the biome being sampled (i.e., air, soil, water), and the organizational (i.e., regional and local networks, communities, species, populations, individuals, or genes) and taxonomic levels. Generality may only be delivered by an ecological understanding of ecosystem structure, probably facilitated by network approaches.

What Are the Appropriate Indicators of Change?

To move biomonitoring forward, science and policy need to explore how: (1) NGB information could lead to new indicators for metacommunities; (2) novel indicators build upon and contribute to existing indicators and frameworks (e.g., Tapolczai et al., 2019); and (3) spatio-temporal metacommunity scales influence the interpretation of these novel indicators. The indicator concept proposes that the ecological state of an ecosystem can be evaluated by observing a particular taxon or taxonomic group or function (De Cáceres and Legendre, 2009). Taxon-free indicator metrics, such as Indices of Biotic Integrity (IBI), are appealing to environmental practitioners and policymakers because they distill a lot of information down to a simple metric that, in principle, can be compared across systems. However, their simplicity is likely the reason why such metrics may be misused in practice (Seegert, 2000). Further, while indicator species or IBIs might be useful at local spatial scales, they are not applicable across the many habitats, ecosystems, or biomes (Angermeier et al., 2000) that can be monitored using next-generation methods. Pairing molecular-based approaches with machine learning for NGB can potentially recover orders of magnitude more information in biomonitoring data, thus eliminating many of the constraints that hindered the development of biomonitoring indicators we use today. For example, building ecological networks from this recovered data might be used to analyze whole-network properties with ecosystem functions and services (Evans et al., 2016), providing a mechanistic link between network structural change and ecological functions. There certainly is a lot of work to be done to explore and develop these higher-level, network indicators, as well as to determine which network properties will be useful for predicting ecosystem consequences to environmental change. Once developed, however, these tools should provide immediate added-value to the taxonomic lists generated by NGB, as well as to the classical, biomonitoring approaches, especially considering the cost effectiveness of routine, open-source pipelines for the rapid calculation of such (e.g., ecological network) indicators.

Scaling up from a local- to a large-scale approach should furthermore incorporate recent advances in metacommunity ecology into biomonitoring, in order to make sense of the connections that exist among communities across landscapes. Leibold and Chase (2017) expounded the compelling argument that we should combine previously competing concepts of community assembly, such as neutral theory, species sorting, patch dynamics, and mass effects into a single, overarching theory. Ecosystem biomonitoring is strongly rooted in local observation and a normative interpretation, yet it often fails to take into account spatio-temporal variability and connections among sampled localities, arguably leading to over-interpretation of local-scale deviations from a putative “normal state” (Baattrup-Pedersen et al., 2017). We may also underestimate the influence of metacommunity effects on the drivers of local dynamics and, consequently, biomonitoring observations. The scale-limited spatio-temporal scope of biomonitoring studies also carries a serious risk of missing large-scale phenomena that could have potentially devastating impacts, such as biological invasions (Kamenova et al., 2017) or global declines in insects (Hallmann et al., 2017) that went largely unnoticed in policy for nearly 30 years (IPBES¹). DNA-based approaches offer a potential avenue to address this challenge, and we should seize the opportunity to both develop NGB methods by further refinement and testing and promote these methods to policymakers, citing their many benefits.

How Will NGB Benefit From Machine-Learning Approaches?

Statistical methods for extracting information from data represent some of the basic tools that ecologists wield. Standard statistics are used to explore the covariation between dependent and independent variables and to test hypotheses of interaction. Machine-learning approaches work analogously, exploring the probabilistic or logical correlations across matrices of species data. Machine learning of networks has been successfully applied to classical, macro-ecological sample data (e.g., Bohan et al., 2011) and to evaluate ecosystem responses to changed management (Ma et al., 2018b). In contrast, the reconstruction of microbial networks or the inference of networks and trophic links from DNA data has proven to be more difficult (Barner et al., 2018; Freilich et al., 2018; Deagle et al., 2019), with results that appear to depend upon a combination of the machine learning technique and the data used. No one algorithm will work best for every problem, mirroring the “no free lunch” theorem of Wolpert and Macready (1997). The rhetorical question, “How will NGB benefit from machine-learning approaches?,” is one that we can answer only by continual work to further develop and integrate ever better learning approaches into ecology and biomonitoring.

Because NGB represents an emerging field, it is useful to look at examples where machine learning and metabarcoding have been successfully combined. Naïve Bayesian and random forest classifiers have been used to make taxonomic assignments from metabarcodes, produce statistical measures of confidence, and reduce rates of false positive identifications (Wang et al., 2007). Supervised machine learning has been used to classify environmental samples in a meta-analysis of microbial community samples collected by hundreds of researchers for the Earth Microbiome Project (Thompson et al., 2017). Recently, eDNA datasets have been analyzed using supervised machine learning to predict the status of aquatic ecosystems (Cordier et al., 2018). The combination of taxonomy-free molecular data and machine-learning techniques outperformed biomonitoring methods based on the screening of known indicator species by classic metabarcoding (Cordier et al., 2018).

Moving toward the reconstruction of networks of explicit interactions is a logical next step that would afford an ecological explanation of change. Such ecological network reconstruction would require the incorporation of background knowledge or information, for example, about species traits or existing interactions (Tamaddoni-Nezhad et al., 2013, 2015). Taxon interaction knowledge can be text-mined from direct observations recorded in the literature, or inferred from published trait information, and, when used to reconstruct interaction networks such as food webs, offer the potential to generate new biomonitoring metrics derived from network properties (Compson et al., 2018). Recent results suggest that, in the absence of background information, model-free inference of network structure is also feasible using information from the overall network structure and those interactions that are known (Stock et al., 2017). Hypotheses or explicit models for how species interact can also be incorporated into machine learning as background knowledge (Tamaddoni-Nezhad et al., 2013, 2015). As symbolic representations of interactions, these hypotheses and models have the benefit of rendering the machine-learning output human-comprehensible and explainable for decision-making and prediction (Muggleton et al., 2018). The challenge for this model-based approach is that we have relatively few symbolic descriptions of species interactions for organisms, especially in understudied biomes. While there are rules for trophic interactions between macro-organisms, for example, based upon body- or gape-size (Jonsson et al., 2018), there are few such rules for microorganisms. The generation of hypotheses for potentially new mechanisms of interaction in understudied systems could also be supported by artificial intelligence: first, using text mining to recover information about taxa and functions that is not readily accessible from reference databases like Global Biotic Interactions (GloBI) or the United States Geological Survey (USGS) traits database; and then by employing machine learning, such as Meta-Interpretive Learning (Tamaddoni-Nezhad et al., 2015), to hypothesize interaction rules that explain the text-mined information and metabarcoding data.

Considerable amounts of this kind of information exist in literature databases such as Google Scholar, Academic Search Premier, and Web of Science. Unfortunately, the publishing rights to these data are often difficult for scientists to disentangle, and the various text-mining exercises that have been conducted have been treated as hacking attacks, which are resisted. Until these publishing rights are relaxed, such as is proposed in Europe (Enserink, 2018), populating many ecosystems with biological and functional information will remain a limitation.

What Are the Key Technical Challenges to the Advancement of NGB?

NGB aims to detect and explain changes in the total biodiversity of ecosystems to understand and predict the ecological structure of ecosystems. This requires that NGB methods generate accurate data for the presence, absence, and abundance of taxa. Uncertainty in the detection of a taxon, as false negatives or positives, can lead to erroneous conclusions with consequences that could impair biomonitoring and decision making. As noted in Question 3, detection uncertainty can arise from multiple sources, such as sampling, laboratory, and bioinformatics, and these have been extensively reviewed elsewhere (e.g., Deiner et al., 2017; Knight et al., 2018; Larsson et al., 2018; Lear et al., 2018; Porter and Hajibabaei, 2018; Zinger et al., 2019). Work to reduce rates of false negatives and positives in DNA metabarcoding data is an active field of research, and progress has been made through using occupancy modeling (Ficetola et al., 2015, 2016) and probability distribution modeling for tag jumping and contamination issues (Larsson et al., 2018).

The next logical step is to ask whether DNA concentrations in the environment relate to organismal abundance or biomass. The question is intuitive, in the sense that a greater abundance or biomass of organisms should, in principle, produce a higher concentration of DNA, but as with detection uncertainty DNA concentration is determined by many other factors. Studies have demonstrated that the relative abundance of an organism between samples can relate to eDNA concentrations (Takahara et al., 2012; Thomas et al., 2016; Piñol et al., 2019). However, the leap from relative abundance to absolute abundance (or anything close) has been confounded by multiple effects, including an inability to distinguish between live and dead biomass, the observation that different age classes of the same organism release DNA at different rates into the environment (Maruyama et al., 2014), and an increased awareness of the complex environmental interactions of eDNA, relating to its origin, state, transport, and fate (Cristescu and Hebert, 2018). How to treat read count data is critical now that microbiome datasets are understood to be compositional in nature and sensitive to library size and several other biases (Gloor et al., 2017). For NGB it is clear that we need to establish how DNA technologies relate to absolute organismal abundance and how we can minimize methodological biases through best practices (e.g., Knight et al., 2018). However, the debate about the confidence to be invested in metabarcoding data will likely continue until we attain technical advances, such as PCR-free sequencing systems, curated and complete reference databases, and modeling that can explain and correct for errors.

How Can NGB Be Applied to Risk Management?

With further development of NGB, multiple lines of evidence and data will need to be combined in real time to provide managers with cost-effective tools needed to make robust decisions and mitigate impacts on the natural environment. To incorporate these multiple sources of information and move beyond purely descriptive models of ecosystem structure and change, such as eDNA-derived lists of taxa and co-occurrence networks, it will be necessary both to develop explanatory and predictive models of ecosystem function and services, and to test, explore, and understand these models, possibly using developments in text-mining (Compson et al., 2018) and Explainable Artificial Intelligence (Miller, 2019; Rudin, 2019).

As the “universe of observation” (Bush et al., 2019b) expands toward a more integrative ecosystem approach, driven by the growing capacity of molecular and analytical methods, it remains unclear what amount of information will be needed to make good management decisions. For example, how much do we benefit if we incorporate all possible data, or do we just add noise? The application of DNA-isolation from bulk environmental samples or mixed communities coupled with high throughput sequencing and automated taxonomic assignment removes many of the taxonomic constraints currently hindering biomonitoring, particularly for multiple trophic groups and otherwise cryptic groups of organisms (Hug et al., 2016). Increasing taxonomic resolution and greater sampling intensity expands the number of observed biological units. This greater volume of information will also require a parallel expansion of our abilities to interpret biodiversity changes.

Artificial intelligence, in the form of machine learning algorithms such as Meta-interpretive Learning, can help process these large amounts of information and aid in hypothesizing explanatory models of interaction that humans can comprehend and machines can read symbolically (Tamaddoni-Nezhad et al., 2015). The explanations used in biomonitoring will evolve from existing concepts of ecosystem indicators and indices that do not attempt to explain the reason for changes in ecosystems (Derocles et al., 2018) toward models that provide a holistic view of ecological change, such as EBVs (Jetz et al., 2019); models that provide an understanding of the underlying mechanisms behind ecosystem functions; and models that recognize the complex and dynamic nature of ecosystems, including all trophic levels and their interactions. This evolution of biomonitoring, moving from a descriptive toward a predictive risk management tool, based on new hypotheses and models, will have the greatest impact on decision and policy making, which will in turn feed-back to biomonitoring.

Outlook Questions

To this point, the questions posed have focused on contemporary issues about the framework of NGB, as well as technical and conceptual challenges to implementing NGB (Figure 2). We also foresee rapid advancement in this field beyond what is needed to establish NGB as a biomonitoring approach, facilitating exploration of new frontiers of science and providing solutions to some of the problems we have outlined in this article. These are related, in large part, to rapid developments in computing and genomics. Specifically, we believe that three areas of advancement in biodiversity assessment and analytical capacity will drastically improve NGB: (1) advances in genomics tools that will lead to greater sequencing capacity, providing unprecedented recovery of information from DNA (Question 8); (2) advances in computing, bioinformatics, and open-source pipelines (Question 9); and, (3) improved models that will allow for more targeted use by practitioners interested in adopting NGB approaches (Question 10).

What Are the Most Promising Future Advancements in Genomics Tools?

Many widely used, next-generation sequencing technologies have attained greater sequencing depth (i.e., the product of the number of reads and the read length standardized to the genome length) despite using shorter read lengths by exponentially increasing the amount of sequences generated (Sims et al., 2014). We anticipate a next-next-generation revolution that will achieve whole genome sequencing for entire communities, with enough sequencing depth to provide information about individual sequence variation necessary to begin exploration of evolutionary and functional questions in conjunction with NGB. Already, technologies are emerging that provide orders of magnitude more sequencing depth than current platforms. For example, a single flow cell of Illumina's Novaseq platform can generate ~700 times greater sequencing depth than is typically available, allowing for the detection of dramatically more diversity, even at coarse taxonomic levels; standardizing sequencing depth using patterned flow cells further improves sequencing performance by preventing the merging of neighboring sequences (Singer et al., 2019). Eventually, as such platforms advance, shotgun sequencing will become the norm, and the need for PCR will be circumvented, eliminating many of the issues currently associated with sequencing and subsequent data processing. Such advances in sequencing capacity and error reduction will translate to higher detection probabilities, greater coverage of species, and better assessments of abundance and rare or endangered species in all systems, including those that are remote and difficult to access or under-studied. Additionally, we foresee three new frontiers of science that the added information from new sequencing technologies will enable us to explore.

First, greater sequencing depth across a larger complement of the community will make it possible to construct robust phylogenetic trees for entire communities, which will help advance NGB method development by providing better phylogenetic information for improving ecological information and prediction. The practice of metaphylogenetics is currently limited by short sequence read lengths (i.e., normally ~150–300 base pairs). Furthermore, PCR and primer choice can greatly influence the resultant community (Hajibabaei et al., 2019), leading to coarse and unresolved phylogenetic trees. With greater sequencing depth, these limitations will become a thing of the past, allowing for more robust phylogenetic analysis. Community assembly can only be understood in the context of species' evolutionary histories, and such an advance in phylogenetic community ecology will not only be crucial for advancement of ecological theory, but also improve the current standards of biodiversity assessment, allowing for a more holistic exploration of rare or unknown taxa in hyperdiverse, poorly studied ecosystems (Papadopoulou et al., 2015).

Second, future platforms will improve sequencing depth per individual such that it will soon be possible to assess intraspecific genetic variation in an assemblage. To date, studies of population genetics have been limited by primer development for target organisms, focusing on no more than a few taxa at a time in order to answer very specific questions. For example, mitochondrial metagenomics approaches that combine shotgun sequencing and DNA metabarcoding allow for read mapping that may provide the quantitative information on intraspecific genetic variation needed to assess population genetic structure (reviewed in Crampton-Platt et al., 2016). In combination with DNA metabarcoding (sensu, Elbrecht et al., 2018), these approaches would then make it possible to assess the genetic structure of any taxa of interest in the community, and enable practitioners to ask questions about the entire metacommunity and test macroecological theory (e.g., species-genetic diversity correlations).

Third, enhanced sequencing depth will allow for a wider exploration of functional genes in environmental samples. This would make it possible to map functional genes to taxa for entire communities of organisms, linking communities and networks with broad-scale ecosystem assessment of function. Recent efforts have attempted to utilize machine learning to link genus-level predictions of function in microbial communities, for example by using Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt), for inferential assessments of function and hypothesis generation (Douglas et al., 2018). With more sequence data and better inferential methodologies, machine learning in biomonitoring will progress. Concurrent efforts to expand and annotate functional gene databases (e.g., Kyoto Encyclopedia of Genes and Genomes, KEGG¹¹) are facilitating the mapping of genes to function across a wide range of biodiversity, bringing incredible added value to projects using the greater sequencing depth afforded by newer sequencing platforms. As these efforts advance, not only will metacommunity and ecosystem theory be advanced by linking structure to function at multiple scales of observation, but potentially transformative changes in biomonitoring and biodiversity assessment will occur, as functional profiles could have greater discriminatory power for detecting change compared to taxonomic profiles, especially in cases where taxonomic profiles are highly variable.

What Are the Most Promising Future Advancements in Computing and Bioinformatics?

With unprecedented data generation, NGB practitioners will be confronted with the enormous task of dealing with an overwhelming amount of information (Keck et al., 2017). Advances in computing and bioinformatics are required to maximize the use of this biodiversity information. Much work still needs to be done to test for and correct errors that inherently emerge from bioinformatics approaches (reviewed in Olson et al., 2017). One solution is to quantitatively assess genome assembly by incorporating evolutionary expectations of gene content, using single copy orthologs (Seppey et al., 2019). These problems of genome assembly and amplification bias will eventually be eliminated as whole-genome sequencing approaches are adopted, but this will, in turn, require even more sophisticated bioinformatics tools (e.g., NanoPack, De Coster et al., 2018).

Another area that will benefit greatly from advances in computing and bioinformatics is database generation, maintenance, and expansion. Existing taxonomic, trait, and functional gene databases (e.g., GenBank, GloBI, KEGG) are incomplete, and the task of updating and expanding these databases is daunting. Artificial intelligence could also be used to advance data discovery (Gonzalez et al., 2016; Compson et al., 2018). Text-mining pipelines, for example, currently make use of open-source, artificial intelligence tools (e.g., OrganismTagger: Naderi et al., 2011). The consequent improvements that these tools will make to taxonomic and functional databases will lead to further advancements of biomonitoring tools, such as cloud-based, rapid ecological network and food web construction, driving a virtuous cycle where more robust datasets lead to improved models.

The promise of these advancements will only be met, however, via improvements in data accessibility, data discoverability, and development of data standards. These will likely emerge from consortiums developing ontologies for genomics and other data (reviewed by Levy and Myers, 2016), as noted in Questions 1 and 3. More work needs to be done, in particular, to develop, peer-review, and publish open-source tools for bioinformatics pipelines (Mangul et al., 2019). Without parallel improvements in tool archival and version control, the improvements that should follow will be inconsistent, reducing their utility and widespread adoption. This work would likely be facilitated by open-source archival services (e.g., GitHub or SourceForge) or package managers (e.g., Bioconda, Grüning et al., 2018).

What Are the Most Promising Future Advancements in Modeling for Addressing Targeted Questions?

While genomic and technological advancements will affect the field of biodiversity assessment, advances in modeling will specifically help end-users, including regulators and resource managers, using NGB approaches. For example, as the costs of sample and bioinformatic processing reduce, more sophisticated hierarchical occupancy models could be applied to repeated sampling data to quantify detection probabilities and inform practitioners about the sampling effort required to answer system-specific questions. These models, which can account for multiple categorical factors influencing a response variable, can accommodate samples of repeated presence-absence data to provide estimates of occurrence and detection probabilities of species and communities, enabling to account for false negatives due to imperfect detection (Campos-Cerqueira and Aide, 2016; Steenweg et al., 2016), a limitation that is seldom considered in bioassessment studies (McClenaghan et al., 2019). Occupancy modeling could also provide a way past the critical limitation of current DNA metabarcoding—that of obtaining absolute abundance information. Applied hierarchical occupancy modeling has been used to address questions related to the detection and abundance of species (Kery and Andrew Royle, 2015), and future genomic and technical advancements will broaden the application of these models via the generation of larger datasets covering wider ranges and along more gradients of environmental change. Hierarchical occupancy models will enable further leveraging of these more robust datasets by incorporating variation in the pathway from sample collection to sequencing and bioinformatics. Detection probabilities, for example, can be built into Bayesian hierarchical models to detect probabilities associated with different primers, sequencing approaches, and other steps along the sampling-to-sequencing pathway (Doi et al., 2019), providing NGB practitioners with better experiments that make more efficient use of resources (Lugg et al., 2018).

As the field of NGB evolves, we foresee synergistic advancements from merging occupancy-modeling and machine-learning approaches with additional layers of information coded in DNA, recovered by improved sequencing technologies and greater sequencing depth. Incorporating relative read abundance information into occupancy models could be used to assess the abundance of functional gene classes in environmental samples. Shotgun sequencing will also remove the constraints and biases of PCR amplification of DNA, leading to better estimates of sample abundance and biomass (Bista et al., 2018). Much of this information could be incorporated into ecological networks and heuristic food webs to estimate interaction strengths and calculate probabilities of interaction (Morales-Castilla et al., 2015). Finally, with increases in occupancy and food web model sophistication, and as more data are generated that capitalize on these approaches, there will be increasing volumes of high-quality information to feed into machine learning algorithms, leading to more predictive modeling of diverse ecosystems and an unprecedented opportunity for NGB practitioners to anticipate change and prevent ecosystem impairment in real time.

Author Contributions

AM conceived, contributed and led the writing of the paper. DAB conceived, contributed and wrote the paper. ZC conceived, contributed and wrote the paper. All other authors contributed and wrote the paper.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

ABo and JP would like to acknowledge funding from the FEDER and Swiss Confederation through the SYNAQUA project. ABo, FK, VV, MK, JP, and TC would like to acknowledge fruitful discussions in the framework of the COST action DNAqua-Net (CA15219).

AD, KR, AT-N, and GW would like to acknowledge funding support from the UK Natural Environment Research Council (NERC-NE/M02086X/1 and NE/M020843/1).

DAB, AM, EC, PD, FM, SR, VR, and CV would like to acknowledge the financial support of the French Agence Nationale de la Recherche project NGB (ANR-17-CE32-011).

DAB, AM, SR, and CV would like to acknowledge the support of the Consortium Biocontrole, which provides funding for the BCMicrobiome project.

TMP would like to acknowledge funding from the Canadian government through the Genomics Research and Development Initiative, Ecobiomics project.

Footnotes

1. ^www.ipbes.net/assessment-reports (accessed May 30, 2019).

2. ^www.gbif.org (accessed May 30, 2019).

3. ^www.geobon.org (accessed May 30, 2019).

4. ^http://www.earthobservations.org/ (accessed May 30, 2019).

5. ^www.dnaqua.net (accessed May 30, 2019)

6. ^www.interreg-francesuisse.eu/beneficiaire/synaqua-synergie-transfrontaliere-pour-la-bio-surveillance-et-la-preservation-des-ecosystemes-aquatiques (accessed May 30, 2019)

7. ^www.who.int/ (accessed May 30, 2019).

8. ^www.earthmicrobiome.org (accessed May 30, 2019)

9. ^www.oceanarms.org (accessed May 30, 2019)

10. ^www.fs.fed.us/rm/boise/AWAE/projects/the-aquatic-eDNAtlas-project.html (accessed May 30, 2019).

11. ^www.genome.jp/kegg (accessed May 30, 2019).

References

Angermeier, P. L., Smogor, R. A., and Stauffer, J. R. (2000). Regional frameworks and candidate metrics for assessing biotic integrity in mid-Atlantic highland streams. Trans. Am. Fish. Soc. 129, 962–981. doi: 10.1577/1548-8659(2000)129<0962:RFACMF>2.3.CO;2