Edited by: Sandie M. Degnan, University of Queensland, Australia
Reviewed by: Matthew A. Barnes, Texas Tech University, USA; Simon Creer, Bangor University, UK
*Correspondence: Ryan P. Kelly
This article was submitted to Marine Molecular Biology and Ecology, a section of the journal Frontiers in Marine Science
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Given the rapid rise of environmental DNA (eDNA) surveys in ecology and environmental science, it is important to be able to compare the results of these surveys to traditional methods of measuring biodiversity. Here we compare samples from a traditional method (a manual tow-net) to companion eDNA samples sequenced at three different genetic loci. We find only partial taxonomic overlap among the resulting datasets, with each reflecting a portion of the larger suite of taxa present in the sampled nearshore marine environment. In the larger context of eDNA sequencing surveys, our results suggest that primer amplification bias drives much of the taxonomic bias in eDNA detection, and that the baseline probability of detecting any given taxon with a broad-spectrum primer set is likely to be low. Whether catching fish with different nets or using different PCR primer sets, multiple data types can provide complementary views of a common ecosystem. However, it remains difficult to cross-validate eDNA sequencing techniques in the field, either for presence/absence or for abundance, particularly for primer sets that target very wide taxonomic ranges. Finally, our results highlight the breadth of diversity in a single habitat, and although eDNA does capture a richer sample of the community than traditional methods of sampling, a large number of eDNA primer sets focusing on different subsets of the biota would be necessary to survey any ecological community in a reasonably comprehensive way.
Environmental DNA (eDNA)—genetic traces of organisms taken from a sample of water, soil, or other medium—is increasingly useful for ecological surveys, particularly where traditional sampling is difficult or expensive (Foote et al.,
A seemingly straightforward way of assessing the performance of the DNA surveys is to compare sequencing results to a known community (Kelly et al.,
Results from eDNA need to be comparable to—and integrated with—information from other, more traditional sampling methods, and therefore we must be able to understand them in the context of results from methods that have been the basis of ecology and environmental science up to this point. However, developing such an intuitive and integrative understanding is challenging, especially because in many cases we should expect different methods of environmental sampling to yield different results. Like any survey method, eDNA detects only a portion of the biodiversity present in the sampled location. In the marine realm, sampling tools and techniques such as settlement plates, various nets, and aerial surveys reveal quite different sets of taxa, producing a variety of ecological patterns that are not easily compared. Similarly, eDNA surveys may detect distinct suites of species that are not easily comparable to results obtained through manual or visual surveys (Shelton et al.,
As with traditional survey tools, eDNA studies should be explicit about the ways in which methodological choices determine the communities observed. One means of doing so is to survey multiple genetic markers (“loci”) simultaneously. Just as the results from any single genetic marker may differ from those of manual or visual sampling techniques, so too may the results differ among genetic markers (Cowart et al.,
Here, we present eDNA data from 3 genetic markers—16S, 18S, and COI—for water samples collected alongside a traditional tow-net survey of epifauna in nearshore eelgrass (
We sampled 4 sites in nearshore eelgrass (
As detailed in Samhouri et al. (in revision) and Kelly et al. (
We simultaneously collected 1-L water samples for eDNA analysis at each transect, immediately below the water surface using a ca. 3.3 m pole to hold the sampling bottle, with the goal of reducing human contamination. We kept these samples on ice until they could be processed in the lab (within hours of collection). We filtered the total volume of the samples (1 L) onto cellulose acetate filters (47 mm diameter; 0.45 um pore size) under vacuum pressure, and preserved the filter at room temperature in Longmire's buffer following (Renshaw et al.,
We used three primer sets for eDNA amplicon analysis, each targeting a different gene region, with the goal of detecting large numbers of taxa present in the sampled environment, which we could compare against the manual tow-net surveys of macroscopic animals. The primer sets amplified two mitochondrial loci [16S, ca. 116 bp, (Kelly et al.,
We generated three (COI, 18S) or four (16S) PCR replicates for each of 18 water samples (3 samples per site, 6 sites) in order to assess variance due to amplification. We simultaneously sequenced four positive [Tilapia;
We processed the resulting sequence reads with a custom Unix-based script (O'Donnell,
We assigned a taxonomic name to each OTU sequence using blastn (Camacho et al.,
For clarity, in the following we refer to the results for community composition from the three loci and the single manual counts as four “datasets.”
We rarefied read counts of Family-level detections from each PCR replicate to allow for comparison across water samples and across datasets. For the three eDNA loci we generated 1000 rarefaction draws of each sample (ca. 3 × 104 reads each, reflecting the smallest sample size of the field samples) using the vegan package for R (Oksanen et al.,
We assessed the degree of taxonomic overlap among datasets by counting taxonomic Families occurring within and across datasets (Table
Different sampling methods capture different subsets of biodiversity, and each has taxonomic selectivity. To illustrate the biases of each of our methods, we mapped Family-level detections in each of our datasets onto a taxonomic tree. The taxonomic classification was determined using the NCBI Entrez Taxonomy Database (
We assessed taxonomic consistency for each pair of genetic markers in two ways, both using nonparametric correlation. First, for Families detected in both of a pair of loci, we asked whether the number of OTUs in each Family (
We calculated the variance in OTU communities among PCR replicates (for the three eDNA loci), among samples within geographic sites, and among sites for each of our datasets with a PERMANOVA test on Jaccard distances among presence/absence versions of our non-normalized data. These variances are useful for parameterizing models that relate sequenced communities to biomass in the field (Shelton et al.,
Species (or “site”) occupancy modeling provides an accessible framework for interpreting taxon detections in eDNA studies (Lahoz-Monfort et al.,
Occupancy modeling is typically considered in a single taxon context. For example, in qPCR assays the probability of detection is strictly a function of the concentration of target DNA in a sample. By contrast, for sequencing studies occupancy models need to be extended to a multi-taxon community and therefore “detection” of a given taxon depends upon a series of analytical steps (from amplification to annotation) as well as on relative abundance of a sequence. Nevertheless, occupancy modeling can be applied to communities and is an appropriate framework for interpreting eDNA results from sequencing studies. An important component of interpreting the community observed from eDNA studies is then understanding how using multiple loci can affect the assessment of a given community. We performed simulations to compare the use of different loci singly or together in concert to characterize a community from eDNA. We note that this occupancy modeling is a special case of the more general framework elaborated in Shelton et al. (
Our simulation supposed 100 species were present in a sampled environment. We let the probability of detecting each of those 100 species vary among species and among each of 5 genetic loci (labeled loci A–E), drawing the values of true-positive detection rates (
We simulated a single biological replicate (for example, a single bottle of water to be analyzed for eDNA) as a single draw from a binomial distribution, with the probability of each species being detected by each locus given by the values of
Our surveys revealed hundreds of taxa present across the sampled geographic sites in Puget Sound, representing many of the most common and characteristic animals from the nearshore habitats of the region. These included barnacles (Balanidae), mussels (Mytilidae), and snails (Littorinidae) of the upper intertidal zone,
Our tow-net surveys recovered 45 distinct animal taxa, of which 32 could be identified to Family level. eDNA surveys with PCR primers targeting three gene regions—16S, COI, and 18S—resulted in sequences detecting a total of 1.8 × 104 unique OTUs, of which a total of 1.3 × 104 could be classified by matching known sequences in GenBank (Table
Total OTUs | 865 | 2059 | 15,218 | – | 18,142 |
Annotated OTUs | 391 | 425 | 13,008 | – | 13,824 |
OTUs annotated to family | 257 | 319 | 10,905 | – | 11,481 |
Unique families | 54 | 99 | 261 | 32 | 366 |
Of those that could be annotated, the eDNA samples represented a total 366 unique taxonomic Families across the Eukarya, including representatives from 33 Phyla from 9 Kingdoms (Table
Each dataset showed a distinct phylogenetic selectivity, consistent with the idea that primer-site mismatches are likely to be driving the observed patterns (Figure
Because each data type reveals a nearly non-overlapping set of taxa, the accumulation curves reflect the idea that adding data types dramatically increases taxonomic coverage (Figure
Where a taxon was detected by more than one eDNA marker, we asked whether the different eDNA markers yielded consistent estimates of diversity within the taxon. The only pair of markers that yielded consistent results among the taxa shared was COI-18S. Families detected by both 18S and COI had correlated numbers of OTUs (Spearman's rho = 0.37,
Between-marker correlations in read-counts-per-taxon mirrored the OTUs-per-Family results, with the 18S and COI primer sets reflecting correlated numbers of reads for the Phyla, Classes, and Families detected at both loci (rho = 0.74, 0.56, and 0.39, respectively;
Apportioning the variance of a given sampling method's results is critical to interpreting those results, to understanding the processes that might generate the observed data, and to comparing results across data types. Given the exponential nature of PCR reactions and the possibility of stochastic amplification of any given taxon with a given primer set, it is particularly important to assess the variability among PCR replicates in eDNA surveys.
We used a permutation-based analysis of variance (PERMANOVA) to apportion the variance in Jaccard distances among our geographic sites (
Among sites | ||||
Among water samples within sites | ||||
Among PCR replicates (residual variance) | – |
Variance among PCR replicates appears largely due to stochastic amplification and sequencing among rare OTUs. For example, the most common 10% of 16s OTUs had a mean among-replicate Bray-Curtis dissimilarity of 0.025 (SD = 0.022), while the least common 10% had a mean Bray-Curtis dissimilarity of 0.741 (SD = 0.026). We observed the same trend at each locus, with dissimilarity increasing in a saturating curve as OTUs became more rare (Figure
Our empirical results provide reasonable starting values for a straightforward simulation of the number of taxa one might expect to recover with a given number of biological replicates and a given number of genetic loci. Two pertinent observations arise from our empirical data: first, the probability of detecting (
For a hypothetical community of 100 species, we simulated eDNA detections for each of five hypothetical genetic loci (see Section Methods). Each locus had a different probability of detecting each of the species present (i.e.,
The simulation results suggest several practical conclusions for real-world eDNA surveys. First, increasing the number of loci used is likely to increase taxonomic coverage more effectively than increasing the number of biological replicates sequenced with a given locus. For example, using 2 loci rather than one approximately doubles the expected taxonomic coverage in a single replicate (median 29 species detected vs. 15; Figure
The attraction of eDNA sequencing for ecological surveys is the ability to detect hundreds of eukaryotic organisms from a water sample, but the interpretation of eDNA data relative to the results of established ecological sampling methods is a nascent endeavor (Shelton et al.,
Ecology and related disciplines depend upon techniques to sample and describe communities, ecosystems, and their properties. For example, larval settlement plates may capture barnacle larvae and bryozoans, but tell us nothing about the sea lions swimming nearby. Often this selectivity is quite intentional—perhaps we care about bryozoans, and not about sea lions—but where unintentional, such selectivity may provide a misleading picture of the community in ways that often remain unexplored.
The organisms we detected depended strongly upon the survey method (e.g., eDNA vs. manual counts) or the PCR primers used (e.g., 16S vs. 18S eDNA sampling), and each detection method had strong taxonomic selectivity. Most relevant to eDNA studies, our data suggest that different primer sets reveal different draws from a common pool of species represented in the sampled bottle of water. Given the taxonomic selectivity apparent among eDNA markers, our results indicate that differences in detections are most likely driven primarily by interactions between primer and template DNA rather than variation in environmental factors leading to abundance in eDNA, such as organismal DNA shedding rates. If differences in DNA shedding drive detection differences, we would expect the same taxa (those with high shedding rates) to be detected across different loci and to have correlated numbers of OTUs per locus; neither prediction is borne out in our data.
The different community-level views revealed by manual and eDNA sampling underscore the importance of complementary sampling methods for ecology, given that any one set of samples yields a necessarily selective view of the world; 10 different sampling methods can yield 10 different results even with small numbers of target taxa (Valentini et al.,
We used the comparison between different survey methods to derive estimates of eDNA detection probabilities for animal taxa. Our results suggest that detection rates for particular taxa are likely to be low (
Our empirical datasets reflect the taxonomic selectivity of the different sampling methods and of the different eDNA primer sets. For example, the manual tow-net samples contained only macroscopic animal taxa, while animals are only one of at least eight Kingdoms detected by the 18S primer set. Hence, for a given sequencing depth, the 18S primer set is likely to detect only a small percentage of the traditionally sampled set of focal taxa (and hence
These detection estimates give us a starting point for developing an intuition for eDNA sequencing survey results. Namely, that primer-by-species
Multiple detections of the same taxon with different primer sets—or different sampling techniques—represent independent detections of that taxon. As a result, multiple data types are useful for evaluating the probability that a particular taxon is or is not present. One implication of this straightforward observation is that multiple eDNA markers can be useful for greatly improving estimates of the probability that a species is present, or for rapidly increasing taxonomic coverage. This is particularly useful when water samples are scarce or expensive; multiple genes provide an inferential benefit even when drawn from the same sample of water. Moreover, because our simulation assigned detection probabilities to each locus independently and with generally low probabilities, the result is a conservative estimate of the value of multi-locus data. The value of additional loci would be greater where loci are more taxon-specific.
We emphasize that the absolute values of detection probabilities are highly context-specific. For example, greater sequencing depth is likely to increase the probability of a true positive detection (
The occupancy simulations also underscore a final point about the use of multiple loci—or indeed, any set of multiple survey methods, molecular or not—to canvass biological diversity in an area. Our simulations suggest that some fraction of species will remain undetected even with intensive molecular surveys. This result of course depends upon the specified detection probabilities and species-by-primer interactions, but is a useful result to highlight the fact that no ecological sampling method is likely to reveal the whole of a community. We see this as a particularly relevant lesson as standardized techniques for biodiversity assessment (e.g., autonomous reef monitoring structures, ARMS) become more common around the world (Duffy et al.,
Two salient observations arise from our assessment of variance in the eDNA datasets, particularly when combined with those reported for a fourth genetic marker in Port et al. (
Such a result is expected if the detectable diversity in a water sample is far greater than that which is amplified in any given PCR reaction, due to stochasticity in early PCR cycles. Primers only bind to a small fraction of the total number of potential taxonomic targets during the PCR reaction, and the result is a different sampling of taxa—even among technical replicates derived from the same field sample–upon sequencing. We would expect greater stochasticity among rare OTUs than among common ones as a result of this sampling effect, and our data and others' (Zhou et al.,
The corollary observation is that higher-variance markers are likely to have lower detection probabilities (
In order to confidently interpret eDNA results in the context of existing ecological study, it is necessary to compare the results of this emerging technique with those of more established methods of ecological sampling. We find that (1) consistent with previous results, eDNA captures a far broader selection of taxa than the accompanying manual survey, (2) the ecological communities detected vary dramatically among eDNA markers and between survey techniques, and (3) despite detecting a total of over 300 taxonomic Families across three eDNA markers, the genetic survey did not come close to detecting most of the eukaryotic diversity present. For example, only about one-third of manually-collected Families were present in the eDNA survey, and the Family-accumulation curve suggests that many more markers would be necessary to carry out a near-exhaustive sampling.
These results highlight the value of using multiple methods in ecological surveys, given that any one sampling method—even eDNA, which can reveal hundreds of taxa present at a location—unavoidably reflects only a small fraction of the true biological diversity present in the environment. Consequently, a single method or genetic marker may reveal ecological trends important in some (detected) taxa, but these trends may not necessarily be general across groups. Accordingly, many microbial and eukaryotic studies grounded in genetics-based observations of the environment may have reached questionable conclusions to the extent these conclusions represent a small and non-random portion of much larger ecological communities. Our results put eDNA in the company of other ecological survey techniques, in that these emerging methods reveal large (but incomplete) swaths of biodiversity for which traditional surveys provide valuable context.
The article's supporting data, metadata, and analytical code can be accessed as supplemental information at
All authors made substantial contributions to conception and design, acquisition of data, and analysis and interpretation of data, and all authors approved of the final manuscript version submitted for publication.
This work was supported by grant 2014-39827 from the David and Lucile Packard Foundation to RK and by NASA grant NNX14AP62A “National Marine Sanctuaries as Sentinel Sites for a Demonstration Marine Biodiversity Observation Network (MBON)” funded under the National Ocean Partnership Program (NOPP RFP NOAA-NOS-IOOS-2014-2003803 in partnership between NOAA, BOEM, and NASA), and the U.S. Integrated Ocean Observing System (IOOS) Program Office. This paper is a contribution to the MBON program.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We thank N. Lowell, S. Hennessey, G. Williams, and B. Feist for all of their help throughout; J. Port, L. Sassoubre, and A. Boehm; A. Stier, and P. Levin; M. Dethier, E. Heery, J. Toft, and J. Cordell; R. Morris and V. Armbrust; J. Kralj; A. Wong, E. Garrison, J. Levy, M. Klein, and E. Buckner; coastal property owners for access to field sites; and the Helen R. Whiteley Center at Friday Harbor Laboratories, and two reviewers. We also thank the Marine Biodiversity Observation Network (MBON) eDNA team. JS thanks H. Bolt for inspiration. The views expressed herein are those of the authors and do not necessarily reflect the views of BOEM, NASA, NOAA or any of their sub-agencies. The US government is authorized to reproduce this paper for governmental purposes.
The Supplementary Material for this article can be found online at: