Diversity Metrics Are Robust to Differences in Sampling Location and Depth for Environmental DNA of Plants in Small Temperate Lakes

Environmental DNA (eDNA) analysis methods permit broad yet detailed biodiversity sampling to be performed with minimal field effort. However, considerable uncertainty remains regarding the spatial resolution necessary for effective sampling, especially in aquatic environments. Also, contemporary plant communities are under-investigated with eDNA methods relative to animals and microbes. We analyzed eDNA samples from six small temperate lakes to elucidate spatial patterns in the distributions of algae and aquatic and terrestrial plants, using metabarcoding of the Internal Transcribed Spacer-1 (ITS1) genomic region. Sampling locations were varied across horizontal and vertical space: sites in each lake included a mixture of nearshore and offshore positions, each of which was stratified into surface (shallow) and benthic (deep) samples. We detected the expected community variation (beta diversity) from lake to lake, but only small effects of offshore distance and sampling depth. Taxon richness (alpha diversity) was slightly higher in nearshore samples, but displayed no other significant spatial effects. These diversity metrics imply that plant eDNA is more evenly distributed than its generating organisms in these small lake environments. Read abundances were heavily weighted toward aquatic macrophytes, though taxon richness was greatest in the algae and other non-vascular plants. We also identified representatives of many phylogenetically and ecologically varied plant taxa, including terrestrial species from surrounding areas. We conclude that freshwater plant eDNA surveys successfully capture differences among lake communities, and that easily accessible, shore-based sampling may be a reliable technique for informing research and management in similar ecosystems.


INTRODUCTION
Questions and experiments in ecology rely on knowledge of what organisms are present in a system, and how their patterns change over space and time. Moreover, the task of monitoring biodiversity, during an unprecedented rate of global extinction, is vital for all the life sciences and the quality of human life (e.g., Díaz et al., 2006). Biodiversity sampling has traditionally involved observation, trapping, and other direct census methods (Trolliet et al., 2014;Olinger et al., 2017;Lõhmus et al., 2018), which are biased toward organisms that are easy to locate and identify (Bosch et al., 2017;Wheeldon et al., 2019).
To support and complement traditional approaches to ecosystem monitoring, new tools have been developed over the last decade, including field detection technologies and lab-based high-throughput molecular techniques (Egan et al., 2013(Egan et al., , 2015Larson et al., 2020). One such technique is the analysis of DNA extsracted from bulk environmental samples of air, water, or soil, which is known as environmental DNA or eDNA. This analysis is currently accomplished either by high-throughput sequencing or targeted PCR, and can target narrow or broad ranges of organisms in many different habitats (Lodge et al., 2012;Creer et al., 2016;Deiner et al., 2017a;Cristescu and Hebert, 2018). However, in the aquatic medium, water and organism movement can confound eDNA-based efforts to determine the spatial distribution of organisms. This spatial uncertainty is one of the most significant remaining challenges for eDNA methods when they aim to characterize aquatic communities in detail Deiner et al., 2017a).
Lake habitats and their photosynthetic inhabitants are crucial elements of many ecosystems, and are both involved in and threatened by a multitude of human activities (Tickner et al., 2020). These factors make them candidates for improved monitoring methods (Mantzouki et al., 2018). This study constitutes one of only a few so far to systematically investigate community-level aquatic plant assemblages using eDNA analysis of easily accessible water rather than sediment (e.g., Alsos et al., 2018), even though ecological monitoring is known to be most effective when it considers multiple taxa (Oertli et al., 2005) and aquatic plants are known to be important proxies for lake health (Hatzenbeler et al., 2004).
Despite the foundational importance of freshwater plants to many ecosystems, aquatic eDNA work on them has been largely confined to PCR-based detection of invasive or elusive species Kuehne et al., 2020) or attempts to measure the quantity and distribution of such species (Gantz et al., 2018;Chase et al., 2020;Kuehne et al., 2020). For instance, eDNA monitoring can detect low abundances of algal species responsible for harmful algal blooms (Keller et al., 2017). However, aquatic plants share many of the same detection challenges as taxa that have received more active attention from the eDNA analysis community: they can be rare, cryptic, and/or difficult to identify precisely by morphology alone (Fahner et al., 2016;Bolpagni et al., 2018).
The spatial aspects of aquatic eDNA sampling have been widely investigated, but thus far without conclusive and generalizable results. When sequencing entire (microscopic) organisms captured in an environmental sample, it is clear that they were present at the actual place and time of collection. However, much of the DNA recovered in aquatic environments from non-microscopic plants and animals is in the form of extraorganismal DNA: shed tissues, cells, or organelles Deiner et al., 2017b;Lacoursière-Roussel and Deiner, 2019). This material travels freely through the environment in a manner similar to that of other inert small particulates Kelly et al., 2018;Nguyen et al., 2020), limiting our ability to determine when, or even if, the source organism was actually present at the location of sampling. The relationship of eDNA sampling to physical space is highly system-specific, depending both on physical aspects of the system such as lotic or lentic hydrology Bedwell and Goldberg, 2020) and on various properties of the targeted organisms (Deiner et al., 2015).
The primary goal of this study is therefore to determine how choices of sampling location and depth affect results of eDNA metabarcoding for photosynthetic communities in small temperate lakes. Because the distributions of many rooted plant species within lakes are complex and yet well-understood based on traditional sampling approaches (Sand-Jensen et al., 2019), we seek to compare the behavior of their eDNA to that of eDNA from non-sedentary organisms such as unicellular algae and photosynthetic protists. We sample a broad array of plants, including algae, from six well-studied natural research lakes in the northern United States. We divide each lake into four "compartments" defined by two spatial variables: surface and benthic sampling depths, and nearshore and offshore sampling coordinates. We apply novel eDNA primers targeting a broad range of internal transcribed spacer (ITS1) rDNA sequences for plant communities (Supplementary Table 1); statistically evaluate the relative roles of sampling depth and distance from shore as both between-lake and within-lake sampling factors; and investigate the effects of these variables on eDNA characterization of these aquatic and lakeshore plant communities. We additionally investigate the contribution of terrestrial plants to the reservoir of eDNA in these lakes, in part because recent studies have revealed the potential for widespread terrestrial biodiversity sampling based on the collection of eDNA in river and lake water and sediment (Giguet-Covex et al., 2014;Cannon et al., 2015;Deiner et al., 2016;Harper et al., 2019).
In this study, we define "plants" as green and golden algae, phytoplankton, and vascular and non-vascular macroscopic plants whether of terrestrial or aquatic habit. We expect that plant communities differ among the four different lake compartments: nearshore surface, nearshore benthic, offshore surface, and offshore benthic. Replicating the sampling design in multiple isolated bodies of water gives us the ability to distinguish overall spatial effects from local inter-lake variation. Specifically, we test three hypotheses: that (1) alpha diversity of plants is highest at the surface and near the shore due to enhanced light and resource availability; (2) spatial differentiation (sample beta diversity among compartments) is higher in larger lakes; and (3) nearshore and surface environments include a greater proportion of DNA from terrestrial plant species. An improved understanding of how sampling location influences the eDNA-based detection of lake-associated plant communities will support large-scale biodiversity monitoring efforts and help minimize the costs associated with these efforts.

Field Sampling for eDNA
We collected eDNA samples between July 8th and 13th, 2015, from six lakes at the University of Notre Dame Environmental Research Center (UNDERC) in the northern peninsula of Michigan, United States ( Figure 1A). This study is part of a larger, 12-lake study that incorporated sampling for vertebrates and invertebrates as well as plants. For the plant data, we report here on the six-lake subset of samples that was amplified with our novel ITS1 primer set described below.
The six lakes sampled for plant communities range in size from 0.8 to 67.3 ha (approx. 2-167 acres), with maximum depths from 5.2 to 13.7 m ( Table 1). These study lakes are generally considered mesotrophic, or moderately productive. The lakes vary in their stream inputs and outputs, and our random site selection was, in part, intended to reduce site-specific inflow/outflow effects. Water clarity as measured by Secchi disk values near the time of sampling ranged from 0.8 to 4.3 m, as many of the lakes have brown or stained waters due to contributions of tannins and humic acids from adjacent wetlands. Summer depths for the metalimnion (the thermocline between the two distinct temperature layers of a stratified lake) have been measured for these six lakes in recent years and range from roughly 0.75 to 3.2 m ( Table 1). Further biotic and abiotic details on these study lakes at UNDERC are given by previous studies (Kelly et al., 2014;Craig et al., 2015).
At each lake, six sampling locations, divided into three nearshore and three offshore, were chosen randomly using ESRI ArcMap (Redlands, California, United States) ( Figure 1B). All offshore sites were located a minimum of 20 m from the shoreline ( Table 1), at an average distance of 46.2 m (range 20.4-108.4 m). Nearshore locations were constrained to be within 1 m of the shoreline to represent sampling by an individual from the shore, though for consistency, all samples for this study were taken from boats. At each sample site, one surface water sample and one benthic water sample were collected for eDNA filtration. Fresh nitrile gloves were used at each of the six sample collection sites in each lake.
Benthic water samples were taken within 1 m of the lake bottom. In all cases, benthic water samples at offshore locations were taken below the metalimnion. Many nearshore benthic samples were taken above the depth of the metalimnion (Figure 2), though four of six lakes had steep enough nearshore depth profiles for some or all of their nearshore benthic samples to be taken below the metalimnion. Benthic sample depths at offshore locations were reliably deeper than at nearshore locations for all lakes, with a mean of 5.3 m (range 3.4-7.7 m). Within each lake, all nearshore benthic sample depths were shallower than any of the offshore benthic sample depths.
Surface eDNA samples were taken directly into 250 mL bottles that had been previously sterilized via autoclave followed by an external overnight soak in 10% bleach solution and an external rinse in deionized water. Benthic water samples were taken using 2 L Van Dorn samplers lowered to within 1 m of the lake bottom based on a depth estimate from a handheld digital sonar (HawkEye Handheld Sonar, Bass Pro Shops, Springfield, Missouri, United States). Subsamples of the water retrieved by Van Dorn samplers were immediately transferred into sterile 250 mL bottles. Three total Van Dorn samplers were used to collect the samples in this study. These three samplers were sterilized between study lakes, and between nearshore and  Information includes lake area, maximum lake depth, distance to visible Secchi disk (a measure of water clarity), and depth of metalimnion or thermocline (Kelly et al., 2014;Craig et al., 2015). The remaining columns indicate which of the six benthic samples were taken below the metalimnion, in contrast to samples at sites where the local lake bottom was itself above or within the lake's metalimnion. e, epilimnion; m, metalimnion; h, hypolimnion.
FIGURE 2 | Summary of sampling design for the six lakes in this study. Twelve samples were collected per lake based on depth and distance from shore to define four qualitative sample categories, herein termed compartments. (A) Generalized lake schematic showing nearshore and offshore zones (green and blue, respectively), and surface vs. benthic samples with respect to the metalimnion (see Table 1). (B) offshore sample locations within study lakes, by 15 min soaks in 50% bleach solution. Sample bottles were kept on ice in a cooler until transported to a laboratory at UNDERC, where all samples for a given lake were immediately processed by filtration through 1.2 µm cellulose nitrate filters with the aid of an electric vacuum pump attached to side-arm flasks and filter funnels. In order to detect potential external contamination during handling and transport in the field, two filtration blanks per lake containing store-purchased bottled water were transported to and from the study sites along with the bottles for sample collection. Filtration blanks were filtered in the laboratory prior to sample filtration. Filters were immediately placed in 2 mL microcentrifuge tubes (United States Scientific, Ocala, Florida, United States) containing 700 µL of Longmire's buffer (Longmire et al., 1997). These tubes were kept at 4 • C until being transported to the University of Notre Dame (Indiana, United States) for eDNA extraction. Fresh nitrile gloves were used during filtration of each individual sample.

ITS1 Primer Design
We chose to develop a primer specific to the study area, as we were not aware at the time of a published ITS1 primer with good specificity for the Great Lakes region. Previous experience with taxon-specific plant assays (i.e., Elodea and Hydrilla; Gantz et al., 2018) highlighted the ITS1 region as an ideal candidate for unique IDs among closely related plant species over relatively short fragments of DNA. Sequences for the ITS1 genomic region were downloaded from GenBank for a list of 119 locally relevant plant species (Supplementary Table 2). Sequence alignments were built and assays were designed on areas of the ITS1 region that were conserved across the list of plant taxa, based on recommendations from Primer3 (Untergasser et al., 2012). The final selected primer set, designated ITS1-F/-R3 (Supplementary Table 1B), produced a range of amplicons approximately 240-500 bp in size. Primers targeting matK, rbcL, and ITS2 were also developed and evaluated, and the ITS1 primers were found to be the best match for the local lake environment. The primers used are thus not intended to be widely employed outside the area of this study.

DNA Extraction and Target Amplification
DNA extractions followed a modified chloroform-isoamyl alcohol (24:1, Amresco) extraction and isopropanol precipitation protocol as outlined in Renshaw et al. (2015). Following the extraction process, rehydrated DNA pellets were treated with the OneStep TM PCR Inhibitor Removal Kit (Zymo Research, Irvine, California, United States).
A two-step PCR-based method was used for preparation of sequencing libraries (Olds et al., 2016). In the first step, locus-specific PCR amplicons were generated from each sample using the novel primer set designed to amplify the intergenic spacer between the 18S and 5.8S ribosomal RNAs in plants: ITS1-F (5 -GTCGTAACAAGGTTTCCGTAGGT-3 ) and ITS1-R3 (5 -GATATCCGTTGCCGRGAGTC-3 ). The ITS1 primer set included an overhang on the 5 -end to allow for indexing and Illumina adapter addition during the two-step PCR-based Illumina library preparation (Supplementary Table 1).
First step PCR products were run through a 2% agarose gel, stained with ethidium bromide, and visualized on a UV light platform. Amplified products were manually cut from the gels with single-use razor blades, cleaned with the QIAquick Gel Extraction Kit (Qiagen, Venlo, Netherlands), and eluted from spin columns with 30 µL of Buffer EB. The DNA concentration of each elution was quantified with the Qubit dsDNA HS Assay (Life Technologies, Carlsbad, California, United States).
Temperature cycling conditions for the second-step PCR consisted of an initial denaturation step at 98 • C for 2 min; followed by 8 cycles of denaturation at 98 • C for 10 s, annealing at 55 • C for 20 s, and extension at 72 • C for 30 s; followed by a final extension step at 72 • C for 10 min. PCR clean-up was performed with Agencourt AMPure XP magnetic beads (Beckman-Coulter, Indianapolis, Indiana, United States) and DNA concentrations were quantified with the Qubit dsDNA HS Assay. Amplicon sizes were verified within each library on a Bioanalyzer DNA 7500 chip (Agilent Technologies, Santa Clara, California, United States). Sequencing was performed on an Illumina MiSeq sequencer at the University of Notre Dame's Genomics and Bioinformatics Core Facility 1 with a MiSeq Reagent Kit v3 (600-cycle; Illumina, San Diego, California, United States).
Over the course of the entire study, 98 total samples were processed, including 72 field samples (12 per lake), 12 field 1 genomics.nd.edu collection blanks (2 per lake), 8 no-template control PCR blanks (one per lake and one per sequencing run), 6 extraction blanks (1 per set of 18 eDNA extractions), and one blank each for the laboratory's DI water and tap water.
The extraction blanks included all the reagents used in the DNA extraction process and were processed alongside eDNA samples, monitoring for contamination that might occur during the DNA extraction step. Although none of the no-template controls visibly amplified, a band was cut out of the agarose gel at the expected size for each NTC and carried through the remaining library prep for subsequent Illumina sequencing.
Samples were spread across two runs on the Illumina MiSeq as part of a larger pooled eDNA study including vertebrate and general eukaryotic eDNA sequencing. Specific to this plant eDNA study, we included 36 field samples (3 lakes) per run, plus 12 associated control samples (6 field, 2 extraction, and 4 NTC). The tap and DI water blanks were amplified and sequenced on a separate run, associated with the larger project.

Bioinformatic Analysis
A summary of the informatic workflow, with full parameters and software versions, is available as Supplementary Figure 1. Raw MiSeq forward and reverse reads were assessed with FastQC (Andrews, 2010) and adapter-trimmed with Trimmomatic ILLUMINACLIP (Bolger et al., 2014) at a simple clip threshold of 6 bp. A sliding-window quality check was performed simultaneously with adapter removal: bases were trimmed when their windowed average fell below a quality score of 20. A custom Perl script (as used in Olds et al., 2016) performed demultiplexing to separate ITS-primer-specific reads from the other simultaneously sequenced amplicon sets. Forward and reverse reads were merged, and then qualityfiltered with USEARCH (Edgar and Bateman, 2010) functions fastq_mergepairs and fastq_filter to a maximum expected error rate of 0.5 and a maximum N count of 1. Filtered merged reads were collapsed to unique sequences, then chimera-filtered and clustered with a 97% similarity threshold using USEARCH cluster_otus and usearch_global.

Taxonomic Assignment
Identification of OTUs was exclusively by BLASTn (Camacho et al., 2009), against the NCBI nt online database as of 12/11/2017. An initial assignment of OTUs to NCBI taxa was performed without identity thresholds, and any completely unassigned OTUs were discarded. Automated resolution of ambiguous BLAST results was then conducted according to a defined set of rules (see Supplementary Figure 1). The assigned OTUs were further filtered and combined as described below.

OTU Filtering
Details of OTU resolution and filtering are given in Supplementary Figure 1. The BLAST criteria for a fully passing OTU taxon assignment were a global percent identity of 97 or higher (as in Olds et al., 2016) and a bitscore of 100 or greater (similar to the approach in Shaw et al., 2016), corresponding to an e-value of approximately 2E −17 . The overall set of these "fully passing" OTU assignments was taken as the working set of taxa presumed to be present in the study.

Ambiguity Resolution
Some OTUs were assigned by BLAST to more than one NCBI taxon with equal confidence (i.e., identical bitscore) using the NCBI BLAST taxonomy 2 . Within each set of ambiguous assignments for a given OTU, a subset of the most informative assignments was selected for final resolution of the OTU. The "most informative" assignments were considered to be, in order of priority: (1) species or subspecies records, (2) genus records with no species specified, and (3) records labeled as "uncultured" members of specific higher taxa. Records labeled as "environmental samples, " which are unidentified sequences reported by other environmental studies, were always excluded in favor of more informative assignments, and an OTU was eliminated from the study altogether if an "uncultured" or "environmental" sample was the most informative assignment that appeared in the list of top hits. See the Code Supplement for further details of this step.
In most cases, this process successfully produced a single species or subspecies assignment. Where it did not, the most specific common genus, family, or order was used as the identification, as defined by the NCBI taxonomy 3 . Strong subthreshold matches to known taxonomic groups indicated that some sequences were legitimate examples of as-yet unsequenced organisms rather than sequencing errors or chimeras.

Statistical Analyses
We calculated Shannon's alpha diversity index (H') and the inverse Simpson's diversity index (1/D) on raw passing taxon read totals within each lake, and across spatial variables, using the packages phyloseq version 1.30.0 (McMurdie and Holmes, 2013) and vegan version 2.5.7 (Oksanen et al., 2019) in the statistical program R (version 3.4.2). Differences in means across sets of variables were assessed with Mann-Whitney tests.
We tested for an effect of eDNA sample location and depth on plant communities using permutational multivariate analysis of variance (PERMANOVA) using the adonis function in the R package vegan. Due to the sensitivity of multivariate statistics to outliers, we omitted the rarest taxa (found in < 5 field samples total) from these beta-diversity analyses. A total of 47 vascular plants and 65 algal taxa were excluded from beta diversity analysis as a result of this filtering. We then log + 1 transformed the remaining reads (Anderson et al., 2006) prior to calculating pairwise Bray-Curtis distances between our samples. In each PERMANOVA run on individual taxonomic groups, we considered sample location (nearshore or offshore) and depth (surface or benthic) as factors, with lake identity as a stratum or group. Inclusion of lakes as strata allowed us to partition the variation in communities among these lakes owing to inherent differences in biotic or abiotic factors between sites, as opposed to differences within lakes caused by sample site locations and depths. We also considered an interaction between location and

Sequencing
Two multiplexed 2×300 paired-end Illumina MiSeq runs yielded 32M reads, including 3.2M ITS1 amplicon sequences for this study. Among the 72 field samples, total per-sample plantspecific read counts ranged from 1.1 to 376K with a mean of 44.0K reads. After quality filtering, forward/reverse read merging, and chimera removal, per-sample plant-specific read counts for field samples totaled 2.4M reads, with per-sample mean of 33K reads. The laboratory controls, and 18 of the 24 field controls, yielded fewer than 10 plant reads apiece. Most reads in the remaining field controls were from plant taxa that appeared only in controls, and are therefore presumed to result from transport contaminants rather than field or laboratory crosscontamination.

OTU Clustering and Filtering
Counts of distinct sequences averaged 1,115 per sample, and distinct non-singleton sequences averaged 341 per sample. These non-singletons were clustered into 580 OTUs and identified with BLAST. Of these, 376 OTU identifications met thresholds of 97% identity and bitscore of 100 and were considered "passing" identifications. Of those, 50% matched with 100% identity. The majority had unambiguous BLAST assignments to a single NCBI species or subspecies. The remaining OTUs had identical sequence identities and bitscores for multiple possible taxa. These OTUs were resolved by a custom algorithm to species or subspecies level (7 OTUs); resolved to genus or a higher taxonomic level (21 OTUs); or discarded as unresolvable (3 OTUs). 5 OTUs were eliminated because they belonged to fungi or other non-target taxa. The resulting OTUs represented 248 distinct taxa, 63 of which were vascular plants and 185 of which were algae or other non-vascular plant taxa. See Methods and the Python code in the supplement for further details of this step.

Correspondence With Known Area Species
A list provided by UNDERC of known area vascular plants includes 609 species, mostly terrestrial, and approximately 25 aquatic. Of those species, only 159 have ITS1 records available at NCBI that would enable identification of corresponding OTUs in the eDNA results. The exact species overlaps between eDNA results and the area list numbered 21 (14 wetland or upland and 7 aquatic). This relatively low correspondence at the species level reflects both the absence of many species from NCBI ITS1 records and the fact that the plant lists cover a much wider area than is represented by these specific study lakes.
FIGURE 3 | Numbers of major taxa common to the four spatial compartments of eDNA sampling across six lakes. Major taxa are defined as the set of 136 identified plant and algal species (or, rarely, higher taxa) that were used for community analysis after the rarest taxa were excluded as described in Statistical Analyses, above.
At the genus level, however, the UNDERC area list included 117 total vascular plant genera with at least some available ITS1 species records. The eDNA results contained 47 vascular genera, with 41 genera in common between the two lists. Of the six "false positive" genera found in the eDNA results that do not appear on the area species list, three are crop or widespread weedy species (e.g., guar, soybean, ragweed) that may well contribute material to the area without appearing on local species inventories (Supplementary Table 2). The remaining OTUs belong to algae and other non-vascular plants, including microscopic taxa.
Our attempts to compare our lake-specific macrophyte results to previous, unpublished plant surveys 4 yielded mixed results, as might be expected given the age of the surveys. No more recent lake-specific plant surveys were available to us for comparison.

Spatial Taxon Distribution
Spatial distributions of the taxa identified by eDNA sequencing show that 150 of the 209 major taxa are found in all four distancedepth compartments of the dataset (Figure 3). Major taxa are those appearing in five or more samples, as defined as above for purposes of beta diversity analysis. None of these major taxa are unique to a single compartment. Figure 4 contrasts numbers of taxa in various NCBI "taxon categories" with their read prevalence. This set of NCBI taxon categories are occasionally non-intuitive due to the presence of generic-seeming basal groups such as "flowering plants"; nevertheless, these high-level categorizations are a useful rough overview of the distribution of taxon types in the dataset. A majority of total taxa identified are algae and phytoplankton ( Figure 4A), but those categories are less dominant in number of reads ( Figure 4B).
FIGURE 4 | Study taxa by NCBI taxonomic categories identified via ITS1 barcode sequencing in six lakes. These treemaps (Shneiderman, 1992) are proportional representations of vascular plants (darker green) and non-vascular plants/algae (lighter green) by (A) numbers of taxa identified vs. (B) read abundance. Note the variation in dominance in subcategories like "eukaryotes" (generally photosynthetic protists) and "flowering plants" (largely aquatics in the Nymphales and Ceratophyllales, predating the monocot/dicot split). The "green plants" in the non-vascular category are in the Zygnematophyceae, or conjugating algae, a close outgroup to the land plants.

Vascular Plant Taxa by Wetland Status
Of the 63 individual vascular plant taxa detected via eDNA sequencing, only 63% (40 taxa) are aquatic species or terrestrial species classified as hydrophytes according to United States federal wetland delineation guidelines (U.S. Army Corps of Engineers, 2018; Table 2 and Figure 5). The other, substantial, portion of taxa (23) occupies non-aquatic classifications, indicating that a wide range of species contribute DNA to the lake environments. However, aquatic species heavily dominate the read abundances, and hydrophyte species also have read abundances disproportionately higher than their taxon representation would indicate ( Table 2). The absolute number of sequencing reads from vascular plants was insufficient for statistical testing of vascular-specific diversity differences within or among lakes: 793K of 1.9M total reads, spread across 72 samples, with two aquatic plant taxa accounting for over half the vascular plant reads.

Alpha Diversity
Alpha diversity among identified taxa varied substantially among the six lakes, with mean Shannon index values ranging from 1.01 to 1.83 ( Figure 6A). The inverse Simpson's diversity index (1/D; not shown) was consistent with the Shannon results (Spearman correlation ρ = 0.943). Mean alpha diversities of surface vs. benthic samples showed little difference (Figure 6B). Mean nearshore diversity was slightly though significantly higher than offshore diversity by both indices, with a p-value of 0.0026 for the difference in Shannon-index means ( Figure 6C).

Beta Diversity and Community Composition
Community characterization by PERMANOVA was performed on the data from six lakes, across each spatial dimension, using lake identity as a blocking factor. Simultaneous analysis of all factors showed a dominant effect of lake identity as the primary explanatory variable for community composition (R 2 = 0.5109, p < 0.001). This substantial effect was followed by small effects of distance and depth (R 2 = 0.0277 and R 2 = 0.0373 respectively, with p < 0.001). The distance-depth interaction showed an additional small yet statistically significant effect (R 2 = 0.0137, p = 0.014) ( Table 3).
Four of the six individual lakes also showed moderately-sized internal patterns with regard to depth and shore distance, with statistically significant R 2 -values of up to 0.206 for depth and 0.414 for shore distance ( Table 4). Effect sizes in each lake showed no significant correlations with per-lake alpha diversity, nor with physical lake characteristics such as longitude, latitude, water clarity, or maximum lake depth, as determined by Spearman correlation (Supplementary Figure 2).
FIGURE 5 | Prevalence of wetland designations among vascular plant taxa categories identified by eDNA sampling in four spatial compartments. "Aquatic," "Wetland," and "Upland" correspond to the three aggregated statuses in Table 2B above. Category labels are the four spatial sampling compartments (depth/distance combinations).   NMDS plots visually confirmed the beta diversity patterns, showing lakes scattered across the plot but distances and depths largely overlapping with only small differences in the overall shapes of the clusters (Figure 7).

DISCUSSION
Environmental DNA (eDNA) metabarcoding has grown rapidly as a novel but repeatedly proven tool for species surveillance (Rees et al., 2014;Thomsen and Willerslev, 2015). Our study is among the first to assess freshwater plant community diversity using eDNA metabarcoding and to address clear hypotheses about eDNA sampling location in lake community surveillance . Even though plant community ecology is a fundamental field in biology, most plant-targeted eDNA studies have focused on a single species, largely in the context of invasives detection (Scriver et al., 2015;Fujiwara et al., 2016;Matsuhashi et al., 2016;Gantz et al., 2018). We report here on the success of a community inventory and its implications for future study design.

Spatial Distribution of Taxa Within and Among Lakes
The largest community assemblage differences in this study appear in the whole-community comparisons among lakes, which is consistent with traditional sampling approaches for lakes and ponds that find environmental heterogeneity to be the strongest predictor of beta diversity (Alahuhta et al., 2017). However, small beta diversity signals are also detectible in the spatial variables combined across the six lakes (Table 3). Moreover, the community compositions internal to 4 of the 6 lakes showed significant small or moderate differentiation in one or both spatial dimensions within the individual lake communities themselves ( Table 4). The depth component of the signal is as strong as it is expected to be at any point in the year, according to an eDNA study on fish in a large English lake which confirmed the presumption that vertical habitat differentiation is strongest in the summer for temperate stratified lakes .
Due to the small but significant absolute magnitude of the spatial patterns (< 5%), we surmised that they might relate to subtle ecological effects, either from inherent characteristics of each lake or the effect of those characteristics on eDNA transport and degradation patterns. Accordingly, we investigated potential relationships between various characteristics of each lake and the sizes of distance and depth effects in that lake, testing for correlation between lake characteristics (position, surface area, maximum depth, and water clarity) and sequencingbased diversity metrics (alpha diversity, and magnitude and significance of depth and distance effects). No correlation between a lake characteristic and an alpha or beta diversity metric reached statistical significance either before or after correction for multiple testing. The lack of relationships between lake size and strength of spatial patterns causes us to reject our hypothesis that larger lakes will have clearer spatial structure in eDNA profiles.
We believe that our single-point sampling was sufficient to discover the existing spatial patterns in these lakes, given that several effect sizes were small but nevertheless highly FIGURE 7 | Visualizations of diversity among lake communities based on 12 spatially varied eDNA samples from each of six lakes. Each point represents the read-abundance-based community of one sample from the indicated lake. All plots are based on NMDS ordination of Bray-Curtis distances calculated from log + 1-transformed read abundances. (A) Diversity among the six lakes, showing substantial clustering of points within lakes, corresponding to the R 2 -value of 0.5109 ( Table 3). The bottom two plots are the same points grouped (B) by depth category and (C) by shore distance category, showing the substantial overlap between the 36 sampling locations in one category vs. the other. significant (p< 0.001). Regardless, in the future, we recommend taking multiple technical replicates and/or performing replicate PCRs for each sample to reduce the PCR-related false negative rate. Field technical replicates and PCR replicates would also increase the overall likelihood of detecting rare taxa that would be especially informative about site differences (Leray and Knowlton, 2017).
Our investigation of distance and depth patterns among our unreplicated lake eDNA point samples demonstrates two subtle and opposing effects. First is that low levels of unique taxa are present in each compartment, and therefore that point sampling may not generate an exhaustive species inventory where one is needed. Secondly, however, is an overall indication that much of the eDNA originating in each compartment is well-distributed throughout a small lake, which indicates that point sampling may be adequate for many other purposes. Several individual taxa per lake would have been missed had sampling been limited to one of the four spatial compartments, and therefore specific applications such as time-series comparisons would clearly benefit from consistency in sampling location. Alternatively, for basic lakewide biological inventories, pooling samples from multiple location types may help to increase overall numbers of species detected without increasing the amount of laboratory work needed. Finally, the spatial information itself may be useful for certain types of detailed investigation such as those of ecological gradients or localized populations. However, the effects of sampling position were small in magnitude, and many taxa were found in common among spatial compartments, which indicates that community composition results were reasonably robust to the choice of sampling position. Our resulting recommendations are summarized in Table 5. Ideally, eDNA sampling projects of this type should perform pilot studies to determine the relative contributions and importance of spatial variation to the local eDNA environment.

Terrestrial and Aquatic Plant Diversity and Distribution
Broad airborne distribution of terrestrial plant DNA has been demonstrated to occur, largely mediated by particles other than pollen (Parducci et al., 2017;Sjögren et al., 2017;Johnson et al., 2019). Of the 77 individual vascular plant taxa detected by eDNA analysis in this study, fewer than half are classified as obligate or facultative upland species according to United States federal wetland delineation guidelines (U.S. Army Corps of Engineers, 2018). At this qualitative level, we therefore conclude that the vascular plant eDNA in these lakes is not a general homogenous sampling of the entire surrounding terrestrial area, but rather that the lakes' eDNA content is enriched in DNA from their local aquatic and shoreline species. However, our hypothesis that nearshore and surface environments will contain more terrestrial plant eDNA is not entirely supported because there is no marked increase in the relative numbers of upland species near the shore ( Figure 5C). In fact, upland species are proportionately best represented in the offshore-surface compartment, perhaps owing in part to the lower overall read abundance in those samples ( Figure 5B). Another possible explanation for the offshoresurface concentration is that windborne plant detritus settles relatively evenly across the surface of a lake, but once it sinks it is bound to or buried in sediment and less available for sampling in the water column.
The distribution of alpha diversity (Figure 6) across the two spatial dimensions is also somewhat unexpected, showing a small increase in diversity for the nearshore samples as a whole, but no distinction between surface and deep samples. This result partially rejects our hypothesis that diversity is greater both near shore and in shallower water due to light and habitat availability. However, it is conceivable that surface-area eDNA is harder to detect and identify, since ultraviolet light exposure may speed the breakdown of eDNA (Barnes and Turner, 2016).

Overall Taxon Distributions by Lake and Sample Location
Because taxa differ in their DNA shedding rates due to differences in their habitats and physical characteristics (Barnes and Turner, 2016), numbers of sequencing reads recovered from eDNA may not be representative of the absolute numbers or biomass of their associated taxa (Buxton et al., 2017;Fonseca, 2018). The most conservative approach for interpreting highly varied sets of taxa sampled by eDNA methods is therefore to convert read abundances to presences and absences (Ransome et al., 2017). However, we retained the read-number information in our analyses, because relative abundances of a given taxon can be compared among comparably-handled samples (Grey et al., 2018) as long as care is taken not to draw inferences across taxa. Regardless of organism-specific detection and quantification biases, comparable sets of eDNA results still generate "fingerprints" of communities that can be usefully compared and contrasted at a statistical or even ecological level (Leray and Knowlton, 2015).
Among the six studied lakes, three have their largest numbers of reads corresponding to Nuphar variegata (yellow or variegated pond-lily), an emergent aquatic plant. The other Our study used single field samples with no PCR replication and a variety of sample depths and distances. Based on the results we achieved with this design, we suggest these optimal designs for future studies in similar systems. three are dominated numerically by algae, either single-celled (e.g., Peridinium) or colonial (e.g., Chrysophyceae, Synura). As mentioned above, read-number dominance does not indicate that pond-lilies or algae are absolutely dominant in area or biomass in these respective lakes; but the relative differences illustrate the inter-lake variability that gives rise to our ability to characterize one sample's community as different from another. Future work should look for correspondences between lake characteristics and read dominance by vascular plants vs. algae, to determine whether these read number relationships are accurately describing lakes with more or less emergent vegetation or are correlated with any known lake characteristics. Such correspondences could give rise to methods for interrogating other important lake characteristics such as primary production and eutrophication (Hilt et al., 2017;Poikane et al., 2018). When a study's primary goal is to develop a reasonably complete catalog of an area's biodiversity, it is currently recommended to combine eDNA surveys with traditional sampling (Cowart et al., 2015;Olds et al., 2016;Shaw et al., 2016). The two methods have some complementary biases, and each will likely miss a significant number of taxa that are captured by the other (Nguyen et al., 2020). This two-pronged approach may be difficult in environments with barriers to observational sampling, such as remote or marine environments or environments that contain a large proportion of undescribed taxa. However, where the goal is, as in this study, to investigate broad patterns rather than individual species, sampling with consistent eDNA methods can provide useful information even without detailed comparison to conventionally established "ground truth." When necessary, fingerprints based simply on the presence and relative abundance of genotypes can be usefully compared among sample sites at an ecological or functional level, even when species lists are not complete and sequences are not fully identified (Leray and Knowlton, 2016).

Comparisons With Known Species Inventories
UNDERC maintains lists of terrestrial and aquatic plants found in the general area covered by the research center, and we compared these catalogs with the vascular-plant portion of our eDNA results. Fewer than a third of the vascular plants on the UNDERC area list have species-level ITS1 reference sequence records at NCBI and many taxa were therefore unavailable for identification using the ITS1 marker in our study. Of the many OTUs that were discarded due to identification failure, many presumably belong to these taxa, or to undescribed or unsequenced algae. However, at the genus level, the plants that were identified by eDNA are largely a subset of the listed area genera, indicating that detection and identification of local plants was generally successful. Only six taxa are left as false positives relative to the area species list, and most have reasonable explanations such as being common food or industrial species (guar, soybeans) or species found in the surrounding area (mulberry) (Supplementary Table 3). No lake-specific plant inventories are available from the year in which the eDNA was sampled. Comparisons with earlier macrophyte inventories from 1996 to 1997 were mixed, but this could reflect differences in timing and sampling locations as well as the passage of time. See Supplementary Tables 4, 5 for presence/absence information per lake and taxon.
One clear result is that the eDNA results contain no trace of four of Michigan's most common non-indigenous invasive lake plants, despite the ITS1 sequences of those plants being available and susceptible to capture by the ITS1-F/-R3 primer set in silico. These species are common reed (Phragmites australis), purple loosestrife (Lythrum salicaria), curly-leaf pondweed (Potamogeton crispus), and Eurasian watermilfoil (Myriophyllum spicatum) (O'Neal and Soulliere, 2006). These species have, in fact, not been historically observed at UNDERC, probably due to strict enforcement of boat transfer and other measures for preventing invasive species transport. This finding establishes what may be a useful historical baseline for future sampling of these frequently researched lakes, and is a demonstration that careful management of lake use can prevent establishment of exotic species (Trebitz et al., 2017).

Performance of eDNA Sampling Methods in This Study
Overall, the study supports the premise that eDNA metabarcoding is accurate and specific, having captured a substantial number of the local area's known plant genera. Many area plants were not detected, representing false negatives that are largely due to reference database gaps, but could also potentially have been caused by primer biases, sampling methods, and/or seasonality. Specifically, it is possible that our broad assay preferentially amplified the generally shorter ITS1 regions of algae at the expense of vascular plants. On the other side, however, false positives were low. Because our main focus is on diversity metrics rather than species inventory, we retained some doubtful or out-of-range individual species as placeholders for close relatives. Only six genera remained outstanding as false positives relative to the list of known species (Supplementary Table 3). Species accumulation curves (Supplementary Figure 3) show a moderate approach to the asymptote, indicating that many though not all of the available species were detected.
An advantage of the eDNA sampling approach is that difficulties such as reference database incompleteness are not permanent ones, because methods in the capture and analysis of eDNA are rapidly improving (Lacoursière-Roussel and Deiner, 2019). Sequences generated years previously can be reanalyzed and reidentified as often as necessary to provide comparison datasets and historical information. Statistical methods are continually advancing, such that analysis may be repeated with better error correction, more sophisticated algorithms, and more complete reference databases, or simply to ensure analytical consistency with more recent studies.

CONCLUSION AND FUTURE WORK
We rejected part or all of our initial hypotheses about the spatial distribution of plant eDNA in these lakes, and therefore conclude that sampling of plants in small temperate lakes of this type is robust to choice of sampling location. Given the subtle differences within lakes, but broad differences among them, the methods employed in this study should perform well for many purposes such as community fingerprinting and timeseries monitoring even when limited to easily accessible shore sampling sites. A comprehensive inventory of a given smalllake community, with the intent of capturing the largest possible number of taxa, may nevertheless benefit from spatial diversity in sampling, especially when combined with adequate technical replicates; pooling samples across depth and shore distance may be the best strategy for maximizing taxon recovery. Further work is suggested to quantify the role of small lakes as significant repositories for terrestrial as well as aquatic plant DNA, and to characterize the functional significance of the patterns found in this and similar studies.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories: NCBI PRJNA693787.