Whole Genome Shotgun Sequencing Detects Greater Lichen Fungal Diversity Than Amplicon-Based Methods in Environmental Samples

In this study we demonstrate the utility of whole genome shotgun (WGS) metagenomics in study organisms with small genomes to improve upon amplicon-based estimates of biodiversity and microbial diversity in environmental samples for the purpose of understanding ecological and evolutionary processes. We generated a database of full-length and near-full-length ribosomal DNA sequence complexes from 273 lichenized fungal species and used this database to facilitate fungal species identiﬁcation in the southern Appalachian Mountains using low coverage WGS at higher resolution and without the biases of amplicon-based approaches. Using this new database and methods herein developed, we detected between 2.8 and 11 times as many species from lichen fungal propagules by aligning reads from WGS-sequenced environmental samples compared to a traditional amplicon-based approach. We then conducted complete taxonomic diversity inventories of the lichens in each one-hectare plot to assess overlap between standing taxonomic diversity and diversity detected based on propagules present in environmental samples (i.e., the “potential” of diversity). From the environmental samples, we detected 94 species not observed in organism-level sampling in these ecosystems with high conﬁdence using both WGS and amplicon-based methods. This study highlights the utility of WGS sequence-based approaches in detecting hidden species diversity and demonstrates that amplicon-based methods likely miss important components of fungal diversity. We suggest that the adoption of this method will not only improve understanding of biotic constraints on the distributions of biodiversity but will also help to inform important environmental policy.


INTRODUCTION
Microbial diversity present in the environment is recognized increasingly for its important and varied roles in the health of ecosystems (Chen et al., 2018;Nottingham et al., 2018;Pike et al., 2018), particularly in the face of a changing climate (Cavicchioli et al., 2019).Unsurprisingly, a great deal of focus has been on quantifying biodiversity-the number, identity, and functions of species (Gotelli and Colwell, 2001;Faith, 2002;Barlow et al., 2007).It has also helped researchers more fully document the ranges of rare or endangered species through environmental DNA detection strategies (Olson et al., 2012;Thomsen et al., 2012;Spear et al., 2015).Further expanding the impacts of this relatively new field, microbial metagenomics has proven exceptionally useful toward informing remediation strategies of disturbed habitats such as ecologically sensitive biological soil crusts (BSC; Bowker, 2007;Steven et al., 2012).Failure to fully understand the microbial (biotic) community can thus dramatically limit understanding of what structures ecological interactions, species distributions, and environmental sustainability, all of which can, in turn, negatively impact informed conservation decision making (Guisan et al., 2013).
Increased accessibility and affordability of high throughput sequencing, has facilitated broad scale exploration of microbial community structure (Logares et al., 2012(Logares et al., , 2014;;Chen et al., 2017;Zhang et al., 2018).At present, the majority of broadscale biotic diversity assessments employ primarily cultureindependent, amplicon-based sequencing (Petrosino et al., 2009;Mande et al., 2012;Uyaguari-Diaz et al., 2016).This method relies on sufficiently variable, universally present regions of the genome, or "barcoding loci" (Hebert and Gregory, 2005;Kress and Erickson, 2008).Such loci must, first and foremost, be unique enough to yield distinctions between species present in a sample (Kolbert et al., 2004).A proliferation of bioinformatic pipelines developed for barcode sequencing has resulted in widespread capacity to analyze microbial diversity and community structure present in a variety of different environments, ranging from soil to the human body (e.g., QIIME; Caporaso et al., 2010;Kuczynski et al., 2012;Navas-Molina et al., 2013).These pipelines rely primarily on the 16S ribosomal DNA (rDNA) gene as a target for PCR amplification and subsequent sequencing to distinguish species (Winker and Woese, 1991;Kolbert et al., 2004;Petrosino et al., 2009).However, shortcomings of such amplicon-based approaches include moderate to extreme amplification bias (Acinas et al., 2005;Wang and Qian, 2009), thus effectively investigating only a fraction of total standing diversity.
Lichens are a species-rich and evolutionarily heterogeneous assemblage of fungi that form obligate symbioses with a minimum of one primary photosynthetic partner, often in addition to other endolichenic fungi, algae, and bacteria (Ahmadjian and Jacobs, 1981;Seaward, 1997;Brodo et al., 2001;Papazi et al., 2015).Lichens are highly successful and ecologically important, as is evidenced by their abundance and diversity in terrestrial ecosystems around the world (Hawksworth, 1991).Along with bryophytes, cyanobacteria, and non-lichenized fungi, lichens are a crucial component of biological soil crust communities and function prominently in ecological restoration processes (Belnap, 2001;Belnap and Lange, 2001;Thompson et al., 2006;Bowker, 2007).In addition to their pivotal ecological contributions to such communities, lichenized fungi have relatively small metagenome sizes (Armaleo and May, 2009;Tripp et al., 2017), making them ideal targets for costeffective genomics projects (Allen et al., 2018;Brigham et al., 2018;Funk et al., 2018;Pogoda et al., 2018Pogoda et al., , 2019)).Given the above, lichens serve as an excellent system in which to explore the factors that constrain the establishment and development of obligate symbioses in nature, including those prevalent among soil crust communities.Such factors span the dynamics of propagule dispersal, the distribution and establishment of individual symbionts in the environment, and biotic interactions between extant symbionts in a given environment.To date, however, few studies have explored such avenues of research, but these have relied entirely on amplicon-based sequencing methods (Banchi et al., 2018;Eaton et al., 2018;but see, e.g., Tripp et al., 2017;Pizarro et al., 2019).
This shortage of fungal genomic resources broadly, and lichen genomic resources more specifically, makes it challenging to investigate key questions about lichen ecology, evolution, genetics, and physiology.Moreover, existing studies that have investigated fungal metagenomic communities, like bacteria, primarily relied on amplicon-based approaches.Given known complications arising from amplification bias (Acinas et al., 2005;Wang and Qian, 2009), one potential solution is to forego amplification of barcoding loci and instead utilize data from whole-genome shotgun sequencing (WGS).This method avoids classical PCR amplification biases but has been little employed to date, likely as a function of one to several challenges.In addition to increased costs of WGS relative to amplicon-based methods, the lack of developed, publicly available reference databases (e.g., complete or nearly complete rDNA complexes) as well as a paucity of bioinformatics pipelines have limited the utility of WGS as a primary tool with which to approach fungal and other microbial metagenomic research.
In this study, we construct and then employ a new rDNA database spanning 273 species of lichenized fungi, built from a metagenomic survey of lichens in the southern Appalachian Mountain biodiversity hotspot, which is characterized by stark abiotic gradients and is home to over a thousand species of lichens (Dey, 1978;Brodo et al., 2001;Hodkinson, 2010;Lendemer et al., 2013;Tripp and Lendemer, 2019a,b;Tripp et al., in press).Coupled with development of a new bioinformatic pipeline, we identify lichen fungal symbionts present in WGS metagenomic environmental samples and then compare the efficacy of our approach against traditional ITS1-based amplicon sequencing of the same samples.We place our results in the broader framework of intensive biodiversity inventories of lichens at the same plots from which environmental samples were taken.Drawing on resulting data, we demonstrate that detection of symbionts using amplicon-free methods, here WGS, detects more species than amplicon-based methods.We introduce a new biodiversity metric, "Potential of Diversity" (hereafter PoD), which refers to the potential for species to occur at a given site in a given study area, regardless of whether the species is actually present at this site as determined by traditional taxonomic inventory.In the present study, PoD specifically refers to the ratio of lichen fungal symbionts detected on bare surfaces in the environment that would serve as biotic partners in subsequent lichen symbiosis (i.e., the potential lichens that could occur in a study plot based on presence of the required fungal symbiont).In this context, PoD is a useful metric in which to fully consider biotic constraints that limit establishment of obligate symbiotic organisms.We compare this metric to the number of lichens detected as established symbioses (i.e., the lichens that visibly occur at a plot, henceforth "TD" for taxonomic diversity).

Study Area, Field Plots and Field Sampling
This study was carried out in two one-hectare plots located in Great Smoky Mountains National Park, in the southern Appalachian Mountain Biodiversity Hotspot of eastern North America (Figure 1).The plots were selected to span the two extremes of a stark elevational (one high, one low) and ecological (bottomland hardwood vs. cloud-laden forest dominated by conifers) gradient in the region, so as to maximize the difference in extant lichen communities (i.e., minimize overlap) as well as potential environmental symbiont pools between the plots.The high-elevation (2,014 m) plot was located on the summit of Clingman's Dome in sprucefir forest along the border of Swain County, North Carolina and Sevier County, Tennessee.The low-elevation (670 m) plot was located at White Oak Branch in mixed-hardwood forest above the north shore of Fontana Lake, in Swain County, North Carolina.
Prior to environmental sampling, data pertaining to several ecological variables (e.g., tree DBH, woody plant inventory, habitat quality assessment, following Tripp et al., 2019) were recorded in order to delimit the plant communities and to ensure maximum difference in lichen communities (see above).Both plots were delimited to be uniform in vegetation type within a plot (i.e., not spanning more than one ecotone).In each plot, a full inventory of lichen species was conducted [carried out by JL, vouchers deposited in the herbaria of the New York Botanical Garden [NY] and University of Colorado, Boulder [COLO]]; methods following (Tripp et al., 2019).In each plot, 16 environmental samples were obtained by swabbing the surfaces of eight rocks and eight trees (yielding a total of 32 environmental samples) for 30 seconds with a sterile toothbrush (size, aspect, and identification were recorded for each rock and tree type; see below for additional details).In order to avoid sampling surfaces with artificially inflated propagule counts, such as spore deposits along river beds and bogs, we chose only vertically-oriented, bare surfaces of rocks and trees (i.e., free of any visible growth besides the bark of the tree, when applicable).

Thallus Collection
To facilitate direct comparison of amplicon-based sequencing to WGS-based sequencing of the 32 environmental samples, we first obtained lichen voucher specimens for species present throughout the Southern Appalachian Mountains (Table S1) as part of a system-wide investigation of drivers of lichen biodiversity and distributions in the region, including potential biotic constraints (e.g., presence of symbionts) such as those herein investigated.These samples were collected in order to build a new genomic reference database for lichens of the region (see below).Samples were collected and identified by JCL and EAT between December 2016 and January 2018.All lichen voucher specimens are deposited at NY and COLO (Table S1).Efforts were made to sample only single thalli for both macro-and microlichens.For macrolichens, ca. 1 × 1 cm of thallus was removed, targeting the margins and lobes.For microlichens, thallus was scraped from rock or tree substrates using a sterile razor blade.Samples were air dried in a laminar flow hood for 24 h then frozen at −20 • C until transport to the University of Colorado for DNA extraction and subsequent sequencing.

Metagenomics Sampling Scheme
To quantify the number and identification of lichenized fungi present in the 32 environmental samples as well as assess differences between amplicon-vs.WGS-based sequencing approaches, we collected samples from eight rock plus eight tree surfaces within each hectare (n = 16 per plot).Substrates were randomly swabbed among those available, and only bare surfaces lacking visible bryophyte or lichen thalli were chosen.Standardized stencils of 10 × 10 cm were placed against the substrate and an individually packaged, sterile toothbrush was used to swab the surface for 30 s.The toothbrush containing the sample was then sealed in a sterile plastic bag.To process the samples, the bag was opened, the bristles were cut from the brush using a sterile blade, and a hole was cut into the corner of the bag using sterile scissors.The opened corner of the bag was placed into a sterile 1.5 mL microcentrifuge tube and the bristles were directly transferred into the tube.These samples were stored at −20 • C until transport to the University of Colorado for subsequent extraction and sequencing.

DNA Extraction and Whole Genome Shotgun Sequencing
For both the 32 environmental samples as well as lichen species vouchered in order to build a new reference genomic database, dried samples were pulverized using tungsten carbide bearings in a Qiagen 96-well plate shaker.Genomic DNA (gDNA) was extracted from lichen thallus samples and toothbrush bristles using a Qiagen DNeasy 96 plant kit.Individual samples were transferred from 1.5 mL microcentrifuge tubes into 96 well plates used in the Qiagen kit.The manufacturer's protocol was modified to include a 10 min 65 • C incubation step for ground material in lysis buffer as well as a 100% ethanol wash before final drying of the membrane prior to elution.Preliminary study found that these modifications improved lichen gDNA concentration and purity (Pogoda et al., 2018).Extracted samples were stored at −20 • C prior to subsequent library preparation.
Whole genome shotgun sequencing was conducted on a total of 494 lichen thallus libraries [these collected throughout the southern Appalachian study area (Figure S1)] and 32 environmental samples on the Illumina NextSeq R .Each of the gDNA samples was prepared using the Nextera R XT DNA library prep kit, which is optimized for 1 ng of total input DNA.Each sample was uniquely tagged using the dual index adapters, Nextera R i5 and i7.Libraries prepared for sequencing on the NextSeq R utilized Illumina PhiX v.3 as a control and samples that passed QC were processed for paired-end 151 base pair reads on an Illumina NextSeq R sequencer at the University of Colorado's BioFrontiers Institute (Boulder, Colorado).

ITS1 Sequencing
To compare results of WGS-based sequencing to ampliconbased sequencing, amplification by PCR was performed on the 32 environmental samples using the ITS1-F (5'-CTTGGTCATTTAGAGGAAGTAA) (Gardes and Bruns, 1993) and ITS2 (5'-GCTGCGTTCTTCATCGATGC) (White et al., 1990) primers.Libraries were prepared for sequencing on the MiSeq R utilized Illumina PhiX v.3 as a control.Samples that passed QC were then processed for single end 251 base pair reads on the Illumina MiSeq R sequencer at the University of Texas' Genomics Sequencing and Analysis Facility (Austin, Texas).

Genome Assembly and Reference Genomic Database
Libraries were filtered with Trimmomatic-0.36 to trim adapters from reads, and with parameters "LEADING:3 TRAILING:3 MINLEN:100" (Bolger et al., 2014).Filtered reads were then assembled using SPAdes 3.9.0 with parameters "-careful -k 21,33,65,81" (Bankevich et al., 2012).This study utilized the whole nuclear ribosomal DNA (rDNA) complexes that were obtained from the de novo genome assembly of lichens that were collected as part of a broader study of lichen diversity in the southern Appalachian study region (Keepers et al., unpub. data).The rDNA complex is easily assembled due to its high copy number in the nuclear genome, and is long (i.e., >5,000 bp, in comparison to amplicon-based sequencing of ITS1, which is <500 bp), providing a larger target onto which sequenced reads may map.Moreover, the high-copy nature of the locus provides many opportunities per propagule to be counted in the downstream analyses.The rDNA complexes for each sample were identified by conducting a BLAST search of the rDNA complex of the trebouxioid algal photobiont from Cladonia uncialis against each of the assemblies.The algal rDNA was used in the search rather than a mycobiont sequence to avoid bias in BLAST hit length due to phylogenetic similarity.To identify complete or mostly complete rDNA complexes from the BLAST tables, contigs were required to have two or greater distinct hits and the span of these hits were required to be >1,000 bp in length.Sequences were parsed from the assemblies based on the nucleotide positions of the BLAST hits and oriented with the 18S in the 5' direction.

Phylogenetic Methods
Each putative rDNA contig parsed from the assemblies was BLAST searched against the NCBI non-redundant database.Any sequences whose best BLAST results mapped to non-lichenized fungi or to non-fungal species were excluded.To further vet the identities of the contigs in the database, sequences were aligned using MUSCLE aligner v3.8.31 (Edgar, 2004) for downstream phylogenetic analysis.The resulting preliminary alignment was trimmed to contain only highly homologous regions, then further trimmed to only sites for which >90% of taxa in the dataset contained sequence data.A best estimate phylogenetic tree was inferred under a model of GTR+I+Γ using MrBayes (Huelsenbeck and Ronquist, 2001;Ronquist and Huelsenbeck, 2003) with 10,000,000 MCMC generations and gaps treated as missing data.The first 25% trees were treated as burn-in and excluded from further analyses; eight runs were implemented and 32 chains were utilized.All other parameters were left at default values and the average standard deviation of split frequencies had converged to less than the recommended 0.01 by the end of the analysis.The trees from the posterior distribution were used to generate a majority consensus tree, upon which we mapped posterior probabilities.
This preliminary consensus tree was visualized using FigTree v.1.4.3, upon which we determined that 34 samples were likely misplaced based on disagreement with established largescale phylogenic reconstructions of lichenized fungi (e.g., Miadlikowska et al., 2014).As such, these 34 samples were excluded from the alignment, which was then further pruned of duplicated taxon sequences such that the final matrix contained only one representative per species (and in each case, the longest rDNA contig was retained).The resulting final alignment included a total of 273 sequences and, after visual inspection following the same phylogenetic methods described above, was used as the reference database for subsequent queries of sequences obtained from sequencing of the environmental samples.

Read Mapping
The 251 bp amplicon reads were truncated to 151 bp to ensure comparability of amplicon-based sequences to WGS sequences.To exclude PCR primer sequence regions, amplicon reads were truncated to include only positions 50-200, representing the highly variable ITS1 sequence.Sequences from both the WGS and amplicon-based approaches were then aligned to the newly constructed reference rDNA database to generate short-read alignment maps.To ensure that reads uniquely mapped to a single locus in the database, several filtering steps were employed.First, only reads that mapped with a CIGAR score between 145 and 151 matches, corresponding to a mapping identity of between 96 and 100%, were retained.Second, only read-pairs for which both the left and right read mapped to the same contig were kept, and then only read-pairs for which both reads mapped with a SAM mapping quality of 31 or greater (out of a maximum of 60) were retained.Due to the highly conserved nature of portions of the coding regions of the ribosomal DNA complex, many reads belonging to species for which multiple congeners were represented in the database mapped almost equally well to multiple species.In these instances, the mapping score was bolstered by the estimated read separation.Thus, a higher SAM CIGAR score of 150 or 151 was required to retain reads that also mapped well to a member of the same genus.

Species Accumulation Curves
To facilitate comparison of species diversity and accumulation as assessed from environmental samples, species richness rarefaction curves for each of the 32 samples falling under four sampling regimes (i.e., eight each from high elevation rock, high elevation tree, low elevation rock, low elevation tree) were generated using the Diversity Stats calculator in EstimateS 9.1.0(Colwell, 2013).Rarefaction curves were bootstrapped by randomizing the sample order 100 times.

Comparison of Environmental Sampling Methods and Expert-Based Inventory
To assess the congruence between the two metagenomic methods (i.e., WGS vs. amplicon) of detecting symbionts in environmental samples (potential of diversity, or "PoD") as well as congruence of both methods to expert-based inventory (Coddington et al., 1991;Sørensen et al., 2012) of species found growing in each plot (taxonomic diversity, or "TD"), we pooled the taxa detected in all samples collected in each of the four sampling regimes (high rock, high tree, low rock, low tree) to produce a single list of species detected through a given method) and calculated Jaccard indices in each inventory.These were derived from a presence/absence matrix that consisted of the species present in the rDNA reference database that were found at each of the two elevation extremes via (1) the expert-based lichen biodiversity inventory, (2) the lichenized fungi detected in environmental samples using amplicon-based sequencing, and (3) the lichenized fungi detected in environmental samples using WGS sequencing.Similarity between the lichens detected with the different methods is reported below as J Treatment,Treatment (e.g., J WGS,Vouchered is similarity of lichenized fungi detected by WGS of environmental samples compared to that detected by expert-based inventory).

Ribosomal DNA Database
The final reference genomic database of lichen rDNA contained complete or nearly complete ribosomal DNA complexes (NTS, ETS, 18S, ITS1, 5.8S, ITS2, 26/28S) for 273 unique species of lichenized fungi within Pezizomycotina, representing 25 orders and 57 families.The database spanned 1,770,139 bp of sequence, with an average contig length of 6,484 bp.

Detection of Lichenized Fungi Using Expert, WGS, and Amplicon-Based Inventory
Taxonomic diversity (TD: the number of vouchered species) of lichens found growing in the study plots totaled 57 species in the high-elevation plot and 83 species in the low-elevation plot, yielding a total of 136 species in both (n = 4 species occurred in both).Of these 136 species, 78 (∼60%) were represented in the rDNA database generated in this study and thus could potentially be matched to sequences from the environmental samples.Note that the additional species in the database were vouchered from other plots in the study area and were used to ascertain whether this method detected species that were not vouchered in the expert-based field inventory.
Mapping of paired-end WGS reads to the rDNA reference database resulted in the detection of a total of 94 lichenized fungi present in both plots: 43 species from the high-elevation plot and 71 from the low-elevation plot (Table 1).Conversely, using the traditional amplicon-based approach of mapping ITS1 reads to the rDNA reference database resulted in the detection of a total of 34 lichenized fungi present in both plots: 21 species from the high-elevation plot and 18 species from the low-elevation plot (Table 1).Species detection accumulated consistently faster by sample using WGS in all four sampling regimes (Figure 2).Of the 57 species vouchered during the inventory of the highelevation plot, 13 were detected using both sequencing methods although only seven of these species represented the same species between the two plots (Figure 3; Table 1).Conversely, of the 83 species that were vouchered during the inventory of the low-elevation plot, 17 were detected using either sequencing method, although only 4 were detected by both (Figure 3; Table 1).
The similarity between the community of lichens detected growing at the plots via expert-based inventory vs. the pools of lichen fungal symbionts species detected in environmental samples using WGS or amplicon sequencing are reported as Jaccard indices in Table 2.The similarity values for the assemblages of lichenized fungi detected in the environment at the high-elevation plot compared to those vouchered in the expert-based inventory (J WGS,Vouchered = 0.148, J AMP,Vouchered = 0.200) were roughly twice as high as their respective indices at the low-elevation plot (J WGS,Vouchered = 0.108, J AMP,Vouchered = 0.063).This result conveys a greater congruence between the PoD inventory based on WGS sequencing to the TD inventory, than the PoD inventory based on amplicon sequencing to the TD inventory.

DISCUSSION
Our results demonstrate that, in comparison to amplicon sequencing, the WGS approach detects a greater number of species that exist within an environment as propagules.When analyzed in tandem with data from expert-based taxonomic      inventories of the same locations, there is an added capability to compare the number and community composition of extant, fully formed symbioses that occur in nature, to those that could potentially occur based on the pool of symbionts in the environment.In turn, this PoD-to-TD comparison helps unravel biotic constraints on the distributions of biodiversity.The estimation of species counts and taxonomic diversity using WGS metagenome sequencing has been conceptually validated in planktonic microbial communities (Poretsky et al., 2014).However, to our knowledge, the application of this method to macro-eukaryotes is only just emerging (Donovan et al., 2018).This study provides a substantial increase in the number of rDNA sequences from lichenized fungi now made available for future research.At the time of writing (April 2019), there were 779 complete (>3,000 bp) ribosomal DNA sequences within the Pezizomycotina (if "18S, " "complete, " and either "26S" or "28S" are required in the search) publicly available on GenBank.Accounting for the 49 complete lichenized fungal ribosomal DNA sequences that had already been submitted from the database curated here, our additional 224 new sequences represent a 29% increase in genomic resources for this locus for lichens.
Our workflow presented herein adds to the growing toolset of molecular contributions to biodiversity science.We have demonstrated the utility of using WGS metagenomic libraries to estimate the pool of available symbionts that are present in an environment.Given the expected ongoing decreases in sequencing cost (Schuster, 2007), we anticipate that WGS will be readily adopted in many study systems and organisms.

Understanding the Distributions of Lichens and Their Propagules
At both sampling plots (high-elevation and low-elevation), species were detected based on environmental (PoD) sampling that were not present based on TD sampling.In other words, lichen mycobionts were detected in the environmental samples but were not detected by the inventory of lichens growing at the site.There are two plausible explanations for these results.First, and likely applicable to most such instances, species detected in the environmental samples but not in the inventories were present only as propagules derived from other locations.It has been shown that the gametes of sexually reproducing lichens can disperse several kilometers (Ronnås et al., 2017), and even clonal propagules are capable of dispersing over a kilometer (Gjerde et al., 2015;Eaton et al., 2018), although these studies suggest that such long-distance dispersal events are rare relative to the total reproductive propagule output of any given individual lichen thallus.It is thus not unexpected that the bank of lichen propagules on a given surface could include representatives of a more diverse pool of lichen species, these derived from a broader geographic neighborhood, regardless of the suitability of the substrate for colonization by those species.Indeed, similar patterns of have been recovered from amplicon sequencing of lichen propagules from indoor dust samples in the United States (Tripp et al., 2016).
The presence of a lichenized fungal propagule alone represents only one element of the total biotic community needed for a lichen species to grow into a mature individual, with the presence or absence of other algal, bacterial, and or other fungal species potentially representing constraints to development of a given lichen thallus (see Tripp et al., 2019).Other constraints include abiotic factors such as elevational or precipitation regimes selected for by different lichen species.One example of this phenomenon in our dataset is Bulbothrix scortella, which was detected only in WGS sequencing of the environmental samples from the high-elevation plot.Bulbothrix scortella is a subtropical species that occurs only at low-elevations in the southern Appalachian Mountains (Hale, 1976;Lendemer et al., 2013).As such, the detection of the species with only WGS sequencing likely reflects the dispersal of a lichenized fungal propagule to the high-elevation plot (from nearby lowelevation habitats), where it is unlikely to become established and undergo further development owing to abiotic and/or biotic constraints.
A second and less likely explanation for our results is failure to detect the presence of a lichen using expert-based TD inventories.Lichens are widely recognized as microhabitat specialists with many species present as small or spatially restricted populations in a given geographic area (Peck et al., 2004;Belinchón et al., 2015;Boch et al., 2016;Dymytrova et al., 2016).Examples from our own work have demonstrated that in some instances, inventories of the same area by more than one collector will yield < 50% overlap in the cohorts of documented species (Lendemer et al., 2016).In the present study, Loxospora elatina is a species known to have a distribution restricted to temperate boreal forests (i.e., high elevation habitats in the southern Appalachian Mountains) in North America and Europe (Tønsberg, 1992;Lendemer, 2013).This species was detected by both sequencing methods in the high-elevation plot, but not found in the TD inventory.Thus, it is possible that in a limited number of instances, the species detected in the environmental samples but not in TD inventories were present in low abundance and not detected.However, our expert-based inventories for the broader research project under which the current study falls (i.e., investigating drivers of diversity and distributions of lichens in the southern Appalachian Mountains) target exhaustive sampling until full vouchering of the entire pool of species diversity has been accomplished (Tripp et al., 2019).Although replicate sampling of additional high-and lowelevation plots was not carried out in this study, we nonetheless detected clear differences in the cohorts of species between the two elevational extremes with all methods herein employed.
High-and low-elevations in the southern Appalachians host different lichen communities that share in common only a small number of species (Lendemer and Tripp, 2008;Allen and Lendemer, 2016;Muscavitch and Lendemer, 2016 , 2017;Tripp and Lendemer, 2019a,b;Tripp et al., 2019).Thus, differences between the inventories of the two plots are to be expected.Some of the differences in the cohorts of species detected through environmental sampling may reflect the overall dispersal limitation of lichen propagules that leads the propagule pool to be dominated by locally occurring species.
We hypothesize that at higher elevations one would expect the detection of fewer "long distance dispersal" events such as is described for B. scortella above, when compared to lower elevations.The ability of a propagule to disperse is influenced in part by gravity (Ronnås et al., 2017), and thus propagules will face an uphill battle in dispersing upward through passive wind dispersal.

Benefits and Limitations of WGS vs. Amplicon Sequencing
Our analyses detected approximately four times as many species using the WGS approach compared to the amplicon-based method in the low-elevation plot, and twice as many in the high-elevation plot.The higher number of species detected by WGS affords a stricter threshold for what counts as a detected species in the SAM file (e.g., by only counting species as detected if they occur at least a certain number of times throughout the SAM file, or by requiring a higher SAM mapping quality score before counting the species as detected).
The community present in a metagenomics sample contains the DNA of species that are represented by many cells in some cases, and potentially by a single cell in other cases.Such stark differences in input quantity from different species in the sample leads to potential issues for both WGS and amplicon environmental sequencing methods.While both methods rely on low abundance DNA being extracted in sufficient quality and quantity to be detected in downstream analysis, amplicon-based metagenomics suffers from difficulties in primer design, including issues with the use of universal primers (Acinas et al., 2005;Wang and Qian, 2009).Moreover, unequal (biased) amplification is of wide concern in skewing the distribution of amplicons in the PCR product (Acinas et al., 2005;Sipos et al., 2010).Conversely, one potential issue with WGS metagenomics is that for species with low abundance in the sample, the probability of sampling that species in the prepared library will scale inversely with the proportion of the prepared library that is sequenced.For these reasons, the cohorts of species detected in the community may differ minimally to substantially between the two methods.
One complication of our approach is the presence of group I introns (DePriest and Been, 1992;DePriest, 1993;Gargas et al., 1995) in the coding sequences (18S, 5.8S, and 28S) of many lichenized fungal ribosomal complexes.These introns may be horizontally transferred between species (Hibbett, 1996;Fitzpatrick, 2012;Roy and Irimia, 2012).Thus, it is possible that a read originating from a group I intron in the rDNA complex of a species that is not present in our dataset may appear to map to the database to a species that recently received a horizontal transfer of that intron.However, this will occur very rarely, and even less-so as the database improves, for several reasons.First, our heuristics for read filtering are fairly strict, requiring read-pairs to uniquely map with high percent-identity to members of the database.Unless the intron transferred very recently, the rapid evolution of such introns (Dujon, 1989;Gargas et al., 1995;Roy and Irimia, 2012) would prevent the false detection of a species within the database.Moreover, these introns may facilitate a higher rate of species detection, due to their rapid evolution relative to the coding regions that are too conserved to uniquely identify species in many cases.
Moving forward, we aim to implement the approach presented here to measure the congruence between the observed TD of lichens in the southern Appalachians and the PoD measured in the propagule bank, as one step forward in understanding potential biotic constraints on the distributions of lichen biodiversity.The environmental covariates that predict this congruence may enable the development of cost-effective conservation measures in ecologically sensitive keystone systems such as biological soil crusts (Eldridge, 2000;Belnap, 2003;Belnap and Lange, 2013).We propose that the WGS method presented here should be added to the toolset used by molecular taxonomists in pursuit of conservation and restoration efforts, in addition to facilitating more general understanding on what structures the distributions of species.The datasets and bioinformatic resources presented here represent an important set of new tools and approaches that can be used to address a broad range of questions.

DATA AVAILABILITY STATEMENT
The database consisting of the complete (or nearly complete) ribosomal DNA complexes of 273 lichen mycobionts is available at Zenodo record #3256645.The WGS and amplicon metagenomics libraries are published on the SRA database, SUB5873533.

FIGURE 2 |
FIGURE 2 | Bootstrapped rarefaction curves of the accumulation of species of lichenized fungi, comparing WGS-based rDNA approach (light gray bands) vs. amplicon-based ITS1 approach (dark gray bands), in the four sampling regimes in two plots: (A) high-elevation rock samples; (B) low-elevation rock samples;(C) high-elevation tree samples; (D) low-elevation tree sample (x-axis: number of environmental samples; y-axis: number of species of lichenized fungi detected).In all four sampling regimes, the WGS-based approached developed in this study detected greater species diversity than the amplicon-based approach.

FIGURE 3 |
FIGURE 3 | Venn diagrams comparing the numbers of species of lichenized fungi detected at (A) the low-elevation White Oak Branch plot and (B) the high-elevation summit of Clingman's Dome.Diagrams show an overall low overlap of detected species using the WGS approach (PoD), the amplicon approach (PoD), and the expert-based inventory approach (TD).

TABLE 1 |
Inventory of species of lichenized fungi collected at the high-elevation Clingman's Dome and low-elevation White Oak Branch plots.

TABLE 2 |
Comparison of the congruence between the species detected by either method and the vouchered species at each plot.