Combining Multiple Markers in Environmental DNA Metabarcoding to Assess Deep-Sea Benthic Biodiversity

Environmental DNA (eDNA) metabarcoding is an emerging tool to estimate diversity by combining DNA from the environmental samples and the high-throughput sequencing. Despite its wide use in estimating eukaryotic diversity, many factors may bias the results. Maker choice and reference databases are among the key issues in metabarcoding analyses. In the present study, we compared the performance of a novel 28S rRNA gene marker designed in this study and two commonly used 18S rRNA gene markers (V1-2 and V9) in estimating the eukaryotic diversity in the deep-sea sediments. The metabarcoding analyses based on the sediment surveys of the Okinawa Trough found that more eukaryotic taxa were discovered by 18S V9 than 28S and 18S V1-2, and that 18S V9 also performed better in metazoan recovery than the other two markers. Although a broad range of taxa were detected by the three metabarcoding markers, only a small proportion of taxa were shared between them even at the phylum level. The non-metric multidimensional scaling (NMDS) analysis also supported that communities detected by the three markers were distinct from each other. In addition, different communities were resolved by different reference databases (NCBI nt vs. SILVA) for the two 18S markers. Combining the three markers, annelids were found to be the most abundant (44.9%) and diverse [179 operational taxonomic units (OTUs)] metazoan group in the sediments of the Okinawa Trough. Therefore, multiple independent markers are recommended to be used in metabarcoding analyses during marine diversity surveys, especially for the poorly understood deep-sea sediments.


INTRODUCTION
About 8.7 million eukaryotic species are predicted to inhabit Earth with ∼2.2 million in the ocean, however, most of them are not described (Mora et al., 2011). Although there is so many species unknown, diversity continues to decline due to pressures from anthropogenic activities and climate change (Butchart et al., 2010;Dawson et al., 2011). Therefore, fast, efficient, and reliable methods are required for estimating ecosystem biodiversity. The environmental DNA (eDNA) coupled with high-throughput sequencing is an emerging tool to assess eukaryotic biodiversity (Bik et al., 2012a). The eDNA derived from the environmental samples (e.g., sediments, soil, water, etc.) is an efficient, non-invasive, and easy-to-standardize sampling approach (Thomsen and Willerslev, 2015). The eDNA metabarcoding method has been widely used in biodiversity surveys, however, there are still pitfalls and limitations for this emerging method (Ruppert et al., 2019).
One of the most important issues to be considered in metabarcoding studies is the maker selection, because different gene markers have contrasting coverage, resolution, and bias between taxa (Ruppert et al., 2019). The nuclear small subunit ribosomal RNA (18S rRNA) gene, has been popularly used in eDNA metabarcoding surveys largely due to the versatility of PCR primers for screening the entire eukaryotic domain (Leray and Knowlton, 2016), and several different hypervariable regions have been used in different studies (Fonseca et al., 2010;Bik et al., 2012b;Guardiola et al., 2015;Chain et al., 2016;Zhang et al., 2018;Djurhuus et al., 2020). In particularly, the V1-2 and V9 hypervariable regions of the 18S rRNA gene are mostly used to illuminate diversity of microbial eukaryotes (Fonseca et al., 2010;Bik et al., 2012b;Djurhuus et al., 2020). Despite being widely used in the eukaryotic diversity assessment, the 18S rRNA gene has its disadvantages. For example, some studies have found that the 18S rRNA gene appeared to underestimate the meiofaunal diversity when comparing to the mitochondrial COI (Tang et al., 2012). Makers such as the mitochondrial 16S rRNA gene or COI have often been used to recover specific metazoan groups (Djurhuus et al., 2020;West et al., 2020). The 28S rRNA gene has also been used to assess community diversity for metazoans (Machida and Knowlton, 2012), zooplankton (Hirai et al., 2020), or algae (Bittner et al., 2013).
The incomplete reference database is another important issue when conducting eDNA metabarcoding studies, because taxonomic assignment of the sequences relies heavily on the quality and completeness of the reference database. The public sequence repositories such as the GenBank database of the National Center for Biotechnology Information (NCBI) or a smaller database of reference sequences with trusted taxonomy such as SILVA (Pruesse et al., 2007) are often used to assign taxonomy (Bik et al., 2012a). However, inconsistencies in taxonomic classifications between different reference databases may lead to different taxonomic assignment. Recently, Balvociute and Huson (2017) compared four public databases, namely SILVA, RDP, Greengenes, and NCBI, and found that the NCBI taxonomy was larger and more diverse, and should be used as a common framework when comparing analyses performed on different taxonomic classifications. However, few studies have evaluated the effects of different reference databases.
Covering nearly two thirds of the Earth's surface, the deepsea environment is the largest ecosystem on the planet (Rex, 1981). Most of the deep sea is heterotrophic in the absence of the photosynthesis, which mainly depends on food from the euphotic zone (Ramirez-Llodra et al., 2010). For this reason, deep-sea benthic communities are usually considered to be food limited, due to the low input of detritus from the euphotic zone, which may influence the benthic abundance and diversity (Smith et al., 2008). However, the deep seafloor ecosystems have been poorly studied because of the remoteness and vastness of the deep sea. In recent years, eDNA metabarcoding has been employed to investigate the deep-sea benthic community diversity and patterns (e.g., Guardiola et al., 2015;Fonseca et al., 2017), improving our understanding of the deep-sea eukaryotic ecosystems.
In the present study, the deep-sea benthic eukaryotic communities in the Okinawa Trough, northwest Pacific were characterized by using eDNA metabarcoding with different metabarcoding markers and reference databases. The main aim of this study was to evaluate and compare the taxonomic coverage and species resolution power of different eDNA metabarcoding markers in deep-sea community surveys. A novel 28S metabarcoding primer set targeting metazoans was designed in this study. The performance of this 28S marker and two commonly used 18S rRNA gene markers (V1-2 and V9) were evaluated. The effect of the reference databases (NCBI and SILVA) on the results of eDNA metabarcoding analyses were also discussed.

The 28S Maker Design and in silico PCR Analysis
In present study, a novel primer set for 28S rRNA gene targeting metazoans was designed. A total of 4166 metazoan 28S rDNA sequences were downloaded from the SILVA database 1 . A pipeline of programs, Primer Prospector, was used to screen conserved sites against the alignment of reference sequences and generate de novo universal primers (Walters et al., 2011). The primers were then sorted according to sensitivity, specificity, or degeneracy, and scored by taxonomic coverage, and mismatch counts. Finally, a primer pair which generates amplicon lengths of ∼400 bp with the lowest mismatch rate and highest coverage (both hit >75% of the 4166 sequences) was selected (1096F and 1535R) ( Table 1). The length of the amplified fragment is suitable for the Illumina MiSeq platform.
To compare the performance of different primer sets, the novel 28S primers and two previously published primer sets targeting the V1-2 (Blaxter et al., 1998) andV9 (Amaral-Zettler et al., 2009) hypervariable regions of the 18S rRNA gene were evaluated by in silico PCRs. All the primers are provided in Table 1. The primer set SSU_F04 and SSU_R22 targeting the ∼400 bp V1-2 region of the 18S rRNA gene (Blaxter et al., 1998) has been widely used in meiofaunal community surveys (e.g., Fonseca et al., 2010). The universal eukaryotic primers 1380F and 1510R were used to amplify a short fragment (∼130 bp) of the 18S V9 region (Amaral-Zettler et al., 2009), which can address general questions of eukaryotic biodiversity over extensive taxonomic and ecological scales (De Vargas et al., 2015). The TestPrime 1.0 2 was applied to check the performance of the three primer pairs, by using the non-redundant SILVA SSU Reference database (release SSURef 138.1 NR) for the 18S regions and the LSU database (release LSURef 138.1 NR) for the 28S region, and allowing 1 mismatch and 5-bases of 0-mismatch zone at 3 end (Klindworth et al., 2013).

Sampling and DNA Extraction
Six sediment samples were collected from the Okinawa Trough with a box corer (50 × 50 cm) during a scientific cruise of R/V Xiangyanghong 20 in 2016 (Supplementary Figure 1). The top ∼5 cm of the sediment was subsampled by a push core of 7-cm diameter from the box corer, and stored at −20 • C. Then the six samples were stored at −80 • C in the laboratory until DNA extraction. The depths of the six sites ranged from ∼550 to 1400 m. Prior to DNA extraction, the samples were thawed at 4 • C overnight. The total DNA of each sample was extracted from ∼10 g sediment with the PowerMax Soil DNA Isolation Kit (MO BIO Laboratories, Inc., United States) following the manufacturer's instructions.

PCR and Sequencing
The three markers (28S, 18S V1-2, and 18S V9) were used to amplify eukaryotic communities by PCR reactions. Each PCR was conducted in a volume of 20 µL. The PCR reaction consisted of ∼10 ng template DNA, 2 U TransStart Fastpfu DNA Polymerase (TransGen Biotech, China), 1× FastPfu Buffer, 0.25 mM dNTPs, 0.2 mg/ml bovine serum albumin (BSA), and 0.4 µM of each primer. Thermal cycling conditions included an initial denaturation step at 95 • C for 3 min, followed by 37 cycles of 95 • C for 30 s, 55 • C for 30 s, and 72 • C for 45 s, and a final extension at 72 • C for 10 min. Three replicate PCRs were amplified from the same DNA template and pooled together to reduce the effect of stochastic PCR amplification. The high-throughput sequencing for the 18S V1-2 and 28S markers was performed on the Illumina MiSeq platform (300 bp paired-end) by Majorbio Co., Ltd. (Shanghai, China). Due to the technological problems, the sequencing for the short 18S V9 marker was conducted on the Illumina HiSeq 2500 platform (PE250) by Novogene Co., Ltd. (Beijing, China). Demultiplexed raw reads have been deposited in the NCBI Sequence Read Archive database (BioProject PRJNA658834 with BioSample accession numbers SAMN16072347-SAMN16072351 and SAMN17126436).
For comparison between different markers and reference databases, two data sets were analyzed. One included the 28S, 18S V1-2, and 18S V9 against the NCBI nt reference database (Data Set 1), and the other included 18S V1-2 and 18S V9 against the SILVA reference database (Data Set 2). Rarefaction curves were estimated with Mothur (Schloss et al., 2009) to evaluate the sequencing effort. To minimize bias due to sequencing effort, each data set was subsampled down to the size of the smallest sample.
Dissimilarities between eukaryotic diversity obtained from different metabarcoding markers were visualized by using the non-metric multidimensional scaling (NMDS) analysis. The analysis was performed at the OTU level, with the abundancebased Bray-Curtis index and presence/absence-based Jaccard index, respectively. All the statistical analyses were performed on the online platform of Majorbio Cloud Platform 3 .

In silico Analysis
The results of the in silico analysis performed against the nonredundant SILVA SSU/LSU databases are shown in Table 2. No Archaea and some Bacteria sequences were amplified by the three primer sets. For Eukaryota, the primer set SSU_F04/SSU_R22 for the 18S V1-2 region had the most number of hits (16,308), followed by the 28S primer set 1096F/1535R, whereas only 495 hits were detected for the 1380F/1510R targeting the short 18S V9 region. The 28S primer set had higher amplification efficiency in recovering Eukaryota (74.5%) than the 18S V1-2 primers (53.0%), whereas the 18S V9 primers poorly recovered Eukaryota (1.8%). Similarly, amplification efficiency found for the three primer pairs in recovering Metazoa was 89.6% for the 28S, 61.1% for the 18S V1-2, and 1.2% for the 18S V9.

Sequence Data
A total of 1,355,328 raw paired-end reads were generated, with 357,214 for 28S, 439,212 for 18S V1-2, and 558,902 for 18S V9 (Supplementary Table 1). Rarefaction curves of the three markers showed that the sequencing depths were sufficient to recover the eukaryotic diversity for 28S and 18S V1-2, but insufficient for 18S V9 despite a deeper sequencing effort (Figure 1).
For Data Set 1, 185,994 sequences were obtained for each of the three metabarcoding marker (30,999 for each sample) after filtering and subsampling (Supplementary Table 2). No Archaea and Bacteria sequences were detected by 28S and 18S

Taxonomic Assignment
For Data Set 1, 531 OTUs were identified for 28S, 1352 for 18S V1-2, and 3788 for 18S V9 (Table 3). These OTUs were assigned to 27, 30, and 39 phyla for the three metabarcoding markers, respectively ( Table 3). The number of families, genera, and species identified by 18S V1-2 was slightly higher than 28S, but were lower than 18S V9 (Table 3). In total, 44 phyla were identified by the 3 markers, with 21 sharing among different markers (Figure 2). At the family level, 53 were shared among them (total 758), and only 12 of 1428 species were shared among them (Figure 2). Furthermore, the NMDS analysis also supported that different metabarcoding markers generated distinct profiles of benthic eukaryotic diversity from the same sediment samples, with the abundance or presence/absence data (Figure 3). For each metabarcoding marker within Data Set 1, the abundance of metazoans was 29.06% for 28S, 26.89% for 18S V1-2, and 10.02% for the 18S V9 (Supplementary Table 3). Number of taxonomic ranks (phylum, family, species, and OTU) for 18S V9 was higher than 18S V1-2 and 28S, and the number of these taxa for 18S V1-2 was comparable or slightly higher than 28S.
For Data Set 2, 41 phylum were determined for 1361 OTUs by 18S V1-2, and 47 phylum for 3928 OTUs by 18S V9 (Table 3). Similarly, much more families, genus, and species were determined by 18S V9 than 18S V1-2 ( Table 3). In total, 50 phyla were identified by the 2 markers, with 38 shared between them (Supplementary Figure 2). A total of 187 families, and 238 species were shared between the 2 markers (Supplementary Figure 2). For 18S V1-2 and 18S V9, taxonomic ranks (i.e., phylum, family, and species) were compared between the two data sets based on different reference databases (Supplementary Figure 3). The number of phyla identified by the NCBI nt database was slightly fewer than the SILVA database for both markers. Comparable number of families were recognized by using the two databases for the 18S V1-2 gene marker, while the number of families recovered by NCBI for 18S V9 was about twice that by SILVA. More species were determined by SILVA for 18S V1-2, but more were identified by NCBI for 18S V9 (Supplementary Figure 3). In addition, a small proportion of families or species were shared between the two databases, and even no more than 27% phyla were shared between different databases for both 18S V1-2 and 18S V9 (Supplementary Figure 3).

Metazoan Communities in the Okinawa Trough
Combining the three markers based on NCBI nt database (Data Set 1), 22.0% of all the sequences were Metazoa, while 26.9% sequences were not assigned to any taxonomic ranks. Similarly, 21.2% of sequences in Data Set 2 (18S V1-2 and V9 against the SILVA database) were Metazoa, but only 3% of sequences were not assigned to any taxonomic ranks. Because metazoan families were not well recovered in Data Set 2, here we only presented the results from Data Set 1 (Figure 4). The most abundant (44.9%) phylum found in the sediments of the Okinawa Trough was Annelida, followed by Cnidaria (25.4%) and Nemertea (13.0%). Phyla such as Mollusca, Nematoda, Chordata, Arthropoda, Hemichordata, and Entoprocta also showed abundance ≥1%. Annelida, Cnidaria, and Arthropoda were the most diverse groups with 179, 120, and 76 OTUs, respectively. At the family level, 65 families were determined for Cnidaria, 52 families for Arthropoda, and 49 families for Annelida. In addition, 40 families were found for Mollusca, and 24 families for Nematoda. It was noted that only eight families were found for the relatively abundant phylum Nemertea.
Metazoan groups were also detected by each of the three markers (Data Set 1). The 28S and 18S V1-2 markers revealed similar metazoan communities and abundance (Supplementary  Figure 4). Annelida (∼50%), Cnidaria (20%), and Nemertea (∼13%) were the most abundant phyla among all the metazoan groups, and Annelida was also the most diverse group in terms of the family and OTU levels. However, the 18S V9 showed that Cnidaria was the most abundant (49%), and had the greatest number of families (n = 53). The second most abundant phylum was Annelida (14.4%), which was also one of the most diverse groups with 36 families and 81 OTUs.

Different Markers in Estimating the Deep-Sea Benthic Diversity
Given the importance of marker choice in eDNA matabarcoding studies, it is critical to evaluate the performance of different markers used in the diversity surveys based on the eDNA, because different markers could perform differently in assessing eukaryotic diversity (e.g., Tanabe et al., 2016;Zhang et al., 2020). Although many metabarcoding studies used one or more gene markers to estimate eukaryotic diversity, few have investigated the performance of different markers in assessing diversity of the poorly understood deep-sea ecosystem. In the present study, we provided a novel 28S rRNA gene marker targeting the metazoans, and presented the evaluation of this and two commonly used 18S rRNA gene markers (18S V1-2 and 18S V9) in assessing the benthic diversity of the deep-sea benthic communities by metabarcoding analyses.
A broad taxonomic coverage and high taxonomic resolving power are crucial when choosing markers in metabarcoding studies. In this study, the in silico analysis showed that primer sets 1096F/1535R for the 28S gene marker had the highest amplification efficiency in recovering Eukaryota and Metazoa, followed by the SSU_F04/SSU_R22 for the 18S V1-2 marker ( Table 2). The metabarcoding analyses showed that a wide range of eukaryotes (Table 3) and metazoans (Supplementary Table 3) were detected by the three markers, suggesting their utility in eukaryotic and metazoan diversity assessment. The 18S V9 displayed the best performance since it recovered the greatest number of taxa in both Data Sets 1 and 2. Furthermore, 18S V9 had the highest sequence variability while 28S had the lowest variability, although amplicons of 18S V9 (∼130 bp) were shorter than the other two markers (∼400 bp). This may partially explain the observation that although the sequencing depth of 18S V9 was deeper than 28S and 18S V1-2, 18S V9 appeared not to be sufficient to recover the benthic diversity ( Figure 1). Overall, 18S V9 outperformed the 18S V1-2 and 28S gene markers in metabarcoding analyses, suggesting that this marker is probably more suitable for metabarcoding the deepsea eukaryotic or metazoan communities. However, it should be noted that although markers with shorter amplicons are more preferred for degraded eDNA due to higher amplification success rates (Meusnier et al., 2008), targeting larger DNA fragments will favor the amplification of DNA from living or recent dead individuals in the environment (Sinniger et al., 2016).
The primer set 1380F/1510R for 18S V9 had the lowest amplification efficiency among the three primer sets in the in silico analysis (Table 2), however, more eukaryotic and metazoan taxa were detected by 18S V9 than 28S and 18S V1-2 in the metabarcoding analyses ( Table 3). The contrast performance of the in silico analysis and the metabarcoding analyses for the markers may be caused by many reasons such as different taxonomic compositions between the reference database and the actual biological community (Zhang et al., 2020). In this study, more matched sequences (4785 for Eukaryota; 2849 for Metazoa) were generated and sequence coverage increased to 9.3% for Eukaryota and 14.8% for Metazoa when five mismatches were allowed in the in silico analysis (data not shown). Thus, the low efficiency of 18S V9 in the in silico analysis is most likely due to the poor representation of eukaryotic/metazoan sequences for this gene region in the current SILVA SSU reference database. Our results agreed with findings from previous studies that the in silico analysis can provide an initial assessment of the primers, but the metabarcoding analyses in an actual metabarcoding study are more crucial to understand the primer performance (Zhang et al., 2020).
Although a broad range of taxa were detected, it was observed that only a small proportion of taxa were shared among the three markers (Figure 2). Even at the phylum level, no more than 50% phyla were detected by all the three markers. Furthermore, the NMDS analysis also supported that communities detected by the three markers were distinct from each other (Figure 3). The results are consistent with findings in previous metabarcoding studies that evolutionary independent markers could detect different taxonomic groups (Tanabe et al., 2016;Zhang et al., 2018Zhang et al., , 2020. Thus, a combination of multiple markers is recommended to increase taxonomic coverage in the metabarcoding surveys.

Choice of the Reference Database
In the present study, both the NCBI nt and SILVA databases were used for the taxonomic assignment for 18S V1-2 and 18S V9. Different communities were resolved by different databases (Supplementary Figure 3). Even at the phylum level, no more than 27% phyla were shared between the two databases for the two markers. Perhaps it was not clear which database is the best choice for assessing benthic diversity, because a greater number of species was recovered in the SILVA database with 18S V1-2, while in NCBI with 18S V9 (Supplementary  Figure 3). As reported by a recent study, the NCBI taxonomy appeared to be more comprehensive than SILVA (Balvociute and Huson, 2017). These results indicate that both databases may be far less complete for the deep-sea eukaryotic communities, and that the taxonomic classifications of both databases are probably not well compatible. Currently, most metabarcoding studies used one of the databases (Fonseca et al., 2010;Chain et al., 2016;Djurhuus et al., 2020), and it seems that few studies have compared the effects of different databases on metabarcoding analyses. In addition, it was noticed that taxonomic classifications were incomplete in both databases. For example, some OTUs assigned to a family, genus, and species were not assigned to a order, class, or phylum. Moreover, about 45% OTUs against the NCBI nt database, and about 24% OTUs against the SILVA database were not assigned to any taxonomic ranks, suggesting that a substantial amount of   diversity are unknown in terms of the 18S rRNA sequences within both databases. Considering the importance of the reference database, the update of taxonomic classifications of each database is necessary to improve the taxonomic resolution in metabarcoding analyses.

Application of eDNA Barcoding to Assess Metazoan Diversity in the Deep-Sea Sediments
As the largest ecosystem on the planet, the deep-sea environment harbors high diversity but is poorly understood (Ramirez-Llodra et al., 2010;Woolley et al., 2016). In this study, the metabarcoding analyses of the 28S and 18S V1-2 markers both showed that the annelids were the most abundant and diverse metazoan group while cnidarians were the most abundant and diverse as revealed by 18S V9 (Supplementary Figure 4). Based on the combination of the three metabarcoding markers, annelids were found to be the most abundant (44.9%) and diverse (179 OTUs) metazoan group in the sediment of the Okinawa Trough, followed by cnidarians (Figure 4). Interestingly, some less abundant phyla were also found to be highly diverse, e.g., Arthropoda (abundance 2.2%) with 52 families and 76 OTUs. The findings here were inconsistent with many previous studies based on traditional methods which found that nematodes were the dominant metazoans in the marine sediments (Soltwedel, 2000;Neira et al., 2001;Danovaro et al., 2002;Sajan et al., 2010;Leduc et al., 2016). Recent studies using the metabarcoding method revealed that other taxonomic groups such as Annelida or Arthropoda were dominant with high abundance and diversity in some deep-sea regions (Bik et al., 2012b;Guardiola et al., 2015). However, other metabarcoding studies have also found nematodes as the most diverse metazoans in the deep-sea sediment (Guardiola et al., 2016;Sinniger et al., 2016). The inconsistency among these studies suggest that traditional taxonomic methods are to some extent limited when assessing the deep-sea diversity, especially for the microbial eukaryotes, and that different eDNA metabarcoding markers may resolve different eukaryotic communities in different deep-sea regions. Nonetheless, the metabarcoding method is a powerful tool to estimate the deep-sea biodiversity, and moreover, multiple markers are recommended during the metabarcoding analyses to obtain a full picture of the deep-sea diversity.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: NCBI (accession: SAMN16072347-SAMN16072351 and SAMN17126436).

AUTHOR CONTRIBUTIONS
JL and HZ conceived the study. JL performed the lab work, analyzed the data, and wrote the manuscript. HZ designed the 28S rRNA gene marker and reviewed the manuscript. Both authors read and approved the final manuscript.

FUNDING
The study was supported by National Key R&D Program of China (2017YFC0306604 and 2016YFC0304905), the major scientific and technological projects of Hainan Province (ZDKJ2019011), and the Strategic Priority Research Program of the Chinese Academy of Sciences (CAS) (XDA22040502).