Eukaryotic Biodiversity and Spatial Patterns in the Clarion-Clipperton Zone and Other Abyssal Regions: Insights From Sediment DNA and RNA Metabarcoding

The abyssal seafloor is a mosaic of highly diverse habitats that represent the least known marine ecosystems on Earth. Some regions enriched in natural resources, such as polymetallic nodules in the Clarion-Clipperton Zone (CCZ), attract much interest because of their huge commercial potential. Since nodule mining will be destructive, baseline data are necessary to measure its impact on benthic communities. Hence, we conducted an environmental DNA and RNA metabarcoding survey of CCZ biodiversity targeting microbial and meiofaunal eukaryotes that are the least known component of the deep-sea benthos. We analyzed two 18S rRNA gene regions targeting eukaryotes with a focus on Foraminifera (37F) and metazoans (V1V2), sequenced from 310 surface-sediment samples from the CCZ and other abyssal regions. Our results confirm huge unknown deep-sea biodiversity. Over 60% of benthic foraminiferal and almost a third of eukaryotic operational taxonomic units (OTUs) could not be assigned to a known taxon. Benthic Foraminifera are more common in CCZ samples than metazoans and dominated by clades that are only known from environmental surveys. The most striking results are the uniqueness of CCZ areas, both datasets being characterized by a high number of OTUs exclusive to the CCZ, as well as greater beta diversity compared to other abyssal regions. The alpha diversity in the CCZ is high and correlated with water depth and terrain complexity. Topography was important at a local scale, with communities at CCZ stations located in depressions more diverse and heterogeneous than those located on slopes. This could result from eDNA accumulation, justifying the interim use of eRNA for more accurate biomonitoring surveys. Our descriptions not only support previous findings and consolidate our general understanding of deep-sea ecosystems, but also provide a data resource inviting further taxon-specific and large-scale modeling studies. We foresee that metabarcoding will be useful for deep-sea biomonitoring efforts to consider the diversity of small taxa, but it must be validated based on ground truthing data or experimental studies.

The abyssal seafloor is a mosaic of highly diverse habitats that represent the least known marine ecosystems on Earth. Some regions enriched in natural resources, such as polymetallic nodules in the Clarion-Clipperton Zone (CCZ), attract much interest because of their huge commercial potential. Since nodule mining will be destructive, baseline data are necessary to measure its impact on benthic communities. Hence, we conducted an environmental DNA and RNA metabarcoding survey of CCZ biodiversity targeting microbial and meiofaunal eukaryotes that are the least known component of the deep-sea benthos. We analyzed two 18S rRNA gene regions targeting eukaryotes with a focus on Foraminifera (37F) and metazoans (V1V2), sequenced from 310 surfacesediment samples from the CCZ and other abyssal regions. Our results confirm huge unknown deep-sea biodiversity. Over 60% of benthic foraminiferal and almost a third of eukaryotic operational taxonomic units (OTUs) could not be assigned to a known taxon. Benthic Foraminifera are more common in CCZ samples than metazoans and dominated by clades that are only known from environmental surveys. The most striking results are the uniqueness of CCZ areas, both datasets being characterized by a high number of OTUs exclusive to the CCZ, as well as greater beta diversity compared to other abyssal regions. The alpha diversity in the CCZ is high and correlated with

INTRODUCTION
The deep seabed encompasses a vast mosaic of poorly sampled habitats, many of them characterized by fine-grained sediments that, for more than 50 years, have been known to host surprisingly high levels of biodiversity (Hessler and Sanders, 1967;Ramirez-Llodra et al., 2010;Rex and Etter, 2010). It conceals a variety of material resources, services and history that are of cultural value to human societies (Wenhai et al., 2019;Turner et al., 2020) as well as ecosystems of considerable and growing economic importance (Armstrong et al., 2012;Thurber et al., 2014). However, estimating and conserving deep-sea biodiversity remain major challenges , with many of the species found in deep-sea samples being undescribed (Glover et al., 2018). In some groups, this novelty is often at high taxonomic levels . Alongside biodiversity, understanding biogeographic patterns of benthic species on the ocean floor has been a long-standing and fundamental concern in deep-sea biology (Ekman, 1953;Vinogradova, 1997). In some cases, species horizontal ranges appear to be wider in the deep sea, particularly at abyssal depths, than in shelf and coastal waters (McClain and Hardy, 2010;, as landscapes of overlooked habitats such as bedrock offer corridors for dispersal (Riehl et al., 2020). However, the sheer scale of undescribed deepsea biodiversity, combined with the rarity of many species and the vast spatial extent and chronic under-sampling of the deep seabed, make it very difficult to establish geographical ranges and the prevalence of endemicity, particularly for species that are small in size.
Previous attempts to tackle deep-sea biodiversity and biogeography have been based mainly on morphological analyses, with a bias toward larger, more conspicuous animals (Higgs and Attrill, 2015). These issues have been explored using metabarcoding (Taberlet et al., 2012), which is mainly based on environmental DNA (eDNA) and thus rather biased toward small, more inconspicuous taxa. Metabarcoding studies targeting the DNA of bacteria (Zinger et al., 2011), Foraminifera  as well as protists and meiofaunal animals (Bik et al., 2012;Guardiola et al., 2015;Sinniger et al., 2016) were conducted at global scale in the deep sea, but with relatively few samples. Evidence for DNA preservation, notably in the deep-sea sediment (Lejzerowicz et al., 2013a;Corinaldesi et al., 2018), justifies the use of environmental RNA (eRNA). Indeed, eRNA has been proposed as a proxy for active species in various environments (Logares et al., 2012;Adamo et al., 2020;Giner et al., 2020). It yields compositions that differ from eDNA in deep-sea sediments (Guardiola et al., 2016) and may therefore be more useful for interpreting biodiversity and biogeographic patterns. Yet further understanding of deep-sea processes can be obtained using metabarcoding, since this environment offers ideal conditions for the preservation of biomolecules, including eRNA which may persist in complex forms (Cristescu, 2019;Wood et al., 2020).
Besides fundamental methodological issues, the use of metabarcoding has practical benefits in the context of current and likely future human impacts on ocean-floor communities (Glover and Smith, 2003;Thiel, 2003;Levin et al., 2020a;Le et al. in revision), combined with the growing and far-reaching effects of global climatic changes (Levin and Le Bris, 2015;Danovaro et al., 2017;Sweetman et al., 2017;Morato et al., 2020), which will alter areas targeted for deep seabed mining (Levin et al., 2020b). The largest proportion of our samples was obtained in the eastern equatorial Pacific Clarion-Clipperton Zone (CCZ). This area lies beyond national jurisdictions  and hosts vast deposits of polymetallic nodules (Hein et al., 2013;Petersen et al., 2016) that are the focus of an emerging seabed mining industry regulated by the International Seabed Authority (ISA). These activities will impact benthic ecosystems in a variety of ways (Levin et al., 2016) and likely lead to the irreversible loss of biodiversity (Niner et al., 2018), at least locally (Vonnahme et al., 2020). The ISA requires contractors licensed for mineral exploration and prospecting within the CCZ to conduct baseline surveys of benthic biodiversity, and subsequent exploitation contracts will require monitoring of mined areas to evaluate environmental impacts. Metabarcoding in the CCZ has been restricted to eDNA. It has been used to document prokaryotes in nodules (Blöthe et al., 2015) and in comparison with surrounding sediments and overlying water (Lindh et al., 2017(Lindh et al., , 2018, whereas eukaryotic studies pointing to the importance of habitat diversity and eDNA transport (Laroche et al., 2020a,b) were limited to the Western CCZ. No comparison with other abyssal regions was yet attempted.
Here, we perform eDNA and eRNA metabarcoding on 310 deep-sea sediment samples from the Eastern CCZ and other abyssal regions, including Northwest Pacific, Southern Ocean and several Atlantic regions in order to identify biogeographic patterns at larger scales and their relation to key environmental parameters at local scales, notably topographic heterogeneity. We rely on two 18S rRNA gene markers to reconstruct the taxonomic composition, diversity and distribution of benthic Foraminifera (37F marker) and Metazoa (V1V2 marker) that live or have lived in the Eastern CCZ region and explore some basic questions concerning: (1) the composition of sediment-dwelling eukaryotic assemblages at abyssal depths in the oceans; (2) levels of alpha diversity at particular abyssal sites and how they compare over different spatial scales; and (3) global (between oceans) to local (between stations) biogeographic patterns based on beta diversity turnover and dispersion. We rely on compositionally aware techniques that are necessary for omics data analysis, and on phylogenetic metrics to reflect on the evolutionary relatedness of communities sampled from eDNA or eRNA, that we systematically compare for more insightful interpretation. We discuss our conclusions in the context of recent metabarcoding and eco-evolutionary studies in the CCZ, in order to understand how environmental molecular data can be applied in future baseline surveys and the importance of small-sized organisms for deep-sea management and conservation in the face of rising pressures to mine minerals from the abyssal seafloor.

Description of the Deep-Sea Sediment Sampling and Environmental Variable
Deep-sea sediments were collected at abyssal depths during eight expeditions to the Atlantic, Southern and Pacific Oceans (Figure 1). The largest number of samples originated from the Clarion-Clipperton Zone (CCZ) in the eastern Equatorial Pacific. They were taken mainly at the eastern end of this region, during the MANGAN'16 cruise in the BGR exploration license area and during the second ABYSSLINE cruise (AB02) in Stratum B (one of two 30 × 30 km study areas in which most samples were collected) of the UK-1 license area (hereafter called UK-1B) and a similar stratum within the OMS area. These two AB02 strata have very different topographies: UK-1B has approximately ten volcanic seamounts and hills on the abyssal plain, whereas the OMS stratum is much flatter with low, gently undulating abyssal ridges and troughs. Some samples were taken in the French (IFREMER) license area, located further to the west although still in the eastern half of the CCZ, during the BIONOD cruise. Sterile or disposable spoons were used to subsample the top surface sediment centimeter (ca. 2 g) according to a nested sampling design: up to three replicates per core; up to two cores per coring device deployment, and up to three deployments per station that were grouped into 16 areas and six larger regions (Table 1). Approximately 2 ml of fresh surficial sediments were collected with a multiple corer and placed in sterile tubes at −80 • C with (VEMA, SYSTCO II, KuramBio I, MANGAN'16, and AB02) or without (MSM39, DIVA-3, and BIONOD) LifeGuard (Qiagen) Preservation Solution (6 ml) before shipment to the University of Geneva (Switzerland). For each station, the latitude and longitude were used to obtain environmental data layers presented in Howell et al. (2016), including seafloor slope (in degrees) and Bathymetric Position Indices (BPI, Broad, and Fine). A positive BPI value indicates a location higher than its surroundings (e.g., ridges) whereas a negative value indicates features such as valleys and troughs (Weiss, 2001).

Nucleic Acids Extraction, PCR Amplification and High-Throughput Sequencing
For each of 310 sediment samples, the total RNA and DNA contents of ca. 2 g of material were extracted as in Lejzerowicz et al. (2015) and cDNA was generated from DNase-treated RNA as in Pawlowski et al. (2014). It was ensured that no DNA molecules carried over into the RNA extracts based on the absence of PCR amplification with each primer set and after 60 cycles. From each sample, approximately 21 ng of DNA and cDNA extract were incorporated in 30-µl volume PCR amplifications performed in duplicate for each of the two 18S rRNA gene amplification primers pairs targeting Foraminifera and Eukaryota (Supplementary Table 1 and Supplementary Material): the 37F hypervariable region using primers s14F1 and s15 , and the V1V2 region using R22mod and F04 (Sinniger et al., 2016). Forward and reverse primers tagged at their 5 -ends with 8-nt tags were used in order to pool different amplicons to the same libraries and demultiplex the resulting reads to samples. The tagged primer combinations and their dispatching to libraries were designed to avoid and detect sequence-to-sample misidentifications due to mistagging  and to balance the base composition of the first sequenced positions (Illumina template cycles, see Mitra et al., 2015). The PCR duplicates of a given sample were amplified using the same tagged primer combinations, but pooled in different sequencing libraries in order to be able to analyze them separately. In total, 1,998 PCR products were mixed into 51 pools (on average: 44 ng per PCR; 39.2 PCRs per pool). The pools were transformed into sequencing libraries using the TruSeq Nano PCR-free Library Preparation Kit, which were Illumina-sequenced on two HiSeq 2500 runs (2 lanes; 36 libraries) and 5 MiSeq runs (5 lanes; 15 libraries). Strict procedures were respected to limit extraneous and cross contamination: (i) maximum 12 samples per session (one PCR session included the four markers and duplicates), (ii) separate laboratory spaces for no-DNA (PCR preparation), low-DNA (extraction and DNA/RNA/cDNA handling) and high-DNA (post-PCR) concentrations, (iii) lab coat and gloves worn at all time, (iv) cleaning with DNA AWAY TM Surface Decontaminant (Thermo Fisher Scientific), and (v) wiping of each pipet tip on absorbent paper to avoid releasing the aerosols of primer, DNA or PCR products before gentle tip disposal. Technical information for each sample along with environmental variables and metadata information required by the MIMARKs standards (Yilmaz et al., 2011), including sequence frequencies after each filtering step (see next section), are available in Supplementary Table 2. Data and

Bioinformatic Sequence Data Processing
HiSeq sequencing generated ambiguous bases at a few positions corresponding to the amplification primer sequences (i.e., after the tags; Supplementary Figure 1). A python script was used to correct these ambiguities if their number corresponded to the hamming distance with the closest primer sequence. The corrected paired-end fastq files of each library were processed to (i) demultiplex sequences to samples using DTD (Dufresne et al., 2019) 2 , which allows the counting of sequences associated with unexpected primer combinations (or mistags, see , (ii) quality-filter and merge paired reads using pandaseq 2.11 (options -t 0.6 -A pear -T 24 -o 30) (Masella et al., 2012), (iii) filter mistags out of each sample (see Esling et al., 2015, Supplementary Material), (iv) dereplicate sequences using mothur (Schloss et al., 2009) and remove the resulting unique sequences that are singletons or that occur in only one of the two PCR replicates, (v) cluster sequences into operational taxonomic units (OTUs) to reduce diversity inflation due to PCR and sequencing errors using swarm v3 (option -f) , (vi) filter potentially chimeric sequences using UCHIME (Edgar et al., 2011) in de novo mode as implemented in QIIME2 (Bolyen et al., 2019), and (vii) remove samples with less than 3,000 reads or less than 10 OTUs. 1 https://figshare.com/account/home#/projects/98972 2 github.com/yoann-dufresne/DoubleTagDemultiplexer

Reference Databases and Taxonomic Assignments
For the taxonomic assignment of foraminiferal OTUs, an alignment of 3,145 18S rRNA gene sequences belonging to classes Monothalamea, Globothalamea, and Tubothalamea as well as planktonic isolates was curated (PRF 2 , Morard et al., 2015). Each sequence entails the hypervariable 37F region that has proved powerful for foraminiferal species identification (Pawlowski and Lecroq, 2010;Lecroq et al., 2011;Lejzerowicz et al., 2014), is flanked by at least 50 and 150 nucleotides at its 5 and 3 ends, and contains no ambiguous nucleotide (N) or species-level duplicate. A phylogeny built using RAxML v8.2.12 (1,000 bootstraps, GTRGAMMAI model; Stamatakis, 2014) was manually inspected using EMPress (Cantrell et al., 2020) to identify and remove rotaliid sequences branching with monothalamids, in contradiction with the current foraminiferal phylogeny (Holzmann and Pawlowski, 2017; Supplementary  Figure 2). The resulting 3,002 18S fragments were harmonized to six taxonomic levels (class; order; family; genus; species; and isolate) in order to serve as a reference database.
For the taxonomic assignment of Eukaryota OTUs, both the Protist Ribosomal Reference database (PR 2 , Guillou et al., 2013) and SILVA 138.1 (SSU Ref NR 99, Quast et al., 2013) were used, without Insecta and harmonized to the eight taxonomic levels of the PR 2 taxonomic framework (kingdom; domain; phylum; class; order; family; genus; species). Although the taxonomic ranks from PR 2 were employed, SILVA was kept because PR 2 is biased toward protists whereas SILVA remains a generalist database useful for Metazoa assignments. It is important to note that the PR2 taxonomy does not include Xenacoelomorpha, which in this database is not distinguished from Platyhelminthes (Ruiz-Trillo et al., 1999). Each OTU representative sequence was assigned using four different methods, including (i) vsearch 2.7.0 (Rognes et al., 2016) and (ii) BLAST+ (Camacho et al., 2009) both as implemented in QIIME2, as well as (iii) IDTAXA (Murali et al., 2018) and (iv) SINTAX (Edgar, 2016). For every individual method, the deepest assignment associated with a probability of at least 0.9 was retained. Finally, OTUs were classified to the last common ancestor found by at least two out of the four assignment methods, and also across the two databases for Eukaryota OTUs.

Metazoa Novelty
A V1V2 subset composed of OTUs assigned to Metazoa was created for comparison with the OTUs reported in Sinniger et al. (2016). From the corresponding author, the Metazoa OTU read counts table (17 samples) was obtained, as well as their aligned representative sequences (8,517) and associated phylogeny, which was used as a guide tree for RAxML (GTRCAT model) to build a phylogeny suitable for phylogenetic sequence placement with SEPP (Mirarab et al., 2012). Then, our homologous V1V2 OTU sequences were placed onto this tree and their prevalence in CCZ samples was visualised in EMPress (for either or both types of molecule). In order to evaluate the quality of the placements, five SEPP repetitions were performed, using five different placement sizes (parameter "-P").
In parallel, our OTU sequences were co-clustered with the OTU sequences from Sinniger et al. (2016) in order to evaluate the amount of metazoan novelty in a deep-sea benthic sequencing survey. Two to 17 samples were randomly subsampled from Sinniger et al. (2016) and OTUs re-replicated into reads (following table counts) that were concatenated with a similar number of unique sequences per OTU randomly sampled from our study, using the same number of samples and the same reads proportions. Clustering was then performed using swarm v3 (as above) on the concatenated sequences of each sampling size and the proportion of resulting clusters containing sequences from both studies was measured (or "co-clusters"). Each random sub-sampling of sequences from 2 to 17 samples was repeated one hundred times. The whole procedure was also repeated for each of three stations of Sinniger et al. (2016) from which were generated 2-6 samples in this study.

Alpha Diversity
Phylogenetic placement of the 37F OTUs sequences onto the foraminiferal phylogeny (without planktonic sequences) and of the V1V2 OTUs onto the SILVA v128 phylogeny precomputed for use with SEPP was performed in QIIME2 (Janssen et al., 2018). QIIME2 was used to calculate the Faith's Phylogenetic Diversity (PD) index (Faith, 1992) based on these trees, in addition to the Shannon's entropy index for alpha diversity analyses. The Faith's PD metric computes the amount of root-to-tip branch length on the phylogeny represented by each community's OTUs, but it does not account for the number of reads associated with these OTUs. Hence, the Shannon's entropy index was also calculated as it incorporates relative read abundances. Both indices were computed after rarefaction to 3,000 reads per sample. For each marker, the difference between regions, areas, stations and coring device deployments was tested based on the Kruskal-Wallis H test for independent samples (Kruskal and Wallis, 1952) using the python library scipy (Virtanen et al., 2020). To satisfy sample independence, tests were performed for each set of replicates and for DNA and RNA separately, and p-values were adjusted using the python library statsmodels for false discovery rate control on multiple testing (Benjamini and Hochberg, 1995). Post-hoc pairwise comparisons between individual regions and areas were performed using the Faith's PD index (after Kruskal-Wallis H tests results showed significant differences at these scales; see section "Results") implemented in the R package conover.test (Conover and Iman, 1979). For these geographic scales, the difference in alpha diversity between DNA and RNA samples was also assessed using paired Kruskal-Wallis tests (Bonferroni adjusted per scale). Finally, the correlation between the Faith's PD index and each environmental variable was measured using Spearman's rho with scipy.
The OTU richness found exclusively at intersections of areas was measured across the 37F benthic Foraminifera and V1V2 Eukaryota datasets, as well as at intersections of stations within the CCZ UK-1B and OMS strata. The R package upSetR (Conway et al., 2017) was used to also report the number of sequence reads associated with each of the 30 most-important intersections in terms of OTU richness.

Beta Diversity
Based on placement trees, the Weighted and Unweighted UniFrac distances were computed for beta diversity analyses. These distances measure phylogenetic relatedness by quantifying the amount of unique branch length between communities (Lozupone and Knight, 2005;McDonald et al., 2018b). Metabarcoding and other environmental-sequencing data are compositional, as the true total of each sample is not known (Quinn et al., 2018). Hence, the compositionally robust Aitchison distance was employed to build sample dissimilarity matrices and identify clusters in Robust Principal Component Analysis (RPCA) (Aitchison, 1982(Aitchison, , 1986 using the EMPeror visualization tool in QIIME2 (Vázquez-Baeza et al., 2013). This Aitchison metric includes a centered log-ratio transformation step, as performed in DEICODE, which also includes a matrix completion step for sparse data (Martino et al., 2019). Dissimilarity matrices were also built using the more traditional Jaccard and Bray-Curtis dissimilarity metrics (e.g., Legendre and Legendre, 2012).
These dissimilarity matrices were subject to one-way group comparisons of beta diversity between regions, areas or stations using PERMANOVA, as well as for differences in dispersion within each of these groups using PERMDISP (no group with fewer than 10 samples included, 999 permutations, performed using QIIME2). Additionally, the added beta diversity was evaluated by randomly sampling the matrices 10 times for each of an increasing number of samples and measuring the minimum distance among these samples, thus building dissimilarity-decay curves. These curves are similar to rarefaction curves but performed on samples dissimilarity, providing a way to explore the novelty space of communities, as proposed in Huttenhower et al. (2012, see Figures 3D,E) and McDonald et al. (2018a). The dissimilarity matrices and ordinations obtained before and after the removal of non-metazoan OTUs were also compared using Mantel tests (Mantel, 1967) with QIIME2 and PROTESTs (Jackson, 1995) with the R package vegan (Oksanen, 2011), respectively.

Local-Scale Heterogeneity and Relation to Bathymetry in the CCZ
Local-scale heterogeneity was assessed in terms of community turn-over by measuring the beta-diversity dispersion of each station within the UK-1B and OMS strata of the CCZ. First, RPCA ordination was performed (as above) but only for the CCZ samples and separately for the DNA and RNA versions of these samples. Then, the position of the centroid formed in these RPCA spaces (in three dimensions) by the maximum of six samples (three replicates from each of two sediment cores of the same deployment) was calculated at each station and the distances separating these samples from their centroid were collected. For an illustration, see Supplementary Figure 20 where the samples are colored per station on the RPCAs used to calculate distance-to-centroids. This approach is similar to that proposed using the distance-to-plane in the human gut microbial ecosystem (Halfvarson et al., 2017). We describe the average and standard deviation of these distances-to-centroids at each station on the UK-1B and OMS maps and in relation to bathymetry. Maps and their isodepth contour lines were fetched using the function getNOAA.bathy() (R package marmap, Pante and Simon-Bouhet, 2013) and the Bathymetric Position Index environmental variable was smoothed over space using the function Smooth() (sigma = 0.05, R package spatstat, Baddeley et al., 2015). Finally, the DNA and RNA distributions of these distances to stations' centroids were compared using paired t-tests with the R package ggpubr. All code to perform and reproduce the analyses is available upon request.

Sequence Data and Taxonomic Assignment
Quality, replicates and frequency filtering discarded 13.4 and 26.9% of the 1,158 and 1,226 samples successfully PCR-amplified and sequenced for the 37F and V1V2 marker, respectively, with the more numerous CCZ samples being most affected (Supplementary Figure 3A). We performed analyses of an average of 474.7 ± 44 (standard deviation) samples and 24.4 ± 10.7 million reads per marker and molecule type ( Table 1). The rarefaction depth of 3,000 reads per sample was chosen as a tradeoff between a high number of reads and a high number of samples for analysis, with the final number of samples being 1,003 for 37F and 896 for V1V2 (Supplementary Figure 3B). The sequencing depths obtained across PCR replicates are consistent for each marker and for DNA and RNA with an average Pearson's r correlation of 0.71 (Supplementary Figure 3C, all p-values < 0.001). Sequence clustering across markers and replicates resulted in a total of 12,742 and 43,091 OTUs for the 37F and V1V2 markers, respectively.
For taxonomic assignment of V1V2 OTUs, using PR 2 or SILVA resulted in very similar assignment profiles across taxonomic levels and methods intersections (Supplementary Figure 4). When the four methods agree, it is mainly for non-assignments, which is the case for the majority of OTUs, representing 74.5 and 43.6% of the total number of OTUs for 37F and V1V2, respectively. Yet, up to 619 V1V2 OTUs and almost a hundred 37F OTUs are consistently assigned to the same genus by the four methods. A few OTUs (max. 214) are assigned differently by two pairs of methods ("conflicts"), but are resolved at family (37F) and domain (V1V2) levels (Supplementary Figure 5). Assignments found consistent at the genus level by two or three methods represent 20.5 ± 0.01% of all V1V2 OTUs but only 5.6 ± 0.02% of all 37F OTUs.

High-Level Taxonomic Composition
We summarized the taxonomic composition of benthic foraminifera and eukaryotes in terms of OTU proportions and sequence read relative abundances in each area (Figure 2). The foraminiferal dataset is composed mainly of benthic species (Figures 2A,B). The proportion of planktonic sequences (order Globigerinida) is negligible (0.37% of OTUs, Supplementary  Figures 7, 8A,B). The foraminiferal dataset is dominated by unassigned sequences, both in number of OTUs and abundance of reads (Supplementary Tables 3, 4). The assigned sequences belong mainly to the single-chambered class Monothalamea. Among them, the groups only known from previous environmental foraminiferal metabarcoding surveys ("ENFOR") can represent more than half of the reads assigned in a particular area. On average, the single group ENFOR5 accounts for 13.3 ± 3.1% of the reads in the six CCZ areas. The proportion of multi-chambered Globothalamea is minor in the CCZ, although it is well represented in the Kuril-Kamchatka (Northwest Pacific) and South Georgia (Southern Ocean) areas, where it contributes 24.2 and 13.3% of the reads, respectively. Interestingly, the high-level taxonomic composition of OTUs is very similar in all sampled areas, in contrast to the relative abundance of reads, which are much lower for globothalamiids in the CCZ compared to other regions (Wilcoxon-Mann-Whitney tests on log-ratios, for DNA: U = 2.82 × 10 6 , p-value = 3.27 × 10 −7 ; and for RNA: U = 3.55 × 10 6 , p-value = 2.75 × 10 −14 , Supplementary Figure 9).
The eukaryotic dataset (V1V2 all taxa) is composed mainly of OTUs assigned to Alveolata, Opisthokonta, Rhizaria, and Stramenopiles ( Figure 2C), while Opisthokonta and Rhizaria dominate in terms of read abundances ( Figure 2D). The average proportion of unassigned OTUs per area is 46.7 ± 0.05%, without the Southern Ocean areas, where it is 34.7 ± 0.02% (Supplementary Tables 5-8). The taxonomic profiles are very similar for the whole CCZ region. Overall, about a fourth of the reads (22.6 ± 0.09%) from CCZ belong to Opisthokonta, and these are distributed in 17.2% of the 43,091 OTUs, a proportion similar to that of the supergroup of Rhizaria (15.9% of OTUs). CCZ Opisthokonta includes Metazoa (6.2 ± 0.02% of OTUs), but also Fungi that represent 14.7 and 15.8% of reads in UK-1B and OMS, respectively (Supplementary Figure 6). In the other regions, the proportion of reads is also similar, except for the supergroup of Stramenopiles (dominated by diatoms), which represents a large fraction of the reads in the Kuril-Kamchatka (Northwest Pacific) and South Georgia (Southern Ocean) areas.

Phylogenetic Novelty
To establish biodiversity novelty in the CCZ region we placed the metazoan and foraminiferal datasets in molecular phylogenies (Figures 3, 4). In the case of metazoans, we performed OTU comparisons with the global-ocean study of Sinniger et al. (2016). An evaluation of the quality of this step reveals that for a given OTU, alternative placements associated with a low probability would place this OTU very close to the retained placement node on the tree (Supplementary Figure 10C). Hence, we are confident that our Metazoa placements can be interpreted in the framework of the Sinniger tree. To further measure metazoan novelty, we co-clustered at similar sampling effort the Metazoa OTUs of this study with the OTUs from Sinniger et al. (2016). With all Sinniger's samples included, only 1-2% of the clusters contained sequences from both studies (Supplementary Figure 11A) and the proportions remain low when comparing samples from exactly the same stations (Supplementary Figure 11B).
All metazoan clades recovered in Sinniger et al. (2016) are also found in the present study (Figure 3). The taxon that dominates in all deep-sea samples is the phylum Nematoda, the OTUs of which (in light blue) cover almost half of the tree (Figure 3). Some of them are assigned to species or genus level but many remain at order level. Another frequently occurring group is the Arthropoda (in dark blue), generally assigned to family level. Most of them belong to the class Maxillopoda, which includes the benthic harpacticoid and planktonic calanoid copepods. Next to the arthropods in the tree are annelids (in red) that also represent a highly diversified group, but with few OTUs assigned to genus or species level. The platyhelminthes/xenacoelomorphes (in dark green) branch separately and are assigned principally to the class level. The platyhelminthes/xenacoelomorphes branch closely to several unassigned sequences (in pink), which possibly also belong to this group.
An important proportion of metazoan OTUs are unique to the CCZ (Figure 3, yellow in inner ring). Many of them can be found among annelids and platyhelminthes/xenacoelomorphs. They are represented either in DNA or in RNA, but rarely in both datasets. Only a few OTUs have been found to be highly prevalent in CCZ samples (bars). Most of them belong to nematodes, but there are also a few hydrozoans, annelids, and platyhelminthes/xenacoelomorphes that are highly prevalent in the CCZ.
The phylogenetic placement of Foraminifera OTUs results in quite a different image (Figure 4 and Supplementary Figure 10A), mainly because the reference tree comprises several taxonomic groups represented by shallow-water species. This applies particularly to the class Globothalamea and monothalamid Clade E, as well as the freshwater and soil clades. Compared to Metazoa, many foraminiferal sequences are highly prevalent in CCZ samples (both with DNA and RNA) and notably the environmental clades ENFOR5 and the highly diverse ENFOR1. High prevalence in the CCZ is also observed in Clades V and F. All these clades are dominated by undescribed monothalamous species, often represented by environmental sequences only. On the other hand, the CCZ foraminiferal dataset contains very few OTUs assigned to Clade C, despite the fact that this clade comprises the megafaunal xenophyophores, which are common in the CCZ.

Uniqueness of CCZ Areas
Areas within the CCZ region are distinct from the other regions studied in terms of exclusively shared OTUs, with 1,124 37F OTUs and 2,230 V1V2 OTUs (Figure 5) representing 19.8 and 14% of the CCZ OTUs, respectively. The numerous OTUs shared between UK-1B and OMS, the most-heavily sampled areas, are also rare in terms of sequence read abundances and half of them were detected both using eDNA and eRNA (Figure 5 and  Supplementary Figure 12). Among other regions, only the three Southern Ocean areas share relatively high numbers of OTUs that are unique to this region. This is especially true for the two FIGURE 3 | Phylogenetic diversity and novelty of Metazoa OTUs. The phylogeny from Sinniger et al. (2016) is shown, expanded with V1V2 OTUs placements from this study. Tree branches are colored at the Class and Order levels. Concentric bar plots indicate, for each OTU its assignment depth (innermost ring), whether it is exclusive to the CCZ using both DNA and RNA (large inner ring), using DNA only (middle ring) or using RNA only (outer ring). These OTU-occurrence measures indicate whether a tip corresponds to an OTU found only in the CCZ ("CCZ only", in yellow), to an OTU also found in any other area ("Other areas", in turquoise), or whether a tip corresponds to a sequence from the reference phylogenies used for placement ("Reference", in purple). The prevalence of each OTU in the CCZ samples is indicated in terms of percent of samples where it is detected in this area (outermost bars). The finest taxonomy of OTUs highly prevalent in the CCZ are indicated.
stations exposed to high chlorophyll exports in the 37F dataset (HC_AEP and South Georgia), but less so in the V1V2 dataset where there are half the numbers of shared OTUs than in closely located areas (HC_AEP and HC_BEP). Interestingly, the CCZ areas also share exclusive OTUs at several intersections with the Kuril-Kamchatka area (Northwest Pacific region), although the number of shared OTUs is much lower than between CCZ areas.
There is a relationship between the number of reads in an OTU and the number of areas in which it occurs (Figure 5, box plots). Moreover, highly prevalent OTUs are detected by both DNA and RNA, which is also the case for the many OTUs present at all stations within each CCZ area (Supplementary Figure 12, box plots). Besides these cosmopolitan, high-frequency OTUs, the most important OTU intersections within CCZ areas are between pairs of stations, including OTUs only detected using DNA or RNA molecules (Supplementary Figure 12). Notably, any two stations within UK-1B and OMS strata tend to share between 10 and 15 exclusive 37F OTUs that can be represented by hundreds of reads (Supplementary Figures 12A,B). This number of OTUs exclusive to pairwise CCZ station intersections is higher for V1V2 (c.a. 40-100) than for 37F, and for OTUs associated with fewer reads (Supplementary Figures 12D,E).

Alpha Diversity
Different levels of alpha diversity characterize the benthic foraminiferal and eukaryotic communities at each spatial scale. Overall across datasets, this is seen more clearly when using Faith's PD (PD) compared to Shannon's entropy (H') and in most cases when using DNA (Supplementary Figure 13). For 37F, only the difference between regions using H' and RNA was not supported statistically, while other significant tests showed the largest difference effect using DNA at the region (  it should be noted that for V1V2 not all areas (and stations) are compared since many DNA co-extractions failed ( Table 1  and Supplementary Table 2). For the CCZ, DNA and RNA yield similar levels of alpha diversity for V1V2, whereas for 37F, the diversity of DNA is higher than that of RNA, but only using Shannon's entropy ( Figure 6A and Supplementary  Figures 14, 15A). Hence, we restricted the following analyses to Faith's PD index.
Alpha diversity in the CCZ differs from that of every other region as assessed by post-hoc pairwise comparisons following Kruskal-Wallis H tests (37F: Figure 6B; V1V2: Supplementary Figure 15B). For the foraminiferal DNA dataset, the CCZ appears as the second most diverse region after Kuril-Kamchatka. Alpha diversity of the CCZ as a whole does not differ from the North Atlantic region because the diversity within its individual areas and stations (all confined to the eastern CCZ) varies significantly ( Figure 6C and Supplementary Figure 16A). Indeed, the UK-1B and OMS strata have a similar, high Faith's PD whereas the BGR and IFREMER areas have very low values for this index (there were too few samples for testing against BGR North). In fact, the BGR stations exhibit consistently lower DNA diversity values than all other CCZ stations, with the notable exceptions of the two IFREMER stations where nodule abundance was particularly high (BIO12-102KG and BIO12-86), and station SO237-3 in the West Basin in the central Atlantic (Supplementary Figure 16A). This Atlantic station is also characterized by the presence of large polymetallic nodules (see cruise report; Devey, 2015). For the eukaryotic DNA dataset, the CCZ stands out as the most diverse region (no data for BGR and IFREMER) (Supplementary  Figures 15B,C). Interestingly, within-station DNA and RNA diversity vary more for Eukaryota than for Foraminifera, as is clear from the wider dynamic range of values for the former dataset (Supplementary Figures 16A vs. 16B, box plots). It also appears that between-station RNA diversity differences are more frequent for Foraminifera than for Eukaryota, as illustrated by the higher number of significant post hoc tests for the latter dataset (see dark yellow cells in heatmaps in Supplementary  Figures 16A vs. 16B).
Foraminifera 37F alpha diversity in the CCZ is associated with topography, but only for DNA ( Figure 6D). Faith's PD was positively correlated with Broad BPI (Spearman's rho = 0.36, p-value < 0.05) and negatively with seafloor slope (rho = −0.19, p < 0.05). Eukaryota V1V2 alpha diversity is also associated with topography for DNA, but only with Fine BPI (Spearman's rho = 0.23, p-value < 0.05) (Supplementary Figure 15D). In relation to water depth, the Foraminifera RNA diversity exhibits a positive but weak correlation with increasing depth (rho = 0.119, p-value < 0.05). Conversely, only DNA associates with depth for the Eukaryota, in a stronger negative correlation (rho = −0.222, p-value < 0.05) (Supplementary Figure 15D).

Beta Diversity
For both Foraminifera and Eukaryota, the CCZ samples form clusters on the Robust PCAs that are well separated from the samples of other deep-sea regions (Figures 7A,B), except for the Mid Atlantic samples that intermingle to some extent with the CCZ samples in the case of V1V2 ( Figure 7B). Strikingly, the DNA and RNA samples also formed two larger clusters, each of which shows separate groupings of CCZ samples. We therefore performed RPCA on DNA and RNA separately (as for alpha diversity analyses). For both 37F and V1V2, this also resulted in grouping samples according to region, and even areas, with the notable exception of CCZ areas, which formed a large and highly variable cluster (Supplementary Figure 17). Indeed, the beta diversity that we demonstrate within the CCZ is far from being representative of all that can be sampled there, as indicated by the dissimilarity-decay curves that do not saturate to an asymptotic minimum for both 37F and V1V2 and both for DNA and RNA (Supplementary Figure 18). Unsurprisingly, this pattern holds when sampling globally, with an increased baseline value of added beta diversity (Supplementary Figure 18, see "All samples").
For 37F, the separation between CCZ and Atlantic areas is driven by several ENFOR5 OTUs and several Unassigned OTUs pointing for DNA in these different, respective area directions whereas for RNA, it is two Unassigned and one poorly assigned Globothalamea OTUs that characterize CCZ samples, as opposed to two Rotaliida OTUs that are strongly associated with OTU-rich Atlantic samples. In the case of V1V2, it appears that for the DNA RPCA ordination, two OTUs assigned to the diatom Chaetoceros (pelagic) acting in one direction, and an OTU assigned the fungal taxon Sordariomycetes acting in the opposite direction, drive the separation along the first axis (PC1), i.e., the separation of Kuril-Kamchatka, Southern Ocean and CCZ samples (these taxa were already most influential with both DNA and RNA on the RPCA, Figure 7B). For the RNA ordination, the CCZ and Southern Ocean separate under the influence of an unassigned eukaryotic OTU and a Fungi OTU (along PC1) whereas a diversity of Cercozoa OTUs (including some assigned to Endomyxa-Ascetosporea parasites), and two OTUs assigned to Radiolaria (pelagic), strongly influence sample separation along the second axis (PC2).
No clear clustering of the areas results from the RPCAs based on the Metazoa subset, except for DNA where one Annelida OTU drives the separation between UK-1B and OMS samples and most of the Southern Ocean samples (Supplementary Figure 17). Beta diversity comparisons, performed before and after removal of non-Metazoa OTUs using Mantel tests on dissimilarity matrices or PROTEST on three-dimensional ordinations, revealed that the beta diversity changed considerably as a result of this filter. Mantel's r correlations were 0.65 and 0.59 for the Aitchison metric on DNA and RNA matrices, respectively, with an even lower procrustes PROTEST fit of ca. 0.4 for both DNA and RNA (Supplementary Table 9). Nevertheless, a relatively good conservation of the beta diversity pattern was achieved using the Jaccard binary metric (Mantel's r = 0.75 and PROTEST's 1-M 2 = 0.78, all p-values < 0.001, Supplementary Table 9).
Differences between regions, areas and stations were statistically supported for each dataset in PERMANOVA performed on the Aitchison metric (DNA Pseudo-F 37F/V1V2 = 20.44/20.43; RNA Pseudo-F 37F/V1V2 = 16.03/16.43; all p-values < 0.001) and for all other beta diversity metrics (Supplementary Figures 19A-C). However, the dispersion of the samples within each group was also significantly different between groups, except for the phylogenetic beta diversity metric Weighted UniFrac, which yielded the greatest difference between regions using DNA (Pseudo-F = 35.54, p-value < 0.01). We also tested for differences between areas and stations within the CCZ only and found statistical support unaffected by PERMDISP dispersion for comparisons based on DNA among 37F stations and V1V2 areas (PERMANOVA Pseudo-F = 4.027 and 3.194, respectively, both p-values < 0.001, Supplementary  Figures 19B-D).

Spatial Heterogeneity
Since PERMDISP testing revealed differences in group dispersions and notably for stations, we calculated as a measure of local heterogeneity the distances to centroids for samples from each station on 37F CCZ RPCAs (for DNA and RNA separately, see section "Materials and Methods"). Since stations associated with steeper slopes seem more dissimilar on these RPCAs (Supplementary Figure 20), we plotted these distances to station centroids on bathymetric maps of the CCZ strata UK-1B and OMS ( Figure 8A). Communities at stations located in depressions (i.e., associated with low Broad BPI values, blue background) were systematically more variable, as indicated by higher values for the distances to centroids in terms of average and standard deviation values (green circles). This relation was highly pronounced for the DNA samples and in the OMS stratum where distances to stations centroids increased with BPI, yielding more variable communities locally. Overall, the distances to station centroids computed based on the compositionally robust Aitchison metric are not different between DNA and RNA, while using the phylogenetic, Unweighted UniFrac metric, we found RNA heterogeneity to be significantly greater than DNA ( Figure 8B).

Metabarcoding Data and the Abyss
Omics techniques allowing rapid and standardized profiling of environmental molecules may represent a powerful approach to filling deep-sea knowledge gaps (Levin et al., 2019) and addressing key goals for this decade (Howell et al., 2020). Metabarcoding, in particular, is being increasingly developed for biomonitoring purposes in marine environments (Chariton et al., 2010;Aylagas et al., 2018Aylagas et al., , 2020Brandt et al., 2020;Le et al. in review). It has successfully documented the local impacts of marine industries, including aquaculture Lejzerowicz et al., 2015;Pochon et al., 2015;Cordier et al., 2018) and oil and gas extraction in coastal ecosystems (Lanzén et al., 2016;Laroche et al., 2016;Cordier et al., 2019;FIGURE 8 | Spatial heterogeneity expressed in terms of station beta diversity dispersion in relation to bathymetry for the 37F dataset. (A) At each station of the UK-1B and OMS strata (CCZ), we computed the distance separating the samples to the station centroid they form in the RPCA spaces obtained separately for DNA and RNA. Average and standard deviations for these distances-to-centroids are shown at each station (max. six samples per station) of each map panel (green rings), as well as for the number of OTUs (gray rings). The background color represents the Broad Scale BPI (Bathymetric Position Index) values smoothed over the plotted area. (B) Paired t-tests comparing the distributions of distance-to-centroid values between DNA and RNA are shown for the Aitchison dissimilarity metric (also used for the Robust PCA and maps) as well as for the phylogenetic, Weighted UniFrac dissimilarity metric. Mauffrey et al., 2020) where extensive ecological knowledge has allowed ground truthing. Metabarcoding has also been used for surveys of CCZ areas focusing on prokaryotic microbial communities , metazoans (Laroche et al., 2020a) and allochthonous pelagic eDNA deposited in sediments (Laroche et al., 2020b).
General methodological considerations regarding eukaryotic metabarcoding have been reviewed recently (Santoferrara et al., 2020). Besides the robust multiplexing design and PCR replicates that we used to filter cross-contaminations and spurious data , and extraction replicates to improve diversity estimates (Lanzén et al., 2017), another commonly overlooked issue is the compositional nature of sequencing data, which includes metabarcoding (Gloor et al., 2016(Gloor et al., , 2017Martino et al., 2019). Indeed, a simple transformation (e.g., with the Aitchison metric) allows circumventing the fact that, following sediment sampling, extraction, PCR and sequence subsampling, it is impossible to know the true total abundance of taxa in samples and thus to compare compositions on the basis of abundance (or relative abundance) (Morton et al., 2017;Knight et al., 2018). Our analyses of beta diversity turnover and dispersion were performed in PCA spaces that are robust to this effect, allowing comparison and testing. These outstanding technical aspects are particularly relevant for the deep seafloor, which is characterized by substantial small-scale faunal patchiness (Snelgrove and Smith, 2002) that is difficult to sample, particularly using metabarcoding .
Although metabarcoding is an important tool that can overcome some of the limitations inherent in morphology-based studies of deep-sea biota (Le et al., in review), there are some caveats to be born in mind when considering metabarcoding data obtained in deep-sea settings. First, because the amounts of sediment analyzed are small (2-10 g), the data will be weighted toward small-sized organisms (meiofauna and smaller). Macroto megafauna would necessarily occur only as sloughed cells, eggs, larvae, or tissue fragments, which would yield high amounts of fresh, readily amplified molecules and explain the occasional dominance of large animals when taxonomic composition is expressed in terms of the relative abundances of reads (e.g., see Craniata in Figure 2F). This caveat may also apply to some protists, notably megafaunal xenophyophores. Although these giant agglutinated Foraminifera are abundant and diverse across the CCZ (Gooday et al., , 2020, no OTU was assigned to them, confirming the lack of representativeness of the metabarcoding samples for macro-to mega-fauna sized organisms. Second, the V1V2 data also includes planktonic organisms, such as diatoms, radiolarians or calanoid copepods and hydrozoans among the Metazoa. It is well established that export fluxes deposit pelagic DNA traces, as well as entire cells, to the seafloor and that this DNA also conveys a strong biogeographic signal, as demonstrated for planktonic Foraminifera (Morard et al., 2017;Barrenechea et al., 2020). As a testament to this phenomenon, we were able recover the same clustering patterns for the CCZ and other regions using the benthic Foraminifera only, as well as the full V1V2 dataset that included a majority of planktonic sequences, as observed previously (Pawlowski et al., 2011). One compelling example is that of the Southern Ocean and Northwest Pacific regions (both highly productive), which clustered opposite to the CCZ (least productive), a separation associated with one OTU belonging to the large diatom genus Chaetoceros. Frustules from this diatom genus constituted the bulk of Southern Ocean sediment (Würzberg et al., 2014) and it was also found to be dominant in the Kuril-Kamchatka area (Golovan et al., 2019). One Chaetoceros OTU drove separation for the DNA samples, further highlighting the importance of pelagic DNA deposits in our data set.
A third point is that enormous quantities of extracellular DNA accumulate across the seafloor (Dell'Anno and Danovaro, 2005) and this material remains accessible for metabarcoding for extended periods of time ranging from centuries for settling, pelagic fish DNA (Kuwae et al., 2020) to millenia for benthic Foraminifera and even planktonic organisms (Coolen et al., 2013;Lejzerowicz et al., 2013a). The deep-sea environment favors the preservation of biomolecules and it has been found that up to half of 16S rDNA copies at abyssal depth originate from extracellular DNA pools (Corinaldesi et al., 2018). We found that the local heterogeneity for foraminiferal DNA is higher at stations characterized by low BPI values, compared to RNA and to stations located on slopes. This suggests that eDNA but not eRNA accumulates in depressions. In addition, we found that for eDNA -but again not for eRNA -there were significant correlations between alpha diversity and both BPI and slope. Faith's PD decreased with slope, suggesting winnowing (i.e., reduced permanent deposition) of eDNA in elevated areas, but increased with BPI. We stress that BPI interpretation is tightly linked to scale, and since this correlation is made across areas, it indicates that eDNA has a higher diversity in localities where there are more topographic depressions in which eDNA deposition may be focused. This suggests that eDNA winnowed off a slope would tend to become concentrated in these pools. Similar mechanisms of sediment winnowing and focusing related to abyssal seafloor topography have been postulated for labile particulate organic matter (Turnewitsch et al., 2015) and may determine its degradation (Volz et al., 2018). Nucleic acid is organic matter as well, and since RNA degrades more rapidly than DNA, it should be possible to model how these complex processes influence the distribution patterns inferred from environmental sequencing data. Interestingly, abyssal hills and more subtle topographic changes also appear to influence the distribution of live Foraminifera (Stefanoudis et al., 2016) and metazoan megafauna (Durden et al., 2020). Hence, if it can be assumed that DNA spreads with organic matter on the abyssal seafloor, we envision metabarcoding as a quantitative tool for trophic ecology and recommend RNA, which is likely to reflect contemporary faunas more accurately, for biodiversity assessments.
Corroborating this observation, we found that only betadiversity comparisons measured from DNA (between regions, areas or stations) were not subject to differences in dispersion. The slow degradation time scales for DNA mentioned above may cause environmental DNA to integrate biodiversity over much longer time scales, combining past and present communities , and thereby lowering variability amongst overlapping communities in RPCA ordinations. This can also explain the higher Shannon's entropy (compared to Faith's PD), because entropy increases with evenness, which will increase in turn with each set of DNA OTUs added to the pool. In contrast, RNA is more labile and therefore may exhibit more compositional variation that likely reflects spatial patchiness over shorter time scales. This potential advantage that RNA may confer by more accurately delineating deep-sea community patchiness in space and time is further indicated by the fact that we observe a higher distance to centroid for RNA compared to DNA across the entire CCZ (although reflecting local-scale, within-CCZ station heterogeneity). Higher RNA heterogeneity is a general trend for all beta-diversity metrics (higher median values, see Figure 8B) but was only statistically supported when accounting for evolutionary histories (i.e., using phylogenetic, UniFrac metrics). Interestingly, Macheriotou et al. (2020) relied on similar metrics to show a strong phylogenetic clustering for Nematoda in the CCZ, indicating highly heterogeneous, patchy communities, although their metabarcoding data derived from density-separated nematodes and not from bulk sediment. This makes their results more comparable to what eRNA conveys best, i.e., active, live species as shown for Foraminifera (Langlet et al., 2013;Lejzerowicz et al., 2013b), further strengthening our advocacy for use of eRNA for biomonitoring.

Wide Spectrum of Novel Benthic Taxa Revealed
This study is based on the largest collection ever assembled of deep-sea sediment samples covering a wide range of geographic areas. Using eDNA as well as eRNA sequence data, it represents a first attempt at understanding global to local biodiversity and distribution patterns for deep-sea benthic eukaryotes. Some metazoan clades could be assigned at the genus or species level, notably those belonging to the Nematoda, which are the best represented metazoan group in our data and consistently the dominant meiofaunal taxon in the deep sea (Thiel, 1983), including the CCZ (Pape et al., 2017). However, for many metazoan taxa, assignment was only possible to a Class, Order or Family (Figure 3). Among the Foraminifera, a large proportion of reads and OTUs could not be assigned to any known taxon (Figures 2A,B), a result that is also consistent with previous studies (Lecroq et al., 2009a;Lejzerowicz et al., 2014). Most of those that were assigned could only be placed in a Class, Order, or Family, or in a particular isolate (i.e., an unidentified specimen).
These results undoubtedly indicate the existence of many new taxa, but it should be noted that the incompleteness of reference database coverage remains a major impediment to taxon assignment in biodiversity surveys. In a recent study based on samples from a well-studied and heavily sampled area of the North Sea, Hestetun et al. (2020) estimated that for the 18S rRNA gene, species coverage was 36% for GenBank and 27% for SILVA, which we used in this study (along with the curated, protistfocused PR 2 ). The species coverage in these databases is certain to be substantially lower for the sparsely sampled, species-rich deep sea. Molecular reference databases are undergoing major developments for animal groups that are important constituents of CCZ assemblage. However, these mainly concern macroand mega-faunal taxa (e.g., Cnidaria, Mollusca and Polychaeta: Dahlgren et al., 2016;Glover et al., 2016;Wiklund et al., 2017Wiklund et al., , 2019 rather than meiofaunal organisms, which are potentially important ecologically (Smith et al., 2008;Ingels et al., 2020) and dominate metabarcoding data. We also recognize that querying very large, lightly curated databases may help improve highlevel assignment, but relying on consistency across assignment methods applied to less complex databases could offer an equally interesting avenue to high-level assignment and with high confidence at the scale of such high-throughput data. In fact, one highly valuable approach is to improve such assignments using taxon-specific databases, as demonstrated for Nematoda (Macheriotou et al., 2020). Curated phylum-specific databases would best resolve assignments associated with long, detailed taxonomic paths that are not aligned with that of other phyla, as in the case of Platyhelminthes and Xenacoelomorpha. Indeed, these two distinct phyla are lumped together in the generalist, protist-oriented PR 2 database, whereas their taxonomic paths include more than the eight taxonomic ranks used here for taxonomic assignment.
Despite these caveats, the large proportion of reads that can only be assigned at higher taxonomic levels is quite consistent with the low level of basic taxonomic knowledge of deep-sea faunas (Glover et al., 2018). Many deep-sea species remain scientifically undescribed, particularly at abyssal depths, despite belonging to higher taxa that are commonly seen in deep-sea samples. Among the metazoan meiofauna, almost all (99.3%) of the 632 morphospecies of harpacticoid copepods recognized at two stations in the SE Atlantic (>5,000 m water depth) were considered to be new to science (George et al., 2014). In the eastern CCZ, 94.1% of 170 macrofaunal isopod species were undescribed (Brix et al., 2020). In the same area, undescribed monothalamid morphospecies, many of them previously unseen, are an important constituent of both meiofaunal Gooday, 2017, 2019;Gooday and Goineau, 2019) and macrofaunal (Gooday et al., 2021) foraminiferal assemblages.
Foraminifera and metazoan meiofaunal animals such as nematodes and harpacticoid copepods are common and familiar constituents of the abyssal sediment biota, as are, for example, polychaetes, isopods, and tanaids among the macrofauna and holothurians and sponges among the megafauna (Gage and Tyler, 1991;Washburn et al., 2021). However, as earlier shown by Sinniger et al. (2016), our V1V2 data reveal the presence in the eastern CCZ of animals that are rarely seen in deep-sea samples; for example, xenacoelomorphs (Rouse et al., 2016) platyhelminthes (Christensen, 1981), andpriapulids (van der Land, 1972). As far as we know, the Placozoa and Mesozoa, for which we recovered some sequences from the Pacific and Southern Oceans, have never been reported from deep sea samples. Some of these taxa, notably some platyhelminthes (trematodes) and mesozoans, probably occur mainly as parasites (Bray, 2020), but others (turbellarian platyhelminthes, placozoans) may be free-living but too fragile to survive in a recognizable form in fixed samples. Thus, our metabarcoding data appears to reveal a fuller spectrum of animal diversity at our abyssal sites than could be recovered by traditional morphology-based methods. The same applies to the foraminifera, with OTUs that are known only from the environmental sequences constituting an important proportion of the assemblages, with an average of 8.4 ± 1.98% of OTUs of an area.
Interestingly, the metazoan diversity that Sinniger et al. (2016) revealed using eDNA metabarcoding is more readily detected at a similar sampling effort in our study using eRNA than eDNA. This suggests that this molecule is more reliable for analyzing diversity across studies and possibly, for future biomonitoring surveys. Yet, the amount of novel sequences that re-sampling yields is huge, as only c.a. 2% of the OTUs formed across studies contained sequences from both Sinniger et al. (2016) and the present study, even in the case of sediment sample extracts originating from the same station in the South Atlantic (DIVA-3 expedition, Martínez .

Biogeographic Patterns From Global to Local Scales
The breadth and heterogeneity of the seafloor habitats (Howell et al., 2016) makes it challenging to establish patterns of deep-sea benthic biodiversity and biogeography at the spatial scales relevant to conservation efforts related to seabed mining (McQuaid et al., 2020) and climate change (Levin et al., 2020b). There is a general trend for morphological and genetic divergence to decrease with increasing water depth, and for species ranges to increase, reaching a maximal extent on abyssal plains where there should be fewer barriers for dispersal, in contrast to the topographically, hydrographically and environmentally more complex continental margins (Grassle and Morse-Porteous, 1987;Etter et al., 2005;McClain and Hardy, 2010). For example, in the geologically orientated literature, a number of welldocumented, abyssal hard-shelled species of foraminifera, are considered to have 'cosmopolitan' distributions (Murray, 2006;Holbourn et al., 2013). This is confirmed by genetic data in the case of Epistominella exigua (Pawlowski et al., 2007;Lecroq et al., 2009b) and Cibicidoides wuellerstorfi (Burkett et al., 2020). Wide ranges are reported for certain species of harpacticoid copepods (Menzel et al., 2011) and polychaetes (Schüller and Hutchings, 2012;Guggolz et al., 2020), in the latter case with genetic support. At the same time, it is also common for abyssal assemblages of a particular taxon to combine a few dominant, widely distributed species with many more rare species that are found only in one or two samples (e.g., Glover et al., 2002;Wilson, 2017;Washburn et al., 2021). Notably, Foraminifera OTUs that only occur in a single CCZ stratum also tend to occur in only one station of this stratum, even when highly sequenced (Gooday et al., 2021). Because rarity is often correlated with small species ranges in better known ecosystems (e.g., Pimm et al., 2014), these rare abyssal species may have limited distributions as well. Our metabarcoding data, as well as earlier genetic results (Janssen et al., 2015;Macheriotou et al., 2019), reveal a similar assemblage structure, with relatively few, highly sequenced OTUs but many rare OTUs that were confined to single or pairs of areas or stations (Figure 5). In addition to rare OTUs, the likely contribution of extracellular DNA (Corinaldesi et al., 2018) is another factor to consider in trying to generalize about biogeographic patterns and species ranges in the abyss based on metabarcoding.
Since establishing cosmopolitan and endemic patterns depends on sampling and sequencing efforts, we used rarefaction and propose to focus on the well-sampled UK-1B and OMS strata of the eastern CCZ. Here, the presence of cosmopolitan foraminiferal OTUs, and notably their high prevalence, are not surprising. Indeed, since the Pacific is the oldest ocean and nodule-rich environments are particularly stable, it is possible that the 'stability-time' hypothesis (Sanders, 1968), suggested as a key driver of niche diversification for Nematoda in the CCZ (Macheriotou et al., 2020), might also explain the diversity of Foraminifera, although other explanations are equally plausible (Snelgrove and Smith, 2002). Interestingly, the early evolution of Foraminifera is characterized by a large radiation of monothalamous species (Pawlowski et al., 2003) which remain highly diverse today in the abyss. Given the genomic potential of Foraminifera to adapt to changing environmental conditions (e.g., anoxia, Orsi et al., 2020), and the dramatic ecosystem transformation that could result from mining, it is important to also explore the distribution of functional-adaptation determinants and consider these for spatial planning in order to preserve adequate sources of biodiversity for recolonization.
At a global scale, we observed distinct faunal patterns in the foraminiferal and eukaryotic datasets. In particular, the CCZ is clearly different from the other regions from which we have data. The Southern Ocean and Kuril-Kamchatka, regions that were also well sampled and sequenced, also appear to be faunally distinct. The large-scale patterns observed in our data are broadly consistent with the abyssal seafloor provinces proposed by Watling (2013) and the marine realms of , which integrate pelagic and benthic data. The CCZ sites fall within abyssal province A11 of Watling (2013), the Southern Ocean sites within AB5 and the SW Atlantic sites within AB3. Interestingly, we found the highest Faith's PD alpha diversity in the DNA data from the Kuril-Kamchatka region. This could reflect the location of this region at the intersection of three realms in  and near the boundary of two benthic provinces in Watling (2013), although again other factors, including habitat heterogeneity, may also drive higher diversity.
Strong differences at the larger scale between regions do not necessarily indicate that sample compositions are homogeneous within these regions (especially when looking at tight clusters on global RPCAs). In fact, there exists a high degree of heterogeneity at smaller scales within regions and within areas. The dissimilarity-decay curves indicate that an enormous amount of faunal variability remains to be explored in deepsea ecosystems, including within the CCZ. If seafloor areas such as the CCZ were sampled enough for random sampling to yield populations of truly comparable communities (e.g., for more specific statistical assessments), these curves would plateau out, as is the case in extensively sampled ecosystems (e.g., McDonald et al., 2018a). In addition, we hypothesize that exploring other habitats of the CCZ, such as nodules and other hard substrates hosting attached faunas Smith, 2020), would shift the baseline we observe for this curve. Therefore, we propose this dissimilarity-decay analysis as a useful approach to evaluate sufficiency of biodiversity sampling for a given spatial scale. This beta diversity analysis can be made robust to the compositional nature of the data and is thus preferable to comparisons based on alpha diversity extrapolations, which remain promising to evaluate the importance of nodule habitats in the CCZ (Laroche et al., 2020b).

Implications for CCZ Conservation in the Face of Deep Seabed Mining
Although most of our samples originate from the CCZ, they cover only a tiny fraction of the CCZ's habitat heterogeneity, effectively 3 out of 24 habitat classes defined for the CCZ (McQuaid et al., 2020). This very uneven coverage limits the relevance of our results to conservation across the entire CCZ, as all samples come from the eastern half and, with the exception of the IFREMER area, from the eastern end of the CCZ. We did not obtain samples from the western half or from any of the no-mining protected areas, termed Areas of Particular Environmental Interest (APEIs). There is a general bias in our current knowledge toward the eastern half across all biological studies in the CCZ (e.g., Washburn et al., 2021), which limits our understanding of trends in relation to depth and productivity gradients. However, new knowledge for the CCZ and its comparison with the five other soft-bottom areas we included in our survey contributes to understanding of finesediment habitats in the deep sea. More widely, we suggest that the rapidly evolving field of environmental genomics has the potential to deliver new tools that could provide consistency across the kind of large international research efforts that are likely to dominate deep-ocean research in coming decades (e.g., Howell et al., 2020).
An important advance that will follow naturally from the present, descriptive study is the testing of predictive models that account for scale, topography and habitat heterogeneity. A plethora of statistical learning techniques are mature enough to make such predictions and inform the design of sampling protocols for baseline and monitoring studies in the CCZ and other industrially impacted areas, provided that high accuracy can be achieved. Machine learning predictions based on metabarcoding data are useful for classification into biotic indices that are used for biomonitoring purposes in polluted, well-studied shallow-water benthic communities (e.g., Cordier et al., 2018Cordier et al., , 2019, as well as for regressions accounting for the rare species of taxa susceptible to climate change at global scales (Busseni et al., 2020). By applying a similar approach, it also may be possible to test whether the cosmopolitan patterns we observed are real. Although we employed rigorous laboratory procedures comparable to those used in research on ancient DNA in order to avoid cross-contamination, OTUs that appear in every sample could, if rare, still be artifactual. Thus, studies based on the same genetic marker should be reanalyzed jointly to confirm patterns of cosmopolitanism and prove that harmonized, open-source metabarcoding data is a powerful tool for testing reproducibility and promoting capacity building.
To conclude, our study confirms the potential of metabarcoding for describing the deep-sea microbial and meiofaunal diversity of abyssal seafloor ecosystems. However, its use for the assessment of anthropogenic impacts, such as polymetallic nodule mining, would require better ecological understanding of the response of seafloor ecosystem to mining relating stressors (e.g., removal of nodules and surface sediments, enhanced turbidity, burial/smothering in disturbed-sediment plumes, Smith et al., 2020). Such ecological knowledge is necessary for ground truthing the metabarcoding analyses. As the ecological interpretation of metrics derived from metabarcoding improves, we anticipate that future surveys, combined with other omics analyses, will prove to be a powerful method for assessing the impact of deep-seabed mining on fragile benthic ecosystems.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: European Nucleotide Archive accession PRJEB44134 and https://figshare.com/ projects/Eukaryotic_eDNA_eRNA_From_CCZ_Benthos/98972.

AUTHOR CONTRIBUTIONS
AB, PMA, LM, LA-P-G, RM, and CRS organized deepsea expedition or provided sediment material. FL, JP, and AJG conceived the study and wrote the manuscript. FL, IBA, and TC prepared the samples. FL performed the analyses. All authors provided comments and edits to the manuscript.

FUNDING
FL was supported by an Early Postdoc.Mobility Grant from the Swiss National Science Foundation (P2GEP3_171829). JP, LA-P-G, and IBA were supported by the Swiss National Science Foundation grants 31003A_179125 and 316030_150817. FL, TC, LAL, and JP were also supported by the Swiss Network for International Studies award (20170024). Samples from the UK-1 and OMS license areas in the Clarion-Clipperton Zone were collected as part of the ABYSSLINE project, funded by UK Seabed Resources Ltd. This funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication. The material was collected and sorted within the framework of several large international projects. The expeditions with RV Polarstern were supported by the German Research Foundation, both expeditions with RV Sonne by the German Ministry for Science and Education. Pew Charitable Trusts payed for publications charges and for the Deep CCZ Biodiversity Synthesis Workshop which helped develop this manuscript.

ACKNOWLEDGMENTS
We thank the crews of the research vessels FS Polarstern, RV Sonne, FS Maria S. Merian, FS Meteor, R/V Kilo Moana, R/V Thompson, and RV L'Atalante for their help on board and all student helpers and technicians for support and help with sorting of the extensive expedition material. EU JPIO-Oceans project MinigImpact-2 (German BMBF contract 03F0812E). Samples from the German Contractor Area were kindly made available by Carsten Rühlemann and Annemiek Vink (Bundesanstalt für Geowissenschaften und Rohstoffe). We also thank Stefan Mulitza and MARUM for the North Atlantic samples collected during MSM39, Maria Holzmann for the alignments of the reference foraminifera tree, Frédéric Sinniger for the data and reference Metazoa tree, Stefan Janssen for early advice on placement in the eukaryotic SILVA reference tree, Kerry Howell for the collection of environmental data layers, Aurélie Goineau, Alexandra Anh-Thu Weber, Gan Bin Qi, Laura Würzberg, Tomas Cedhagen, and Ivan Voltsky for assistance during sampling at sea. Pew Charitable Trusts for supporting the participation of CRS, AJG and FL in the Deep CCZ Biodiversity Synthesis Workshop and for paying the publications charges.