Metabarcoding Analyses and Seasonality of the Zooplankton Community at BATS

The diversity and specific composition of zooplankton communities have a direct effect on the ecosystem services (e.g., carbon export) and structure of ocean food webs. In areas where diversity is high, a detailed characterization of the whole community can represent a hurdle that prevents a complete understanding of those processes and interactions. One such region is the Sargasso Sea, where zooplankton diversity is among the highest in the world oceans. As a consequence, at the Bermuda Atlantic Time-series Study location, most research has focused on zooplankton biomass or on particular groups. To provide new insight into the total community composition at the BATS site, in this study the zooplankton community is investigated by means of metabarcoding of the 18S V9 hypervariable region. Day and night samples from a full year (2015) show the signature of diel vertical migration, driven by those groups with a higher representation in the metabarcoding reads. Seasonal ordination was detected after square-root transformation and, similarly to temperate regions, four seasons could be differentiated based on community composition (post-spring bloom, summer stratification, fall mixing event, and winter mixing), with no correspondence with the zooplankton biomass patterns. This community ordination showed correlation with the measured vertical flux, highlighting the need of understanding community regimes, even in theoretically stable regions such as the oligotrophic ocean, to advance in our understanding of zooplankton-mediated processes


INTRODUCTION
The zooplankton community is a key component of the ocean ecosystem. Considering both their direct grazing on primary producers and their grazing of other consumers, zooplankton channel 15 to > 50% of the ocean production toward upper trophic levels (Steinberg and Landry, 2017), key for sustaining ecosystems and fisheries. To properly understand the processes affecting or driven by the zooplankton community, a truly knowledge of their taxonomic composition (i.e., biodiversity) is needed (Chiba et al., 2018). For example, species-level identification has allowed the detection of changes in distribution (Beaugrand et al., 2002;Harris et al., 2015;Yoshiki et al., 2015) and phenology (Mackas et al., 2012) associated to climate change. Locally, seasonal succession in the zooplankton community has an effect on its functional profile, including, the zooplankton trophic role, both as predators and as prey (Pomerleau et al., 2015) or their influence in the carbon cycle (Mayzaud and Pakhomov, 2014).
In the Sargasso Sea, zooplankton diversity reaches one of its highest in the world oceans (Rombouts et al., 2010).
Studies of the open ocean zooplankton in the Sargasso Sea started with Moore (1949) and Grice and Hart (1962), however the most exhaustive research in the zooplankton community composition was done by Georgiana Deevey (Deevey, 1968(Deevey, , 1971Brooks, 1971, 1977) at Hydrostation "S, " in the vicinity of Bermuda. The complexity of the zooplankton inhabiting these waters was however a limitation as Deevey and Brooks (1977) indicated that "Because of taxonomic problems and the amount of time that would have been involved, no attempt has been made to identify many small species of such genera as, for example, Paracalanus, Clausocalanus, Spinocalanus, Microcalanus, Scolecithricella, Scaphocalanus, Oithona, and Oncaea. Also, some larger species were not identified". Still the authors identified more than 300 species of copepods, and it is likely that those uncounted genera (or groups) would have easily accounted for another 100 species at least. A few other studies were carried out in the Sargasso Sea or their vicinity later on, but in general focused on a single or a handful of groups (Angel, 1979;Wiebe et al., 1992). Most of the studies at community level have since used main taxa groups (Eden et al., 2009;Ivory et al., 2018), focused on a single group (Stone and Steinberg, 2014) or processes and biogeochemical cycling (Steinberg et al., 2000;Madin et al., 2001;Schnetzer and Steinberg, 2002;Stone and Steinberg, 2016;Maas et al., 2018) more than in the community composition.
In the last two decades, molecular markers have been developed to be used for species identification, an approach commonly known as barcoding (Hebert et al., 2003). Applied to all environments, including the marine zooplankton, single-individual based analyses identified (and barcoded) 175 holoplankton species in the Sargasso Sea in a single cruise (Bucklin et al., 2010), leaving, however, 100s of species without identification due to time constrains. Nowadays, with the aid of high-throughput sequencing, community-level barcoding can be done in a single analysis, an approach known as metabarcoding (Amaral-Zettler et al., 2009;Fonseca et al., 2010;Taberlet et al., 2012). Applied to the zooplankton, this technique has demonstrated its ability to uncover the existing diversity in very complex samples (Lindeque et al., 2013), determine spatial biographical regions based in community structure Bucklin et al., 2019), capture the vertical structure in the open ocean (Sommer et al., 2017), or show the potential for monitoring of estuarine conditions and seasonality (Abad et al., , 2017. This method is still subject to technical limitations, and there is much to be learn about it, especially the meaning of read counts compared to "classical" studies such as individual counts or biomass (Bucklin et al., 2016;Lamb et al., 2019).
Understanding the community dynamics associated to seasonal cycles is essential to precisely estimate basic questions such as biogeochemical cycling or trophic webs (Burd et al., 2010). Although the Sargasso Sea is often described a lowseasonality location, particle flux shows a clear seasonal signal (Conte et al., 2001) and, compared to truly oligotrophic locations such as ALOHA, the hydrography at the Bermuda Atlantic Timeseries Study (BATS) is subject to strong seasonality, with a winter mixing period below the photic layer, summer stratification and fall mixing, influencing nutrients inventory, primary production, particle flux, etc. similarly to temperate locations (Church et al., 2013). Subtropical gyres are characterized by a relatively stable community throughout the year, meanwhile temperate regions show strong seasonality in the community composition. This difference has a strong impact in the effects of the zooplankton in the biogeochemical cycles, from how efficient is to channel the primary production to upper trophic levels, to cycles of secondary production or export of carbon to the mesopelagic (Carroll et al., 1998;Wexels Riser et al., 2002;Mackas et al., 2012;Hannides et al., 2020). Understanding the dynamics of the zooplankton community at BATS, in the boundary of the North Atlantic Subtropical Gyre, is therefore essential to properly address the biogeochemical cycles at this site. For the zooplankton community, studies have previously focused on describing the monthly evolution of single species or taxonomic groups (Deevey, 1971;Deevey and Brooks, 1971;Ivory et al., 2018); a study linking hydrography to shifts in community composition is, however, still missing. From the existing data, the best suited would be those from Deevey studies but, unfortunately, the counts were not available in the original publications, and all efforts done to find the original data from her institution, or at the Bermuda Institute of Ocean Sciences, have been unsuccessful. This study analyses the zooplankton community at the BATS location (32.166667 • N, 64.5 • W), in the Sargasso Sea, during the year 2015, by means of metabarcoding, using the hypervariable V9 region of the 18S gene (Amaral-Zettler et al., 2009;de Vargas et al., 2015), a region commonly used for zooplankton metabarcoding Abad et al., 2017;Bucklin et al., 2019). The analyses aim to understand the seasonal response of the zooplankton community to hydrographic conditions, the ability of the metabarcoding approach to detect processes such as seasonal and daily (day/night) changes in the community composition of the zooplankton, and to determine if there is correlation between routine biological oceanographic measurements taken within the BATS program and the changes in the zooplankton community.

Metadata
Oceanographic data were accessed from the BATS ftp repository 1 . Sampling, quality control and post-processing methodologies are available on the BATS website 2 . Temperature, salinity and CTD fluorescence profiles from surface to 1000 m were graphed over time using Ocean Data View Ver. 4.8 (Schlitzer, 2017). Mixed layer depth (MLD) was defined as the depth at which density (sigma-theta; σ θ ) reached 0.125 kg m −3 difference compared to surface (Levitus, 1982) calculated from CTD profiles. Monthly integrated primary production (as per BATS methodology; Knap et al., 1997) and total Chl-a per square meter were calculated from Dec. 2014 to March 2016. Day and night size-fractionated zooplankton biomass (as per BATS protocol and data) was also analyzed. Both total biomass as well as fractionated biomass were graphed. Sedimentary traps flux data (measuring total biomass, C, N, and P from sedimentary traps during BATS cruises; a trap array, with traps at 150, 200, and 300 m is allowed to drift for 72 h; Steinberg et al., 2001) was also retrieved.

Zooplankton Sampling and Preservation
Zooplankton samples were taken during 2015 and early 2016 onboard the RV Atlantic Explorer during BATS cruises from February 2015 to March 2016 ( Table 1). Sampling methods and gear have been previously described in detail in Madin et al. (2001) and Steinberg et al. (2012). In short, zooplankton was sampled using a 1-m 2 rectangular, 202 µm mesh net. Oblique tows were done during the day (between 13:59 and 15:40 GMT) and night (between 01:24 and 6:19 GMT) on each cruise, with the exception of the 20314 cruise in which the day tow was canceled due to rough weather. Net tows were done at a ship speed around 1 kN, and paying out 250 m of wire at 15 m per minute both ways. Depth was recorded using a STAR ODDI DST centi-TD depth and temperature logger (effective depths ranging 113-247 m; Table 1). Once on deck, water was drained, and samples were immediately fixed in 95% undenatured alcohol. Samples were then stored at −20 • C until returning to BIOS (1 to 3 days). Once at BIOS, the ethanol was replaced and samples stored at 4 • C until analysis. If the ethanol was turbid after 24 h, ethanol was replaced again. Samples were stored in 500 mL of ethanol minimum. If the sample occupied more than 1/3 of the jar, it was then transferred to a 1 L jar, and filled to the top with ethanol.

Sample Processing and DNA Extraction
Prior to extraction, each zooplankton sample was split in two subsamples using a Motoda splitter (Motoda, 1959). Half of the sample was set for extraction, meanwhile the other half was kept as archival material at BIOS. Macroplankton (>2 cm; mostly the larger fish larvae) were removed from the sample and stored in a separate jar, not being used for this study.
The Extraction method was based on the SDS-chloroform extraction in Corell and Rodríguez-Ezpeleta (2014) but using the E.Z.N.A. R Mollusc DNA Kit (Omega Bio-Tek, Norcross, GA, United States) to recover the DNA instead of the ethanol precipitation. Three final pseudoreplicates (since they originated from the same zooplankton sample) were obtained for each sample.
Samples were filtered through a 53 µm mesh placed on a home-made sieve that allows for easy removal of the mesh without disturbing the sample, and thoroughly washed with milliQ water to remove as much ethanol as possible. The mesh was them removed and placed in a 50 mL falcon tube, with part of the mesh hanging outside the tube. The falcon tube was then tightly closed, securing the mesh to the edge. The tube was then centrifuged at 3,500 g for 10 min to remove as much water and remainder ethanol as possible. After centrifugation, the pellet was weighted, transferred to a fresh 50 mL falcon tube (carefully checking that no plankton was left on the mesh) and 15 mL of SDS buffer (Tris-HCl 10 mM, EDTA 100 mM pH 8.0, NaCl 200 mM, SDS 1%) was added if total mass was < 10 mg, and 25 mL if above. Samples were then vortexed to break the pellet away and ensure rapid mixing of the plankton with the buffer to prevent DNA degradation. The sample was then ground using a Fisher Scientific Homogenizer FSH125, equipped with a 10 mm × 160 mm steel generator. Samples were ground at 1/3 of the maximum speed (approximately 10,000 rpm; this model of homogenizer does not give precise measurement of the speed) for 5 min. This low speed was chosen since, during initial testing, it was noticed that using higher speeds (i.e., near maximum speeds), even for shorter times, often resulted in excessive DNA fragmentation and rapid heating of the sample, and fragments > 2 kb were difficult to obtain during amplification. Large gastropods, whose shells softened after contacting the buffer, did not get broken often, and were cut using dissecting scissors. After the grinding step, the generator was rinsed using 10 mL of SDS buffer to minimize the loss of sample in the generator. Between samples, the generator was completely disassembled, all parts washed with soap under running water, soaked in bleach 10% for 5 min and rinsed in MilliQ water before assembling and running the next sample. For samples < 10 mg of biomass, extraction was done using the 25 mL of buffer used (15 mL initially + 10 mL for cleaning the generator). For samples with > 10 mg, extra 15 mL of buffer were added, to make up to 40 mL of SDS buffer. Using less buffer caused DNA to degrade during extraction during preliminary tests. No heating of the sample was detected during the grinding process at low speeds.
After the grinding protocol, proteinase K (Sigma-Aldrich; St. Louis, MO, United States) was added to the buffer (0.2 mg mL −1 final concentration), as well as 5 mL of sterilized stainless steel Shot, 3/32 Ellipse (Rio Grande, CA, United States). The tubes were then placed in an oven at 60 • C for 4 h, vortexing at maximum speed for 30 s every 30 min. After the 4 h, the tubes were centrifuged at 3,500 g for 15 min. Three replicates of 400 µL of the supernatant were transferred to 1.5 mL Eppendorf tubes. Three additional cryovials (2 mL of supernatant each) were stored at −80 • C, and the rest of the supernatant was transferred to a clean 15 mL Falcon tube and stored at −20 • C. For the three Eppendorf tubes, if the supernatant still contained small particles that would not precipitate at the aforementioned centrifugation forces, the Eppendorf tubes were then centrifuged at 10,000 g for 5 min or until complete precipitation of the particles, and then supernatant transferred to a new tube. After this step, DNA was extracted following the E.Z.N.A. R Mollusc DNA Kit (Omega Bio-Tek, Norcross, GA, United States), from the addition of chloroform:Isoamyl alcohol 24:1 step on. A modification to the standard protocol included a 14 min centrifugation step to separate the aqueous and the organic phases at this point. From that, only 300 mL of the aqueous phase were carried to the next step, to ensure avoiding the milky interface and the organic phase. All the other steps followed the manufacturer conditions. Final elution was done using 2 × 200 µL of Ambion R nuclease-free water (not DEPC-Treated; Life Technologies, Foster City, CA, United States). DNA was then stored at −20 • C until amplification after quantification (NanoDrop TM One). DNA output ranged between 16 and 100 µg of DNA per pseudoreplicate.

18S V9 Amplification
One PCR reaction was done for each pseudoreplicate (total 3 PCR per sample). Amplification of the V9 region was done using the primers 1389F and 1510R (Amaral-Zettler et al., 2009) integrated in an oligonucleotide consisting of an Illumina adapter, an 8nt index sequence and a 10-nt pad sequence identical for all oligonucleotides (Supplementary Table S1 in Supplementary File S1). Amplifications were done using the AccuPower R ProFi Taq PCR PreMix (20 µL reaction; BIONEER; South Korea). PCR steps were performed such as to minimize amplification bias and formation of chimeras, including 300-500 ng of DNA template (to minimize the number of PCR cycles needed and the associated bias), fewer PCR cycles (12 cycles) and lower temperatures (Aird et al., 2011;Ahn et al., 2012;Gonzalez et al., 2012). The premix includes the DNA polymerase, reaction buffer, dNTPs, tracking dye, and a patented stabilizer. To the premix, 1 µL of each primer (10 µM) were added; the amplification protocol consisted in an initial denaturation at 94 • C for 3 min, followed by 12 cycles of 94 • C for 30 s, annealing at 56 • C for 30 s and extension at 72 • C for 30 s; a final extension step was done at 72 • C for 15 min, and then kept at 4 • C. After amplification, PCR success was tested using 3 µL of the PCR reaction in a 1.5% agarose gel in TBE buffer. For each sample, the remainder of the PCR product from the pseudoreplicates were pooled and sent for sequencing at the Microbial Analysis, Resources, and Services (MARS) unit of the University of Connecticut Biotechnology Bioservices Center. Sequencing was performed in an Illumina MiSeq using the MiSeq Reagent Kit v2 (500-cycles; 2 × 250) V2 chemistry. Sequence files are available at the NCBI site under the bioproject https: //www.ncbi.nlm.nih.gov/bioproject/PRJNA540654. To test for replicability within the procedure, two samples were submitted using different indexes in separate batches.

Bioinformatics Pipeline
Demultiplexed samples were analyzed in MOTHUR ver. 1.39.5. (Schloss et al., 2009). The full script, annotated, is available at https://github.com/blancobercial/BATS_zooplankton. Contigs were made allowing for trimming outside the overlapping region (therefore only reads giving full length in both directions will be retained in a later cleaning step), and bases that are compared to a gap in the other fragment were eliminated if the quality score was below 30. After pairing, all reads containing any ambiguity or shorter than 115 bp were removed. Unique sequences were aligned against the V9 region of the SILVA 128 release database (Quast et al., 2013). Sequences were trimmed to the length of the V9 region, and only those showing completeness (starting in the first base and ending in the last base of the alignment) were kept, avoiding artificial OTUs due to unfinished amplifications. After gap removal, chimeras were removed using UCHIME (Edgar et al., 2011) as implemented in MOTHUR, and unique sequences selected (since after trimming and processing a few sequences may become identical). Distances between reads were calculated, considering gaps as a single event per gap no matter the length. The effect of base by base threshold clustering was studied running an initial clustering step using the nearest neighbor algorithm, with a 0.10 cutoff and 1000 precision. After graphical identification of the clustering level corresponding to 2 bp (= 0.016), final clustering was done using the OptiClust algorithm (Westcott and Schloss, 2017) with distance set to 0.016. This distance would only remove PCR and sequencing errors, since it would only cluster the true reads with the derived reads with 1 bperror (2 bp distance between two reads with one error each). This approach is similar in concept to the TARA oceans 1 bp swarming procedure (de Vargas et al., 2015). OTUs were taxonomically assigned to a database developed from the complete SILVA 128 release database for SSU (SILVA_128_SSURef_tax_silva; not the NR99 available from www.MOTHUR.org) 3 . Bash scripts used to transform the SILVA files to MOTHUR format are available in the repository containing all scripts used in this manuscript 4 . Using the NR99 would have worsened the taxonomic assignment, since the 18S is a highly conserved gene, and it was observed that the same V9 region sequence was often shared between components belonging to the same Superfamily. Based on this classification, only metazoan data were selected for all downstream analyses. For identification of taxa (OTUs) flagged by ecological analyses (see below), each of the OTU were BLASTed against the GenBank and results individually analyzed, to take into account the limits of taxonomic assignment of the V9 region and the existing references.

Diversity Analyses
To analyze the effect of read numbers (i.e., read depth) per sample on OTU richness (i.e., diversity), unfiltered and nonstandardized read counts were analyzed in iNEXT online (Hsieh et al., 2016;Chao et al., 2018). Interpolated (rarefaction) curves and the extrapolation curves (to the maximum number of reads per sample) were calculated for each sample for species (OTU) richness (Chao et al., 2014). Confidence intervals were obtained after 100 bootstrap repetitions. Same analyses (rarefaction and extrapolation) were done after removal of singletons and doubletons for the whole dataset (not per sample).

Standardized Analyses
For community analyses, all samples were standardized to 36,471 reads (the minimum per sample), and all OTU abundances values < 1 after transformation assumed to be zero (to make it equal to the sample with less reads); and then any OTU with global abundances < 2 removed (global singletons). Two parallel analyses were run using PRIMER ver. 7 (Clarke and Gorley, 2015). The first considered only the relative abundances of the individuals at each sample as obtained after the "standardize by the total" procedure in PRIMER (that is, in result, a normalization to percentage of the total; ranging between 0 and 100); meanwhile the second analyses included a subsequent square root transformation (Hellinger standardization; Legendre and Gallagher, 2001), to reduce the weight of the highly abundant OTUs. Similarity matrices were built using Bray-Curtis similarity index, and graphically represented using nonmetric MultiDimensional Scaling (nMDS). Non-hierarchical, k-R Clustering analyses were run for grouping levels from 2 to 12, selecting R Cluster mode with 1000 restarts. The results were compared to ANOSIM pairwise tests at those levels of clustering, with 10,000 permutations to test for significance. To compare to a hierarchical clustering approach, a group average clustering (paired with a Simprof analysis for significance of the clusters/nodes with 10,000 permutations) was also carried out. Grouping was analyzed in terms of a significant increase of the R statistic from the k-R approach (equivalent to the ANOSIM R) as well as for ecological significance. SIMPER analyses were used to identify the OTUs contributing to both the similarity within, and dissimilarity between, the groups chosen for each analyses. Initial analyses included all samples and OTUs, meanwhile a more evenly sampled set, including only core BATS cruises (sampled then approximately once per month) from 2015 were analyzed, to avoid heterogeneity due to different time lags between consecutive sampling events ( Table 1). Environmental variables (MLD, PP and Chl-a per cruise) were compared against the nMDS ordination using the BIO-ENV analyses in PRIMER. MLD has shown to drive changes in the biogeochemistry and the community composition in other groups (Carlson et al., 1994;Steinberg et al., 2001;Carlson et al., 2009), and provides a better measure than a metric such as average temperature, that could not reflect the temperature the organisms are exposed to (since it averages above and below the thermocline). Chla is considered a proxy for phytoplankton biomass, and PP has showed to play a role in resource availability for the zooplankton, and both, although correlated, often show different maxima during the year. It was chosen to keep this short list of variables, instead of adding others such as nutrient levels (N, P, etc.) that would be already influenced by MLD and affecting PP. This would also reduce the risk of finding spurious correlations between the variables and the nMDS ordination.
The relationship between sedimentary traps flux (measuring total biomass, C, N, and P from sedimentary traps during BATS cruises; a trap array, with traps at 150, 200, and 300 m is allowed to drift for 72 h; Steinberg et al., 2001) to the community composition was interrogated using a mantel test (Spearman rank; significance tested after 10,000 permutations) between the similarity matrices derived from the community composition (as described above) and from the BATS flux components measured per month using the sedimentary traps.
For direct comparison with image analyses existing for BATS, clustering and nMDS analyses were carried out grouping the OTUs by main taxa groups to levels commonly used in automated image scanning in systems such as ZooSCAN/Ecotaxa (Gorsky et al., 2010;Picheral et al., 2017), and following the classification used from images carried out from BATS data by Ivory et al. (2018).

Hydrography
The observed seasonality of the water column has been previously described for the BATS station in several publications (e.g., Steinberg et al., 2001;and references therein;Lomas et al., 2013). Briefly, temperatures showed deep convective mixing early in the year (winter and early spring), moving to a strong summer stratification in surface for July, August and September, with deepening of the mixed layer later in the year (November and December) leading to the deep mixing during winter (Figure 1). During October 2015, Hurricane Joaquin passed near Bermuda (missing data in the graphs) and likely produce surface mixing. No data was registered from the BATS program due to the associated rough weather conditions, and profiles were obtained from Mid-Atlantic Glider Initiative & Collaboration (MAGIC 5 ) program (R. Curry, unpublished), revealing abrupt mixing of the column after the passage of Joaquin (Figure 1). Data gaps early in both years correspond to scheduled shipyard repairs for the RV Atlantic Explorer. CTD fluorescence showed higher concentrations during spring bloom in late February (i.e., winter), with some intermittent smaller peaks during the year between 70 and 100 m depth. When considering all casts made during each BATS cruise, the fluorescence peaks roughly follows the measured Chl-a (Figure 2) or Phaeopigments (Turner fluorometer; see BATS methods). MLD showed a patter similar to those already described for BATS (e.g., Church et al., 2013), with a deeper layer in winter, abruptly shallowing in spring and summer and gradually deepening during the fall despite the hurricane (Figure 2). Primary production showed maximum peaks in late winter/early spring and August, with a minimum during the fall (Figure 2) also in agreement with previously published observations for BATS (e.g., Steinberg et al., 2001).
Zooplankton biomass showed maximum values in March, May and August, with night values being higher for monthly pairs with the exception the May and September (Figure 2 and Supplementary Table S2 in Supplementary File S1). In general, each fraction followed the same common trend, with similar maximum and minimum relative values.

Laboratory and Bioinformatics
All samples succeeded in producing DNA, and amplification success was 100%. A total of 7,199,536 reads were produced, of which 5,144,471 passed quality control and chimera search, with 141,553 unique sequences. After clustering at 2 bp difference, 20,114 metazoan OTUs remained, accounting for 3,781,379 reads. The remaining reads belonged to other Eukaryotes OTUs (SAR, unclassified, etc.) and were not included in downstream analyses. Raw data is provided in an excel file including OTU abundances per sample, OTU taxonomic assignment and OTU representative sequence (Supplementary File S2).

Replicability
The test for replicability (raw counts; without any prefiltering) showed very strong agreement between both pairs of samples submitted from different amplification and sequencing batches using different illumina index (for both pairs Pearson ρ > 0.99; p < 0.001). The general trends in the most abundant groups were kept. Abundances for rare taxa (abundances below 0.5%), showed in some cases discrepancies, a behavior commonly observed in metabarcoding (Leray and Knowlton, 2017), despite the technical procedures carried out to minimize this effect (large amount of template, low number of cycles and pooling of three independent PCR reactions).

Effects of Sequencing Depth and Prefiltering on Diversity Estimates
Diversity analyses without prefiltering or standardization (3,781,379 reads in 20,114 OTUs) showed insufficient sampling (with the exception of a few samples such as 20313D), and the extrapolated curves indicated 1239 to 3890 OTUs per sample (Figure 3). The measured number of OTUs showed a higher dependence of the number of reads available to any other factor such as Day/Night or seasonality (Pearson correlation r = 0.96, p < 0.001). The extrapolated to 350,000 reads showed a worse correlation to the number of reads (r = 0.51, p < 0.01) and the interpolated (rarefied) to 36,471 reads showed no correlation to the number of reads per sample (r = 0.04, p > 0.85). Furthermore, the OTU numbers obtained from the rarified dataset were the less variable of the three comparisons (Figure 4 and Supplementary Figure S1 from Supplementary File S2).
After removal of singletons (3,768,816 reads, 7,551 OTUs) many species accumulation curves still did not show attenuation, although many more curves did show attenuation compared to the original data (Figure 3). Further removal of doubletons (3,763,740 reads, 5,013 OTUs) did not significantly improve much (i.e., increased saturation) the species accumulation curves nor mitigate the species detection dependency on the number of reads per sample. In all cases, the variability of OTUs detected per sample was much lower in the interpolation to the minimum number of reads compared to the observed or extrapolated. Based on the interpolated data, more taxonomic units were detected in the night samples compared to the corresponding day samples (paired t-test, p < 0.05), however there was no seasonal pattern in the number of units detected (Supplementary Figure S1 in Supplementary File S2).

Standardized Analyses
All samples were reduced to 36,471 reads per sample and only core BATS cruises were considered. Taxonomically, calanoid copepods overwhelmingly dominated the community in all samples, representing 45-65% of the community (Figure 5). No other of the taxa groups considered reached values over 13% in any case. At this major taxa group level, no clear FIGURE 1 | CTD temperature and fluorescence profiles at BATS station during 2015. Thin vertical lines indicate sampling events. Data is missing for October due to the Hurricane Joaquin. For temperature, two extra profiles were obtained from glider data (MAGIC program, BIOS) to give a better coverage between the last CTD sampling before the hurricane and the one after.
seasonal pattern was appreciated, with proportions remaining more or less constant.
After building the Bray-Curtis similarity matrix, a nMDS analyses on the non-transformed data showed little seasonality pattern, however day/night samples were separated to a certain degree, showing statistically significant grouping by this factor despite the partial mixing (ANOSIM R = 0.289, p < 0.005; Figure 6). The taxa that contributed most (up to 25%) to the differences between day and night samples, were typically OTUs with a higher number of reads, in particular OTUs 1, 2, 3, 5, 11, and 7 ( Table 2). They contributed more than 4% each to the dissimilarity, accounting for up to 35% of the total dissimilarity. GenBank BLAST analyses indicated that their identity would correspond to (1) the Family Calanidae (Copepoda, Calanoida; several genera and species of the family shared the same V9 region), (2) Pleuromamma spp. (Gaussia princeps shares this sequence, it is however unlikely due to its meso-bathy-abyssopelagic habitat; Metridia does not share this sequence), (3) the Euchaetidae or Aetideidae; (5) unknown members of the Superfamily Clausocalanoidea, (11) the Superfamily Eucalanoidea (sensu Blanco-Bercial et al., 2011; again, the V9 region is common to several genera within this superfamily, across families) and (7) the Family Euphausiidae (Order Euphausiacea; several genera of the shared this sequence) respectively ( Table 2). Contribution to the dissimilarity progressively decreased for the other OTUs, and only 20 contribute more than 1% (Supplementary Table S3 in Supplementary File S1).
After standardization by the total and square root transformation, K-R Clustering showed that the maximum improvement in R occurred between 3 and 4 clusters ( Figure 7A). The defined groups coincided with the four defined clusters by hierarchical group average method ( Figure 7B). The nMDS analyses reflected a seasonal cycle based in those four clusters (Figure 7C), corresponding to seasonal hydrographic regimes: winter deep mixing and bloom (February to April; spring bloom happened in February, Figure 1), spring (May to July; coincident with the largest zooplankton biomass), summer stratification (August and September) and fall mixing event (November and December; post-hurricane Joaquin). Environmental variables (MLD, PP and Chla) showed high correlation with the nMDS ordination and grouping (Pearson ρ = 0.458; p < 0.005). ANOSIM analyses showed high significance for the four groups (p < 0.001) meanwhile it resulted in non-significant grouping of Day/Night (p > 0.25). A mantel test comparing the community composition with measured vertical particle flux measured at BATS (RELATE procedure) showed a statistical correlation between both similarity matrices (ρ = 0.348; p < 0.001). A direct correlation between zooplankton biomass (measured during the day, during the night, or the average of both) did not result in FIGURE 2 | Upper panel shows the depth of the mixed layer depth (MLD), total Chl-a per square meter, and total primary production (PP) per square meter during BATS core cruises. Lower panel depicts the day and night dry biomass of zooplankton (mg per cubic meter) during the same cruises. Marker colors correspond to the four zooplankton community regimes as established based on the community composition (see section "Results"). significant correlations when compared with the exported total mass. The raw data for the vertical flux can be found in the Supplementary File S1 in Supplementary Table S4. SIMPER analyses between the four groups ( Table 3) indicate that the transitions between seasons were due to many OTUs, each contributing at the most 1 to 2% to the dissimilarity between seasons (detailed results in Supplementary Table S5 in Supplementary File S1). Taxonomic assignment by BLAST gave a wide range of IDs, from several phyla, covering both holoplankton to meroplankton. The deep mixing and spring bloom to late spring transition was marked by the increase of Eucalanoidea (2 OTUs) and Calanidae copepods, with a decrease of winter taxa such as appendicularians, euphausiidae and Pleuromamma sp. Toward the summer, Poecilostomatoid copepods, and Acartia longiremis increased their relative abundances, meanwhile during the fall there is an increase of an Eumalacostraca and an OTU belonging to the Euchaetidae/Aetideidae clade. The transition to the winter is again characterized by increases in an appendicularian, Pleuromamma sp. and Euphausiidae, and a decrease in the groups dominant in summer and fall. Similarity within groups were caused mostly by the most abundant OTUs (especially 1, 2, 3, 5, 6, 7, 9, and 11). Some other taxa were, however, typical of only one or two of the groups, such as Poecilostomatoid (OTU 18) for the summer and the unknown Eumalacostraca (OTU 15) in the fall. In case of Eucalanoidea, it was possible to appreciate a gradual decrease in their relative contribution to the groups, with a maximum in spring, decreasing toward the summer and fall.
Grouping the OTUs by main taxa groups following the classification used from images carried out from BATS data  resulted in a lack of ability to detect seasonality. Samples analyzed this way were clustered into three groups, one containing summer (June to September) samples and the other two the rest of them. Day/night pairs were, however, mixed within groups, and even same-month day/night samples fell in separate groups (Supplementary Figure S2 in Supplementary File S2).

DISCUSSION
This manuscript shows how seasonality influences the mesozooplankton assemblage at the BATS site, indicating the presence of four distinct community regimes. Although some FIGURE 3 | OTU accumulation curves for all samples, using all reads (black), and after removal of global singletons (red) and doubletons (blue). Curves represent values extrapolated to 350,000 reads per sample (see methods). After removal of doubletons, most samples appeared to have reached a quasi-asymptotic profile, that would indicate that all OTUs had been detected. Standard deviations (after 100 repetitions) were too small to be seen in the graph. SD (not represented) were always ≤ 5% of the value.
previous works have described certain (or all) components of the community in detail (Deevey, 1971;Brooks, 1971, 1977;Stone and Steinberg, 2014) or organized by main taxonomic groups , none of the previous described any shifts in the seasonal patterns at community level. These seasonal regimes were characterized by changes in the relative abundances of both key and secondary players of the community, with only a smaller influence based on the fluctuations of the dominant players of the community. The four-community pattern described here is similar to those described for temperate oceans (responding to hydrographic conditions), and it is more clear than when only biomass (e.g., Madin et al., 2001; this study) is considered. In particular, at least during the year 2015 (analyzed here) the maxima and minima biomass values are mixed within a statistical cluster; both in total biomass and for each size fraction, considering just day or night, or as a monthly measurement. While some of the community shifts are predictable based on a priori understanding of the ecosystem (linked, for example, to the end of the mixing of the water column to the stratified conditions), the more gradual transitions (spring/early summer to summer, and fall to winter) would not be obvious without this detailed biodiversity data. Furthermore, for the very rapid winter to spring transition (represented by the formation of shallow stratification), there was a month of delay in the zooplankton response, showing an apparent disconnect between hydrography and zooplankton community composition. This lag would be due to the known delay in the response of the zooplankton, attributed to the longer generation times compared to phytoplankton (Ceballos and Álvarez-Marqués, 2006). The presence of these four community regimens would also represent changes in the functional profile of the community. The clearest example was the shift from the deep-mixing and spring bloom communities to the late spring community. This was typified by a shift from an early spring bloom community, dominated by of omnivorous/carnivorous (Pleuromamma sp., euphausiids), and appendicularians, small filter feeders able to feed in even very small phytoplankton and respond rapidly to increased resources due to their short life cycle (Acuña et al., 1995;Fernández and Acuña, 2003;Blanco-Bercial et al., 2006;Lobón et al., 2013), to a Calanidae and Eucalanoidea dominated one, specialized in feeding in larger phytoplankton, during the late bloom period.
Omnivorous species (e.g., Pleuromamma or euphausiids), able to respond faster to the spring bloom, decreased their relative abundances in late spring, with an increased relevance of Calanidae (likely Nannocalanus minor s.l., Mesocalanus tenuicornis s.l. and Neocalanus spp.; a family tightly linked to phytoplankton blooms worldwide) and Eucalanoidea, likely with a slower response to the increased phytoplankton biomass. This switch came, however, well after the yearly maximum of Chla (February and March). This suggests a delay in the response of the phytoplankton specialists and an advantage in this region for taxa with a wider prey spectrum, which might be able to maintain larger populations under the low resource conditions of fall and winter. The fact that community relative composition alone (without any biomass or total abundance implicit) was also a better predictor for vertical flux than zooplankton biomass also suggests a significant role of the community composition on the oceanic biogeochemistry. All these results would place the zooplankton community dynamics at BATS closer to those typical from temperate oceans than to subtropical oligotrophic gyres, including the seasonality signal correlated with flux. The seasonality in the flux (and in the flux attenuation) was already known from this region (Helmke et al., 2010;Church et al., 2013); the described zooplankton cycle might contribute to explain these cycles and the difference between BATS and other locations with a weaker seasonality such as HOT or ESTOC. Further investigations, using both field and modeling approaches, should help connecting these observations.
Although a deep analysis of functional biodiversity profiles was not done (see below about limitations of this study), it is known that functional analyses might be able to perceive more subtleties when characterizing community dynamics (Pomerleau et al., 2015), which could subsequently affect relevant processes such as carbon flux (Steinberg et al., 2008). If the identified community regimes have a relevant impact on such processes, as the correlation between flux and seasonal groupings suggests, identification of the functional profiles and analysis of their role in biogeochemical cycling, is something that should be addressed in future research.
These observed seasonal changes do not, however, affect the general profile of the community based on main taxa grouping or for the most dominant taxa, which stay dominant throughout the year (with small changes in the relative abundances). Nor do they correlate with the diversity of the community (at least the indices calculate here). Most OTUs with a relatively significant presence were detected in all samples; cases in which an OTU would be a significant part of the community (more than 1%) for only part of the year were non-existent. These results would support the hypothesis that the community has been deeply sampled (deeper than could be attained with classical microscope counts), without missing OTU due to undersampling, and that changes in relative abundances would be based in balancing changes involving many taxa, with very little final influence on the average diversity of the community.
The inability to detect the same groups, or even well-defined groups, when biodiversity was assessed by main taxa groups (similar to detection levels achieved by other methods such as scanning and machine-aided identification), highlights the power of the metabarcoding approach despite the known existing limitations (see below). This should not be interpreted as any method being better or worse, but as a need to integrate several approaches to gain a full understanding of the processes in the zooplankton community. Total biomass, particle size profiles and real abundances, etc., will be enriched by the power of metabarcoding to detect changes in the community ecology.

Limitations and Capabilities of 18S V9 Metabarcoding
When performing any study, the limitations of the approach should be always carefully considered, and metabarcoding is not an exception (Creer et al., 2010;Pilgrim et al., 2011;Bucklin et al., 2016). The most commonly questioned or criticized facet of the metabarcoding is how quantitative metabarcoding is (if at all). Reviewed recently by Lamb et al. (2019), there is in general a correlation between number of reads and biomass, although with large uncertainties. In the present study, the top 10 OTUs would correspond to taxa or clades that are known to be among FIGURE 5 | Proportion of the most abundant large taxa groups (those achieving 5% of the reads per sample at least once during 2015). The community was dominated by calanoid copepods, with secondary taxa more variable throughout the year. Horizontal bars in the base of the graph correspond to the zooplankton community regimes as established based on the community composition (see results).
FIGURE 6 | Non-metric multidimensional scaling representing the similarity between all samples base in community composition (relative abundances). Day versus night grouping was statistically supported; however, no seasonal pattern was detected.
the most dominant at the BATS site (Deevey, 1971;Deevey and Brooks, 1977;Steinberg et al., 2000), however no morphological counts were done for the same samples. It is also important to remember that metabarcoding only provides an equivalent to relative abundances/biomass. In our case, if weighing the relative abundances by the biomass for each month and day/night, it was observed that the total maximum peaks for several of those taxa would correspond with the sample of maximum biomass (May; 10314 cruise), and not the maximum of relative abundance (figure not shown). Finding the best way to transform these metabarcoding counts into real abundances/biomass should be one of the priorities for the whole scientific community.
In the pre-processing steps, the most critical step appeared to be the rarefaction to the minimum number of reads. This process has the associated problem of loss of information (Colwell et al., 2012;Chao et al., 2014), however it increased the differences between samples, biasing diversity toward those with more original reads. If this is an effect of artificial inflation during sequencing is something to be studied in the future. The removal of singletons had a stronger effect within the extrapolated data, while the additional removal of doubletons was minimal, especially in the interpolated dataset. All these results suggest that number of taxa were not be inflated due to methodological errors (that would have been marked by a more dramatic decrease in OTUs when removing singletons and doubletons). The approach in this manuscript is very conservative, and our clustering (theoretically) just removed PCR errors, and was equivalent to a 100% similarity matrix after de-noising (some OTUs are still 1 bp different, since they were the centroids of their respective error-derived cluster of reads). It did not, however, result in an excessively high final number of OTUs for the region, considering existing literature. One of the advantage of the V9 region is its short length (∼130 bp), therefore a complete overlapping between F and R reads was obtained. Discarding those pairs or reads that did not completely agree with a Q > 30 might have helped removing PCR and sequencing errors, and likely dropped chimeras. Both are known to be factors that might be causing the apparent inflation in the number of taxonomic units in metabarcoding studies.
Another limitation in this particular study is the taxonomic resolution of the V9 region. Despite the fact that it has been widely used for community characterization in the zooplankton (Amaral-Zettler et al., 2009;de Vargas et al., 2015;Abad et al., 2017), its ability to resolve species is somewhat limited in zooplankton (Wu et al., 2015). For the V9 alone, even identification at the genus level would have only an 80% success for Copepoda (Wu et al., 2015). A similar problem was also detected for euphausiids in the present manuscript (where several species and genera shared the same sequence), and for pteropods (Stewart et al., 2018), suggesting it would In bold, when the highest relative abundances (percentage of reads) were reached for each OTU.
FIGURE 7 | Clustering and nMDS after Hellinger transformation, based on Bray-Curtis similarity. The k-R non-hierarchical clustering shows the maximum increment in the R statistic at K = 4, with R > 0.8 (A). The addition of subsequent clusters had a smaller effect on the R, and did not result in ecologically meaningful clusters. In (B), group average clustering showed four clusters (in colors) matching those detected by the non-hierarchical clustering. In (C), the nMDS organized the samples reflecting the seasonality based in the community composition. Colors match the clusters detected by the previous approaches (A,B). The overlay represents Pearson correlations between the three environmental parameters and the spatial ordination of the samples.
likely be for all taxonomic groups, if studied in detail. For Copepoda, it can also be observed that the resolution is not equal across the phylogenetic tree. Within the order Calanoida, the oldest planktonic families, Arietelloidea (= Augaptiloidea Blanco-Bercial et al., 2011) and Diaptomoidea showed good resolution to genus and even to the species level (e.g., the several Pleuromamma OTUs; or the Centropages sp. detected). If considering at the most modern superfamilies, whose planktonic radiation happened more recently (Bradford-Grieve, 2002), the same V9 is shared by whole families (Calanidae) or even across several families within the most modern superfamily, the Clausocalanoidea. This is a widespread issue Abad et al., 2017), and this asymmetry of the resolution would influence estimates of diversity, with a closer accuracy in areas where the Arietelloidea or Diaptomoidea are dominant groups (e.g., the mesopelagic or coastal environments, respectively) compared to niches where the other superfamilies are more common (e.g., the epipelagic open ocean). In the BATS region, Pleuromamma spp. and Lucicutia spp. (Metridinidae and Lucicutiidae; Arietelloidea) are significant contributors to the zooplankton biomass (Deevey and Brooks, 1977;Schnetzer and Steinberg, 2002), however the other superfamilies also contribute significantly to the community (pers. obs.; Deevey and Brooks, 1977). In terms of total number of species, 241 copepod OTUs passed the different thresholds in this study. Compared to morphological analyses, 326 species were found by Deevey and Brooks (1977) from the surface to 2000 m depth, 128 of those in the epipelagic, although they acknowledge many more (especially small ones) were not identified and were left at the genus level. A similar number of morphological species has been obtained (159 species) during summer 2016 in the upper 200 m at BATS (Blanco-Bercial and Maas, unpublished), where likely a higher species-level resolution was attained compared to Deevey and Brooks (1977). The higher numbers obtained in this metabarcoding analysis could be attributed to the fact that a whole year was sampled. Furthermore, in the metabarcoding approach, all of the sample is analyzed, while in the morphological counts a subsampling of some kind is needed in order to make the counting feasible. It would be also possible that, in fact, metabarcoding would be detecting more of the rare community, which in the Sargasso Sea it makes a significant fraction of the zooplankton. Considering the difficulty in attaining species-level ID between congeneric groups in this oceanic region, the expected number of species could easily double that number (around 500 species). That would still not be out of scale, considering the high diversity of this area and the understudied non-calanoid copepods. For the subtropical NW Atlantic, at least 713 named copepod species have been indicated (Razouls et al., 2005(Razouls et al., -2019. Although many would not be distributed in the Sargasso Sea, most of them could potentially be present in the samples analyzed here. More accordance was found in pteropods (but a very complete reference was available from Burridge et al., 2017) or in Ostracods, when metabarcoding identified 110 OTUs for ∼120 species described for the subtropical North Atlantic (Angel et al., 2008). Considering these three groups, this dataset does not seem to have inflated the existing diversity, and might be relatively well-representing the community composition. Understanding the limits (and the capabilities) of metabarcoding is essential to make use of its full potential, while avoiding over interpretation of the results. The final goal of the metabarcoding should not to be viewed as the complete and perfect description of the community. Lacking complete databases, and working with a marker that is neither able to discriminate among nor efficiently amplify all species, among others, are limitations that, at least at present, seem unavoidable. These limitations, and many others, have been discussed in several reviews (Creer et al., 2010;Pilgrim et al., 2011;Bucklin et al., 2016). In the present study, however, metabarcoding effectively revealed community regimes, with correlation to environment and biogeochemical processes, providing a step toward a better understanding of ecological succession and processes in the Sargasso Sea zooplankton, and allowing for the development of new working hypothesis regarding the zooplankton community at the BATS site. Many potential studies are furthermore possible based on this dataset; e.g., biological interactions (Lima-Mendez et al., 2015;Stewart et al., 2018), in depth studies in relation to biogeochemical cycles (Guidi et al., 2016), among others. The equivalent analyses of the zooplankton community, done by classical methods (morphology), would have needed a much larger effort (time-wise) and several experts while the faster imaging (zooSCAN or similar) analyses do not provide the same kind of information. It is time to incorporate metabarcoding as a standard procedure in timeseries, environmental monitoring, and regular research, not as a substitute of, but as a complement to, the present methods used in biological oceanographic research.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in the NCBI BioProject 540654, accessible at https://www.ncbi.nlm.nih.gov/ bioproject/PRJNA540654. All scripts used in this study can be found in the github repository https://github.com/blancobercial/ BATS_zooplankton. Data from the Bermuda Atlantic Time-Series Study analyzed in this study can be found at BATS data repository http://bats.bios.edu/bats-data/.

AUTHOR CONTRIBUTIONS
The author confirms being the sole contributor of this work and has approved it for publication.

FUNDING
This research was partially funded by the Simon's Foundation International as part of the BIOSSCOPE project (http://scope. bios.edu/), as well as startup funding from the Bermuda Institute of Ocean Sciences to LB-B.