Genetic Diversity in Marine Planktonic Ciliates (Alveolata, Ciliophora) Suggests Distinct Geographical Patterns – Data From Chinese and European Coastal Waters

Unraveling geographic distribution patterns of planktonic protists is a central goal in marine microbial ecology. Using a novel combination of recently developed phylogenetic and network analyses on a V4 18S rDNA metabarcoding dataset, we here analyzed the genetic diversity of marine planktonic ciliate communities in Chinese and European coastal waters. Thereby, our approach provided an unprecedented perspective on geographic patterns inferred from ciliate genetic diversity and accomplished a very fine resolution down to single nucleotides within operational taxonomic units (OTUs). While most OTUs (87%) exclusively contained sequences of either Chinese or European origin, those OTUs detected in both regions comprised the vast majority of reads (84%). Phylogenetic analyses of OTUs belonging to the same taxon revealed genetically distinct clades that were geographically restricted to either Chinese or European coastal waters. The detection of signature nucleotides emphasized this genetic distinction of Chinese and European clades. Second-level clustering of OTUs and reference sequences in two selected taxa (the oligotrichid Spirotontonia and the tintinnid Tintinnidium) revealed the presence of several potentially new species or ones lacking genetic reference data. Geographic patterns were also discovered by network analyses within 700 widespread and abundant OTUs; in 77 of these OTUs, European and Chinese sequences formed significantly assortative groups. These assortative groupings indicated a higher genetic similarity among sequences from the same region than between sequences from different regions. Our results demonstrate that detailed analyses of metabarcoding data down to single nucleotide differences expand our perception of geographical distribution patterns and provide insights into historic and ongoing effective dispersal in protists. The congruent discovery of geographic patterns at different levels of resolution (between and within OTUs) suggests that cosmopolitan distribution in marine planktonic ciliates is less common than previously postulated.

Unraveling geographic distribution patterns of planktonic protists is a central goal in marine microbial ecology. Using a novel combination of recently developed phylogenetic and network analyses on a V4 18S rDNA metabarcoding dataset, we here analyzed the genetic diversity of marine planktonic ciliate communities in Chinese and European coastal waters. Thereby, our approach provided an unprecedented perspective on geographic patterns inferred from ciliate genetic diversity and accomplished a very fine resolution down to single nucleotides within operational taxonomic units (OTUs). While most OTUs (87%) exclusively contained sequences of either Chinese or European origin, those OTUs detected in both regions comprised the vast majority of reads (84%). Phylogenetic analyses of OTUs belonging to the same taxon revealed genetically distinct clades that were geographically restricted to either Chinese or European coastal waters. The detection of signature nucleotides emphasized this genetic distinction of Chinese and European clades. Second-level clustering of OTUs and reference sequences in two selected taxa (the oligotrichid Spirotontonia and the tintinnid Tintinnidium) revealed the presence of several potentially new species or ones lacking genetic reference data. Geographic patterns were also discovered by network analyses within 700 widespread and abundant OTUs; in 77 of these OTUs, European and Chinese sequences formed significantly assortative groups. These assortative groupings indicated a higher genetic similarity among sequences from the same region than between sequences from different regions. Our results demonstrate that detailed analyses of metabarcoding data down to single nucleotide differences expand our perception of geographical distribution patterns and provide insights into historic and ongoing effective dispersal in protists. The congruent discovery of geographic patterns at different levels of resolution (between and within OTUs) suggests that cosmopolitan distribution in marine planktonic ciliates is less common than previously postulated.
Keywords: distribution, metabarcoding, network analyses, phylogenetic analyses, protists, second-level clustering, Swarm OTU INTRODUCTION Marine planktonic ciliate communities contribute fundamentally to marine food webs by linking different trophic levels (Sherr and Sherr, 1988). This role is especially important in coastal environments that generally harbor a rich biodiversity and additionally host the breeding grounds of many multicellular organisms whose juvenile stages feed on ciliates (Stoecker and Capuzzo, 1990). Despite the importance of ciliates, our understanding of their distribution and speciation patterns is incomplete. Both kinds of patterns are strongly affected by past and ongoing effective dispersal through transport in active or passive state, which includes a subsequent successful establishment of individual organisms (Weisse, 2008). Likewise, dispersal limitations, species sorting caused by the absence of adequate ecological niches, and adaptations to local environmental conditions define a species' geographic distribution.
Communities of marine planktonic ciliates are usually dominated by the oligotrichids and the choreotrichids comprising aloricate (naked) species and the house-forming tintinnids (Pierce and Turner, 1992;McManus and Santoferrara, 2013). Both are assigned to the Oligotrichea Bütschli, 1887 according to morphologic, molecular, and ontogenetic features (Adl et al., 2019). The prevalence of data on the diversity and distribution of the about 1,000 tintinnid morphospecies in the literature (e.g., Merkle, 1909;Laackmann, 1910;Hofker, 1931;Hada, 1938;Zeitzschel, 1966Zeitzschel, , 1969 is in sharp contrast to the comparatively few taxonomic and ecological papers on the about 140 oligotrichids and 60 aloricate choreotrichids. Moreover, geographic distribution patterns inferred from records of morphospecies are often influenced by investigation methods (e.g., sampling strategy, fixation blurring speciesspecific features) as well as taxonomic uncertainties caused by phenotypic plasticity and cryptic species (Agatha, 2011). In the latter cases, morphological divergences or convergences are incompletely linked to genetic relationships.
To circumvent the problems regarding species limitations in tintinnids, studies on their global distribution are largely restricted to genus level. Different geographic patterns emerged from the voluminous dataset: cosmopolitan, neritic, warmtemperate, boreal, austral, and tropical Pacific; yet, even in cosmopolitan genera (inhabiting the neritic and oceanic regions from the Arctic through the tropics to the Antarctic), none of its members covers the entire geographic range (Pierce and Turner, 1993). Furthermore, the composition of tintinnid communities also distinctly differs between neritic and oceanic regions (Pierce and Turner, 1993;Dolan and Pierce, 2013). In oligotrichids and aloricate choreotrichids, the records of morphospecies are widely restricted to coastal waters. Keeping the taxonomic impediments in mind, the data demonstrate a wide, possibly cosmopolitan distribution in most genera, while about one third of them seem to be geographically restricted and potentially contain few species with apparently limited distribution (Agatha, 2011).
The stepwise introduction of sequencing technologies into marine planktonic ciliate research complemented morphospecies analyses and allowed for investigating previously untouched aspects of genetic diversity. The first molecular study from coastal sites in the Northwest Atlantic on oligotrichids and choreotrichids used the V3 and V8 regions of the 18S rDNA and applied Sanger sequencing (Doherty et al., 2007). The molecular and morphological data were congruent regarding tintinnids, but revealed a much higher genetic diversity in oligotrichids, especially within the genus Strombidium Claparède & Lachmann, 1859 (Doherty et al., 2007(Doherty et al., , 2010. The comparison of molecular and morphological data on tintinnids by a subsequent study not only confirmed differences in the composition between coastal and adjacent open ocean communities, but also showed the advantages of sequencing studies for detecting less abundant species (Santoferrara et al., 2016b). Global records of 18S rDNA gene sequences also supported biogeographical patterns for tintinnid genera (Santoferrara et al., 2018).
Barcoding approaches, which combine morphological and molecular data, alleviated the limits of taxonomic resolution, and unveiled both phenotypic plasticity and (pseudo-) crypticity in oligotrichean ciliates Santoferrara et al., 2015Santoferrara et al., , 2017Kim et al., 2020). The marker genes employed (e.g., 18S, ITS, and 28S rDNA gene sequences) generally enable to distinguish ciliates down to species rank (Warren et al., 2017). Ideally, a single cell is identified and documented, its marker genes are analyzed, and the sequences are deposited together with metadata in a reference database. Thereby, marker gene sequences are unambiguously linked with corresponding morphospecies (Santoferrara et al., 2016a). Reference databases that cover most of the diversity represent a valuable resource for "meta"-barcoding studies that depend on reliable taxonomic annotations of gene sequences obtained from environmental samples. Yet, species identification using reference sequences is hampered by overlapping interspecific and intraspecific sequence distances, different species divergence times, and arbitrary, a piori fixed divergence thresholds (Moritz and Cicero, 2004;Will and Rubinoff, 2004;DeSalle et al., 2005).
Identical marker gene sequences detected in independent samples revealed the occurrence of several marine planktonic ciliate species in different oceanic regions. For example, identical internal transcribed spacer sequences and 18S rDNA gene sequences of some oligotrichids were found to be distributed across the North Atlantic and adjacent sea regions or in the West Atlantic (Agatha et al., 2004;Katz et al., 2005;Doherty et al., 2010). Specimens of the tintinnid Antetintinnidium mucicola collected from the Northwest Atlantic are identical in their 18S rDNA sequences to one reference sequence from the Yellow Sea (Ganser and Agatha, 2019). However, even specimens with identical sequences in the rather conserved 18S rDNA might display genetic divergences in other, more variable marker genes (e.g., ITS, 28S, and COI), indicating distinct species (Xu et al., 2012;Santoferrara et al., 2015Santoferrara et al., , 2017Jung et al., 2018).
While the barcoding of ciliate species continues, metabarcoding studies, which employ high-throughput sequencing (HTS) technologies for rapidly investigating whole ciliate communities, are independent of taxonomic expertise. As outlined above, they make, however, use of the already existing taxonomic results from barcoding studies in publicly available databases. Despite their exclusive focus on molecular data, metabarcoding studies have considerably contributed to our current perception of geographic distribution patterns in marine planktonic ciliates, as well. The patterns were mainly analyzed via large-scale metabarcoding datasets comprising hyper-variable sequences of either the V4 or the V9 18S rDNA gene regions. Regarding ciliates, the genetic distances of the hyper-variable V4 regions are generally comparable to distances between near full-length 18S rDNA sequences, this applies, however, only to a lesser extent to the hyper-variable V9 region (Dunthorn et al., 2012). The findings of large-scale metabarcoding studies regarding planktonic ciliates in marine habitats suggest that European coastal waters exhibit a high regional genetic diversity and a strong habitat specificity based on V4 sequence data (Forster et al., 2015). By contrast, the global and regionally restricted genetic diversity of planktonic ciliate communities in the open oceans based on V9 sequence data was found to be low (Gimmler et al., 2016).
Moving from whole ciliate communities down to the level of individuals, metabarcoding also contributed to assess the intra-and interspecific genetic diversity (Forster et al., 2019). Whatever the targeted level of investigation, diversity assessments as well as the detection of geographic patterns and their underlying processes strongly depend on the molecular resolution (Hanson et al., 2012). This resolution, in turn, depends on the selected marker gene (Dunthorn et al., 2012) as well as on the approach with which metabarcoding sequences are clustered into operational taxonomic units (OTUs) (Forster et al., 2019). Traditionally, heuristic sequence clustering approaches that rely on global alignment scores (e.g., USEARCH), such as a given percentage of sequence similarity, were used to generate OTUs (Edgar, 2010). While these methods reduce the computational demand for processing a dataset (sequences are only compared with a centroid sequence), several studies have shown that OTUs generated by heuristic approaches lack accuracy and reproducibility Bachy et al., 2013;Schmidt et al., 2015). Moreover, heuristically clustered OTUs are designed to assess community structure on the OTU level (e.g., αand β-diversity), but provide little information about their internal structure and are therefore comparatively coarse in terms of genetic resolution. More recently developed hierarchical sequence clustering approaches that employ allvs.-all pairwise sequence comparisons allow for a much more fine-grained resolution of diversity. In particular, the algorithm Swarm provides detailed information about the internal structure of its OTUs . Swarm OTUs also grow from a centroid 'seed' sequence, but the sequences are linked to each other if their sequence alignment differs by not more than a given number of nucleotides (usually one nucleotide difference, which is likely to arise through biological processes). The internal structure of a Swarm OTU thus represents a network based on the information which sequences are linked to each other, which may be evaluated by means of graph theory (Forster et al., 2020). Evaluating the properties of the internal network structure distinctively increases the resolution of diversity down to single nucleotide differences and, therefore allows unprecedented insights into genetic diversity and geographical distribution patterns. Investigations of intraspecific genetic diversity are especially important within ciliates, which exhibit high copy numbers and sequence variation even within individual specimens (Gong et al., 2013;Wang et al., 2017;Zhao et al., 2019). An additional second-level clustering of Swarm OTUs enables an efficient differentiation of genetically divergent sequences into species-specific network sequence clusters. This strategy thus allows for a closer approximation between morphospecies and OTUs, i.e., a better estimation of the real species diversity from metabarcoding datasets (Forster et al., 2019). Owing to differences in evolutionary rates, however, a refinement of the similarity cut-off values for particular lineages is required Zhao et al., 2019).
Current high-throughput sequencing technologies facilitate the rapid assessment of marine planktonic ciliate communities and thus the investigation of distribution patterns on various temporal and spatial scales. The two most common sampling strategies result in (i) a low spatial and high temporal resolution, i.e., the samples are collected several times at locally restricted sites, or (ii) a high spatial and low temporal resolution, i.e., the samples are collected usually only once at sites widely distributed over a coastal or oceanic region. The selection of a strategy depends on the aims of the study. Type 1 studies usually focused on the influence of environmental gradients on the community composition (Santoferrara et al., 2016b;Minicante et al., 2019), the distribution and composition of communities at varying depths (Grattepanche et al., 2015;Sun et al., 2019), or seasonal patterns (Piredda et al., 2017;Giner et al., 2019). However, type 2 studies investigated large-scale diversity and distribution patterns, for example, at a regional scale in European coastal waters (Logares et al., 2014) or at a global scale in the oceans Logares et al., 2020). Usually, large-scale studies of the marine plankton either analyzed the entire community of eukaryotes or focused on certain sizefractions of protists.
Coastal waters are highly interesting environments for investigating geographic distribution patterns and their evolutionary origin in marine planktonic ciliates, particularly as data covering large spatial scales are lacking. Within these environments, assessments of ciliate diversity and distribution patterns were so far confined to whole communities, genera, morphospecies, or OTUs (Foissner et al., 2008;Doherty et al., 2010;Dolan and Pierce, 2013;Massana et al., 2015;Piredda et al., 2017). In the present study, we apply a novel combination of phylogenetic approaches (Hütter et al., 2020) and network analyses (Forster et al., 2019(Forster et al., , 2020 to achieve an unparalleled increase of resolution down to within-OTU level for investigating geographic distribution patterns and their origin in marine planktonic ciliates. Hence, it overcomes the limitations of previous investigations by employing an increased genetic resolution and a unique sampling strategy that links a high spatiotemporal resolution (regional sampling at several sites, at different daytimes, and in different seasons) with a large geographic distance between the regions examined, namely, the Chinese and European coastal waters. The present analyses aim (i) to detect correlations between phylogenetic relationships and the geographic distribution based on metabarcoding data of the hyper-variable V4 region, (ii) to identify signature nucleotides for geographically restricted OTU clusters (Hütter et al., 2020), (iii) to determine the threshold values that reconcile secondary clusters of Swarm OTUs and morphospecies and allow estimating the diversity of taxa, and (iv) to discover geographic patterns of genetic diversity within Swarm OTUs by means of network analyses.

Sampling and Filtering Procedure
Following identical protocols, 80 samples were taken from Chinese and European coastal surface waters between October 2017 and November 2018, namely, one day and one night sample per season at each site (Supplementary Table 1); the high spatiotemporal sampling aimed to capture as much of the biodiversity of planktonic ciliates as possible. Our sampling sites belong to different Longhurst ecological provinces: the Northeast Atlantic Shelves, the Mediterranean Sea, and the China Seas Coastal provinces (Longhurst, 2007). The Chinese sampling (n = 32) was performed at four sites along the coast of southern China, i.e., in Aotou (coastal town), Donghai Island, Futian (mangrove wetland), and Hailing Island. The European sampling (n = 48) was conducted at the German North Sea coast, i.e., at two sites in Wilhelmshaven, and at two sites in the Baltic Sea, i.e., Warnemünde and Rostock, as well as at two sites in the Mediterranean Sea, i.e., Villefranche-sur-Mer (France) and Trieste (Italy). The spatial distances spanning the shortest sea routes along the coastlines amount to about 150-500 km between the Chinese sites, about 400-6,400 km between the European sites, and about 14,200-18,800 km between the Chinese and European sampling sites.
Surface water samples were collected with a bucket (5 L), and temperature and salinity were immediately measured. The collected water was pre-filtered through a 250 µm-mesh net to remove debris and metazoa and subsequently filtered through a 0.8 µm-pore size polyether sulfone membrane 47 mm across (Pall) in a filtration unit with a manual vacuum pump (Nalgene) until the membrane was clogged or a maximum of about 3.6 L was reached. The membrane filters were immediately placed in Cryo-Vials (Simport) and preserved with LifeGuard R buffer (Qiagen) at 0 • C. In the laboratory, they were stored at −80 • C until DNA extraction. Using a whole water sample, the entire ciliate community ranging from nano-to microplankton could be collected; plankton nets, which provide usually merely qualitative samples, were exclusively employed in sampling for microscopic inspection.

DNA Extraction, Amplification, and Sequencing
The total DNA from each membrane filter was extracted with the DNeasy R PowerSoil R Kit (Qiagen) with slight modifications of the manufacturer's protocol (Supplementary Data 1). The extracted DNA of each sample was amplified and sequenced by Biomarker (Beijing, China). Amplification comprised a nested-PCR approach (Supplementary Data 1), using first ciliatespecific primers followed by eukaryotic primers targeting the hyper-variable V4 18S rDNA gene region (Lara et al., 2007;Stoeck et al., 2010). In a single run, paired-end sequencing (2 × 250 bp) of the library with the V4 sequences of all samples was conducted on an Illumina HiSeq 2500 platform. Sequence read data will be made accessible at the NCBI BioProject ID PRJNA714531 (https://www.ncbi.nlm.nih.gov/bioproject/).

Bioinformatic Quality Control and Sequence Clustering
The primer removal from each demultiplexed forward and reverse library was conducted in Cutadapt version 2.8 (Martin, 2011). Primers were clipped from both ends of each read, and only reads without mismatches in the primer regions were retained. Forward and reverse reads were then paired-end assembled by VSEARCH version 2.14.2 (Rognes et al., 2016) with high success rates (≥96.5% merged paired-end reads per sample; mean paired-end fragment lengths ≥344 nucleotides). Pairedend reads with ambiguous bases were discarded and identical reads were dereplicated to amplicons across all samples according to the recommendations for preparing sequence data for OTU clustering 1 . Our terminology follows Mahé et al. (2015) in that we refer to a single sequence obtained from raw HTS data as a 'read, ' while an 'amplicon' comprises a set of identical reads obtained via dereplication. The term 'sequence' refers to the sequence of nucleotides of which a read or an amplicon consists.
Amplicons were used as input for OTU clustering by Swarm v. 3.0.0 . Swarm is an efficient clustering algorithm for linking only those amplicons into one OTU which differ by a single nucleotide from at least one other amplicon in that OTU. Thereby, the OTUs grow iteratively from a central 'seed' amplicon. In each iteration, the algorithm searches for all amplicons that can be linked to the most recently added amplicon(s). If no more such amplicons can be found, the clustering process stops, the current OTU is closed, and a new OTU is initiated from another central amplicon. In our particular clustering approach, we enabled the options -d 1, -f (fastidious option), -i (output of a structure file), -j (output of all pairwise sequence connections into a file), and -s (output of a clustering statistics file). Since our goal was to achieve resolutions down to single nucleotide differences, we set the clustering threshold in Swarm accordingly (d = 1). Although not tested here, our methodological workflow allows for adjusting this threshold to the taxonomic group and marker gene under study. For ciliate metabarcoding datasets of the ribosomal V9 and V4 gene regions, a clustering threshold of one nucleotide difference has successfully been applied in previous studies (e.g., Forster et al., 2019). For other taxonomic groups and faster evolving marker genes, such as COI, though, higher clustering thresholds might be necessary for assessing the underlying genetic diversity within and between OTUs (see e.g., Bakker et al., 2019;Siegenthaler et al., 2019;Antich et al., 2020). Swarm uses all sequences in the input dataset and yields a very fine-scaled resolution of genetic diversity in a sample. The OTU networks generated by Swarm are used for downstream evaluations by means of graph theory.
Chimera detection was conducted after OTU clustering, using VSEARCH's uchime_denovo option, an implementation of the UCHIME algorithm (Edgar et al., 2011). After removing chimeras and low-abundant OTUs (i.e., comprising less than three reads), all remaining OTUs were taxonomically annotated, using the PR 2 reference database (Guillou et al., 2013) and the last common ancestor approach in VSEARCH, applying the sintax option with default parameters, which is an implementation of the SINTAX algorithm (Edgar, 2016). Those OTUs not assigned to the Ciliophora were removed from the dataset; the final ciliatespecific dataset contained 47,093 Swarm OTUs.

Phylogenetic Analyses
To detect geographic distribution patterns in marine planktonic ciliates, phylogenetic analyses were conducted. Those OTUs with an identical "taxon" annotation were grouped. "Taxon" labels derived from the PR 2 reference database may represent environmental sequences, sequences of species identified to genus level, or reference sequences from known morphospecies (Guillou et al., 2013). All OTUs grouping to a "taxon" label might thus encompass several different but closely related ciliate taxa. Those groups assigned to the Oligotrichea with more than 15 OTUs in one of the two regions and a ratio of OTU numbers from China and Europe higher than 0.15 were chosen to assemble the final dataset. Next, each group was augmented by shared OTUs (i.e., OTUs consisting of amplicons/reads from both geographic regions) that matched the respective "taxon" labels. The final dataset (Supplementary Table 2, Supplementary Data 2) included 21 "taxon" groups containing in total 2,404 Chinese OTUs (51,369 reads), 2,679 European OTUs (71,085 reads), and 1,124 shared OTUs (1,076,655 reads).
The amplicon sequence of each Swarm OTU with the highest read number (representative Swarm OTU sequence) was chosen for the subsequent alignments, which were calculated for each "taxon" separately with MAFFT v. 7 (Katoh and Standley, 2013), using the globalpair command. The phylogenetic tree for each "taxon" was computed with FastTree v. 2.1.11 (Price et al., 2010) under the GTR + CAT model (Tavaré, 1986;Stamatakis, 2006). The trees were edited in FigTree v. 1.4.4 2 . They have a polar tree layout, midpoint rooting, and increasing order of nodes. For displaying the different distribution patterns, the trees represent cladograms (Figure 4), whereas the detailed taxon trees show the original branch lengths (Supplementary Figure 1). The branches were colored according to the geographic affiliation of the respective OTU, i.e., those from China are marked red, those from Europe are blue, and the shared ones are purple.
Nucleotides characterizing certain tree branches geographically restricted to a large extent were identified with DeSignate (Hütter et al., 2020), using the web-interface 3 . In DeSignate, two sets of aligned sequences chosen by the user (=query and reference groups) are compared and each alignment 2 https://github.com/rambaut/figtree/ 3 https://designate.dbresearch.uni-salzburg.at/home/ position is ranked and categorized according to its diagnostic value. Alignment positions with a diagnostic value of 1 are categorized as binary or asymmetric if they are uniform within the query group. The nucleotides at these positions are termed "signature nucleotides" in the context of our study. At binary positions, a nucleotide base (e.g., A) is present in all sequences of the query group and another nucleotide base (e.g., G) is present in all sequences of the reference group. At asymmetric positions, the nucleotide bases in sequences of the reference group do not need to be uniform, but different from the nucleotide base in the query group (e.g., G, C, or T, but not A). These signature nucleotides thus indicate that part of the amplicon diversity of a ciliate taxon can be affiliated to a certain geographic region, namely, China or Europe. Suitable query and reference groups to test for the presence of signature nucleotides in aligned sequences were identified by eye, evaluating branching patterns and branch support values in each of the phylogenetic trees.

Sequence Similarity Networks
For two taxa in which reference sequences can be reliably linked to morphospecies, namely, the oligotrichid Spirotontonia and the tintinnid Tintinnidium, sequence similarity networks were calculated. The representative sequences of the phylogenetically analyzed Swarm OTUs affiliated to these taxa along with their closest related reference sequences of the PR 2 database and sequences of known morphospecies (updated list of curated reference sequences from Santoferrara et al., 2017) were subjected to a second-level clustering by means of sequence similarity networks (SSNs). The pairwise sequence alignments by VSEARCH yielded similarity values for each "taxon" used for constructing the sequence similarity networks at different thresholds (Forster et al., 2019). The thresholds at which the morphologically distinct species separated into network sequence clusters (NSCs) were determined for elucidating the numbers of connected components that do not include reference sequences; they represent either new species or ones lacking genetic data.

Network Analyses on Geographic Patterns Within Swarm OTUs
The Swarm OTUs were subdivided into two categories: OTUs comprising sequence reads from only Europe or China and shared OTUs comprising sequence reads from both regions. A further subdivision of the European OTUs regarding their affiliation to the two different Longhurst ecological provinces and network analyses considering geographical distances were not performed. A case study on a test dataset of Scandinavian lakes detected no significant patterns on a regional scale (Forster et al., 2020). Therefore, we expanded the spatial scale toward two more distant geographical regions in the current study. For network analyses, only shared OTUs which comprised at least 0.0001% of all reads and included at least three amplicons from both geographic regions (China and Europe) were chosen. These thresholds were set to select those OTUs in which network analyses on the internal structure were feasible, as OTUs poor in reads and amplicons are prone to produce statistically insignificant network patterns. The resulting subset consisted of 700 OTUs.
Assortativity is a commonly used metric in graph theory (Newman, 2003). Within a particular OTU, it allows measuring preferential connections between amplicons originating from China or Europe, i.e., the assortativity is high (maximum 1) when amplicons from one region are genetically more similar among each other than to sequences from the other region. The -i and -j outputs of Swarm served as input for constructing a network which was then subjected to assortativity calculations.
For determining the statistical significance of the assortativity values each OTU network was randomized 1,000 times by shuffling the geographic attributes of the nodes (China or Europe), while the general network structure (i.e., same number of nodes and edges, same node degree) was retained. The assortativity values for the randomized networks were calculated. When the computed assortativity values of the original OTU network exceeded the 95% confidence interval of the 1,000 randomized values, amplicons from the same region were significantly more often connected to each other than expected by chance. The workflow and the bioinformatic commands used have previously been published (Forster et al., 2020) and were run in R Studio v. 3.5.1 (R Core Team, 2018), using the igraph package v. 1.2.2 (Csárdi and Nepusz, 2005).

Geographic Distribution of Swarm OTUs
The following results explicitly focus on the geographic patterns in OTUs from Chinese and European coastal waters representing two spatially distant regions. A detailed comparison of ciliate communities within regions and their variability through seasons and daytime will be the topic of a subsequent publication benefitting from the high spatiotemporal resolution of our sampling strategy. The ciliate-specific dataset shows that only 13% of OTUs consist of reads from both regions (shared OTUs), whereas 43% and 44% of OTUs exclusively consist of reads from China or Europe, respectively (Figure 1). On the contrary, most reads (84%) belong to shared OTUs, while the remaining reads are almost evenly divided between Chinese OTUs (7%) and European OTUs (9%). Out of the 6,227 shared OTUs, 663 OTUs (11%) consisted to at least 99% of Chinese reads and 744 OTUs (12%) consisted to at least 99% of European reads. Most OTUs (55%) were annotated to taxa assigned to the Oligotrichida, aloricate Choreotrichida, or Tintinnina, while the CONthreeP (e.g., peritrichs) were less often observed. Interestingly, 43% of all purely Chinese and European OTUs (8.4% of total reads) were annotated by the last common ancestor approach to the same taxon names, suggesting that there are considerable regional genetic deviations.
On closer inspection of the shared Swarm OTUs, three main patterns of geographic affiliation are distinguished for each enclosed amplicon (Figure 2): (i) the amplicon comprises reads from only one region, (ii) the vast majority of reads within the amplicon originate from one region, while only one or relatively few reads (<1%) originate from the respective other region and are interpreted as methodological artifacts (e.g., in the oligotrichid Spirotontonia sp.), and (iii) the amplicon contains a considerable number of reads from both regions, such that its occurrence in both regions is unlikely to be a methodological artifact (e.g., in the tintinnid Tintinnopsis baltica; Figure 3). The distinction of the different amplicon types was of utmost importance for the representative amplicons used in the phylogenetic analyses.

Phylogenetic Trees and Second-Level Clustering
The phylogenetic trees of the 21 oligotrichean taxa display a spectrum of topologies ranging from distinct geographic to heterogeneous distribution patterns generated by the representative amplicon sequences of the Swarm OTUs (Figure 4). Distinct patterns thereby suggest that the amount of geographically restricted genetic diversity is the result of past effective dispersal events in specific taxa as indicated by long and highly supported branches (Supplementary Figure 1). On the other hand, multiple effective dispersal events are repeatedly ongoing in several other taxa as indicated by heterogeneous distribution patterns and short branch lengths.
Most of the phylogenetic trees show various degrees of heterogeneous branching patterns with no clear split between Chinese and European OTUs. Several different branching patterns are exemplified in the tree of the tintinnid T. baltica ( Figure 3A). Please, note that the last common ancestor approach assigned the V4 sequences only to the reference sequence of T. baltica although the reference sequences of Tintinnopsis acuminata and Tintinnopsis nana are identical in their V4 regions. Hence, the tree shown might actually comprise several species. The various subclades in the tree mainly consist of closely related Chinese, European, and shared OTUs, which display an alternating and irregular branching ( Figure 3B). The representative sequences of the shared OTUs were assigned to one of the three types described above (Figure 2), depending on the percentages of reads from the particular regions in its representative amplicon ( Figure 3B).
Distinct geographic distribution patterns are apparent in trees of four taxa (Figure 4), in which Chinese and European OTUs are entirely (the oligotrichids Spirotontonia sp. and Strombidium K sp.) or largely separated (the oligotrichid Strombidiidae K X sp. and the tintinnid Tintinnidium sp.). The representative sequences of shared OTUs are evenly distributed in the phylogenies of the four taxa and either fall into clades of European or Chinese OTUs. They belong to Type 2 of shared OTUs, and their representative amplicons can be assigned to the geographic region of their respective clade (Figure 2).
The bifurcation forming the conspicuous split between Chinese and European OTUs in the trees of the four abovementioned taxa (Supplementary Figure 1) exhibit high support values (Shimodaira-Hasegawa test ≥ 0.9; Shimodaira, 2002) and are further corroborated by signature nucleotides (see section "Materials and Methods"). In general, the number of signature nucleotides greatly varies (Supplementary Figure 3). FIGURE 2 | Types of shared Swarm OTUs. The circle sizes in the within-OTU networks correspond to the read numbers of the respective amplicons. The representative amplicons (with highest read number) of the shared OTUs might comprise reads from China and Europe in different proportions. In Type 1 of shared OTUs, the representative amplicon consists of reads only detected in China or in Europe. In Type 2 of shared OTUs, only a minute proportion of reads was detected in the respective other region and is attributed to methodological artifacts (sample bleeding; Mitra et al., 2015). Accordingly, Type 2 of shared OTUs is interpreted as purely consisting of reads from the dominating region. In Type 3 of shared OTUs, similar read numbers from China and Europe constitute the representative amplicon.
Frontiers in Marine Science | www.frontiersin.org FIGURE 3 | Phylogenetic tree of the tintinnid Tintinnopsis baltica (A) and highlighted sub-clade (B) displaying various branching patterns with different types of shared Swarm OTUs. In Type 1, the representative amplicon of the shared OTUs was exclusively detected in one region but is associated to one or more amplicons from the respective other region. In Type 2, the representative amplicon consists of many reads from one region and only one or very few reads from the respective other region, which are interpreted as methodological artifacts (cp. Figure 2). In Type 3, the representative amplicon comprises a considerable number of reads from both regions.
Further geographic patterns emerged, e.g., in Strombidium sp.; yet, they are not supported by signature nucleotides. In the individual alignments, all signature nucleotides are restricted to an otherwise conserved section of the V4 (positions 59-152; Supplementary Figure 2), which comprises the highest read quality scores for each nucleotide position. In the same V4 section, signature nucleotides also occurred in other taxa (e.g., the oligotrichids Strombidium sp., Strombidiida XX sp., and the aloricate choreotrichid Pelagostrobilidium sp.; data not shown). However, they are not signature nucleotides characteristic for either geographic region as the separation of the OTUs is ambiguous; they represent phylogenetic signatures. These findings demonstrate the significance of this particular section of the V4 for ascribing amplicons to a certain geographic region.
In the two trees with nearly complete geographic separation (the oligotrichids Spirotontonia sp. and Strombidium K sp.), the majority of OTUs originate from China. The genealogy of Spirotontonia sp. including reference sequences (Supplementary  Figures 1A,B) shows that a single shared OTU in the European clade (OTU 1377; 472 reads) is identical to one unidentified environmental sequence collected from the East Pacific Rise (KF129779) and nearly identical to another unidentified environmental sequence collected in the South China Sea (KJ760553). On the other hand, the reference sequences of Spirotontonia turbinata (FJ422994) and Spirotontonia taiwanica (FJ715634) described from coastal waters of China and northeastern Taiwan, respectively, group closely together with a few basally branching Chinese OTUs, while the remaining Chinese OTUs are more distantly related to any reference sequence and display an intense radiation. Each reference sequence shares the signature nucleotides of the respective European or Chinese OTU clades they are assigned to. Interestingly, the reference sequence of Spirotontonia grandis from the South China Sea (KU525755) branching between both clades is not closely related to any particular OTU. Further, it shares 13 binary signature nucleotides with the European OTUs and two binary signature nucleotides with the Chinese OTUs.
The PR 2 reference sequences of the oligotrichid "taxa" Strombidium K sp. and Strombidiidae K X sp. are unidentified environmental sequences, but are related with sequences of two species sampled from Chinese coastal waters and deposited as Strombidium triquetrum (KJ609052; Gao et al., 2016) and Strombidium capitatum (KP260510; Song et al., 2015) in GenBank. The sequences of the four "taxa" are aggregated into the "Strombidiidae K cluster" in the PR 2 database. In the phylogenetic tree of the Strombidium K sp. OTUs and reference sequences (Supplementary Figures 1C,D), the S. capitatum sequence is related with the clade of European OTUs, while the sequence of S. triquetrum groups together with two PR 2 sequences in the cluster of Chinese OTUs. Both clades feature a huge genetic diversity as indicated by the numerous branches and partially considerable branch lengths. Signature nucleotides for the Chinese or European clades are shared by the respective reference sequences, except for the sequence of S. capitatum, which shares one binary signature with each of the two clades and two asymmetric signatures with the European clade.
In the phylogenetic tree of Strombidiidae K X sp. (Supplementary Figures 1E,F), two reference sequences FIGURE 4 | Phylogenetic trees based on the representative sequences of the Swarm OTUs. Only trees that display a distinct geographic pattern supported by signature nucleotides are marked by "taxon" names highlighted in black and dotted circles.
from the eastern North Pacific (AF372790 and AF372789) and one sequence from the Mediterranean Sea (HQ394045) fall within the clade of Chinese OTUs, while one reference sequence from the eastern North Pacific (KJ762996) as well as one sequence from the North Atlantic (KC488382) are highly similar to a shared OTU (OTU 409) in the clade of mainly European OTUs. The latter clade comprises most sequences (including a single Chinese OTU) and is clearly separated from the clade of mostly Chinese OTUs by a long bifurcation. Additionally, both clades are characterized by 14 binary signatures.
The phylogenetic tree of Tintinnidium sp. (Supplementary  Figures 1G,H) was complemented by reference sequences of known morphospecies belonging to the family Tintinnidiidae plus relevant sequences from the PR 2 database and an outgroup sequence belonging to the choreotrichid family Strobilidiidae (Rimostrombidium lacustre). The majority of OTUs form a predominantly Chinese and an exclusively European clade. Shared OTUs that group within these two clades contain 99% of reads from the respective geographic clade affiliation (Type 2 cp. Figure 2). Several OTUs branch between the outgroup sequence and the first reference sequences. The representative sequences of these OTUs are highly different from sequences of the Chinese and European clades (e.g., partly containing gap regions), indicating that they do not belong to the Tintinnidiidae at all and were thus excluded from the subsequent analysis of signature nucleotides. The clade of Chinese OTUs also contains eight European OTU sequences. Both the Chinese and European clades are supported by seven binary signatures. Interestingly, all reference sequences of known morphospecies (Antetintinnidium mucicola, Tintinnidium fluviatile, Tintinnidium pusillum, and Tintinnidium balechi) exclusively fall into the Chinese clade. The latter therefore comprises most of the currently known diversity of the Tintinnidiidae. Four unidentified environmental sequences, namely, from the Baltic Sea (FN690031), from two freshwater lakes in France and Germany (EU162620 and HM135052), and from the Columbia River estuary at the north-eastern Pacific coast (KJ925310), fall into a rather basally branching group of the European clade, while the remaining European and related shared OTUs display a broad diversification into numerous subclades.
Those phylogenetically analyzed representative amplicons of Swarm OTUs that displayed clear geographical distribution patterns were subjected to a second-level clustering via sequence similarity networks (SSNs) along with reference sequences of the PR 2 database and closely related sequences of known morphospecies (Figures 5, 6). The cut-off divergence at which the morphospecies separated into different network sequence clusters (NSCs) was determined for elucidating the number of NSCs without reference sequences; these may represent new species or ones lacking genetic data and await further validation, for example, by targeted recovery approaches (Gimmler and Stoeck, 2015). In both the oligotrichid Spirotontonia and the tintinnid Tintinnidium, a threshold of 98.5% sequence similarity was determined for separating the reference sequences of known morphospecies in the SSNs. The Spirotontonia network (Figure 5) contained five NSCs, which comprised 369 Swarm OTUs and 39,113 reads, indicating potential species. The largest of these NSCs was associated with S. turbinata; the smallest was associated with S. grandis. Both of these NSCs as well as two NSCs without any reference sequence predominantly included sequence data from Chinese samples. Only one NSC of a potential species included sequences predominantly from European samples as well as two unidentified environmental sequences.
The Tintinnidium network (Figure 6) contained at least six NSCs that indicated potential species. Four of these NSCs included unidentified congeneric environmental sequences, while two NSCs were not affiliated with any reference sequence. While the largest NSC and one without reference sequences predominantly included sequences from Europe, three smaller NSCs with reference sequences and the other NSC without affiliated reference sequences contained mostly sequences from China. One of the latter NSCs persistently contained reference sequences of the marine T. balechi and the freshwater T. fluviatile above thresholds of 98% because these two reference sequences share 99.5% V4 18S rDNA sequence similarity to another. The largest NSC of the Tintinnidium network consisted of two nearly separated groups of nodes (separation at 98.7% sequence similarity): one group with affiliations to reference sequences (Tintinnidium sp.), the other without affiliations to reference sequences. This pattern might indicate further potential species in the Tintinnidium network in addition to the six potential ones mentioned above. FIGURE 5 | Second-level clustering of the representative amplicons of Swarm OTUs detected in the present study and reference sequences from the PR 2 database assigned to Spirotontonia oligotrichids. At a threshold of 98.5%, the OTUs clustered into five network sequence clusters (NSC), two of which did not contain a reference sequence. Laboea strobila, a supposedly secondarily tail-less tontoniid with a somatic ciliary pattern identical to that in Spirotontonia species, remains separate. The circle sizes correspond to the read numbers of the representative amplicon sequences of the Swarm OTUs. NSCs which indicate potential species are framed by a dashed line. FIGURE 6 | Second-level clustering of the representative amplicons of Swarm OTUs detected in the present study and reference sequences from the PR 2 database assigned to the Tintinnidiidae tintinnids. At a threshold of 98.5%, the OTUs grouped into six larger network sequence clusters (NSCs), two of which did not contain any reference sequences and another two NSCs which contained unidentified congeneric reference sequences (Tintinnidium sp.). The circle sizes correspond to the read numbers of the representative amplicon sequences of the Swarm OTUs. NSCs which indicate potential species are framed by a dashed line.

Within Swarm OTU Assortativity Analyses
For 700 widespread (occurring in Chinese and European coastal waters) and abundant (comprising ≥ 0.0001% of the reads in the dataset) Swarm OTUs, assortativity analyses on the internal network structures were conducted. In 149 of these OTUs, assortativity values that were significantly higher than expected by chance for amplicons detected in Chinese samples were observed. Furthermore, 166 OTUs with assortativity values that were significantly higher than expected by chance for amplicons detected in European samples were found.
In another 211 of the 700 OTUs analyzed, the assortativity values were significantly higher than expected by chance for each geographic region simultaneously. Only patterns within these 211 OTUs are considered in the following and their results are displayed in Figure 7. Besides these significantly higher assortativity values, a disassortative grouping (assortativity ≤ 0) of amplicons from either region was discovered in 12 out of the 211 OTUs, most of which were taxonomically affiliated to the CONthreeP clade. In these OTUs, the slight tendency of amplicons to be connected to amplicons from the respective other region indicates that the effective dispersal between China and Europe does not seem to be strongly limited (Figure 7; lower right quadrant).
There were 49 out of the 211 OTUs in which amplicons detected in China displayed a significant assortative grouping (assortativity > 0), but for which amplicons detected in Europe displayed a disassortative grouping (assortativity ≤ 0). For these OTUs (for an example see Supplementary Figure 4A), the assortativity of the amplicons in the internal OTU structure indicates a main area of distribution regarding genetic diversity in Chinese waters (Figure 7; upper right quadrant). Of those 49 OTUs, 23 were taxonomically affiliated to the Oligotrichida, eight to each the aloricate Choreotrichida and members of the CONthreeP, and four to the Tintinnina; taxonomic groups with less than three affiliated OTUs are not mentioned.
By contrast, there were 73 out of the 211 OTUs in which amplicons detected in Europe displayed a significant assortative grouping, but for which amplicons detected in China displayed a disassortative grouping. For these OTUs (for an example see Supplementary Figure 4B), the assortativity results indicate a main area of distribution regarding genetic diversity in European waters (Figure 7; lower left quadrant). The Oligotrichida were again the most abundant group among them, contributing 26 OTUs. Other abundant groups were the CONthreeP (23 OTUs), aloricate Choreotrichida (six OTUs), and Tintinnina (four OTUs).
Significant assortative groupings of amplicons from both Chinese and European waters occurred in 77 out of the 211 OTUs. Since the amplicons were mostly connected to amplicons detected in the same region, we suppose a limited effective dispersal between the regions for all these OTUs (Figure 7; upper left quadrant). Once more, the Oligotrichida were by far the most abundant group (42 OTUs), followed by the CONthreeP (14 OTUs), by OTUs that could not be assigned further than to the class Spirotrichea (nine OTUs), and by the aloricate Choreotrichida and the Tintinnina (four OTUs each).

Motivations for Coastal Sites and Sampling Strategy
Coasts are the ideal region for combining knowledge on ciliate plankton obtained by various methodological approaches. Apart from their easy accessibility, coastal habitats are characterized by high temporal dynamics and spatial heterogeneity, both of which are assumed to foster speciation (Norris, 2000). Climate variation and sea level changes severely affect the highly productive shallow coastal areas with their brackish water habitats. Sea level changes cause dramatic shifts in the distribution, the dimensions, and the connectivity of these regions (Palumbi, 1994). A drop in sea level would thus restrict the neritic region to a narrow strip along the upper continental slope and thus reduce the habitat for specialized communities of coastal planktonic ciliates. Therefore, such regions are predestined for a bridge building approach like ours that aims to reconcile morphospecies, barcoded reference sequences, and OTUs in investigating geographic distribution patterns and their evolution, focusing on the community of oligotrichean ciliates. A comprehensive body of literature exists on neritic oligotrichean morphospecies (Cariou et al., 1999;Agatha, 2011;Liu et al., 2017;Hu et al., 2019). Furthermore, the majority of barcodes are from neritic taxa (Doherty et al., 2007;Bachy et al., 2012;Zhang et al., 2017;Santoferrara et al., 2018), and metabarcoding data are available (Santoferrara et al., 2014;Massana et al., 2015;Tucker et al., 2017).
Previous metabarcoding studies analyzed the genetic material obtained from entire microzooplankton or ciliate plankton communities and investigated community changes on higher taxonomic ranks (e.g., Massana et al., 2015). In our interdisciplinary multi-species approach, we combined the expertises of molecular ecologists and ciliate taxonomists to focus on the geographic patterns of particular ciliophoran Swarm OTUs and the interpretation of these patterns under evolutionary aspects. Thereby, our sampling strategy stands out from commonly applied approaches. In contrast to previous studies, our dataset is based on a material collection that complements a high temporal resolution (seasonal and daytime shifts) and large geographic distances between samples. Since the sea and the neritic regions are dynamic (Abboud-Abi Saab and Owaygen, 1998;Marteinsson et al., 2016;Huang et al., 2021) and highly structured horizontally and vertically (McManus and Woodson, 2012), environmental patchiness is a question of scale. Additionally, these regions are very heterogeneous for protists effecting together with community assembly processes a patchy distribution highly likely prone to undersampling. Previous data from our sampling regions indicate a pronounced seasonality in the ciliate community compositions (Mozetič et al., 1998;Witek, 1998;Gómez and Gorsky, 2003;Jiang et al., 2013;Yang et al., 2014). Despite the comparatively intensive sampling, however, only a small part of the pronounced seasonality with short abundance peaks and a rapid turnover of the dominating taxa could be caught. Likewise, the limitation to Chinese and European sites prevents an investigation of the entire taxon ranges.

Considerations for Innovative Pattern Inference on Between-and Within-OTU Level
Metabarcoding is a powerful complement to taxonomic surveys, especially when comprehensive and taxonomically validated reference databases enable unequivocal species identification. In this study, we contribute a novel approach combining phylogenetics, signature nucleotide analyses, and network analyses to enable an unprecedented investigation of genetic diversity patterns on multiple scales. Swarm is an ideal de novo clustering method to generate OTUs from metabarcoding data based on single nucleotide differences while retaining the underlying diversity in the amplicon networks. Different timescales can be investigated on between-and within-OTU levels: phylogenetic trees of the representative amplicon sequences (between-OTU level) visualize comparatively older processes (e.g., past effective dispersal and speciation events), whereas the internal network structure of Swarm OTUs (within-OTU level) may display more recent divergences on subspecies level. Albeit we are pretty sure to have ascertained genetic differentiations that evolved on different time scales, the distinction between intra-and interspecific sequence deviations remains blurry (Santoferrara et al., 2020).
In Swarm OTUs, those sequences resulting from natural genetic variation as well as some artificially created ones (see discussion below) group around the amplicon with the highest read number (the 'representative, ' 'seed, ' or 'dominant' amplicon). We interpret the numerous amplicons with much fewer reads (the 'minor' amplicons) than the 'dominant' one as intraspecific V4 variants generated by imperfect concerted evolution of the ribosomal DNA tandem repeats (Ganley and Kobayashi, 2007). In marine planktonic diatoms, Gaonkar et al. (2020) found a general correlation between the number of haplotypes comprising a taxon and the number of reads in its 'dominant' haplotype. Interestingly, the 'minor' amplicons of our dataset frequently displayed some peculiarities, namely, some were exclusively detected either in China or in Europe and thus exhibited a clear affiliation to one geographic region. It is unclear whether they originated from an ongoing splitting process or a recent merging of conspecific populations; yet, we favor a diverging scenario in cases where the dominant amplicon is truly shared. Estimating the extent of this intraspecific variation is crucial for metabarcoding studies, in which the 'minor' amplicons might indicate a considerable rare diversity. In the present study, the number of supposed 'minor' amplicon variants surpasses by far the findings obtained from sequencing up to three conspecific specimens and subsequent cloning steps in most major ciliate groups (Gong et al., 2013;Wang et al., 2017Wang et al., , 2019. We agree with these and previous studies that most genetic diversity originates from the high number of low-divergent sequence variants present in individuals of single ciliate species (Forster et al., 2019;Zhao et al., 2019). OTU clustering by Swarm in combination with second-level clustering turned out to represent an elegant solution to this problem, since the approach allows for inferring species-specific thresholds (Forster et al., 2019;Wang et al., 2019;Zhao et al., 2019).
Nevertheless, not all low-divergent ('minor') amplicons necessarily originate from real biological variation, but may instead reflect artificial sequences caused by methodological errors. A perfect distinction between these two types of sequences is currently impossible, but, of course, important for accurate diversity estimations (Brown et al., 2015). Potential error sources that generate non-biological sequences occur in the PCR amplification process, for instance, by a decrease in polymerase fidelity, thermal damage (Pienaar et al., 2006), or by the formation of chimeric sequences from two or multiple parental sequences (Lahr and Katz, 2009). Artificial sequences may also be generated during high-throughput sequencing (HTS) by erroneous base calls (Bokulich et al., 2013). Another often ignored error source in HTS is sample bleeding, which is caused by incorrect assignment of small numbers of reads to multiplexed samples processed on the same sequencing lane (Mitra et al., 2015). Because of the finescaled resolution in our approach, we could observe events of sample bleeding and attribute Swarm OTUs with more than 99% of reads from one region and less than 1% of reads from the other region to this phenomenon.
To exclude artificially generated sequences from potential HTS error sources, we took several precautions and followed advices for best practices in sequence quality filtering 4 . Thus, except for sample bleeding (which may assign real biological sequences to the wrong sample), we conclude that only a negligible part of the genetic variation in our dataset is of nonbiological origin. As a consequence, we further presume that the effect of artificial sequences on assortativity analyses will be of minor relevance. Since artificially generated sequences are highly unlikely to be generated from scratch, but rather from a dominant parental amplicon, both the artificial and dominant sequence will carry the same geographic information. This scenario is therefore not altering results of Swarm OTUs that consist completely or nearly completely of amplicons from the same region. In Swarm OTUs that include amplicons from different regions, artificial sequences might increase the number of edges between amplicons from the same region and therefore lead to more assortative patterns. Regarding the latter, though, we refer again to our quality filtering strategy, which removes low-quality sequences before clustering in Swarm is conducted. In addition, the patterns observed by within-OTU analyses were further corroborated by between-OTU analyses, which imply that they are not the result of artificially created sequences.
Denoising algorithms are an alternative to our sequence quality filtering strategy and have gained in popularity for analyzing metabarcoding datasets (e.g., Callahan et al., 2017). While denoising metabarcoding data prior to OTU clustering in Swarm is possible, too many of the low-abundant 'minor' sequence variants are aggregated into central consensus sequences or discarded by the denoising algorithms (Antich et al., 2021). The denoised dataset thus contains only few amplicons that can be further clustered by Swarm. Instead, the majority of these denoised amplicons are placed in standalone Swarm OTUs (i.e., singleton OTUs of high read abundance) without any other amplicon. Denoising a dataset before clustering in Swarm is therefore counter-effective for evaluating underlying network topologies of Swarm OTUs and exploring distribution patterns of genetic diversity with our newly introduced strategy.

Geographic Patterns and Diversity at Swarm OTU Level
The vast majority of Swarm OTUs in the present study were detected either in China or in Europe, whereas only a small proportion represents shared OTUs, admittedly, though, with high read numbers. Several phylogenetic relationships between the OTUs confirmed the occurrence of geographic patterns and indicate continuing limitations of effective dispersal. Some of the highly diversified clusters found only in one region branch basally and suggest historic cladogeneses with a high degree of genetic divergence possibly due to large population sizes. The geographic patterns are further corroborated by regionspecific signature nucleotides detected by means of DeSignate (Hütter et al., 2020). The analyses focused on the most abundant ciliates in coastal waters, namely, the Oligotrichea. Four of them displayed geographic patterns, namely, the oligotrichid Spirotontonia sp., two further oligotrichids of the genus Strombidium (Strombidium K sp. and Strombidiidae K X sp.), and the tintinnid Tintinnidium sp.
The genus Spirotontonia Agatha, 2004 currently comprises three species (S. grandis, S. taiwanica, and S. turbinata). Like all Tontoniidae Agatha, 2004, it is characterized by a highly contractile tail. The single somatic kinety forms a sinistral spiral around the cell. The genus Laboea Lohmann, 1908 shares this somatic ciliary pattern and is considered a secondarily tail-less tontoniid based on ribosomal DNA phylogenies (Gao et al., 2009). Based on literature data about morphospecies, the genus is distributed in the Arctic Sea, the Mediterranean Sea, the North Atlantic, the North Pacific, and the Indian Ocean (SA, unpubl. data). Although no Chinese OTU grouped within the clade of European OTUs, the close relation between reference sequences from China and few European OTUs with identical signature nucleotides indicates that similar sequences occur in China, but were not detected in the present study. Yet, Chinese OTUs could not be detected in Europe, considering that the single or few reads from the respective other region in the shared OTUs are methodological artifacts caused by sample bleeding. The Chinese OTUs probably represent more than the currently known species (S. turbinata and S. taiwanica) given the numerous short branches, which suggest a radiation with a high diversification rate. Spirotontonia grandis might be of hybrid origin as its reference sequence shares signature nucleotides with amplicon sequences from both the Chinese and European clades. In the related euplotid ciliates, hybridization for instance caused by an overlap in the pheromone structures and inter-specific mating further increases the complexity of the interpretations (Luporini et al., 2016). The phylogenetic tree features a high diversity beyond reference sequences of known morphospecies, which is corroborated by the second-level clustering proposing at least two new Spirotontonia species in Chinese coastal waters.
The Strombidium K cluster contains numerous reference sequences labeled as Strombidium K sp. and Strombidiidae K X sp., respectively, and two reference sequences of morphologically identified species. Considering the original descriptions, however, the identification of these morphospecies is doubtful, and their sequences probably belong to different species. Strombidium capitatum (Leegaard, 1915) Kahl, 1932 was discovered in the Northeast Atlantic (reported as Laboea capitata). Seravinella pectinata Alekperov and Mamaeva, 1992 is based on a completely deformed and destroyed S. capitatum cell as revealed by the reinvestigation of the type slide. Further reports of the morphospecies come from the Arctic Sea, the North Atlantic, the Mediterranean Sea, the South Atlantic, the North Pacific, the Indian Ocean, and the Antarctic Sea (SA, unpubl. data). The GenBank sequence KP260510 was linked to a specimen identified as Strombidium capitatum sampled from Chinese coastal waters (Song et al., 2015). The gene sequence is accompanied by a short description, line drawings, and micrographs. The line drawings and one micrograph come from a previous study on S. capitatum; the remaining micrographs do not provide many details and primarily depict a conspicuous apical protrusion. Conspecificity of the sequenced specimens with S. capitatum is questioned by a much larger cell size (90-135 µm × 70-100 µm vs. 45-70 µm × 40-60 µm) and much longer extrusomes (22-24 µm vs. about 15 µm; Song et al., 2015). Hence, a cryptic species different from S. capitatum might have been sequenced. Strombidium triquetrum Agatha and Riedel-Lorjé, 1997 was described from the North Sea; further reports of this morphospecies do not exist. Gao et al. (2016) sequenced specimens identified as Strombidium cf. capitatum; the sequences were deposited in GenBank and included in the PR 2 database under the name S. triquetrum. Since the associated stamp-sized micrograph merely displays the anterior cell portion of a protargol-stained specimen, conspecificity is uncertain. The high amount of genetic diversity present in the Strombidium K cluster, which was partly restricted to one region, suggests several potentially new species. Nevertheless, attention should be paid to the reliable identification of species, using original descriptions or authoritative redescriptions to avoid an incorrect linkage of reference sequences and morphospecies.
The family Tintinnidiidae currently comprises the genera Antetintinnidium Ganser and Agatha, 2019, Membranicola Foissner et al., 1999, and Tintinnidium Kent, 1881 with in total 16 species (SA, own data). Considering the genetic divergence between known morphospecies (Antetintinnidium mucicola, T. fluviatile, T. pusillum, and T. balechi), the enormous genetic diversity recognizable in the European clade of the Tintinnidium sp. phylogeny suggests the presence of several as yet unknown species, which very likely differ distinctly in their cell morphologies. Likewise, the second-level clustering of Tintinnidium sp. yielded two network sequence clusters (NSC) with known morphospecies and additionally three clusters of probably novel Tintinnidiidae taxa. In addition, the NSC which contained the most amplicons and reads could only be linked to Tintinnidium sp. reference sequences, suggesting another unknown morphospecies. The relationships in the phylogenetic trees propose several independent introductions of Tintinnidiidae from brackish waters into freshwater (Ganser and Agatha, 2019).
In contrast to the abovementioned taxa, the majority did not display distinct geographic patterns. Exemplary for these taxa, OTU sequences assigned to the tintinnid Tintinnopsis baltica Brandt, 1896 were analyzed. The species had been established based on material from the Baltic Sea. Further reports of the morphospecies are from the Antarctic/Sub-Antarctic Sea, the North Atlantic, the Mediterranean Sea, the Black Sea, the Sea of Azov, the Southwest Atlantic, the North Pacific, the Southeast Pacific, and the Indian Ocean (SA, unpubl. data). Its tree topology might be interpreted as alternating multiple invasions, suggesting that this species belongs to a group of organisms with effective long-distance dispersal. Thus, allopatric speciation is probably a slow process in this tintinnid which might be indicated by the identical V4 regions in T. baltica, T. acuminata, and T. nana.
All phylogenetic analyses corroborate each other and shed light on previously undetected aspects of diversity. On the one hand, novel diversity, i.e., potentially new species or ones for which genetic data are not available, emerged in both the phylogenetic trees and the second-level clustering. On the other hand, geographical distribution patterns of OTUs in Chinese and European coastal waters were revealed by phylogenetic trees and second-level clustering and, in several cases, were further underpinned by signature nucleotides. In the future, 'dominant' amplicon sequences that are not closely related to sequences of a known morphospecies may enable the identification and proper description of new species, following a 'reverse taxonomy' or 'targeted recovery' approach (Markmann and Tautz, 2005;Gimmler and Stoeck, 2015). The sampling site and season at which the amplicon sequence was recorded together with the signature nucleotides detected by DeSignate (see above) further facilitate this aim.

Geographic Patterns and Diversification Within Widespread Swarm OTUs
Magnifying the resolution of sequence analyses from between Swarm OTU-level to within Swarm OTU-level further emphasized the existence of distribution patterns. Compared to results from phylogenetic analyses, which focus on sequences between which a sufficiently high sequence divergence has led to the establishment of genetically distinct units (OTUs), the assortativity analyses explore relatively recent or even ongoing processes (e.g., effective dispersal and gene flow), in which the genetic divergence is momentarily insufficient for establishing distinct OTUs.
Since our network analyses aimed at revealing dispersal limitations, the interpretations focused on those 211 widespread and abundant Swarm OTUs within which amplicons from each region had a significantly higher assortativity value than expected by chance. Using our novel approach, we distinguish three different scenarios by interpreting the assortative and disassortative patterns of amplicons within these shared OTUs. (i) No effective dispersal takes place. The assortative grouping of both the Chinese and the European amplicons indicates genetically different amplicon clusters that presumably experience limited gene flow. With increasing genetic divergence, it may eventuate in an allopatric speciation. One or very few edges (connections) between distinct clusters of amplicons from different regions suggest founder events followed by bottlenecks and genetic drift. This scenario was observed in 77 OTUs. (ii) The main area of distribution is apparently restricted to one particular region. This distribution pattern might be attributed to species sorting, i.e., the presence or absence of an appropriate niche, a rather recent dispersal limitation, or an extinction event without considerable resettlement. This scenario dominated in the present dataset on planktonic ciliates from European and Chinese coastal waters. It was observed in 122 OTUs: 49 OTUs with main area of distribution in China and 73 OTUs with main area of distribution in Europe. (iii) Effective dispersal takes place. The disassortative grouping of amplicons in the OTUs does not follow a clear geographic pattern owing to the lack of a strong genetic separation between Chinese and European amplicons. Possibly because the ciliates are capable of effective dispersal or because their evolution rates are so low that they did not genetically diverge after spatial separation. This scenario was rarely found owing to our selection of OTUs (see section "Materials and Methods"). Although the patterns perceived are highly interesting, they necessitate further interpretations by intertwining them with future data, for instance, on the species' autecology (see section "Outlook"). By demonstrating geographic patterns on within-OTU level, we contradict Caron and Hu (2019) in that the knowledge far beyond species-level distinctions has no practical value.

Potential Causes for Geographic Patterns and Diversification
The "everything is everywhere, but the environment selects" tenet states that protist taxa are dispersed globally, but form populations only where the environmental conditions (abiotic and biotic) are suitable for them; otherwise, they might be present merely as resting cysts (see below). Since the autecology (including dispersal rates) and thus the niche parameters are unknown for the majority of Oligotrichea (Maselli et al., 2020), it is impossible to tell apart ecological and historical effects that cause distribution patterns (Bass and Boenigk, 2011).
Studies on the effective dispersal capabilities of marine planktonic ciliates are currently missing. The sparse evidence are records of identical marker gene sequences at the east and west coasts of the North Atlantic plus adjacent sea regions and in the Northwest Atlantic and Northeast Pacific (see section "Introduction"). For pelagic cyanobacteria, copepods, and foraminifera (Darling et al., 1999(Darling et al., , 2000de Vargas et al., 1999), genetic data indicate a strong effective dispersal across ocean basins as well as between them (Norris, 2000). Interestingly, almost identical gene sequences (three base pairs difference in the 18S) of planktonic foraminifera were detected in the Northeast Pacific and Mediterranean Sea, and a transport from east to west is supposed owing to the prevailing surface currents (Darling et al., 1999). These data imply that many foraminiferan genotypes have either evolved only slightly after their spatial separation or they are still capable of an effective dispersal. Molecular data on bipolar planktonic foraminifera and benthic ciliates as well as morphologic data on a planktonic tintinnid (Acanthostomella norvegica) suggest a trans-tropical effective dispersal (Darling et al., 2000;Di Giuseppe et al., 2013;Dolan and Pierce, 2013).
The metacommunity structure of small organisms with relatively high dispersal rates is probably dominated by species sorting and mass effects rather than by dispersal limitation and patch dynamics (De Meester, 2011). Likewise, Norris (2000) concluded that ranges of many pelagic groups are much more limited by their ability to maintain viable populations than by any inability to disperse across tectonic and hydrographic barriers. Oceanographic and climatic changes may create new habitats with new niches instead of preventing dispersal. Such shifting selective regimes have been found to generate large genetic differences very quickly, even between large populations that are not completely isolated (Palumbi, 1994). Three scenarios of species-niche interactions were discussed (Weiner et al., 2014). (i) Dispersal is limited. Bottlenecks, genetic drift, and abiotic boundaries may cause in concert the differentiation of allopatric sister lineages. (ii) Dispersal is rather unlimited and species interaction are negligible; the distribution of species thus probably reflects the spatial realization of appropriate niches. (iii) Species interactions are considerable; competitive exclusion hence influences the pattern of niche occupation. The present heterogeneous data on marine planktonic ciliates from Chinese and European coastal waters propose that all three scenarios are realized.
Dispersal limitation can, however, not be concluded from the absence of an active ciliate species at a particular site, i.e., a more intensive sampling might prove at least some of the Oligotrichea to be more widely dispersed and distributed, resulting in an increased number of shared amplicons in the Swarm OTUs. In non-random processes, planktonic Oligotrichea might be dispersed as active states by water currents over large areas or possibly even globally; but, the timespans are long compared to their generation times, questioning whether it is possible for a species to be cosmopolitan. In response to external triggers, many protists are, however, capable to form more or less resistant resting cysts (Foissner, 2011). Actually, some oligotricheans are known to generate at least occasionally resting cysts, and several similar shaped, but unidentified forms had been described (Meunier, 1910;Kim et al., 2002;Rubino et al., 2002;Agatha et al., 2005). Yet, the viability of the cysts and hence the ability to excyst decreases with time (Kim and Taniguchi, 1997). The factors controlling encystment, excystment, and viability of cysts may differ among closely related protist species and probably even between clones of the same species (Weisse, 2008). Since cysts also seem to sediment rather rapidly (Kim et al., 2008), they are very rarely found in plankton samples collected in the upper 2 m of the water column (SA, own observ.). Therefore, long-distance dispersal of suspended cysts and a subsequent population growth are probably rare events for oceanic taxa (Agatha, 2011), while cysts might easily be resuspended in shallow coastal waters during storm events. In the present study, the probability is low that considerable quantities of vital resting cysts were among the sequenced material, which was confirmed by microscopic inspection of some samples.
In the phylogenies based on our data, we detected comparatively deep and more recent splits; in addition, the assortativity values of some shared OTUs indicated ongoing divergence, possibly speciation. The origin of the Choreotrichida and Oligotrichida took place ca. 462 million years ago (mya) after an extreme boost on speciation rates in the stem branch linking these taxa to the hypotrichs (Fernandes and Schrago, 2019). Based on the ribosomal data set comprising the 18S, 5.8S, and 28S rDNA, the divergence time of the oligotrichids was calculated to 199-443 mya and that of the choreotrichids to 256-447 mya. A significant diversification rate shift commenced at the stem branch leading to the Perilemmaphora (= hypotrichs + oligotrichids + choreotrichids) at the Ediacaran period (about 550-650 mya) with the development of numerous new lineages. While the mean rate of nucleotide substitutions per 100 million years is rather low (µ = 0.0185) for the ciliate ribosomal dataset, the substitution and speciation rates are markedly higher in the Perilemmaphora with on average 0.779 (0.635-0.956) species per million years (my). Their extinction rates were distinctly (5×) lower compared to the speciation rates and have remained nearly constant throughout time (0.011-0.396 species/my). At that time, the climate and the water currents were totally different from those prevailing today. Generally, the combination of high speciation rates with cladogenesis and low extinction rates generates an accumulation of lineages with long branches. However, the ribosomal time-calibrated tree of Fernandes and Schrago (2019) demonstrated an exceptional pattern in the cluster of hypotrichs, oligotrichids, and choreotrichids with several radiation events concentrated at the tips of the tree. The cladogenetic success of the Perilemmaphora is thus probably most likely a consilience of accelerated speciation rates and lineage accumulation. The combination of sexual reproduction and cell divisions might strongly promote the generation of local adaptations according to De Meester et al. (2002). The first reliable tintinnid fossils are from the Jurassic. Since they seem to be fully developed, the tintinnids apparently have an even longer history (Lipps et al., 2013), matching the abovementioned calculations by Fernandes and Schrago (2019).
In contrast to the conception that long-distance dispersal mitigates speciation by swamping the gene pool of the resident population, Norris (2000) suggests that long-distance dispersal may actually promote evolution by regularly carrying variants of a species across major oceanic fronts and thus exposing them to very different selection pressures than those occurring in their home range. Accordingly, large populations may accumulate mutations beneficial in particular environments in which natural selection and further adaptation take place, resulting in a differentiation from the parental population and finally a speciation (Shapiro et al., 2016). The experimental evolution of dispersal and movement in the freshwater ciliate Tetrahymena illustrated an increase in individual movement velocity and overall dispersal rates of populations at the margins of their distribution range over time (Fronhofer and Altermatt, 2015). Soft hydrographic barriers might cause populations to experience selection sufficiently strong on either side that speciation may take place, although they do not inhibit dispersal completely (Martín et al., 2020). Actually, individual populations might be more isolated than currently accepted as patterns of dispersal and connectivity could be highly structured (McManus and Woodson, 2012). Even sympatric speciation is possibly more widespread than previously assumed and is facilitated by environmental gradients of intermediate slope (Dieckmann and Doebeli, 1999;Doebeli and Dieckmann, 2003). An isolation by ecology, namely, regarding the depth or season of reproduction, had been described in planktonic foraminifera. In euplotid and hypotrich ciliates, any mutations in the coding sequence of a pheromone results in a new pheromone and a new pheromone receptor and thus in a new mating type (Luporini et al., 2016). Likewise, shifts in the seasonal occurrence or isolation in confined coastal areas may have been involved in the observed rapid speciation of the Oligotrichea.
Studies on the marine biogeography date back to more than 150 years and described a close linkage between faunistic and hydrogeographic features of the major oceanic water masses, resulting in a circumglobal warm water region with subtropical and tropical provinces and Northern and Southern cold water regions with polar and subpolar provinces (Zeitzschel, 1990). Van der Spoel and Heyman (1983) defined planktonic faunal centers with taxa sharing the same ecological preferences or ancestral forms, which simultaneously represent the main spots of speciation and dispersal. Many studies based on morphospecies discovered rather similar patterns in the distribution of planktonic organisms (Zeitzschel, 1990). In contrast, the Longhurst (2007) ecological provinces are primarily based on the primary production and frontal systems. Likewise, the preliminary metabarcoding studies revealed geographic patterns on community level. Yet, the rationales differ: some publications ascribe the observed patterns to abundance-dependent dispersal, while others regard species sorting, community interactions, or dispersal limitations as crucial. The present findings covering data from three Longhurst provinces suggest that marine planktonic ciliates, specifically, the Oligotrichea, might display similar geographic zonations as other planktonic organisms.
With the present study, a new perspective to this discussion is added by analyzing distribution patterns on within-OTU level down to single nucleotide differences. The impressive number of Swarm OTUs with records restricted to China or Europe and the geographical distribution patterns of sequences within OTUs revealed by assortativity analyses suggest an inverted scenario of the moderate endemicity model concerning the distribution of planktonic ciliates in coastal waters. This implies that the majority of marine planktonic ciliates has a restricted distribution, while only few have a wide, potentially cosmopolitan distribution.

Outlook
While we discovered a considerable variation in geographic distribution patterns of planktonic ciliates in Chinese and European coastal waters, the causes remain unknown. It is crucial to bring life-history traits into play, which are highly likely linked to (effective) dispersal. Much more experiments have to be conducted for comprehending the dispersal of active and resting stages, the formation of resting cysts and their viability, as well as the autecology regarding abiotic factors, food items and preferences, functional, numerical and thermal traits. For addressing these topics, we have to go beyond the simple assumption that geographical patterns are size-dependent (Jenkins et al., 2011). Nevertheless, our novel approach can uncover variational patterns at a resolution up to single nucleotide differences and is applicable to investigate any group of interest.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
MG, DF, and SA conceived the study and wrote the manuscript. MG, SA, and XL selected the sampling sites. WL performed the DNA extraction. MG and DF analyzed the data. WL, XL, and TS helped to improve the manuscript. All authors approved the final version of the manuscript.

FUNDING
This study was financially supported by the FWF project I3268 given to SA and the NSFC project 31761133001 given to XL. DF was supported by a postdoctoral research grant of the Carl Zeiss Foundation.  (Hütter et al., 2020) in the Swarm OTU sequence alignments of the four taxa displaying distinct geographic patterns.

ACKNOWLEDGMENTS
Supplementary Figure 3 | Signature nucleotides identified by DeSignate (Hütter et al., 2020). The upper line displays the V4 sequence with the position of the signature nucleotides. The middle and lower left portions demonstrate the entropy plot (blue line) of the alignment shown below. The upper portion of the alignment comprises the query group sequences, the lower portion the reference group sequences. In the lower right portion, the alignment positions are ranked according to their relevance. Green, binary signatures; orange, asymmetric signatures; gray, noisy characters.
Supplementary Figure 4 | Examples of Swarm OTU networks. The networks display the internal structure of Swarm OTUs, in which nodes represent amplicons and edges represent one nucleotide differences between the amplicons. The node size was scaled to the read abundance comprised within an amplicon. (A) Swarm OTU with main area of distribution in Chinese coastal waters. (B) Swarm OTU with main area of distribution in European coastal waters.  (1); signature nucleotides detected in the four taxa with distinct geographic patterns (2); list of reference sequences obtained from GenBank and the PR 2 database (3).