Cryptic Diversity and Database Errors Challenge Non-indigenous Species Surveys: An Illustration With Botrylloides spp. in the English Channel and Mediterranean Sea

complicate taxonomic assignment. We illustrate these two issues in the ascidian genus Botrylloides , in which at least three global marine invaders have been recognized, including B. violaceus and B. diegensis . We obtained COI sequences from > 750 colonies of Botrylloides spp. sampled in W Europe or provided by expert colleagues from other regions. Phylogenetic trees clearly distinguished our targeted taxa [i.e., B. violaceus , B. diegensis and B. leachii (native)]. They also revealed another discrete lineage apparently related to a recently described eastern Mediterranean species. By examining public databases, we found sequences of B. diegensis erroneously assigned to B. leachii . This observation has major implications as the introduced B. diegensis can be misidentiﬁed as a putatively native species. We also checked published sequences of the genus Botrylloides in the Mediterranean Sea, complemented with new samples. Based on our custom reference database, all published sequences of B. leachii corresponded to B. diegensis , although this NIS has hardly been reported at all in the Mediterranean region. Such database errors are unfortunate, as the barcoding approach is a powerful tool to identify the recognized Botrylloides species currently present in European seas. This is of particular importance because a trait often used during ﬁeld assessment, i.e., single-color vs. two-color colonies, is misleading to distinguish B. violaceus and B. diegensis respectively: a substantial proportion of the single-color morph are actually B. diegensis in both the Mediterranean Sea and the English Channel. Altogether, this study exempliﬁes the advantages and disadvantages of molecular barcoding in NIS surveys and studies. The limitations that were identiﬁed are all easy to resolve once proper vouchers and collections are set up.


INTRODUCTION
In marine environments, biological introductions exert important pressure on natural ecosystems. Introduction of non-indigenous species (NIS) has occurred at an increasing rate since the 20th century, in pace with the increasing range and intensity of vectors (Nunes et al., 2014). Because successful eradications of invasive species in marine systems are very rare (Sambrook et al., 2014;Ojaveer et al., 2015), both pre-border prevention measures (e.g., limiting the introduction of NIS by ballast waters) and early detection focused on introduction hotspots (e.g., marinas and harbors) have been recommended (Ojaveer et al., 2014). Accurate identification is a pre-requisite for the implementation of policies and regulations such as the Marine Framework Strategy Directive (MSFD), in which Descriptor 2 is dedicated to NIS, with one criterion involving the detection and identification of new NIS, and another concerning the changing spatial extent of already-reported NIS.
However, identifying new or recently introduced NIS can be challenging when based only on traditional methods (Darling et al., 2017), such as rapid assessment surveys (RAS), which rely heavily on field recognition of species and have been commonly used to survey NIS in marinas and harbors (e.g., Cohen et al., 2005;Arenas et al., 2006;Campbell et al., 2007;Bishop et al., 2015a;Lehtiniemi et al., 2015). Many marine animal NIS in introduction hotspots belong to taxonomic groups, such as bryozoans, hydrozoans and tunicates, that require substantial taxonomic expertise. In addition, the development of molecular studies in the last three decades has revealed a large number of cryptic species (i.e., species that are not distinguishable based on morphological traits) in these taxonomic groups (Knowlton, 2000;Appeltans et al., 2012). Consequently, an increasing number of studies have recommended the use of molecular tools to complement the traditional methods, and thus achieve reliable taxonomic identification of marine NIS (Comtet et al., 2015;Darling et al., 2017;Dias et al., 2017).
The usefulness of the molecular barcoding approach is well established. The benefits of the approach have been demonstrated by numerous studies (for a review, see Comtet et al., 2015). Such an approach can be used to ascertain the presence of a new NIS (e.g., Asterocarpa humilis: Bishop et al., 2013), to reveal false morphology-based NIS identification (e.g., Crepidula spp.: McGlashan et al., 2008) or to determine the taxonomic status of previously unrecognized NIS (Ordóñez et al., 2016). However, the robustness of the molecular barcoding approach relies on some important pre-requisites, notably the availability of reliable reference sequence data for the targeted species. Thus, the prompt updating of databases is essential following taxonomic revision and discovery of unrecognized cryptic species. The sequences delivered in public databases such as GenBank (Benson et al., 2013) or BOLD [Barcode of Life Data System; (Ratnasingham and Hebert, 2007)] should thus be error-free and validated by taxonomic experts. BOLD is expected to provide such a framework. However, the public portal also retrieves non-curated sequences from Genbank that can propagate errors. There is so far a limited number of initiatives specifically aimed at delivering reference data for marine NIS. Although rare, such initiatives should be strongly encouraged, notably to support international regulations and policies such as the MSFD in Europe (Darling et al., 2017). Such an initiative has recently been undertaken in Western Australia, where a reference collection (vouchers, DNA and sequences for one marker (COI)) was assembled for 75 species (out of 79) of the "Western Australian Prevention List for Introduced Marine Pests" (Dias et al., 2017). Similarly, facing the issue of cryptic diversity, integrative taxonomy coupling molecular phylogenetics with morphology and other species attributes (e.g., ecology, life-history traits) is needed (Pante et al., 2015).
We here illustrate the issues described above using the colonial tunicate genus Botrylloides as a particularly relevant case study. At least two Botrylloides species, B. violaceus Oka, 1927 andB. diegensis Ritter andForsyth, 1927, with the recent addition of B. giganteus (Pérés, 1949) (Rocha et al., 2019) are globally invasive. B. violaceus and B. diegensis are both native to the N Pacific. While the former is native to the NW Pacific, there is more uncertainty regarding the native range of the latter: although B. diegensis was originally described from the NE Pacific (southern California), it is likely that the species had been introduced from the western or southern Pacific (Carlton, 2009). The two species are important members of the fouling community colonizing artificial substrates on the Pacific coast of the United States, for instance in harbors and marinas (e.g., Nydam and Stachowicz, 2007). The same two species have also been introduced in Europe, notably in the English Channel (EC), where they are well established in marinas of the United Kingdom and N France (Bishop et al., 2015a,b). In Europe, one putatively native species, B. leachii (Savigny, 1816), is also recognized, often showing coloration somewhat similar to the two-tone color pattern seen in B. diegensis (Supplementary Figure S3, panel 6). The original description (Savigny, 1816) of B. leachii was based on material "Communiqué par M[onsieur]. Leach, directeur du Musée britannique" (i.e., William Elford Leach, then working in the Department of Natural History of the British Museum) and the specimen was thought by Savigny to have originated probably from the English coast ("Habite les côtes de l'Angleterre?"). Conversely to the introduced species, the native taxon seems uncommon in artificial habitats in some parts of the EC (J. Bishop, L. Lévêque, and F. Viard, pers. obs.).
The first reports of non-indigenous Botrylloides spp. in the English Channel were made after RAS in 2004 (Arenas et al., 2006). A single species, B. violaceus, was reported, recognized by its single-colored colonies and large and morphologically distinctive brooded larvae. Colonies of B. violaceus could occur in a range of single colors, including violet, cream, yellow (Supplementary Figure S3, panel 4), brick-red and (commonly) orange. A second species was encountered during the 2004 surveys, but was not reported by Arenas et al. (2006) since its identity was uncertain (Bishop et al., 2015a,b). This second species showed a strong and distinctive two-colored pattern featuring a broad ring of solid pigmentation (commonly orange or cream) surrounding each buccal orifice, against a contrasting darker background (Supplementary Figure S3, panel 2), and was subsequently identified as B. diegensis. This distinction, based on color morphs, was used by Lambert and Lambert (2003) to distinguish B. violaceus and B. diegensis in southern California, United States. In 2011, the present authors and colleagues collected extensive samples on both sides of the English Channel for an intended population-genetic study of B. violaceus; singlecolor colonies were thus collected, with or without confirmation of the distinctive but seasonally brooded larvae of that species, under the assumption that these would all be the target species. Preliminary molecular analyses revealed that these single-color colonies included specimens attributable to B. diegensis, so that B. diegensis had been misidentified in the field as B. violaceus on the criterion of possessing single-colored, rather than twocolored, colonies. B. diegensis in the English Channel could thus occur in a single-color morphotype (e.g., Supplementary Figure S3, panel 1) easily confused with B. violaceus, even by those supposedly familiar with the taxa. In addition, during our surveys some color morphs believed to belong to B. leachii were encountered that had a two-color pattern similar to the two-color pattern of B. diegensis, although differing in detail (Supplementary Figure S3, panel 6).
Following from these preliminary observations and molecular results, in this study we examined a large number of Botrylloides colonies displaying a single-color pattern, using sequencing data to make an in-depth assessment of the association between color-morph and species. We chose to analyze a fragment of the mitochondrial gene region cytochrome oxidase I (COI) which has been previously investigated to study B. violaceus in its N American introduced range Lejeusne et al., 2011), besides being a marker commonly used in barcoding studies of marine invertebrates (Comtet et al., 2015) and a standard for BOLD (Hebert et al., 2016). This dataset was completed with data from specimens kindly provided by taxonomic experts (see acknowledgment section); this allowed us to get reliable reference sequences for the three Botrylloides To this end, we used all existing Mediterranean sequences of the genus in GenBank, complemented with new samples, for comparison with our reference sequences. The objectives of our work were four-fold: (i) ascertaining the distribution of the two NIS in the study EC area, (ii) providing further molecular reference data, specific to European seas, (iii) assessing if field identification, based on the typical single-color pattern of B. violaceus, is truly reliable, and (iv) updating Mediterranean records of Botrylloides spp. backed by sequence data.

Specimen Collection
Colonies (N = 627) displaying a single color, as typically described for B. violaceus, were randomly sampled along floating pontoons, from April to September 2011, in 20 marinas along the southern coastline of England from Plymouth to Brighton (9 marinas) and along the coastlines of Brittany from Quiberon to Saint-Malo (11 marinas) (Table 1 and Figure 1). RAS done in these same locations have shown the presence of non-indigenous Botrylloides spp. (Bishop et al., 2015a;FV and L. Lévêque, unpublished data). All the sampling was undertaken by SCUBA diving or from the surface by removing colonies from the pilings and pontoons in marinas. Samples were preserved in 96% ethanol prior to genetic analysis.
A reference collection of 128 specimens was gathered for analysis (Supplementary Table S1). It included a total of 84 specimens of B. diegensis, of which 39 were kindly provided by expert colleagues from the W coast of United States, Italy and New Zealand. The remainder were sampled in four marinas in Brittany. In every case, these colonies displayed the typical two-color pattern of B. diegensis. We also included 17 reference specimens for B. leachii from United Kingdom, Ireland and France. These latter specimens mostly displayed the color pattern of Botrylloides radiata Alder and Hancock, 1848, as illustrated by Alder and Hancock (1912), pl. 64, Figures 10 and 11), a form synonymized with B. leachii by Berrill (1950) and Millar (1970) (Supplementary Figure S3, panels 5 and 6). These specimens will be hereafter referred to as the 'radiata' morph of B. leachii. Nine typical violet B. violaceus colonies, also displaying the distinctive large larvae of this species, were obtained in two French localities. Finally, to be used as an outgroup, 18 specimens of Botryllus schlosseri were collected in United Kingdom and France. All these samples were preserved in 96% ethanol after their collection.
In order to update Mediterranean records, a third dataset was constructed, using samples obtained from the NE of the Iberian Peninsula (

Mitochondrial DNA Amplifications and Sequencing
Genomic DNA was extracted from a single zooid (dissected under the microscope), using the Nucleospin 96 Tissue Kit (Macherey-Nagel, Düren, Germany) following the manufacturer's protocol and using a final elution volume of 80 µl.
A fragment of the cytochrome c oxidase subunit I (COI) gene was amplified using the LCO1490 and HCO2198 primers (Folmer et al., 1994). Amplifications were carried out in a 30 µl reaction volume with 4 µl of genomic DNA, 1X PCR buffer (Thermoprime AbGene), 2 mM of MgCl 2 (Thermoprime AbGene), 0.05 mM of each dNTP, 0.4 µM of each primer and 0.33 U of Taq DNA polymerase (Thermoprime AbGene). PCR was performed following the protocol of Lejeusne et al. (2011): initial denaturing step at 94 • C for 3 min, followed by 5 amplification cycles (94 • C for 50 s, 45 • C for 50 s, 72 • C for 60 s), 30 further cycles (94 • C for 50 s, 50 • C for 50 s, 72 • C for 60 s), and a final  (27), Bd-H2 (2) Population numbers refer to those indicated in Figure 1. Accession numbers for the COI haplotypes are provided in Supplementary Table S3 as Supplementary Material. 1 The Bloscon marina was under construction at the collection date (i.e., no floating pontoons; colonies were sampled on pilings). elongation step at 72 • C for 5 min. Sequencing reactions were performed at the LGC Genomics platform (Berlin, Germany) on purified (ExoSAP R -it) PCR products using the reverse primer (HCO2198). Mediterranean amplicons were obtained with the same primers and protocols, and sequenced at Macrogen Company (Korea).

Mitochondrial DNA Analyses
The sequences were edited using CodonCode Aligner v. 3.7.1 (CodonCode Corporation, Dedham, MA). They were aligned using BioEdit v.7.1.981 33 (Hall, 1999). After alignment, a 455 base-pair fragment was retained, further reduced to 435 base pairs for the EC dataset and to 395 base pairs for the Mediterranean dataset, for comparison with sequences obtained for our reference samples or available in public databases. DnaSP version 5 (Rozas et al., 2003) was used to compute the number and frequency of haplotypes (i.e., unique sequences) per taxon or locality and estimate polymorphism. The number of base substitutions per site across sequences was computed with MEGA7 (Tamura et al., 2011), using the Maximum Composite Likelihood model (Tamura et al., 2004) with positions with less than 95% site coverage eliminated. Phylogenetic trees were built with MEGA7 (Tamura et al., 2011) using 32 unique sequences for the analyses of the EC dataset and 54 sequences with the Mediterranean dataset (details in Results). Goodness of fit with various evolution models was first tested: based on both BIC criteria and AICc value, the best model explaining the data was the Hasegawa-Kishino-Yano model with 54-61% of the sites evolutionarily invariable, according to the trees. Rooted phylogenetic trees were constructed using a maximum likelihood method with heuristic search (Subtree-Pruning-Regrafting method). To assess the reliability of the inferred trees, bootstrap tests were carried out (1000 bootstraps). Note that similar topologies were obtained using other tree construction methods, in particular a simple distance-based method (i.e., Neighbor-joining).
To start investigating the status of a new reported lineage (see results), GMYC (Pons et al., 2006) and mPTP (Kapli et al., 2017) species delimitation analyses were carried out on the EC and Mediterranean unique sequences of Botrylloides spp. These two approaches differ in their properties (e.g., GYMC requires ultrametric trees), and one may be more confident in the inferred species delimitation when similar results are obtained with both. GMYC analysis was carried out with the R library splits, using an ultrametric tree built using the Beast2 software (Bouckaert et al., 2019), with a Yule tree prior and site model as obtained with MEGA analysis (see above). The mPTP analysis was carried out with the web-service available at http://mptp.h-its.org, with the tree produced with MEGA (note that the same results were obtained with the tree produced by the Beast2 Bayesian phylogenetic inference).
FIGURE 1 | Study area and sampling localities of the 627 single-color colonies sampled in the English Channel and morphologically assigned to B. violaceus, with pie charts indicating the percentage of the three Botrylloides taxa, after being identified with molecular barcoding (see Table 1). Numbers refer to population name as indicated in Table 1. The source for the European map in the inset is ©Terre Ouverte.
was very high with 122 sites showing substitutions, for a total number of 140 mutations. In addition, although these colonies were all assigned to B. violaceus based on their color, evolutionary divergence over all sequence pairs was substantial, with a number of base substitutions per site of 0.175 over the 12 sequence pairs. Marked variations were observed among pairwise sequence divergence estimates, ranging from 0.002 to 0.301. On the other hand, the analysis of the 127 reference samples including the three Botrylloides spp. putatively present and Botryllus schlosseri (outgroup) yielded 20 mitochondrial haplotypes (Supplementary Table S1).
A phylogenetic tree was built to map the 12 haplotypes obtained over the 627 colonies with the 20 sequences obtained for the 127 reference samples (Figure 2). This tree displayed three monophyletic groups associated with 1) B. diegensis, 2) B. leachii ('radiata' morph) and 3) B. violaceus reference sequences. It showed that 11 haplotypes obtained from the colonies sampled actually belonged to two groups, namely those clustering with B. violaceus and B. diegensis references. As these colonies were all originally thought to be B. violaceus, based on the single-color pattern, this result thus clearly indicates mis-identification during collection in the field. No sequence clustered with our B. leachii references. Among the 11 haplotypes, seven were 100% identical to one of the 20 reference sequences: four (out of six assigned to the B. diegensis clade) were identical to B. diegensis references One haplotype ('BvX-H6.EC sampling' in Figure 2) clustered with the five B. violaceus haplotypes but the topology was poorly supported by bootstrap values. In addition, the evolutionary divergence of this haplotype (hereafter named BvX-H6) from the other haplotypes of B. violaceus was particularly high (19.6%) ( Table 2). This strong divergence is clearly reflected in the mismatch distribution shown in Supplementary Figure S1 provided as Supplementary Material. When removing this haplotype from the analyses, the three targeted species (i.e., B. diegensis, B. violaceus and B. leachii) were characterized by a FIGURE 2 | Molecular phylogenetic tree based on 30 Botrylloides spp. and Botryllus schlosseri haplotypes. The tree was built by the Maximum Likelihood Method based on the Hasegawa-Kishino-Yano model with 61% of sites evolutionarily invariable. It is made of 12 haplotypes obtained from 627 single-color Botrylloides spp. colonies sampled in the English Channel (solid circles and diamond, name ending with "EC sampling") and 20 haplotypes from reference samples (open triangles; name ending with 'Ref') from Botrylloides spp. and Botryllus schlosseri colonies (see text and Table 1 and Supplementary Table S1 for details). A total of 435 positions were analyzed. The tree is drawn to scale, with branch lengths measured as number of substitutions per site indicated in italics below the branch. Numbers in bold on nodes indicate percent bootstrap support values (1000 bootstraps). Values on the diagonal are comparing haplotype within group, with distance (computed over 434 base pairs) and absolute number separated by a slash. For comparison between groups, distance and absolute number are indicated below and above the diagonal, respectively.
low level of within-species divergence (0.5 to 1.29%) and much higher level (15.5-19.6%) of divergence among them (

Mis-Identification Rates and Haplotype Distribution Across Populations
Based on the results of the phylogenetic analysis, each of the 627 colonies could be assigned to either B. diegensis, B. violaceus or the divergent BvX-H6 lineage. Altogether 373 (59%) of the 627 single-color colonies sampled in the United Kingdom and France as B. violaceus in 2011 were unambiguously assigned to B. diegensis on molecular criteria, as shown above. We observed large variations across regions and populations (Figure 1), with mis-identification ranging from 0 to 100%, and a mean misidentification rate of 53% per population. The mis-identification was not associated with a particular haplotype, with six haplotypes found among these B. diegensis single-color colonies. The number of reference specimens is not large enough to encompass the whole diversity of B. diegensis but it is noteworthy that haplotypes Bd-H1, Bd-H2, Bd-H3 and Bd-H6 characterizing our B. diegensis two-color reference samples (Supplementary  Table S1), were also found in single-color colonies of B. diegensis mis-identified as B. violaceus (Table 1).

Comparison With Sequences in Public Databases
We completed our analysis with a species identification search on the BOLD portal, using as a query each of the distinct haplotypes obtained in this study. The five B. violaceus haplotypes were found to match with reference sequences in the BOLD system (Supplementary Table S3): they were assigned with high similarity (97.59-97.62%) to 'Botrylloides violaceum' (note that this neuter form of the specific name is no longer accepted according to WoRMS, being replaced by the masculine form used here; see Ryland, 2015). These haplotypes (Bv-H1 to Bv-H5) matched with four haplotypes (named as Bv2, Bv8, Bv9, and Bv11) obtained as part of two population genetics studies of B. violaceus in N. America Lejeusne et al., 2011). The haplotype BvX-H6 did not match any sequences in BOLD, and the best match obtained using a BLAST search on the NCBI portal was with Botrylloides israeliense (Reem et al., 2018) but with only 93.27% similarity (for a 91% query cover over 461 bp), which indicates that the assignment is not robust.
None of the B. diegensis haplotypes matched with any sequences registered under this name in BOLD (Supplementary Table S3): they were all assigned at 99.76-100% similarity to references registered under the name B. leachii (or the incorrect variant spelling B. leachi) with different published (Griggio et al., 2014;López-Legentil et al., 2015) or unpublished sources. It is noteworthy that one of these sources (Griggio et al., 2014) is a full mitochondrial genome sequence.
The five haplotypes of the 'radiata' morph of B. leachii did not match with any references registered in BOLD (Supplementary Table S4). When searching in the NCBI portal, the best match was with a sequence (acc. no. KY235402) provided under the name B. leachii, but with an extremely low level of similarity (85.15-85.68%), thus well below the threshold used for taxonomic assignment. Interestingly this sequence matches, with high similarities (99.76-100%), our B. diegensis haplotypes in BOLD (not shown). Thus, we can conclude that there are no reference sequences corresponding to our specimens of the 'radiata' morph of B. leachii either in BOLD or NCBI.

Comparison With Mediterranean Sequences
A total of 22 sequences were retrieved from GenBank of Mediterranean Botrylloides (six of them from whole mitochondrial mtDNA data), and 17 new sequences were obtained from the samples collected in Spain, France and Italy (Supplementary Table S2). These samples were mapped onto a phylogenetic tree with the 25 haplotypes obtained with the EC dataset (EC sampling and reference sequences; Supplementary Tables S3, S4).
The results (Figure 3) showed that all sequences in GenBank previously assigned to B. leachii clustered unambiguously with B. diegensis. The only previous B. violaceus sequence (i.e., complete mitochondrial genome HF548552; Griggio et al., 2014) grouped with our reference sequences of B. violaceus. B. pizoni branched within the [B. diegensis -B. leachii] clade, with a higher similarity to our B. leachii references, whereas B. niger branched at the root of this clade. In both cases, very low bootstrap values indicated that the topology was poorly supported. The tree, however, indicates that they are distinct from the B. violaceus clade.
As for the newly generated sequences, two colonies from Venice (with a clear two-color pattern), all colonies from Sète (with a clear two-color pattern), and one from the Ebro Delta (with a uniform reddish color) grouped with B. diegensis. Another three colonies from Venice (of uniform orange or red coloration) were in the B. violaceus clade. The Botrylloides colony from Miseno Lagoon (NAP1 in Figure 3) clustered with B. israeliense and somewhat less closely with the EC haplotype BvX-H6.
Using GMYC, nine entities were identified among the Botrylloides sequences (Supplementary Table S5), with BvX-H6 assigned to a different entity ('species') than B. israeliense plus the sample collected in NAP -both assigned to the same species with a maximum likelihood support of 1. Interestingly the haplotype Bv-H5 was also distinguished from the other B. violaceus haplotypes suggesting a putative additional cryptic species. However, the mPTP analysis distinguished only seven entities (Supplementary Table S5), with BvX-H6, NAP and B. israeliense clustered into a single entity, and Bv-H5 clustered with the other haplotypes of B. violaceus. The other delimited entities were the same in the two analyses (Supplementary Table S5).

DISCUSSION
We used COI sequencing to examine >750 colonies of the genus Botrylloides and Botryllus with a special interest in three species, two introduced species, namely B. violaceus and B. diegensis, and one species putatively native to Europe, namely B. leachii. Combining phylogenetic analyses and barcoding, we showed that COI is a robust barcode for distinguishing the three species. We also confirmed the presence of the two NIS in the EC. In addition, this study went well beyond our expectations: it pointed out (1) the risk of mis-identification between the two NIS when rapidly identified in the field based on simple external criteria (colony color patterns), (2) the presence of errors in databases, with reference data available for B. leachii (native) at the time of the analysis (last checked in May 2019) actually being sequences corresponding to B. diegensis (introduced in the study regions), (3) the presence of B. diegensis, probably previously mistaken as B. leachii, in the Mediterranean Sea, based on our custom reference database, thus pointing to the need to revise all previous reports of B. leachii, as it has likely been confused with B. diegensis and (4) the existence of a cryptic divergent lineage, displaying a single-color pattern similar to B. violaceus.
Thus the molecular findings indicate that single-color colonies lacking larvae cannot be identified to species level with confidence from external appearance alone (see brief discussion in Bishop et al., 2015a), at least until additional distinguishing characters are recognized. Preliminary morphological observations based on external features of the species investigated are presented in Supplementary Figure S3. Identification by internal morphology, i.e., microscopical examination of the zooids and brooded larvae, requires laboratory facilities and training, and would be very time-consuming if large numbers of specimens were to be processed, for instance to estimate the relative abundance of the different species in multiple samples. In addition, some key morphological features, such as gonads and larvae, may be present only seasonally. Recourse to molecular means of identification appears a fruitful option, particularly for colorless preserved specimens that may not be reproductive. Furthermore, clear distinctions based on internal anatomy are not yet very consistently reported in the literature, given the current level of taxonomic confusion. Morphological characterization of specimens that have been pre-sorted into putative taxonomic categories based on DNA sequence information may help to achieve the full clarification of any differences in internal anatomy. However, discovery of reliable differences in zooidal morphology can prove challenging in this genus. Saito et al. (1981) distinguished B. simodensis Saito and Watanabe, 1981 from B. violaceus on the basis of reproductive characteristics and the color of live colonies, while preserved, non-reproductive colonies were very hard to separate on the basis of zooidal morphology. Atsumi and Saito (2011) were subsequently able to distinguish two additional species from B. simodensis based on differences in reproductive seasonality and the color of living colonies, backed up by mtDNA sequence information, despite them sharing very similar zooidal features.
As shown by the phylogenetic tree (Figure 2), COI is reliable to distinguish the three accepted species with a clear barcoding gap among them (Supplementary Figure S1). This is in line with previous studies showing that COI can be a relevant barcoding marker for metazoans (Bucklin et al., 2011), notably considering its properties as a rapidly evolving marker (as compared to the nuclear genome). This study also highlights a clear benefit of the barcoding approach, i.e., to enable species identification in groups that lack easy morphological traits to use in taxonomic determination. We carried out extensive sampling of Botrylloides spp. colonies that externally looked like B. violaceus, based on one simple field criterion, namely the single-color patterns of the colonies. The results of COI sequencing and comparison with a newly generated reference dataset clearly indicated that we made a large number of errors in assigning species name in the genus Botrylloides. Error rates varied 0 to 100% (Figure 1) according to the population. We could compare the results of the molecular identifications of the supposedly 'B. violaceus' population samples with results of RAS carried by one of us (JB) while sampling for the intended molecular study in the nine populations from the United Kingdom. During the FIGURE 3 | COI phylogenetic tree based on Botrylloides sequences obtained from samples in the English Channel, reference sequences (see Figure 2), and sequences obtained from samples from the Mediterranean Sea (published data or from this study; see details in Supplementary Table S2). The tree was built with MEGA 7 by the Maximum Likelihood Method based on the Hasegawa-Kishino-Yano model with 54% of sites evolutionarily invariable. It is made of 52 sequences over 395 base pairs. They include sequences obtained from Mediterranean Botrylloides either newly collected or obtained from GenBank (see Supplementary  Table S2 for codes and details). In addition, we included 6, 5, and 5 haplotypes from our reference sequences for B. diegensis, B. violaceus and B. leachii 'radiata' morph (open triangle symbols in green, red and blue, respectively), together with the haplotype named as BvX-H6 (violet diamond) (see details in Supplementary  Tables S3, S4). The tree is drawn to scale, with branch lengths measured as number of substitutions per site indicated in italics below the branch. Numbers in bold on nodes indicate percent bootstrap support values (1000 bootstraps).
FIGURE 4 | Relationship between the percentage of B. violaceus mis-identification (i.e., individuals that were assigned to B. diegensis with barcoding) and the abundance of typical two-color morph of B. diegensis in the nine study populations on the English coast. The four-point abundance scale was obtained for B. diegensis during rapid assessment surveys carried out at the same time as sampling for molecular analyses.
RAS, a semi-quantitative abundance of the distinctive twocolor morph B. diegensis was recorded on a four-point scale of abundance. Plotting the percentage of mis-identification as a function of the abundance of two-color B. diegensis shows that the highest mis-identification rates were observed where twocolor B. diegensis colonies were relatively common (Figure 4). From this plot, it is clear that the typical two-color morph is easily identified as compared to B. violaceus. It also confirms that the distinctive 'B. diegensis' two-color morph is associated with a conspecific uniformly colored morph. Although the number of sites where the two morphs were both identified is low, the correlation further suggests that these two morphs within B. diegensis occur in relatively constant proportions from place to place. Interestingly, the single-color morph of B. diegensis shares haplotypes with the two-color morph (Supplementary Table S1 compared to Supplementary Table S3). In addition, colonies of the same color can be found in both single-color B. diegensis and in B. violaceus.
Even before becoming the barcoding marker of choice for many groups, COI had been extensively used in phylogeographic studies, thus to investigate intra-specific diversity and evolutionary history (Avise, 2000;Beheregaray, 2008;Hickerson et al., 2010). Such population-based investigations often revealed divergent lineages, which, through an integrative taxonomy approach, were sometimes assigned to new species (Pante et al., 2015). Botryllus and Botrylloides are complex taxa in which species status has already been debated (Brunetti, 2009;Reem et al., 2018). For instance, Bock et al. (2012) showed that at least three divergent clades that might be cryptic species characterize B. schlosseri. Griggio et al. (2014) further indicated that the clade A (as defined in Bock et al., 2012) may in turn be undergoing speciation. Our eight reference sequences for this accepted species were obtained from specimens collected in Brittany and south-western United Kingdom. They are also indicative of the presence of two divergent clades in the W English Channel (Figure 2), which are corresponding to clade A and E described by Bock et al. (2012). These authors also showed they co-occur in three populations (namely Brest, Falmouth and Brixham) from EC. Similarly, we found one haplotype, which we called BvX-H6, from colonies externally somewhat similar to B. violaceus, that showed a very large sequence divergence from the other studied Botrylloides species, with a divergence similar (ca. 20%) to the one measured between reference sequences of B. violaceus and B. diegensis ( Table 2 and Supplementary Figure S1).
Two populations from United Kingdom, namely Southsea and Gosport, where the first dataset revealed the presence of BvX-H6 were sampled a second time in June 2012. Out of the 13 and 35 specimens examined, 77 and 24% specimens showed this particular haplotype in Southsea and Gosport, respectively. In Gosport, the other specimens were all B. diegensis and in Southsea they were all B. violaceus (Supplementary Figure  S2). As part of another population-genetics study, we used the microsatellite markers specifically developed to target B. violaceus (Molecular Ecology Resources Primer Development Constorium et al., 2010) on the colonies assigned here to B. violaceus (with haplotypes Bv-H1 to Bv-H5) and to this new clade. Interestingly, whereas all the colonies with COI haplotypes Bv-H1 to Bv-H5 were easily genotyped with these microsatellites, no PCR product could be obtained for any of the BvX-H6 colonies (data not shown, FV, CR, JB unpublished data). This amplification failure strongly suggests that the colonies with haplotype BvX-H6 represent a distinct cryptic species in the EC. The clustering of this form with the Botrylloides colony from Miseno Lagoon (Italy) and with Botrylloides israeliense (Figure 3) indicates that the same divergent clade is represented in the Mediterranean Sea, but the reported divergence between the two geographical regions (ca. 6%), based on comparison of just 395 base pairs, leaves it unclear whether the EC colonies with haplotype BvX-H6 should be regarded as a somewhat divergent lineage of B. israeliense or as a sister species which is possibly undescribed. Discrepancies between the outcome of the GYMC and mPTP species delimitation methods left us with uncertainties regarding the status of this new lineage. The external appearance of B. israeliense shown by Reem et al. (2018: Figure 1b) does not suggest close kinship with the entity represented by BvX-H6 (Supplementary Figure S3, panels 7 and 8), but further molecular studies, including new species delimitation analyses with other markers, ideally coupled with detailed morphological comparison, will be necessary to resolve the status of this lineage and the exact relationship between the EC and Mediterranean populations.
Another facet of this work has been the documentation of some errors in public databases. Such errors have been pointed out previously (e.g., Harris, 2003;Vilgalys, 2003, and references herein). In our case, this issue was assessed by using simple identification searches in the BOLD system and in the NCBI portal. The outcome is illustrated in the Mediterranean Sea with a phylogenetic tree in Figure 3, where it is clear that all our B. diegensis references are mixed with sequences named B. leachii in Genbank. Further, new sequences obtained confirmed that B. diegensis in the Mediterranean can also have a two-color pattern (Venice or Sète specimens) or a single reddish coloration (Ebro Delta). We failed to find any sequence, either published or new, of Mediterranean B. leachii. Considering that this is the Botrylloides species most commonly mentioned in this sea, our results suggest that the presence of B. diegensis might have been strongly underestimated in the Mediterranean region. We also confirm the presence of B. violaceus in the Venice Lagoon, with external aspect identical to single-color colonies of B. diegensis.
The case of B. diegensis and B. violaceus is a particularly acute issue as these are two NIS, widespread in the northern hemisphere. The lack of reliability of public databases for notorious NIS is an important shortcoming for their early detection and thus effective management (Darling et al., 2017). In this context and for the specific case examined here, it is noteworthy that this mis-identification of B. diegensis occurred during a survey of Mediterranean harbors with the specific aim of examining the distribution of NIS in these well-known introduction hotspots (López-Legentil et al., 2015), and during collections by JB and FV in the EC, specifically targeting B. violaceus in marinas for population-genetic studies. We now have evidence for the presence of B. diegensis in Venice (Italy), Delta of Ebro (Spain), along the Catalan coast (Spain) and in Sète (France). Nevertheless, B. diegensis has not been formally reported in the Mediterranean, although Brunetti and Mastrototaro (2017) mention personal communication by A. and W. Bay-Nouailhat indicating its presence in the W Mediterranean. B. diegensis might be not only present but common and widespread along the Mediterranean coast, although commonly mis-identified as B. leachii. Confusion between B. diegensis and B. leachii might have contributed to statements that these species share very similar morphology (e.g., Van Name, 1945), and Brunetti (2009) synonymized them, although Brunetti and Mastrototaro (2017) treat them as separate. There is little doubt that the distinctive two-color morph now attributed to B. diegensis is a relatively recent arrival in NW Europe, rather than a long-established, potentially native, species such as B. leachii. The very striking two-color morph is not illustrated or mentioned in accounts of Botrylloides in less recent identification guides to the region (e.g., Eales, 1939;Berrill, 1950;Barrett and Yonge, 1958;Millar, 1970;Picton, 1985;Hayward and Ryland, 1990;Gibson et al., 2001). Furthermore, the continuing spread of the distinctive two-color morph to new regions of the English coast from an apparently recent origin on the south coast has been documented in repeated field surveys since 2004 (e.g., Wood et al., 2016). RAS rely substantially on clearly defined field characteristics, which may be elusive in some groups; detailed molecular and morphological studies are called for before field identification can be deemed reliable for the important ascidian genus Botrylloides.

DATA AVAILABILITY STATEMENT
The haplotypes generated by Sanger sequencing have been deposited in GenBank (accession numbers MK978800-MK978824 for the EC sampling and reference sequences and MN076465-MN076483 for the Mediterranean sampling).

AUTHOR CONTRIBUTIONS
FV and JB contributed to the conception and design of the study. FV, JB, SB, and XT provided the samples and/or carried out the sampling. CR, SB, and XT performed the molecular work or provided the sequence data. FV and CR made the analyses. FV wrote the first draft of the manuscript with substantial contributions of JB for some sections and XT for the Mediterranean part of the manuscript. All authors contributed to the manuscript revision, and read and approved the submitted version.

FUNDING
This work was part of the Interreg IVA project 'Marinexus' financed under the European Regional Development Fund. And also received further support from the AquaNIS2.0 project funded by the Fondation TOTAL, and from project CTM2017-88080 (MCIU/AEI/FEDER/UE) of the Spanish Government.