Variation and Diagnostic Power of the Internal Transcribed Spacer 2 in Mediterranean and Atlantic Eolid Nudibranchs (Mollusca, Gastropoda)

Mediterranean marine biodiversity is still underestimated especially for groups such as nudibranchs. The identification of nudibranchs taxa is challenging because few morphological characters are available and among them chromatic patterns often do not align with species delimitation. Molecular assessments helped unveiling cryptic diversity within nudibranchs and have been mostly based on mitochondrial markers. Fast evolving nuclear markers are much needed to complement phylogenetic and systematic assessments at the species and genus levels. Here, we assess the utility of the nuclear Internal Transcribed Spacer 2 (ITS2) to delimit species in the eolid nudibranchs using both primary and secondary structures. Comparisons between the variation observed at the ITS2 and at the two commonly used mitochondrial markers (COI and 16S) on 14 eolid taxa from 10 genera demonstrate the ability of ITS2 to detect congeneric, closely related, species. While ITS2 has been fruitfully used in several other mollusc taxa, this study represents the first application of this nuclear marker in nudibranchs.


INTRODUCTION
Mediterranean marine biodiversity is still underestimated and new cryptic species continue to be identified, described, and added to our inventory of marine fauna (Calvo et al., 2009;Uriz et al., 2017;González-Castellano et al., 2020;Furfaro et al., 2021). Our knowledge on diversity of groups such as nudibranchs is recently increasing as demonstrated by the number of studies published in the last years (Cella et al., 2016;Furfaro and Trainito, 2017;Korshunova et al., 2017Korshunova et al., , 2019Furfaro et al., 2018;Chimienti et al., 2020;Furfaro and Mariottini, 2020). In nudibranchs, few morphological characters are available, and the same chromatic pattern is often shared between closely related species, thereby making difficult the species identification based on morphology alone (Johnson and Gosliner, 2012;Furfaro et al., 2021). These uncertainties are overcome with the application of an integrative taxonomic approach.
The mitochondrial gene cytochrome oxidase c subunit 1 (COI) is the reference marker for DNA barcoding and species delimitation in animals, due to its fast evolutionary rate that makes this marker very useful to investigate diversity at lower taxonomic levels (Bucklin et al., 2011). Almost all the published studies concerning nudibranchs include the COI marker, and most of them also combine the mitochondrial 16S rRNA, which has proved to be valuable for species delimitation (Lydeard et al., 2000;Alqudah et al., 2015). However, limitations of using only mitochondrial data in taxonomic and evolutionary studies are well known (Moritz and Cicero, 2004) and additional nuclear markers are needed to improve results or resolve species delimitation in closely related taxa. To date, the most used nuclear marker in nudibranch phylogenetics and systematics is the histone H3. However, this marker is weakly informative, given its low evolutionary rate, especially at the species or genus levels (Galia-Camps et al., 2020;Furfaro et al., 2021).
The nuclear Internal Transcribed Spacer 2 (ITS2) rRNA has proved informative at both lower and higher taxonomic levels (Koetschan et al., 2010) in many invertebrates including marine molluscs (Oliverio et al., 2002;Salvi et al., 2010Salvi et al., , 2014Mariottini, 2012, 2017). The ITS2 is located inside the rRNA gene cluster, between the 5.8S and 28S rRNA genes (Tague and Gerbi, 1984). A peculiar feature of this marker is its bivalent patterns of variation: while its primary sequence has a high mutation rate, the secondary RNA structure is very conserved, probably due to its crucial role in the processing of the "pre-rRNA" (Joseph et al., 1999;Côté and Peculis, 2001). For this reason, it has been successfully applied both to barcoding analyses (Müller et al., 2007;Yao et al., 2010) and molecular phylogenies (Wade and Mordan, 2000;Oliverio et al., 2002). In molluscs, the most common secondary structure of the ITS2 rRNA has four/five helices (domains D1-5). The first two domains (D1 and D2) on the 5 region are more conserved and shorter than the two or three domains of the 3 region (D3 and D4/D5), which are often ramified (Joseph et al., 1999;Schultz et al., 2005). In the domains D3 and D4, it is located the so-called apical STEM, which is an extremely conserved sequence that has been described in a few molluscan taxa (Oliverio et al., 2002;Coleman, 2007;Salvi and Mariottini, 2012).
The barcoding power of the ITS2 is anticipated by the finding that there is a correlation between differences at the ITS2 sequence-structure and the biological species concept (Müller et al., 2007). In this context, a very useful character for species delimitation inferences is represented by Compensatory Base Changes (CBCs), i.e., two mutations that occur in a paired region of a primary RNA transcript so that pairing itself is maintained (e.g., G-C mutates to A-U) (Müller et al., 2007).
Despite there are now more than 390,000 sequences available (April 7, 2021) in the "ITS2 Database" 1 (Koetschan et al., 2010), ITS2 studies are still limited to a few groups of molluscs. To date the ITS2 has been used only in two studies on nudibranchs and none of them have combined the analysis of the primary sequence with the secondary structure (Eriksson et al., 2006;Trickey, 2013).
The aim of this study is to assess the utility of the nuclear ITS2 to delimit species in the eolid nudibranchs (suborder Cladobranchia) using both primary and secondary structures and to identify diagnostic CBSs and semi-CBCs that can aid

Analysis of Primary Sequences
ITS2, COI, and 16S DNA sequences were aligned together with GenBank sequences. We used the Muscle algorithm implemented in MEGA X (Kumar et al., 2018) for COI dataset, while for ITS2 and 16S, sequence-structure alignments were built using combined sequence-structure models in 4Sale (Seibel et al., 2008) and further optimised considering the consensus sequence of each rRNA domain. The software "WebLogo" 2 was used to generate a graphical representation of conserved domains and stems based on multiple sequence alignment (Crooks et al., 2004). Genetic distance (p-distance and Kimura 2-paramer, K2p, distance) and segregating sites were calculated using the programme Mega X for each marker (Kumar et al., 2018). Species delimitation analyses based on both genetic distances [Automatic Barcode Gap Discovery (ABGD) (Puillandre et al., 2012) and Species Identifier (Meier et al., 2006)] and on phylogenetic trees [Bayesian and maximum-likelihood analyses and Bayesian 2 https://weblogo.berkeley.edu/logo.cgi Poisson Tree Process (bPTP) (Zhang et al., 2013)] were carried out with the ITS2 dataset. Acanthodoris pilosa (Abildgaard in Müller, 1789) was used as the outgroup in BI and ML analyses. The GTR + I + G model was selected as the best substitution model by JModelTest 0.1 (Posada, 2008) according to the Bayesian information criterion (BIC). BI analyses were carried out with MrBayes 3.2.6 (Ronquist et al., 2012) running four Markov chains of five million generations each, sampled every 1000 generations. Consensus trees were calculated on trees sampled after a burnin of 25%. Maximum-likelihood analyses were performed in raxmlGUI 1.5b2 (Silvestro and Michalak, 2012), a graphical front-end for RAxML 8.2.1 (Stamatakis, 2014), with 100 independent ML searches and 1000 bootstrap replicates. Additionally, the recently developed Assemble Species by Automatic Partitioning (ASAP) analysis (Puillandre et al., 2021) was performed using default parameters.

Secondary Structure Modelling and Compensatory Base Changes (CBCs) Analysis
Complete ITS2 rRNA and partial 16S rRNA (region from L7 to L13 stem-loops) secondary structures were obtained using the programme mfold (Zuker and Jacobson, 1998;Zuker, 2003). Best-supported folding models were predicted combining a thermodynamic approach (Mathews et al., 1999) with a close inspection of paired conserved regions. CBCs of rRNA sequence-structure were calculated and visualised in 4Sale (Seibel et al., 2008).

ITS2, COI, and 16S Primary Sequence Comparisons in Eolid Nudibranchs
Genetic distance values for ITS2 rRNA and the mitochondrial COI and 16S rRNA markers are reported in Genetic distance values of the two ITS2 conserved domains D1 and D2 are reported in Supplementary Table 2. Results from all the species delimitation analyses were congruent to each other (Figure 1) revealing the ability of the ITS2 to detect species even if closely related. Results from BI and ML phylogenetic inferences produced the same topology (statistical supports at each node are reported in Figure 1) and recovered two distinct monophyletic clades within Fjordia lineata (Lovén, 1846) that will need further taxonomic assessment.

ITS2 and 16S Secondary Structure Comparisons in Eolid Nudibranchs
The ITS2 rRNA alignment consisted of 44 sequences, from 14 species in 10 genera and six families (Supplementary Table 1 and Supplementary Figure 1 Frontiers in Marine Science | www.frontiersin.org (Oliverio et al., 2002;Coleman, 2007;Salvi et al., 2010Salvi et al., , 2014Mariottini, 2012, 2017). The ITS2 D1 and D2 helix-loop regions are always identifiable in terms of sequence/structure and position (Figure 3). In fact, the basal double-strand RNA portion of D1 consisting of the very conserved triplet 5 -CGC/GCG-3 is preceded by the triplet 5 -CGU-3 and followed by the conserved single-strand sequence that separates D1 and D2 domains 5 -CUUC-3 . In D1 stem, another conserved base pairing is 5 -G/C-3 positioned next to the triplet 5 -CGC/GCG-3 , present in all the families except for the Flabellinidae (substituted by the CBC 5 -A/U-3 ) and the Facelinidae (substituted by the CBC C/G). The basal doublestrand helix of D2 displays the consensus base pairing 5 -GG/CC-3 (Calmidae, Coryphellidae, Flabellinidae, and Trinchesiidae), but in some families, this paired sequence is substituted by 5 -GA/UC-3 (Fionidae) and 5 -GC/GC-3 or 5 -GU/GC-3 (Facelinidae) where CBCs and semi-CBCs maintain the initial stem base-pairing folding of D2. The ITS2 3 region containing D3-D5 shows a moderate to high divergence structural folding between congeneric species (Figure 2 and Supplementary  Figure 1) and the correct multiple sequence alignments were obtained only considering the derived 2D structures.
In all species examined, there is a very high conserved sequence motif located in the apical double helix-loop region of D5 and identified as the Apical STEM, described in other marine molluscs (Salvi et al., 2010;Salvi and Mariottini, 2012). The consensus Apical STEM of the eolid nudibranchs includes up to nine base pairs: there are five invariant nucleotide positions in the 5 -strand (nnnGCUCGn) out of nine and two invariant ones (nnGnGnnnnn) out of 10 in the 3 -strand (Supplementary  Figure 2). The base-pairing between the two strands does not perfectly match the entire sequences of the double helix. Within the eolid families analysed, the Apical STEM consists of nine base pairs with a single U conserved in all families (Supplementary Figure 2). Caloria elegans (Facelinidae) shows a quite different sequence due to insertions of nucleotide stretches between the stem regions (Supplementary Figure 2). The Apical STEMs of the families Coryphellidae and Flabellinidae are very similar with semi-CBCs that preserve nine base pairs (Supplementary Figure 2).
The number of ITS2 CBCs observed between species of six eolid families is reported in Supplementary  Figure 1) correspond to the domain D5 of the 3 half portion of the gene (Horovitz and Meyer, 1995;Lydeard et al., 2000).

DISCUSSION
Our inventory of Mediterranean nudibranchs is rapidly growing as new species are continuously added, including cryptic species within historically accepted species (Furfaro and Trainito, 2017;Korshunova et al., 2019;Chimienti et al., 2020;Furfaro and Mariottini, 2020). Nowadays, the integrative approach combining morphological and molecular characters is commonly used in nudibranch taxonomy. Ideally, this approach is more robust as more genes and characters are considered. The most used markers in nudibranch systematics are the mitochondrial COI and 16S and the nuclear H3. However, the latter is known to be poorly informative at lower taxonomic levels and for this reason, an additional nuclear "fast-evolving" marker would be extremely valuable. The nuclear ITS2 assessed in this study has revealed a promising marker for nudibranch species identification. The analyses of the ITS2 primary sequence indicate that ITS2 holds a high variability that is comparable with, or higher than, the one of the COI. Species delimitation analyses based on both genetic distances (ABGD, ASAP, and Species Identifier) and on phylogenetic trees (bPTP) (Figure 1) demonstrate the ability of the ITS2 to distinguish nudibranch species. Species identified based on ITS2 are consistent with the a priori species identification (based on morphology and COI data), even for cryptic species such as Trinchesia spp. or closely related species such as Calma spp. which show the lowest divergence values (Table 1). Moreover, phylogenetic trees based on ITS2 sequence data confirm two distinct clades within F. lineata (Lovén, 1846) previously reported in Furfaro et al. (2018) based on COI data and will need further taxonomic assessment. While the combined analysis of the ITS2 sequence-structure is crucial to establish positional homology in multiple alignments, it also provides conserved sequencestructure elements, such as CBCs, that can have diagnostic value at different taxonomic levels up to the species level (Figure 3 and Supplementary Table 3; see Salvi and Mariottini, 2012 for examples on bivalves). This result indicates that also in nudibranchs the presence of CBCs in the RNA folding correlates with distinct species, thus they are very useful for species delimitation, as previously observed in other eukaryotic groups (Müller et al., 2007;Wolf et al., 2013). On a less positive note, the ITS2 amplification revealed difficult for some nudibranch species (e.g., Furfaro et al., 2021). In fact, in the case of the two Mediterranean sibling species Flabellina cavolini (Vérany, 1846) and Flabellina gaditana (Cervera et al., 1987), it exhibited poor sequencing trace quality and for this reason could not be used in that study. The variability in the efficiency of sequencing the ITS2 must be considered and tested while planning the study to perform. Except for these specific cases, however, there is no doubt on the great utility of the ITS2 in the study of animal diversity (Müller et al., 2007;Dong et al., 2011). This study represents the first assessment of the ITS2 in nudibranchs and constitutes a starting point for future works focused on other nudibranch groups.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: NCBI GenBank (accession: MW924024-MW924060).