Unraveling the phylogeny of Chaetopteridae (Annelida) through mitochondrial genome analysis

Mitochondrial genomes serve as valuable markers for phylogenetic and evolutionary studies across diverse invertebrate taxa, but their application within Annelida remains limited. In this study, we report the mitochondrial genomes of seven species from four genera of Chaetopteridae (Annelida), obtained by high-throughput sequencing. Phylogenetic analysis was performed using cox1 , 18S , 28S and all mitochondrial genes. Our results reveal Chaetopterus and Mesochaetopterus as well-supported monophyletic sister clades, while Phyllochaetopterus and Spiochaetopterus appear paraphyletic, with species from both genera in a mixed clade sister to Chaetopterus + Mesochaetopterus . While mitochondrial gene orders remain conserved within Chaetopteridae, they appear substantially different from those of the ancestral patterns in Annelida. All 13 protein-coding genes found in Chaetopteridae evolved under strong puri ﬁ cation selection, although Phyllochaetopterus exhibited the highest base-substitution rate for most of them, suggesting a more relaxed puri ﬁ ed selection. Overall, our study provides molecular resources for phylogenetic studies of Chaetopteridae, highlighting the necessity for a comprehensive revision of the family, particularly dealing with the paraphyletic Phyllochaetopterus and Spiochaetopterus .


Introduction
Chaetopteridae is a small family of the phylum Annelida, whose species live in selfsecreted membranous tubes and are commonly found in different habitats from the intertidal to the deep sea (Moore et al., 2017;Britayev and Martin, 2019;Rouse et al., 2022).While most of them live buried in soft sediment, some are attached to rocks, either living alone or in groups, and one species is holoplanktonic (Blake, 1996;Osborn et al., 2007;Nishi and Hsieh, 2009;Nishi et al., 2009).Chaetopterids measure from less than 1 cm to more than 40 cm and usually have less than 60 segments (Moore et al., 2017).Members of the family are easy to be identified by having bodies usually divided into three highly distinct regions.To date, Chaetopteridae contains 79 species belonging to four genera: Chaetopterus Cuvier, 1830, Mesochaetopterus Potts, 1914, Phyllochaetopterus Grube, 1863, and Spiochaetopterus Sars, 1856(Read and Fauchald, 2023).
The phylogenetic position of Chaetopteridae in the tree of life of Annelida has not been stable over the last three decades.Based on morphology, they were considered a family within Spionida, t o g e t he r w i t h t he S p i o n i d a e , Po e c il ic h a e t i d a e , a n d Trochochaetidae, among others, with Trochochaetidae as its sister family (Rouse and Fauchald, 1997).In molecular phylogenetic analyses based on 18S rRNA, Chaetopterus variopedatus (Chaetopteridae) was recovered as sister to a clade containing two species of Brachiopoda and two of Phoronida, which altogether, were sister to Dodecaceria concharum (Spionidae) (McHugh, 2000).Later studies with more genes (Colgan et al., 2006;Rosset et al., 2007), including transcriptomes (Struck, 2011;Weigert et al., 2014) or mitochondrial genomes (mtgenome) (Weigert et al., 2016;Struck et al., 2023), usually recovered Chaetopteridae as one of the basal groups of Annelida.However, the relationships among the basal annelid groups (i.e., Amphinomidae, Magelonidae, Myzostomida, Oweniidae, Sipuncula) and between annelids and other lophotrochozoans (i.e., Nemertea, Brachiopoda, Mollusca) remained unstable.
Within Chaetopteridae, there have also been inconsistencies in the monophyly of the four traditionally recognized genera, as well as in the phylogenetic relationships among them.Chaetopterus and Mesochaetopterus were paraphyletic based on cox1 only (Morineaux et al., 2010;Zhang et al., 2015), but monophyletic based on the combined dataset of cox1 and 18S (Martin et al., 2022) and cox1, 18S and 28S genes (Zhang et al., 2015;Moore et al., 2017), while the same authors found Phyllochaetopterus and Spiochaetopterus as paraphyletic, forming a sister clade to Mesochaetopterus + Chaetopterus (Osborn et al., 2007;Zhang et al., 2015).Therefore, more taxon sampling within each chaetopterid genus and more genetic data for each species are required to provide a well-resolved phylogenetic tree of Chaetopteridae and to properly place this family in the annelid tree of life.
In this study, we conducted high-throughput low-coverage sequencing using the Illumina platform to obtain the mtgenomes of seven species of Chaetopteridae, attempting to contribute to a better understanding of their evolution within the family, as well as of phylogenetic relationships among the four genera.
2 Materials and methods  1, Figure 1) and preserved in 95% ethanol for DNA extraction or 4% formalin for morphological observations.Genomic DNA for each species was extracted using the CTAB method.Paired-end sequencing on the Illumina platform was performed at Novegene or Sangon to obtain 3.67 to 8.86 Gb 150 bp data (Supplementary Table 1).

Mtgenome annotation
The chaetopterid mitochondrial genes or mtgenome or 18S/28S rRNA were detected using BLAST V2.13.0 (Camacho and Madden, 2013).The mitogenomes were annotated on the MITOS web server using genetic code 05 for invertebrates (Bernt et al., 2013c).The boundaries of protein-coding genes (PCGs) and rRNA genes were manually examined and adjusted based on alignment with published mtgenomes of this family.The GC-skew and AT-skew were defined according to Perna and Kocher (1995): AT skew = (A − T)/(A + T) and GC skew = (G − C)/(G + C).

Phylogenetic analyses
A three-gene dataset (mitochondrial cox1, nuclear 18S and 28S rRNA genes) contained 50 species, including 41 published in Moore et al. (2017) and Britayev et al. (2017), seven sequenced in this study, and two as outgroups, Eurythoe complanata (Pallas, 1766) and Owenia fusiformis Delle Chiaje, 1844, were used in our phylogenetic analyses (Supplementary Table 2).The sequences of each gene were aligned using the MAFFT version 7 web server (Katoh et al., 2017), and concatenated using PhyloSuite (Zhang et al., 2020).Poorly arranged locations and very dispersive regions were removed using less stringent selection settings of Gblocks Server which include smaller final blocks, gap positions within the final blocks, and less strict flanking positions (Talavera and Castresana, 2007).The best nucleotide evolutionary model for each partition was selected based on the Akaike information criterion (AIC) (Darriba et al., 2012) of the PartitionFinder2 module in PhyloSuite.Phylogenetic analysis was performed using Maximum Likelihood (ML) and Bayesian inference (BI) were respectively executed using the IQ-TREE module in PhyloSuite with 10,000 Ultrafast Bootstrap (UFBoot) replicates with the SH-aLRT test (Minh et al., 2013) and the MrBayes module in PhyloSuite with 1000 Sampling Freq replicates.
Phylogenetic analyses were also conducted using all mitochondrial genes (13 PCGs, 2 rRNAs, and 22 tRNAs) for all mtgenomes, except in Spiochaetopterus sp.A which contained only 3 PCGs and 11 tRNAs.Eurythoe complanata (Pallas, 1766) (accession number KT726962.1)and Owenia fusiformis (Delle Chiaje, 1844) (accession number NC_028712.1)were used as the outgroups.The saturation of genetic sequences was assessed using Species sequenced in this study.The species with incomplete names will be described as new species in another paper.
DAMBE (Xia, 2018).The methods for sequence alignment, concatenations, removal of poorly aligned locations, model selection and phylogenetic tree construction were the same as those mentioned above.
To assess the phylogenetic placement of Chaetopteridae in Annelida, we used all protein-coding amino acid datasets of 240 species in Struck et al. (2023) and the seven here sequenced, with Terebratulina retusa (Linnaeus, 1758) as outgroup (Supplementary Table 2).The methods for model selection and ML phylogenetic tree construction were the same as the above, while BI was executed with two chains for 5,000,000 generations using Mpi-MrBayes v3.2 (Ronquist et al., 2012).In ML, a bootstrap >90 was considered strong clade support, 70-90 as moderate, and < 70 as weak support (Krenz et al., 2005).For BI, posterior probabilities > 0.95 were considered strong support (Jacobsen et al., 2010).

Mitochondrial gene order rearrangement
CREx2 was used to assess the mitogenomic rearrangement (Bernt et al., 2007), and the TreeRex (Bernt and Middendorf, 2011) allowing deducing the ancestral gene order of the inner nodes for Chaetopteridae and predicting the evolution process of present species based on the given phylogenetic tree.The parameters were set following the software recommendations: -s (strong consistency method), -w (weak consistency method), and -W (parsimonious weak consistency method).

Genetic distance and base substitution rates of mitochondrial genes
The pairwise genetic distance analyses of amino acid sequences of each mitochondrial PCG were conducted using MEGA X (Kumar et al., 2018).Base-substitution rates for the 13 PCGs of all chaetopterid mtgenomes were calculated followed by Sun et al. (2021).In short, the sequences for each gene were aligned using default parameters of the Muscle module within MEGA X.The non-synonymous to synonymous rate ratio (Ka/Ks) was then calculated using the YN method implemented in KaKs_Calculator 2.0, with Ka/Ks indicating the strength of the selective pressure as > 1 positive selection, = 1 neutral evolution and < 1, purified selection (Yang and Nielsen, 2000).Spiochaetopterus sp.A was excluded from this analysis due to its incomplete mtgenome.

Phylogeny tree based on all mtgenome genes
The phylogenetic analyses were based on a data matrix containing 16,939 characters, including 9,900 bp from 13 PCGs, 2,687 bp from two rRNA genes and 1,630 bp from 22 tRNA genes (Figure 4; Supplementary Table 7).Five out of 13 PCGs showed significant saturation but were included because they contained phylogenetic information (Supplementary Table 8).The best models selected for the phylogenetic analyses were shown in Supplementary Table 9 and they produced similar tree topologies to those based on the three-gene dataset (Figure 4).All species of Chaetopteridae form a well-supported clade (ML/BI: 100/1) with two sub-clades, one with three species of Chaetopterus and two of Mesochaetopterus, both well supported (ML/BI: 100/1) consistently with the three-gene dataset results, in which Chaetopterus and Mesochaetopterus are also monophyletic.Another well-supported sub-clade consisted of species from Phyllochaetopterus and Spiochaetopterus, but this showed the three species of Phyllochaetopterus as paraphyletic with species of Spiochaetopterus and Phyllochaetopterus from the same location clustered together.

Mitochondrial gene order rearrangements
There were seven gene order patterns in the chaetopterid mtgenomes, including three in Chaetopterus and Mesochaetopterus, three in Phyllochaetopterus and one in Spiochaetopterus (Figure 5).The gene order distance ranges from 1 to 7 (Supplementary Table 10).Those between Chaetopterus and Mesochaetopterus are similar (1 to 2) but they are substantially larger than with Phyllochaetopterus (4 to 7).
The gene order evolutionary patterns were predicted by TreeREx based on the phylogenetic tree (Figure 5; Supplementary Table 11).Reversions were detected in the most recent common ancestor of the M. tingkokensis-M.japonicus clade (trnE and trnF), in C. qiani (trnY and trnM), when compared with its most recent common ancestor with Chaetopterus.Two transpositions were detected in the most recent common ancestor of two Phyllochaetopterus sp.(OR637234 and KT726961.1)(trnN and trnK), and P. hainanensis and Spiochaetopterus sp.B (trnD and trnN), compared with its most recent hypothetical common ancestral gene order.One reversion (trnA-trnP and trnR-trnD) was detected in Phyllochaetopterus sp (KT726961.1)and two transpositions (trnC and trnM-rrnS) and one reversion (trnP and trnA) in Spiochaetopterus sp.B, compared with that of its most recent common ancestor.

Purifying selection of 13 mitochondrial PCGs
The Ka/Ks of the 13 chaetopterid mitochondrial PCGs are all lower than 0.5, suggesting that they have undergone strong purification selection.Nevertheless, the Ka/Ks of atp8 (0.378) and nad4 (0.423) are higher than those of the other PCGs (Figure 7A), with the smallest in the complex IV (i.e.cox1 = 0.018, cox2 = 0.051, cox3 = 0.035) and nad5 (0.070) (Figure 7A).Compared with Chaetopterus and Mesochaetopterus, Ka/Ks was higher in Phyllochaetopterus for all mitochondrial PCGs except nad4 (Figure 7B), indicating a general relaxation of purifying selection in the mtgenomes of this genus.Hypothetical ancestral gene order of mtgenomes for Chaetopteridae and gene order rearrangement scenarios.R= Reversion and T= transposition, with the related genes being marked by lines and triangles, respectively.The ML/BI phylogenetic tree was constructed using the concatenated datasets of the amino acid sequences of 13 PCGs of mitogenome, with 247 annelid terminals (Supplementary Table 2B, Supplementary Figure 1).Our results show the Oweniidae and Magelonidae as sister groups, forming a sister clade to Chaetopteridae + all other annelid families (ML/BI: 95/#).

Structural features of the chaetopterid mtgenomes
The mitochondrial intergenic gaps of metazoans are usually composed of very short non-coding regions or very few overlapping bases (Zhong et al., 2008).Therefore, the length of metazoan mtgenome is generally stable, especially in vertebrates.Conversely, annelid mtgenomes vary substantially from 14kb (Aguado et al., 2016) to 25kb (Seixas et al., 2017;Sun et al., 2021), although chaetopterid mtgenomes show a smaller range from 15kb (Chaetopterus) to 21kb (Phyllochaetopterus). Longer annelid mtgenomes obey either an enormous control region (Dloop) (as in Spirobranchus and Siboglinum) or non-coding region within the cox1 gene (the group II intron) and many huge intergenic gaps (as in Glycera, Hydroides and Polynoidae).Longer mitogenomes generally contain more long non-coding regions and a non-conserved gene order, as in Hydroides spp.(Sun et al., 2021), Siboglinidae (Siboglinum fiordicum Webb, 1963, 19,502 bp;Li et al., 2015).However, there are some exceptions, such as among Glyceridae (Glycera unicornis Lamarck, 1818, 20,366 bp;Richter et al., 2015).In Phyllochaetopterus and Mesochaetopterus, large mitogenomes were either due to the group II introns among the cox1, to the non-coding region within the nad2, or numerous huge intergenic non-coding regions.Long non-coding regions were produced probably during changes in gene structure in Chaetopteridae, since intergenic non-coding spaces have been suggested to facilitate the inception of replication (Pons et al., 2014), or as remnants of gene order change (such as TDRL and transpositions) (Seixas et al., 2017), strands of the template mispairing, or imprecise termination during replication (Bernt et al., 2013b;Aguado et al., 2016).
The mtgenome gene orders of Chaetopteridae greatly differ from those of the ancestral annelid and most other annelids, although it is relatively conserved and shows three conserved gene blocks in most species (Figure 2).Among the ten mtgenomes, those of Chaetopterus and Mesochaetopterus show very similar gene orders, with only one reversion in the position of trnY and trnM or trnE and trnF.The Phyllochaetopterus and Spiochaetopterus mtgenomes are also very similar, with several unstable positions of tRNA (i.e.trnR, trnD, trnP, trnN, trnK, trnC) and a fragment of trnM-rrnS.These results are consistent with the close phylogenetic relationships between Chaetopterus and Mesochaetopterus, and between Spiochaetopterus and Phyllochaetopterus, respectively.
The monophyly of Chaetopterus and Mesochaetopterus was not supported by morphology and BI based on 28S (Osborn et al., 2007;Morineaux et al., 2010).In this study, Chaetopterus appears as a well-supported sister taxon to Mesochaetopterus, in agreement with Moore et al. (2017); Martin et al. (2022) and Zhang et al. (2015).The paraphyletic relationship of Phyllochaetopterus + Spiochaetopterus is consistent with the results of the three scholars above.Major revisions are needed to define more distinguishing characteristics of these two genera instead of the present/absent of notopodial cirrus of A1.
Among the PCGs, atp8 has the highest pairwise genetic distance (Figure 6), followed by nad2 and nad6, suggesting faster substitution rates compared to other PCGs, as previously shown in deep-sea Polynoidae (Zhang et al., 2018), in which atp8 is the fastest-evolving mitochondrial gene (Sun et al., 2021).Additionally, the PCG genetic distance in Phyllochaetopterus was generally greater than that of the other genera, suggesting different selective pressures and faster mutation rates in its mitogenomes.In addition, the branch length of the Phyllochaetopterus + Spiochaetopterus clade was longer than that of the Chaetopterus + Mesochaetopterus clade (Figures 3, 4), indicating a faster substitution rate in the former.

Mtgenome rearrangement
Chaetopteridae exhibits distinct mtgenome gene order arrangement patterns compared to other annelids.Gene orders were initially considered as conserved among annelid mtgenomes (Jennings and Halanych, 2005), which is still true for many families (Weigert et al., 2016).However, substantial gene order arrangements have been found among Syllidae (Aguado et al., 2015(Aguado et al., , 2016)), deep-sea Polynoidae (Zhang et al., 2018) and Serpulidae (Sun et al., 2021).Chaetopteridae shows different mtgenomes than the ancestral annelids (excluding Oweniidae and Magelonidae), mainly in the atp6, nad1, nad6, cox3, cox1, and several tRNAs positions (Supplementary Figure 2).However, its gene order is relatively conserved, with three conserved regions in all species except Spiochaetopterus (Figure 2).The uncertain evolutional relationships between Chaetopteridae and all other polychaetes, the gene order of its most recent ancestor cannot be determined.Therefore, it is not clear how the mitochondrial gene order of the common ancestor of Chaetopterus and Mesochaetopterus or Phyllochaetopterus and Spiochaetopterus evolved from that of the chaetopterid common ancestor.

Base-substitution rates of mtgenomes
The diversity of gene orders is positively related to the substitution rate (Bernt et al., 2013a, b;Luo et al., 2015).The conserved functions in the respiration of mitochondrial genes explain why they are undergoing purifying selection (Stewart et al., 2008).Our study shows the Ka/Ks for all chaetopterid protein-coding genes as being lower than 1, indicating purifying selection (Figure 7), while being quite fast evolving compared to other polychaete (Zhang et al., 2018) or invertebrates (Ren et al., 2010).Phyllochaetopterus shows higher base-substitute rates for most mitochondrial PCGs than Chaetopterus or Mesochaetopterus, indicating faster mutation rate that is consistent with their longer branch length in phylogenetic trees.Different mutation rates have also been shown among Polynoidae, with the deep-sea genera Branchipolynoe and Branchinotogluma, having high mutation rates that suggest adaptation values (Zhang et al., 2018).
Overall, our results show different mitochondrial gene orders in Chaetopteridae compared to the conserved Annelida pattern, despite most species in this family sharing three singular conserved regions.Chaetopterus and Mesochaetopterus are sister groups forming well-supported monophyletic clades, which, together, are sister to a paraphyletic Phyllochaetopterus clade with Spiochaetopterus as subclade.The mitochondrial PCG genes in all examined chaetopterids have undergone purifying selection, but in most cases, those of Phyllochaetopterus show a higher basesubstitution rate than those of the other genera.

FIGURE 2
FIGURE 2Comparison of gene orders of chaetopterid mtgenomes.Conserved gene clusters of annelids are marked by different color bars.White circles represent non-coding regions with >100 bp sequences between genes.White blocks represent the location of control region in mtgenome.Triangles stand for a long non-coding region within a gene.Missing genes are indicated by black blocks.

FIGURE 3
FIGURE 3Phylogenetic tree of Chaetopteridae reconstructed based on cox1, 18S and 28S rRNA genes by ML/BI methods.The ML tree was displayed.The species newly sequenced for this study are in bold.Bootstrap support values of ML (left) and posterior probability of BI (right) are indicated above the nodes.Asterisks stand for maximum supports and pound keys stand for < 50 or unsupported values.

FIGURE 4
FIGURE 4Phylogenetic tree of Chaetopteridae constructed based on all mitochondrial genes by ML/BI methods.The ML tree was displayed.The species newly sequenced for this study are in bold.Bootstrap support values of ML (left) and posterior probability of BI (right) are indicated above nodes.Asterisks stand for maximum supports and pound keys stand for unsupported values.
FIGURE 7 Base substitution rate of the 13 mitochondrial PCGs found in Chaetopteridae.(A) Ka/Ks for each mitochondrial gene.(B) Ka/Ks of each mitochondrial gene compared among different genera.Ka/Ks <1 indicates purification selection.Ch, Chaetopterus; Ph, Phyllochaetopterus; Me, Mesochaetopterus.

TABLE 1
Collection information of specimens used in this study.
*The specimens will be deposited at the Institute of Oceanology, Chinese Academy of Science (IOCAS), Qingdao, China.#

TABLE 2
Information on the mtgenomes of ten species of Chaetopteridae.