Complete Mitochondrial Genomes of Paedocypris micromegethes and Paedocypris carbunculus Reveal Conserved Gene Order and Phylogenetic Relationships of Miniaturized Cyprinids

Citation: Sam K-K, Lau N-S, Shu-Chien AC, Muchlisin ZA and Nugroho RA (2021) Complete Mitochondrial Genomes of Paedocypris micromegethes and Paedocypris carbunculus Reveal Conserved Gene Order and Phylogenetic Relationships of Miniaturized Cyprinids. Front. Ecol. Evol. 9:662501. doi: 10.3389/fevo.2021.662501 Complete Mitochondrial Genomes of Paedocypris micromegethes and Paedocypris carbunculus Reveal Conserved Gene Order and Phylogenetic Relationships of Miniaturized Cyprinids


INTRODUCTION
Miniaturization, or the evolution of small size, is widely seen in vertebrate species and best documented in amphibians and fishes (Weitzman and Vari, 1988;Hanken, 1993;Hanken and Wake, 1993). The highly acidic blackwater peat swamps in Southeast Asia are hotspots for an exceptionally large number of miniature fishes (Kottelat et al., 2006). Among these, members of the genus Paedocypris (Paedocypris progenetica, Paedocypris micromegethes, and Paedocypris carbunculus) display the most intriguing form of miniaturization with maximum adult sizes 10-12 mm in standard length (Kottelat et al., 2006;Britz and Kottelat, 2008). Miniaturization in Paedocypris stems from an event known as progenetic paedomorphosis or developmental truncation, which results in adult forms resembling the larval stage of its close relatives (Britz et al., 2014). The miniaturized phenotypes represent unique combinations of ancestral and derived traits such as reduction and structural simplification, morphological novelty, and increased morphological variability (Hanken, 1993;Hanken and Wake, 1993). With more than 3,000 species, Cyprinidae (Teleostei: Ostariophysi: Cypriniformes) is the most species-rich family of vertebrates, distributed in freshwaters throughout North America, Africa, and Eurasia (Nelson, 2006). Paedocypris fish are located within the subfamily Danioninae, which also includes two other paedomorphic groups, the Sundadanio and Danionella (Fang et al., 2009;Tang et al., 2010).
There has been considerable interest in resolving the phylogenetic position of Paedocypris, using morphology features (Britz and Conway, 2009;Britz et al., 2014), a combination of nuclear and or mitochondrial markers (Fang et al., 2009;Mayden and Chen, 2010;Tang et al., 2010Tang et al., , 2013Hirt et al., 2017) or individual markers alone (Ruber et al., 2007). Consequently, Paedocypris was suggested to be within the Danioninae subfamily (Ruber et al., 2007;Fang et al., 2009;Tang et al., 2010Tang et al., , 2013, still within Cyprinidae but not within Danioninae (Yang et al., 2015), or as a lineage sister to Cypriniformes (Mayden and Chen, 2010;Stout et al., 2016;Hirt et al., 2017). These differences could be due to the simplified anatomical structure and some highly derived autamorphic characters of Paedocypris which complicates morphological classification (Britz and Conway, 2009). Base compositional heterogeneity is known to affect phylogenetic reconstruction. High molecular evolution rates may undermine the inference of the phylogenetic position of miniaturized taxa due to issues associated with long-branch attraction (Hirt et al., 2017).
The mitochondrial DNA (mtDNA) has been widely used for phylogenetic, molecular evolution and phylogeography studies because of its maternal inheritance, compact gene organization, low recombination frequency, and rapid evolutionary rate (Miya et al., 2003;Gissi et al., 2008). The circular mitogenomes of fish, having similar composition and structure to that of most vertebrates, are 15-20 Kb in size with 37 genes: 13 protein-coding genes (PCGs), 22 transfer RNA genes (tRNAs), two ribosomal RNA genes (rRNAs), control region, and origin of L-strand replication (Boore, 1999;Satoh et al., 2016). The mitochondrial gene set, including 13 PCGs and two rRNAs, has been consistent used as markers to improve classification or to resolve higherlevel relationships of fish species (Mullens et al., 2020;Sun et al., 2021). Mitogenome, with higher informative sites than shorter sequences is now a widely used tool to resolve deep branching lineages at the level of many taxa levels (Boore et al., 2005).
In view of the unresolved phylogenetic position of the Paeocypris genus and paucity of knowledge on the mitogenome content of miniaturized animals, the complete mitochondrial genomes of P. micromegethes and P. carbunculus were sequenced, assembled, and annotated. This enables the use of 13 mitochondrial PCGs and two rRNAs from complete mitochondrial genome to study the phylogenetic relationship of Paedocypris. In addition, a comparative mitogenome analysis with other miniature fish, Danioninae and Cyprinidae species was also carried out. . All experimental procedures complied with the current animal ethics guidelines and were approved by the USM Institutional Animal Care and Use Committee (USM IACUC). Total genomic DNA was extracted from the whole fish using the Genomic-tip 100/G kit (Qiagen, Germany). The DNA quality was assessed by gel electrophoresis and the DNA concentration was measured using the Qubit R 2.0 Fluorometer (Life Technologies, USA). Paired-end libraries were prepared for P. micromegethes and P. carbunculus, respectively using the TruSeq DNA PCR-free Library Prep kit (Illumina, USA). The constructed libraries were sequenced on a Hiseq platform (Illumina, USA) using 2 × 150 bp paired-end reads.

MATERIALS AND METHODS
Mitochondrial Genome Assembly, Annotation, and Analyses 8.5 million reads totaling 1.3 Gb were used for the mitochondrial genome assemblies of P. micromegethes and P. carbunculus, respectively. The complete mitogenomes were assembled in MITObim v1.9 (Hahn et al., 2013) with the sequence of a closelyrelated species, P. progenetica, as reference (NC_020436). The assembled mitogenomes of P. micromegethes and P. carbunculus were annotated in the MitoAnnotator (Iwasaki et al., 2013) and MITOS (Bernt et al., 2013) web servers. The secondary structures of tRNAs were inferred using the mitochondrial tRNA finder as implemented in MITOS and visualized on the Forna webserver (Kerpedjiev et al., 2015). The secondary structures of the origin of L-strand replication and control region were analyzed using mfold software v3.6 (Zuker, 2003). The strand asymmetry was calculated according to the formulas: AT-skew = (A-T)/(A+T) and GC-skew = (G-C)/(G+C) (Perna and Kocher, 1995). The codon usage and relative synonymous codon usage for the PCGs were analyzed with MEGAX (Kumar et al., 2018) using the vertebrate mitochondrial genetic code. The ratios of nonsynonymous (Ka) and synonymous substitution (Ks) of the PCGs were estimated by KaKs_Calculator v2.0 (Zhang et al., 2006) with genetic code 2 and γ-MYN model. The Ka and Ks values were based on pairwise comparisons between the four Paedocypris species and Sundadanio rubellus (AP011401, in GenBank listed under the trade name "Sundadanio axelrodi "red""), as outgroup. Sundadanio is the genus most closely related to Paedocypris.

Phylogenetic Analyses
The phylogenetic relationships were reconstructed based on the nucleotide sequences of 13 PCGs and two rRNA genes from 75 Cyprinidae. Along with P. micromegethes and P. carbunculus, sequences of selected Danioninae, Acheilognathinae, Culturinae, Xenocyprinae, Squaliobarbinae, Opsariichthys, Leuciscinae species were used for the phylogenetic analyses, and Gobioninae were used as the outgroups (Supplementary Table 1). The alignment was performed using MUSCLE, followed by alignment trimming using trimAl v1.2 and then concatenation of the trimmed sequences into a supermatrix file with FASconCAT v1.11. The best-fit model for the alignment was estimated with jModelTest v2.1.10 based on the Akaike Information Criterion. The concatenated nucleotide alignment was used to perform maximum likelihood (ML) phylogenetic analysis in RaxML software (Stamatakis, 2014) with the GTR+ G + I model and 1,000 bootstraps. For Bayesian inference (BI), the analysis was conducted with MrBayes v3.2.3 (Huelsenbeck and Ronquist, 2001) using the same supermatrix file in nexus format and running with the same model (GTR+ G + I). Four Markov chain Monte Carlo (MCMC) chains were run for 1 million generations with sampling every 100 generations, and the first 25% of the trees were discarded as burn-in. Tracer v1.7.1 was used to assess the chain convergence. FigTree v1.4.2 was used to visualize and edit the phylogenetic trees.

Genome Organization, Structure, and Composition
The mitochondrial genomes of P. micromegethes and P. carbunculus are circular DNA molecules of 17,208 bp and 17,280 bp in length, respectively (Figure 1A), well within the range of a typical vertebrate mitogenome size of 15-20 Kb. The Paedocypris mitogenomes also possess a complete gene set identical to that of the vertebrate mitogenomes (Boore, 1999), comprising 13 PCGs, 22 tRNA genes, two rRNA genes [rrnS (12S rRNA) and rrnL (16S rRNA)], control region and origin of Lstrand replication (Supplementary Table 2). Our analysis of the mitochondrial genome size and content revealed that there are no significant changes in these two characteristics in the miniature fish mitogenomes compared to vertebrate mitogenomes. The complete mtDNA sequences of Paedocypris, on the other hand, are the largest in length of sequences of the mitogenomes in Danioninae examines (Supplementary Table 3). Since the length is conserved in PCGs, tRNAs, and rRNAs, the differences in mitogenome length are mainly attributed to variations in the control region. The mitochondrial control region has a higher evolutionary rate and tends to be under weaker purifying selection than protein-coding genes, allowing the region to accumulate length variations more readily (Tang et al., 2006;Resch et al., 2007). The nucleotide compositions of the complete mtDNA sequences of the Paedocypris are biased toward A and T, as has been noted in other Danioninae mitogenomes (Broughton et al., 2001;Chang et al., 2013;Kusuma and Kumazawa, 2016). In order to evaluate the base bias of the nucleotide composition, we measured skewness in the different gene regions of P. micromegethes and P. carbunculus mitogenomes and found that the AT-skew values were mostly positive, whereas the GC-skew values were mostly negative (Supplementary Table 4).

Protein-Coding Genes and Codon Usage
Most Paedocypris mitochondrial PCGs are transcribed from the positive strand of the molecules except for nad6 on the negative strand. All PCGs use start codon ATG except cox1 which uses start codon GTG. Seven PCGs (nad1, cox1, atp8, atp6, na4l, nad4, and nad5) ended with a complete and canonical stop codon (TAA or TAG). The genes cox2, cox3, nad2, and nad3 were found to have a truncated stop codon T. The presence of truncated stop codon is a common phenomenon among metazoan mitochondrial genes (Sheffield et al., 2010), and the truncated stop codon is hypothesized to be completed by posttranscriptional poly-adenylation (Ojala et al., 1981).
The relative synonymous codon usage values for the third position of the 13 PCGs are summarized in Figure 1B. The most frequently used codons found in the PCGs of Paedocypris were CGA (Arg), CCA (Pro), and UCA (Ser); ACG (Thr), CAG (Gln), and GCG (Ala) being rarely used. There was a bias in favor of A/T in the third position rather than G/C, as almost all frequently used codons ended with A/T. The A+T bias present in the third position of Paedocypris PCGs is consistent with the A+T bias in whole mitogenomes. To investigate the selective pressure across the Paedocypris species, the Ka/Ks ratios for the PCGs of each mitogenome were estimated ( Figure 1C). The Ka/Ks values for all 13 protein-coding genes were <1, indicating the existence of purifying selection in these species.

Transfer RNAs, Ribosomal RNAs, and Control Region
The complete set of 22 tRNA genes typical of metazoan mitogenomes was identified in P. micromegethes and P. carbunculus. Most of the tRNAs could be folded into classic cloverleaf structures, except for trnC and trnS1 (Supplementary Figure 1). In trnC, the loop of the pseudoridine arm (T-arm) was missing, while trnS1 lacked the stem of the T-arm. The function of these aberrant tRNA genes might be complemented by coevolved interacting factors or posttranscriptional RNA-editing (Ohtsuki et al., 2002;Chimnaronk et al., 2005). The large ribosomal rrnL and small rrnS present in the mitogenomes of P. micromegethes and P. carbunculus were 1,672-1,673 and 947-949 nucleotides long, respectively, and were located close to each other between trnF and trnL1 but separated by trnV.
A single long intergenic space of 1,591-1,663 bp located between trnP and trnF in the mitogenomes of P. micromegethes and P. carbunculus is recognized as the control region ( Figure 1D). Multiple copies of 34-bp long tandem repeats "TGGTATAGTGCATATTATGCTTAATACTACATAG" were detected at the 5 ′ end of this region (Supplementary Figure 2), contributing to a greater control region length in Paedocypris compared to other Danioninae. The secondary structure prediction analysis revealed possible folding configuration featuring a stem-loop structure near the 3 ′ end of this noncoding sequence. Paedocypris control region also contains conserved blocks including extended termination associated sequences (ETAS), central conserved domains (CSB-F and CSB-D), and conserved sequence block domains (CSB-1, CSB-2, and CSB-3). CSB-E reported in another teleost CSB domain (Lee et al., 1995) was not present in Paedocypris, indicating that the regulatory mechanism of transcription may be varied in the species. The origin of L-strand replication displaying a stem and loop structure was identified in a cluster of five tRNA genes (WANCY region) as in other cyprinid fishes (Broughton et al., 2001;Wei et al., 2016, Yu et al., 2019.

Gene Arrangements
The pattern of genome arrangement in miniature fish still remain elusive. Here, we analyzed the mitochondrial gene arrangement in representative miniature cyprinids (Paedocypris, Sundadanio, and Danionella), Danioninae (Danio, Horadandia, Boraras, Trigonostigma, and Rasbora), and Cyprinidae (Acheilognathus, Ischikauia, Xenocypris, Squaliobarbus, Opsariichthys, Leuciscus, and Gobio) species (Supplementary Figure 3). Comparative analysis revealed that the mitochondrial gene organization in these miniature taxa shows high stability, with the independent miniaturization in these lineages not influencing the mitochondrial gene order. The same mitochondrial gene order is found across the Danioninae subfamily and the Cyprinidae family. The shared mitochondrial gene order among miniature fish, Danioninae and Cyprinidae species is indicative of common ancestry.

Phylogenetic Analyses
The phylogenetic position of Paedocypris was analyzed based on the combined mitochondrial gene set of 13 PCGs and two rRNAs (Figure 2 and Supplementary Figure 4). The topologies of the ML and BI trees were mostly congruent, except for the position of Acheilognathinae. Based on the ML analysis, the 75 Cyprinidae species could be divided into five major clades corresponding to the subfamilies Danioninae, Acheilognathinae, [((Culturinae + Xenocyprinae) FIGURE 2 | Phylogenetic tree inferred from nucleotide sequences of 13 protein-coding genes and two rRNAs of the mitogenomes using maximum likelihood analysis. The numbers along branches include bootstrap values. The sequences generated in this study are indicated in bold font. + Squaliobarbibinae) + Opsariichthys group], Leuciscinae, and Gobioninae. In the BI tree, subfamilies of Danioninae, [((Culturinae + Xenocyprinae) + Squaliobarbibinae) + Opsariichthys group], (Leuciscinae + Acheilognathinae), and Gobioninae were found, with Acheilognathinae clustering with Leuciscinae. The recognition of the Danioninae subfamily in its current usage follows Fang et al. consideration of Danioninae as the senior synonym of the cyprinid lineage that includes Danio and its closest relatives (Fang et al., 2009). Our mitogenome analyses confirm the previous exclusion of ex-Danioninae taxa (Candida, Nipponocypris, Opsariichthys, Parazacco, and Zacco) that are not part of Danioninae sensu stricto and have been reassigned to the Opsariichthys group (Liao et al., 2011;Huang et al., 2017). The 28 Danioninae species were paraphyletic, forming two clades in the phylogenetic trees. The first clade contains 21 species from the genera Rasbora, Trigonostigma, Boraras, Horadandia, Danio, and Danionella. The genus Paedocypris comprising five species, P. progenetica, P. carbunculus, Paedocypris sp. ZF-12274, P. micromegethes and Paedocypris sp. Banka was placed in the second clade sister to representatives from the genus Sundadanio. This finding shows that the genetically closest relationship exists between Paedocypris and Sundadanio. We found that the sequence of P. progenetica is almost identical to that of P. carbunculus, likely due to a misidentification in the GenBank specimen.
Mitogenomic phylogeny has placed Paedocypris as a sister group of Sundadanio with high support (bootstrap 100% and posterior probability 1.00). This Paedocypris-Sundadanio clade was also resolved with high support (bootstrap 100% and posterior probability 1.00) as a sister group to the remaining Danioninae. This topology is consistent with the phylogenies constructed using cytochrome b, cytochrome c oxidase I, opsin, and recombination activating gene 1, respectively (Ruber et al., 2007;Fang et al., 2009;Tang et al., 2010Tang et al., , 2013. This topology is dissimilar to the findings of Yang et al. (2015) using nuclear and mitochondrial genes which place Paedocypris within Cyprinidae but not within Danioninae. Elsewhere, based on nuclear genes and multi-locus analysis Paedocypris was designated as lineage sister to Cypriniformes (Mayden and Chen, 2010;Stout et al., 2016;Hirt et al., 2017). Notably, the miniature taxa in our mitogenomic phylogenies are widely separated from each other. Danionella dracula is a sister to the Danio clade; Danio erythromicron is a member of the Danio clade; and Paedocypris + Sundadanio form the sister group to the remainder of the Danioninae. A clade of miniature taxa is refuted in our observations, with the phylogeny suggests morphological homoplasy. Convergent evolution of miniaturization has occurred recurrently in the Danioninae from a common ancestor leading to multiple distinct clades of miniature fish.
In this study, the complete mitochondrial genomes of the miniature fish genus Paedocypris (P. micromegethes and P. carbunculus) were analyzed and compared with other members of the Danioninae and Cyprinidae. Our results indicate that mitochondrial genome features, including genome size, gene content and gene arrangement in Paedocypris spp. were not influenced by miniaturization event. The placement of Paedocypris, Sundadanio, and Danionella in different positions in the phylogenetic tree provides clue to the convergent evolutionary trajectory of the miniature taxa. The complete mitogenome information including gene content, structure, gene arrangement, and phylogenetic presented here provide a basis for population genetic and evolutionary biology of the miniature fish and related groups.

DATA AVAILABILITY STATEMENT
The complete mitogenome sequences of P. micromegethes and P. carbunculus have been deposited in the Genbank under the accession numbers MT909824 and MT909825, respectively.

ETHICS STATEMENT
All experiments procedures for this study complied with the current animal ethics guidelines and were approved by the USM Institutional Animal Care and Use Committee (USM IACUC).

AUTHOR CONTRIBUTIONS
AS-C and N-SL conceived and designed the experiments. ZM and RN collected and identified the samples. K-KS and N-SL performed the experiments, analyzed the data, and wrote the manuscript. K-KS prepared the figures. All authors reviewed the manuscript.