Mitochondrial Genome Structures and Phylogenetic Analyses of Two Tropical Characidae Fishes

The Characidae family contains the largest number of tropical fish species. Morphological similarities make species identification difficult within this family. Here, the complete mitogenomes of two Characidae fish were determined and comparatively analyzed with those of nine other Characidae fish species. The two newly sequenced complete mitogenomes are circular DNA molecules with sizes of 16,701 bp (Hyphessobrycon amandae; MT484069) and 16,710 bp (Hemigrammus erythrozonus; MT484070); both have a highly conserved structure typical of Characidae, with the start codon ATN (ATG/ATT) and stop codon TAR (TAA/TAG) or an incomplete T−−/TA−. Most protein-coding genes of the 11 Characidae mitogenomes showed significant codon usage bias, and the protein-coding gene cox1 was found to be a comparatively slow-evolving gene. Phylogenetic analyses via the maximum likelihood and Bayesian inference methods confirmed that H. amandae and H. erythrozonus belong to the family Characidae. In all Characidae species studied, one genus was well supported; whereas other two genera showed marked differentiation. These findings provide a phylogenetic basis for improved classification of the family Characidae. Determining the mitogenomes of H. erythrozonus and H. amandae improves our understanding of the phylogeny and evolution of fish species.


INTRODUCTION
The mitochondrion is an organelle that can directly convert organic matter into energy to support the biological activities of a cell (Avise et al., 1987;Wataru et al., 2013;Strohm et al., 2015;Parhi et al., 2019). It is the main site of ATP production and oxidative phosphorylation in eukaryotic cells (Wilson et al., 1985). Mitochondria possess mitochondrial DNA (mtDNA), which has a closed circular double-stranded structure and self-replicates semi-conservatively (Prosdocimi et al., 2012;Paz et al., 2014). Mitochondrial DNA is considered the second genetic information system of eukaryotic cells (Kim et al., 2008;Cooke et al., 2012;Zhao et al., 2015;Ruan et al., 2020). Compared with nuclear DNA, it is a relatively independent replication unit characterized by a small size, simple structure, maternal inheritance, rapid evolution, limited recombination, and variability in evolutionary rate at different loci (Harrison, 1989;Javonillo et al., 2010). These characteristics make mitochondrial genome (mitogenome) a valuable resource for studying DNA structure and gene expression, as well as for understanding the evolution and phylogenetic distribution of species.
Recent advances in molecular biology, such as secondgeneration sequencing technologies, have facilitated the sequencing of fish mtDNA, thereby providing a clearer understanding of the structures of fish mitogenomes, which are between 16 and 18 kb. Structurally, protein coding (PCG), transfer RNA (tRNA), and ribosomal RNA (rRNA) genes as well as the noncoding regions of fish mitogenomes are highly conserved, but gene intervals and lengths vary between species (Gray, 1989;Kim et al., 2009;Alam et al., 2014). As mtDNA sequences of many fish species have been determined, mitogenomes have become popular molecular guides in phylogenetic and evolutionary studies of fishes (Brown et al., 1979;Wang et al., 2016;Wu et al., 2020).
The Characidae family contains the largest number of species among tropical fishes and belongs to class Actinopterygii and order Characiformes (Wilkens, 1988;Mirande, 2010). This family is mainly found in the freshwater rivers and lakes of Africa and America and inhabits habitats with slow water flows. The fish species of this family are characterized by a small adipose fin on the caudal stalk. Most Characidae are small and harmless; only a few species are predatory. Because of the small body size and colorful markings, Characidae is the most popular tropical fish family raised. With the rapid development of the global trade market of ornamental fish, increase in fishing activities, and deterioration of ecological habitats, the natural resources that Characidae fish depend on have been seriously damaged. Accurate species identification and understanding of the systematic relationships among species are useful for protecting existing species and discovering new species. However, the classification of Characidae species remains difficult because of the morphological similarities among many species (Wilkens, 1971;Langerhans et al., 2003;Oliveira et al., 2011;Barreto et al., 2017).
As there are many types of ornamental fish available in the market and hybrid species are widespread, two common fishes were selected for this study. The main reason for choosing Glowlight tetras (Hemigrammus erythrozonus) and Ember tetra (Hyphessobrycon amandae) is that these two fishes are representative and common in the family Characidae, and their morphological characterization is relatively accurate. In this study, the complete mitochondrial genomes of two tropical fishes were sequenced, assembled, and annotated. The genome organization, gene contents, repeat sequences, and tRNA structures of the two newly sequenced mitogenomes were compared and analyzed. The mitogenomes of these two fishes were compared with those of nine other Characidae species to identify the similarities and differences in their gene orders, genetic structures, base compositions, evolutionary features, and codon usage. Additionally, phylogenetic analysis of various Characiformes species was carried out using a combined mitochondrial gene set. The mitogenomes of the two Characidae species improve our phylogenetic and evolutionary understanding of Characidae fishes.

Samples and DNA Extraction
The two specimens were collected from the Nanjing Qiqiaoweng flower and bird market, Jiangsu province, China (32 • 0 27.1 N, 118 • 50 11.5 E). Morphological identification was conducted during the sampling according to the latest taxonomic classification of fish. As these two species were collected from an ornamental fish market, the geographic data about the specific origins of the species are unknown. Total genomic DNA from the samples was extracted using a FastPure Cell/Tissue DNA Isolation Mini Kit V7.1 (Vazyme Biotech Co., Ltd., Nanjing, China) (Chen et al., 2018). DNA integrity was evaluated via 1.5% agarose gel electrophoresis. DNA concentration and purity were assessed using a NanoDrop 2000 (NanoDrop Technologies, Wilmington, NC, United States).

PCR Amplification and DNA Sequencing
To amplify the mitogenomes of Hemigrammus erythrozonus and Hyphessobrycon amandae, nine pairs of specificity primers ( Table 1) were designed based on the published conserved nucleotide sequences of nine Characidae mitogenomes (Astyanax giton, Astyanax paranae, Gephyrocharax atracaudatus, Grundulus bogotensis, Hasemania nana, Hemigrammus bleheri, Oligosarcus argenteus, Paracheirodon axelrodi, and Paracheirodon innesi). For accurate sequencing and assembly of the complete mitogenomes, the overlap between adjacent fragments was designed to exceed 200-300 base pairs (bp). Because of the differences in the mitogenomes between the two species, specific primers were designed. PCR amplification was performed as described previously (Sun et al., 2019a). The PCR products were electrophoretically separated on a 1.5% agarose gel and subsequently purified and Sanger-sequenced by Tsingke Biotech (Tsingke Biotechnology Co., Ltd., Nanjing, China).

Genome Assembly and Annotation
DNA sequencing results were verified using NCBI BLAST (Johnson et al., 2008). Raw sequence data from the DNA fragments were screened and assembled using Lasergene 7.1 (DNAStar, Inc. Madison, WI, United States) to obtain the complete mitogenome sequences. The tRNAscan-SE v2.0  (Bernt et al., 2013) were used to verify the tRNA genes. Open reading frame finder (Master et al., 2016;Sun et al., 2019b) and the NCBI website were used to identify the protein-coding regions by using the default settings for the vertebrate mitochondrial code, and GenBank was used to translate the putative proteins. The sequences of the identified PCGs and rRNAs were analyzed and compared with those of other Characidae species.

Phylogenetic Analyses
To investigate the phylogenetic relationship between the two Characidae species, a phylogenetic tree of 24 Actinopterygii species ( Table 2) was constructed based on the combined mitochondrial gene set (13 PCGs + two rRNAs). MAFFT v7.313 (Katoh and Standley, 2013) was used to perform multiple-sequence alignment. The maximum likelihood (ML) and Bayesian inference (BI) methods were used for phylogenetic analysis. ModelFinder (Kalyaanamoorthy et al., 2017) was used to select the best-fit substitution model and best partitioning scheme, a greedy algorithm was adopted with the Akaike information criterion (Yamaoka et al., 1978). ML method was used to construct an evolutionary tree by using IQ-TREE v.1.6.8 (Nguyen et al., 2015) based on the GTR + R + F model. The BI method was used to construct an evolutionary tree by using MrBayes v3.2.6 (Ronquist et al., 2012) based on the GTR + I + G + F model. Two independent runs with four chains each were simultaneously conducted for ten million generations, with one tree sampled every 100 generations. The first 25% 2 https://chlorobox.mpimp-golm.mpg.de/OGDraw.html of the samples was discarded as burn-in, and the remaining trees were used to calculate the Bayesian posterior probabilities. FigTree v1.4.0 (Rambaut, 2015) was used to visualize and edit the resulting phylogenetic evolutionary trees.

Protein-Coding Genes
The total length of the PCGs in each of the 11 Characidae species ranged from 11,184 bp (P. axelrodi) to 11,435 (H. erythrozonus) ( Table 4). Among these 11 sequenced mitogenomes, one PCG (nad6) was encoded on the L-strand, whereas the remaining PCGs were located on the H-strand. The average A + T content of the PCGs in each of the 11 Characidae species varied from 56.9 (H. amandae) to 60.2% (G. bogotensis). Most PCGs used the conventional start codon ATN (ATG/ATT), except for H. erythrozonus cox1, which started with GTG. Within our two newly sequenced mitogenomes, only the cox1 and nad4L genes of H. amandae started with GTG (Table 3).
Most PCGs terminated with the codon TAR (TAA/TAG) or incomplete codon (TA−/T−−), except for the cox1 gene, which terminated with AGG, in both mitogenomes. As with Characidae mitogenomes, incomplete stop codons are commonly observed across fish mitogenomes (Cooper et al., 2001;Zhao et al., 2015), which may be related to post-transcriptional modification during mRNA maturation. The AT-skews (−0.095 to −0.045) of PCGs were similar among the 11 Characidae species (Table 4).
Excluding the stop codons, the mitogenome PCGs consisted of 3,718-3,801 codons (CDs) and showed very similar codon usage among the 11 Characidae species (Figure 2). Ile (283.64 ± 9.67 CDs), Thr (287.36 ± 12.52 CDs), Ala (331.36 ± 8.67 CDs), and Leu1 (CUN) (459.91 ± 29.62 CDs) were the four most predominant codon families. Among these, Leu1 (CUN) exhibited the highest usage bias (402-508 CDs), which may be associated with the coding function of the chondriosome. In contrast, Cys (27.27 ± 1.81 CDs) showed the least number of CDs. To gain an insight into the genetic codon bias of the 11 Characidae mitogenomes, the relative synonymous codon usage was evaluated. As shown in Figure 3, the usage of synonymous codons was biased for most amino acids. Moreover, the synonymous codon preferences for the 11 Characidae species were conserved, which may be attributed to their close relationship in the same fish family; these preferences have also been observed in some other fishes (Parhi et al., 2019). The two most commonly used codons in these 11 species were consistently AUU and CUU.
To analyze the evolutionary pattern of the PCGs, the ratio of Ka/Ks, nucleotide diversity, and K2P genetic distance across all Characidae mitogenomes were calculated for each aligned PCG. Among the PCGs detected, nad2 showed the largest K2P genetic distance among the 11 Characidae species (Figure 4A), followed by atp8 and atp6. As seen in Figure 4B, nad2 and atp8 had the highest nucleotide diversity; in contrast, cox1 and cox3 had the lowest nucleotide diversity. Similar to the nucleotide diversity, Ka/Ks value was the highest for nad2, followed by nad4, cox3, and nad3; the lowest value was observed for cox1 and cob ( Figure 4C). Notably, the Ka/Ks values were <1 in all the   PCGs, suggesting that all the PCGs have evolved under purifying selection. Based on the above-mentioned analyses, nad2 is the most rapidly evolving gene among Characidae mitochondrial PCGs, since it is under the least selection pressure. In contrast, cox1 is the most slowly evolving gene due to the highest selection pressure it is subjected to. The sizes of the 16S rRNA genes were 1,678 bp (H. erythrozonus) and 1,671 bp (H. amandae), and the 12S rRNA genes of both mitogenomes were 949 bp. The rRNA genes of Characidae mitogenomes were found to be highly conserved compared with those of other published fish mitogenomes (Javonillo et al., 2010;Zhao et al., 2015;Ruan et al., 2020), with the two rRNA genes located between trnL2 and trnF separated by trnV. The A + T contents of rRNA genes ranged from 55.1 to 57.2% among the 11 Characidae species (Table 4).
For the two newly sequenced Characidae mitogenomes, the typical 22 tRNAs were detected. Among them, 14 tRNAs were encoded on the H-strand, and the remaining eight on the L-strand. The sizes of the tRNA genes ranged from 66 bp (trnC) to 75 bp (trnL2) in both H. erythrozonus and H. amandae. The total lengths of the 22 tRNA genes ranged from 1,550 bp (P. innesi) to 1,529 bp (G. atracaudatus) among the 11 Characidae. As shown in Figure 5, all the tRNAs exhibited a typical clover-leaf secondary structure, except for trnS1 (GCT), which lacked the dihydrouridine arm, a feature generally present in Characidae fishes and vertebrate mitogenomes (Krajewski et al., 2010;Sun et al., 2020a,b).

Control Region
Compared with PCGs and rRNA genes, the CR displayed the highest variation and mutation rates throughout the mitogenomes; thus, this region was the dominant region for evaluating intraspecies variations. The CR has become a hotspot for phylogenetic research since this region shows the maximum mutation and fastest evolution rates in the whole mitogenomes. Similar to other fish mitogenomes, the CRs were found to be located between trnF and trnP in all the 11 Characidae species. The average A + T content (63.9-72.6%) of the CRs was higher than that of the whole genomes (57.1-60.1%), PCGs (56.9-60.2%), rRNAs (55.1-57.2%), or tRNAs (55.9-59.4%). Composition analysis revealed seven positive and four negative AT skew regions in the mitogenome CRs of the 11 Characidae species.

Phylogenetic Analyses
To determine the phylogenetic relationship between H. erythrozonus and H. amandae in the family Characidae, we selected the concatenated nucleotide sequences of the combined mitochondrial gene set (13 PCGs + two rRNAs) from 23 Characiformes species. Additionally, we used Lateolabrax japonicas (Lavoue et al., 2014) as an outgroup because it belongs to the order Perciformes and family Moronidae. As shown in Figure 6 and Supplementary Figures 1, 2, the phylogenetic analysis of the two tree models (BI and ML) by using the combined mitochondrial gene set well supported the tree topologies and yielded identical results. All the major clades were supported in the preferred trees by the analysis.
Although the experimental samples were from an animal market and there is a lack of comparison among wild samples, mitochondria are inherited from the maternal line, and we have a good morphological classification basis. Therefore, we believe that, even if samples are gathered from an animal market, the corresponding results will not be compromised by analysis bias as long as the morphological identification is performed well.   (Mirande, 2019;Montero-Mendieta and Dheer, 2019), our study proves that P. brachypomus and S. brasiliensis do not belong to the family Characidae. Piaractus is a member of Serrasalmidae, and Salminus is a member of Bryconidae. A. paranae and O. argenteus form a well-supported clade. Likewise, P. axelrodi and P. innesi form a separate well-supported clade (PP = 1; BP = 100). In all the Characidae species studied, one genus was well supported (P. axelrodi and P. innesi), and the other two genera diverged (A. giton and A. paranae, and H. erythrozonus and H. bleheri). This two genera have been discussed in a recent taxonomic study. The taxonomic status of three species has been reassessed: Hemigrammus bleheri should be Petitella bleheri (Bittencourt et al., 2020), renamed Astyanax giton as Deuterodon giton, and Astyanax paranae as Psalidodon paranae (Terán et al., 2020). These results indicated that the taxonomic status of the family Characidae is currently unresolved, and morphological classification combined with the usage of mitogenomes and other molecular markers are needed for comprehensive classification . These findings provide a phylogenetic basis for improved classification of the family Characidae. The newly sequenced mitogenomes of the two species (H. erythrozonus and H. amandae) improve our understanding of the phylogeny and evolution of fish species.

DATA AVAILABILITY STATEMENT
The data presented in this study can be found in GenBank with accession numbers MT484070 and MT484069.

ETHICS STATEMENT
The animal study was reviewed and approved by the Ethics Committee of the Nanjing Forestry University.

AUTHOR CONTRIBUTIONS
H-YL, B-PH, and C-HS contributed to the experimental design. NX and X-LZ were involved in the sample collection and preprocessing. C-HS contributed to the data analysis and image editing. H-YL and C-HS drafted the manuscript. B-PH, QZ, and C-HS reviewed and edited the manuscript. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
We kindly acknowledge two reviewers for their fruitful and critical comments and would like to thank Editage (www.editage. com) for the support on language-editing.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.