Revealing Hidden Diversity of the Underestimated Neotropical Ichthyofauna: DNA Barcoding in the Recently Described Genus Megaleporinus (Characiformes: Anostomidae)

Molecular studies have improved our knowledge on the neotropical ichthyofauna. DNA barcoding has successfully been used in fish species identification and in detecting cryptic diversity. Megaleporinus (Anostomidae) is a recently described freshwater fish genus within which taxonomic uncertainties remain. Here we assessed all nominal species of this genus using a DNA barcode approach (Cytochrome Oxidase subunit I) with a broad sampling to generate a reference library, characterize new molecular lineages, and test the hypothesis that some of the nominal species represent species complexes. The analyses identified 16 (ABGD and BIN) to 18 (ABGD, GMYC, and PTP) different molecular operational taxonomic units (MOTUs) within the 10 studied nominal species, indicating cryptic biodiversity and potential candidate species. Only Megaleporinus brinco, Megaleporinus garmani, and Megaleporinus elongatus showed correspondence between nominal species and MOTUs. Within six nominal species, a subdivision in two MOTUs was found, while Megaleporinus obtusidens was divided in three MOTUs, suggesting that DNA barcode is a very useful approach to identify the molecular lineages of Megaleporinus, even in the case of recent divergence (< 0.5 Ma). Our results thus provided molecular findings that can be used along with morphological traits to better define each species, including candidate new species. This is the most complete analysis of DNA barcode in this recently described genus, and considering its economic value, a precise species identification is quite desirable and fundamental for conservation of the whole biodiversity of this fish.


INTRODUCTION
Neotropical freshwater fishes have a remarkable diversity, exceeding 8000 species (Reis et al., 2016), however, much taxonomic uncertainty exists leading to underestimated diversity (Pereira et al., 2013;Reis et al., 2016). Molecular studies have been crucial to improve our knowledge on the ichthyofauna, and DNA barcoding has successfully been used in fish species identification and in detecting species of taxonomic concerns or cryptic diversity (Pereira et al., 2013;Gomes et al., 2015;Ramirez and Galetti, 2015;Machado et al., 2016). Within the neotropical freshwater fishes, the order Characiformes represents more than 30% of the known species, and Anostomidae is one of the most species-rich families, occurring in all major hydrographic basins, with trans-and cis-Andean distribution in South America (Reis et al., 2003).
Comprising approximately 150 described species, distributed in 15 genera (Garavello and Britski, 2003;Sidlauskas and Vari, 2008;Ramirez et al., 2017), the known diversity of the Anostomidae has increased in recent years. For instance, 14 species and 1 genus were described only in the last 5 years Burns et al., 2014). DNA barcoding has revealed taxonomic uncertainties within the genus Laemolyta (Ramirez and Galetti, 2015), and molecular phylogeny has helped to provide an understanding of the evolutionary history of the Anostomidae (Ramirez and Galetti, 2015;Ramirez et al., 2016Ramirez et al., , 2017. Recently, the genus Megaleporinus (Ramirez et al., 2017) was described to include 16 lineages, corresponding to 10 nominal species, previously recognized in Leporinus or Hypomasticus (Ramirez et al., 2017). Megaleporinus is supported by cytogenetic, molecular, and morphological data. It is characterized by having a unique ZZ/ZW sex chromosome system (Galetti et al., 1995), while most cytogenetically known Leporinus species have no sex chromosomes (Galetti et al., 1981(Galetti et al., , 1991. Its monophyly is also well supported by mitochondrial and nuclear markers, which identified it as the sister group to Abramites (Ramirez et al., 2017). Concerning its morphology, Megaleporinus is characterized by being relatively large (adults usually reaching more than 35 cm standard length, including the largest species of the family), three teeth on each premaxillary and dentary bones, and a color pattern of one to three dark mid-lateral blotches (Ramirez et al., 2017). Because of its large size, Megaleporinus has an economic importance in subsistence fisheries and aquaculture (Garavello and Britski, 2003).
Recent studies indicate that there is a hidden biodiversity within Megaleporinus that needs to be better understood (Avelino et al., 2015;Ramirez et al., 2017). A study based on mitochondrial and nuclear markers, but using few individuals for each species, showed that several nominal species allocated to this genus comprise two or more molecular lineages allopatrically distributed in different basins (Ramirez et al., 2017).
In this study, we used a DNA barcoding approach to generate a reference library for Megaleporinus, assessing all nominal species and lineages previously described. We included a broad sampling for most of the species. Our hypothesis is that DNA barcoding support the observation that some of the nominal species represent species complexes with most molecular operational taxonomic units (MOTUs) allopatrically distributed in different basins, as proposed by Ramirez et al. (2017). Identifying such hidden biodiversity within this genus, this paper will contribute to a more complete understanding of its diversity and to the conservation of this important fish group.

Sampling
Animals were collected on public land, handled and killed under permission (ICMBIO/MMA N • 32215) provided by the Environment Ministry (MMA). This study did not involve endangered or protected species. Fish were collected by fishing rods and gillnets. No ethics committee approval is required for these organisms in Brazil. Fish were killed in the field using cold water and immediately transferred onto ice. Tissue samples were collected after fish death was confirmed through lack of operculum movement.
Specimens from several populations of all Megaleporinus species were used in this study, totaling 79 samples of the 10 nominal species, and comprising the 16 molecular lineages described by Ramirez et al., 2017 (Figures 1, 2 and Table 1). Voucher numbers are provided for the specimens (Table 1). Additionally, previous DNA barcode sequences of specimens from the São Francisco (Carvalho et al., 2011), Paraná (Pereira et al., 2013), Paranapanema (Frantine-Silva et al., 2015), and lower Paraná basins (Díaz et al., 2016) were included in our data set giving a total of 116 sequences (Figures 1, 2 and Table 1).

DNA Extraction, Amplification, and Sequencing
Total DNA was extracted from tissues (fins, muscle, or liver) by the standard phenol-chloroform method (Sambrook et al., 1989). A fragment of Cytochrome Oxidase subunit I (COI; 698 bp) was amplified via polymerase chain reaction (PCR) using primers AnosCOIF and AnosCOIR (Ramirez and Galetti, 2015). PCR products were sequenced for both strands using an ABI 3730xl (Applied Biosystems, Waltham, MA, United States) automatic sequencer. Contigs were assembled and edited using BioEdit (Hall, 1999). All sequences were evaluated manually, deleting regions of low quality. All sequences were verified to represent the COI gene and were checked for indels and stop codons. GenBank (Benson et al., 2017) accession numbers are given in Table 1. All information about specimen, sequences, and electropherograms were deposited in a data set of The Barcode of Life Database platform (BOLD) with code DS-MGLEP.

DNA Barcode Analysis
The general mixed Yule coalescent (GMYC) model (Pons et al., 2006) with a single threshold, implemented in the splits packages  in the R 3.3.3 statistical software (R Core Team, 2017), was used to infer MOTUs. For the GMYC input, an ultrametric tree was generated using Beast 2.4.3 (Bouckaert et al., 2014), with a lognormal relaxed clock, a birth and death model, and a GTR+G substitution model, chosen using jModeltest 2 (Darriba et al., 2012), using 50 million MCMC generations and a burn-in of 10%. Poisson tree processes (PTP) model (Zhang et al., 2013) was used for MOTUs delimitation through the bPTP server 1 , using default values. The bPTP server includes a Bayesian implementation of the PTP model and the original maximum likelihood PTP. For the PTP input, a tree was generated using Beast 2.4.6 (Bouckaert et al., 2014), with a strict clock, a birth and death model, and the GTR+G substitution model, using 50 million MCMC generations and a burn-in of 10%.
Additionally, two cluster algorithms were used, the Barcode Index Number System (BIN) (Ratnasingham and Hebert, 2013) and Automatic Barcode Gap Discovery (ABGD) (Puillandre et al., 2012). The BIN was automatically determined in the BOLD 1 http://species.h-its.org/ptp/ Workbench, while the ABGD was performed using Kimura-2parameter (K2P) distance and default values through the web interface 2 .
COI intraspecific and interspecific genetic distances were estimated using the K2P model implemented in Mega 6.0 (Tamura et al., 2013). These values were used to calculate the mean, minimum, and maximum values for intra-and inter-MOTU distances, and intra-and interspecific distances (nominal species). A genetic distance neighbor-joining (NJ) tree analysis was performed based on the K2P substitution model in Mega 6.0 (Tamura et al., 2013).

RESULTS
The alignment of COI sequences resulted in 600 characters with 158 parsimony informative sites (included in the Supplementary Material). The GMYC analysis resulted in 18 MOTUs (Confidence interval: 16-18) ( Table 2). The GMYC model was preferred over the null model (likelihood ratio = 73.49, P < 0.0001), indicating that GMYC results are reliable. The PTP analyses (maximum likelihood and Bayesian implementation) resulted in the same 18 MOTUs obtained in GMYC. The ABGD analysis found six partitions with 27 (P = 0.001) to 16 groups (P = 0.01), including a partition with the same 18 MOTUs (P = 0.005) obtained in the GMYC and PTP analyses. The BOLD system determined 16 BINs (Table 2) Only Megaleporinus brinco , Megaleporinus garmani (Borodin, 1929), and Megaleporinus elongatus (Valenciennes, 1850) showed correspondence between nominal species and MOTUs. Within six nominal species, a subdivision in two MOTUs was found, while Megaleporinus obtusidens (Valenciennes, 1837) was divided in three MOTUs ( Table 2).
The mean of intra-MOTU and maximum intra-MOTU distances, the nearest neighbor (NN), and the minimum distance to the NN are shown in Table 2, for both GMYC MOTUs and nominal species.
The overall mean of intra-MOTU distances was 0.03%, the maximum intra-MOTU distance was 0.5% (M. obtusidens), and the mean of inter-MOTU distances was 9.19%. The lowest and highest values of inter-MOTU distances were 0.67 and 15.31%, respectively. Considering these values, there is a barcode gap that allowed identifying successfully all MOTUs using COI distance. In contrast, when only the nominal species were considered, the maximum intraspecific distance increased to 15.31% [M. muyscorum (Steindachner, 1900)], and, in addition, no barcode gap was found.
Frontiers in Genetics | www.frontiersin.org    The mean and the maximum of intra-group distances, the nearest neighbor (NN), and the minimum distance to the NN for MOTUs (ABGD, GMYC, and PTP) and Nominal species. Frontiers in Genetics | www.frontiersin.org Ramirez et al. (2017). This high number of MOTUs contrasts with the 10 nominal species recognized in the genus thus far, showing several potential target for cryptic species to be described, reinforcing the general idea that there is still a lot of undocumented diversity within the neotropical ichthyofauna (Reis et al., 2016). The difference between the number of MOTUs detected is due to the lower genetic distance value (0.67%) between two pairs of MOTUs: M. reinhardti and M. cf. reinhardti, separating the genetic lineages from São Francisco and Itapicuru, respectively, and between M. piavussu and M. cf. piavussu lower Paraná. These lower genetic distance values are likely due to a recent divergence between these MOTUs [<0.5 Ma for M. reinhardti and M. cf. reinhardti according to Ramirez et al. (2017)]. Of note, besides presenting an allopatric distribution, these MOTUs were also recovered by the monophyly criterion (Figure 3). MOTUs with recent origin have less time to accumulate genetic differences than species with ancient origin, hindering their identification. Despite this low genetic distance, the species delimitation methods could delimit these MOTUS, especially those based on phylogenetic trees (GMYC and PTP).
A key aspect implicit in the DNA barcoding analysis is the genetic distance threshold used to define MOTUs. COI distances of 1% (Hubert et al., 2008) to 2% (Pereira et al., 2013) have been claimed as threshold to fish DNA barcode analysis. However, such values were derived from comparative analyses among phylogenetically diverse groups. For instance, 2% was used to characterize DNA barcoding of a fish community of a given river (Pereira et al., 2013). However, when the DNA barcoding analyses have focused within a group of species closely related (e.g., a genus), lower threshold values have been reported (Carvalho et al., 2011;Pereira et al., 2011Pereira et al., , 2013Ramirez and Galetti, 2015). Particularly in Anostomidae, a lower threshold of 0.92% was reported to distinguish MOTUs within the genus Laemolyta (Ramirez and Galetti, 2015). Although most of the values obtained herein were above 2% (13 out of 18 MOTUs, Table 2), a maximum threshold of 0.67% for Megaleporinus was detected between the MOTUs obtained. It reinforces that lower genetic distance values might be obtained when intra-genus MOTUs are analyzed, mainly between recent divergent lineages.
In contrast to previous results (Avelino et al., 2015), evidence of local differentiation was not found here and all cryptic diversity correspond to inter-basin differentiation. Analyzing only two samples of M. reinhardti from the Três Marias (MG, Brazil) region (São Francisco basin), Avelino et al. (2015) reported an intraspecific distance of 3.8% between them, suggesting a local differentiation. Here we analyzed nine individuals, representing four different localities, including Três Marias region, and we found no genetic distance (0%) among them. Mitochondrial pseudogenes, sequencing errors, or misidentification could explain such discrepancies, and it would be more cautious to consider M. reinhardti from São Francisco as a single MOTU, as recovered here.
Similar discordance is observed for M. piavussu (upper Paraná). Avelino et al. (2015) included four samples from a single locality and reported a mean intraspecific distance of 2.8%. Our present data set for this species included 18 individuals obtained from six localities and showed a lower maximum intraspecific distance of 0.17%. It is strongly suggested that M. piavussu is also a single MOTU.
Incongruences were also observed within the nominal M. obtusidens. While four groups (A-D), showing 0.7-4.1% mean intraspecific distances, were previously reported (Avelino et al., 2015), we found three MOTUs showing 0-0.5% COI distances. The group D mentioned as part of M. obtusidens by Avelino et al. (2015), which included individuals caught downstream the Itaipú dam (Paraná basin), was recovered here as a sister group of M. piavussu, and was named M. cf. piavussu lower Paraná (Figure 3).
One particular aspect was highlighted in our results. Several individuals clustered in the M. macrocephalus clade were caught in different hydrographic basins, as Doce, São Francisco, Tocantins, and Paraná, outside of its original distribution in the Paraguay basin likely due to aquaculture releasing. Similar findings had already been described in the São Francisco basin (Carvalho et al., 2011). This species is a commercial important fish being extensively farmed throughout the Brazilian territory, and accidental or intentional releasing can occur (e.g., Langeani et al., 2007;Vieira, 2010). In such case, the use of DNA barcoding provides a rapid and accurate identification of this species and can be used in management and monitoring potential ecosystem disturbance caused by an invasive species.
In summary, the use of DNA barcoding points at the need for a taxonomic revision of this genus. A search for morphological traits able to support a taxonomic delimitation could be facilitated whether the MOTUs identified here are considered. A morphological trait showing a range of variation when searched within a given nominal species perhaps could be more informative if studied in each MOTU separately. In such case, our results would give an important contribution for the taxonomy of Megaleporinus facilitating the search for decisive taxonomic characters. This is the most complete analysis of DNA barcode in this recently described genus, and considering the economic value of this group, a precise species identification is quite desirable and fundamental for conservation of the whole biodiversity of this genus.

AUTHOR CONTRIBUTIONS
JR and PG designed the research. JR, DC, PA, PV, HO, MC-A, and JR-P collected data. JR performed the analyses. All authors contributed to the writing of the manuscript.

FUNDING
The authors thank Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) for financial support (Universal