# GENETICS, EVOLUTION, AND CONSERVATION OF NEOTROPICAL FISHES

EDITED BY : Rodrigo A. Torres and Roberto Ferreira Artoni PUBLISHED IN : Frontiers in Genetics

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88963-335-7 DOI 10.3389/978-2-88963-335-7

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# GENETICS, EVOLUTION, AND CONSERVATION OF NEOTROPICAL FISHES

Topic Editors: Rodrigo A. Torres, Federal University of Pernambuco, Brazil Roberto Ferreira Artoni, Universidade Estadual de Ponta Grossa, Brazil

Fish represent the most ancestral and specious group of vertebrates, and occupy more diverse aquatic environments around the world. Ichthyofauna is extremely diverse, especially in megadiverse countries occupying biogeographical regions such as the Neotropical Region, which covers an extensive area between North and South America. Much of this biodiversity will be extinct, even before science knows any aspect of its biology. Like this, Neotropical fish genetics started in the end of the 70's with papers studying the chromosomes of Hoplias malabaricus (Family Erythrinidae) and the karyotype variation among three genera of the family Anostomidae. The topic at that time was concentrated in two Institutions from the state of São Paulo, Southeastern Brazil. In the middle 80's, the first Symposium on Neotropical Fish Cytogenetics was organized. Nowadays, the field of Neotropical Fish Genetics is present in Brazil, Colombia, Argentina, Uruguai, Venezula, Chile, and Equador, as well as outside South America in Panama, Mexico, USA, Canada, Czech Republic, Germany, and Spain. The research developed in cytogenetics has focused mainly on karyotype evolution and cytotaxonomy, chromosome structure and, more recently, cytogenomics. In relation to the use of molecular markers, support has been sought for the management of populations for conservation or production in captivity. In addition, many studies have been carried out with the aim of establishing supra-specific phylogenetic relationships and clarifying species distribution scenarios by phylogeographic modeling. The genome and transcriptome of some model species begin to emerge as extremely promising and informative areas for neotropical fish.

In 2017, the Neotropical fish genetics research community celebrates the 30th anniversary of its main Meeting (today entitled Symposium on Neotropical Fish Genetics and Cytogenetics). This Research Topic is part of this celebration and aims at reporting the state of the art and its current advances in the frontier of knowledge in genetics, evolution, and conservation of neotropical fish, as well as to detect the challenges to be overcome in the next years.

Citation: Torres, R. A., Artoni, R. F., eds. (2020). Genetics, Evolution, and Conservation of Neotropical Fishes. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88963-335-7

# Table of Contents


Ricardo Utsunomia, Francisco J. Ruiz-Ruano, Duílio M. Z. A. Silva, Érica A. Serrano, Ivana F. Rosa, Patrícia E. S. Scudeler, Diogo T. Hashimoto, Claudio Oliveira, Juan Pedro M. Camacho and Fausto Foresti


Milena Ferreira, Caroline Garcia, Daniele A. Matoso, Isac S. de Jesus, Marcelo de B. Cioffi, Luiz A. C. Bertollo, Jansen Zuanon and Eliana Feldberg

*45 A Flow Cytometry Protocol to Estimate DNA Content in the Yellowtail Tetra* Astyanax altiparanae

Pedro L. P. Xavier, José A. Senhorini, Matheus Pereira-Santos, Takafumi Fujimoto, Eduardo Shimoda, Luciano A. Silva, Silvio A. dos Santos and George S. Yasui

*53 Revealing Hidden Diversity of the Underestimated Neotropical Ichthyofauna: DNA Barcoding in the Recently Described Genus*  Megaleporinus *(Characiformes: Anostomidae)*

Jorge L. Ramirez, Jose L. Birindelli, Daniel C. Carvalho, Paulo R. A. M. Affonso, Paulo C. Venere, Hernán Ortega, Mauricio Carrillo-Avila, José A. Rodríguez-Pulido and Pedro M. Galetti Jr.

*64 Low Genetic Diversity and Structuring of the* Arapaima *(Osteoglossiformes, Arapaimidae) Population of the Araguaia-Tocantins Basin*

Carla A. Vitorino, Fabrícia Nogueira, Issakar L. Souza, Juliana Araripe and Paulo C. Venere

*74 Hidden Diversity in the Populations of the Armored Catfish* Ancistrus *Kner, 1854 (Loricariidae, Hypostominae) From the Paraná River Basin Revealed by Molecular and Cytogenetic Data*

Ana C. Prizon, Daniel P. Bruschi, Luciana A. Borin-Carvalho, Andréa Cius, Ligia M. Barbosa, Henrique B. Ruiz, Claudio H. Zawadzki, Alberto S. Fenocchio and Ana L. de Brito Portela-Castro

*87 Headwater Capture Evidenced by Paleo-Rivers Reconstruction and Population Genetic Structure of the Armored Catfish (*Pareiorhaphis garbei*) in the Serra do Mar Mountains of Southeastern Brazil* Sergio M. Q. Lima, Waldir M. Berbel-Filho, Thais F. P. Araújo,

Henrique Lazzarotto, Andrey Tatarenkov and John C. Avise

*95 Genetic Diversity of an Imperiled Neotropical Catfish and Recommendations for its Restoration*

Fernando S. Fonseca, Rodrigo R. Domingues, Eric M. Hallerman and Alexandre W. S. Hilsdorf

#### *107 First Chromosomal Analysis in Hepsetidae (Actinopterygii, Characiformes): Insights Into Relationship Between African and Neotropical Fish Groups*

Pedro C. Carvalho, Ezequiel A. de Oliveira, Luiz A. C. Bertollo, Cassia F. Yano, Claudio Oliveira, Eva Decru, Oladele I. Jegede, Terumi Hatanaka, Thomas Liehr, Ahmed B. H. Al-Rikabi and Marcelo de B. Cioffi

*119 Population Genetic Structure of* Cnesterodon decemmaculatus *(Poeciliidae): A Freshwater Look at the Pampa Biome in Southern South America*

Aline M. C. Ramos-Fregonezi, Luiz R. Malabarba and Nelson J. R. Fagundes

*129 Genetic Pattern and Demographic History of* Salminus brasiliensis*: Population Expansion in the Pantanal Region During the Pleistocene*

Lívia A. de Carvalho Mondin, Carolina B. Machado, Emiko K. de Resende, Debora K. S. Marques and Pedro M. Galetti Jr.

*137 Microsatellites Associated With Growth Performance and Analysis of Resistance to* Aeromonas hydrophila *in Tambaqui* Colossoma macropomum

Raquel B. Ariede, Milena V. Freitas, Milene E. Hata, Vito A. Mastrochirico-Filho, Fabiana Pilarski, Sergio R. Batlouni, Fábio Porto-Foresti and Diogo T. Hashimoto


Lenice Souza-Shibatta, Thais Kotelok-Diniz, Dhiego G. Ferreira, Oscar A. Shibatta, Silvia H. Sofia, Lucileine de Assumpção, Suelen F. R. Pini, Sergio Makrakis and Maristela C. Makrakis

*165 Identification of a New Mullet Species Complex Based on an Integrative Molecular and Cytogenetic Investigation of* Mugil hospes *(Mugilidae: Mugiliformes)*

Mauro Nirchio, Fabilene G. Paim, Valentina Milana, Anna R. Rossi and Claudio Oliveira


Alexandr Sember, Luiz A. C. Bertollo, Petr Ráb, Cassia F. Yano, Terumi Hatanaka, Ezequiel A. de Oliveira and Marcelo de Bello Cioffi

*236 High-Throughput Sequencing Strategy for Microsatellite Genotyping Using Neotropical Fish as a Model* Juliana S. M. Pimentel, Anderson O. Carmo, Izinara C. Rosse,

Ana P. V. Martins, Sandra Ludwig, Susanne Facchin, Adriana H. Pereira, Pedro F. P. Brandão-Dias, Nazaré L. Abreu and Evanguedes Kalapothakis


Karine O. Bonato, Priscilla C. Silva and Luiz R. Malabarba


Bruno F. Melo, Beatriz F. Dorini, Fausto Foresti and Claudio Oliveira

*289 Trends in Karyotype Evolution in* Astyanax *(Teleostei, Characiformes, Characidae): Insights From Molecular Data*

Rubens Pazza, Jorge A. Dergam and Karine F. Kavalco

*300 Molecular Identification of Shark Meat From Local Markets in Southern Brazil Based on DNA Barcoding: Evidence for Mislabeling and Trade of Endangered Species*

Fernanda Almerón-Souza, Christian Sperb, Carolina L. Castilho, Pedro I. C. C. Figueiredo, Leonardo T. Gonçalves, Rodrigo Machado, Larissa R. Oliveira, Victor H. Valiati and Nelson J. R. Fagundes

*312 Hidden Diversity Hampers Conservation Efforts in a Highly Impacted Neotropical River System*

Naiara G. Sales, Stefano Mariani, Gilberto N. Salvador, Tiago C. Pessali and Daniel C. Carvalho

#### *323 Remarkable Geographic Structuring of Rheophilic Fishes of the Lower Araguaia River*

Tomas Hrbek, Natasha V. Meliciano, Jansen Zuanon and Izeni P. Farias

*335 A Multilocus Approach to Understanding Historical and Contemporary Demography of the Keystone Floodplain Species* Colossoma macropomum *(Teleostei: Characiformes)*

Maria da Conceição Freitas Santos, Tomas Hrbek and Izeni P. Farias

# Editorial: Genetics, Evolution, and Conservation of Neotropical Fishes

*Rodrigo Augusto Torres1 and Roberto Ferreira Artoni2\**

1 Laboratório de Genômica Evolutiva e Ambiental, Departamento de Zoologia, Centro de Biociências, Universidade Federal de Pernambuco, Recife, Brazil, 2 Laboratório de Genética e Evolução, Departamento de Biologia Estrutural, Molecular e Genética, Universidade Estadual de Ponta Grossa, Ponta Grossa, Brazil

Keywords: Chromosome, molecular marker, Genetic population, Evolutionary, fish

**Editorial on the Research Topic**

#### **Genetics, Evolution, and Conservation of Neotropical Fishes**

Fishes are a more basal and speciose vertebrate group and they occupy the most diverse aquatic environments on earth. Fish are extremely diverse, especially in biologically mega diverse countries such as those in the Neotropical region, encompassing a very large area in the Americas. The amazing diversity alone generates great interest in fish biology, ecology and evolution, but despite that interest, many challenges remain. Today, new genetic methods are available to study evolutionary and ecological processes that have influenced and given rise to their wide diversity and the many habitats that these fish occupy. Consequently, these genetic methods can also serve as tools for better conservation and management. Today, we must ask ourselves what are the classic questions not yet answered? What new information is available and useful to answer those questions? Where will this new science lead us into the future? To begin answering those questions, we asked these, and other, questions of the scientific community with the goal of gathering together answers to those questions for the Frontiers in Genetics, which celebrates the 30th anniversary of the first Symposium on Cytogenetics and Fish Genetics.

Traditionally, karyotpye analysis is among the genetic methods most often applied to Neotropical fish. Since the 1970s, karyotypes studies have been used to describe diversity and thereby provide support for evolutionary systematics, taxonomy and the evolution of karyotypes themselves. Banding and classical chromosomal markers, now used with the fluorescence *in-situ* hybridization (FISH) method and its derivatives, have advanced Neotropical fish cytogenetics which has allowed for more extensive comparative analyses with African fish (Carvalho et al.) and to investigate incipient speciation (Ferreira et al.) and evolutionary divergences of sex chromosomes (Sember et al.), as well as complex chromosomal rearrangements, that are a recurrent feature found in many neotropical fish families (Machado et al.). With integration of cytogenetics and molecular markers, a new cycle of methodological developments has come to pass, especially with the inclusion of dates that became available through massive sequencing of DNA and RNA. Trends in karyotype evolution have been refined in several species complexes, including the genera *Astyanax* (Pazza et al.) and *Mugil* (Nirchio et al.), as well as permitting the uncovering of surprising and previously unknown diversity in the armored catfish *Ancistrus* (Prizon et al.). While a good beginning, bioinformatics use in the selection of satellite DNA markers and their chromosome localization is still growing (Utsunomia et al.). This tremendous growth of information and the availability of more robust methodological tools, and finding undiscovered diversity, have resulted in a growing concern for understanding the evolutionary history of this diversity as well as its conservation (Moritz, 2002). Due to rapidly declining biodiversity of Neotropical fish, integrating conservation genetics with data describing systematics and ecological information including details of the adaptive landscape, together will become important for future research. Recently, the application of genetic evolutionary analysis

# Edited and reviewed by:

Lior David, Hebrew University of Jerusalem, Israel

> \*Correspondence: Roberto Ferreira Artoni rfartoni@gmail.com

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

Received: 04 December 2018 Accepted: 17 October 2019 Published: 04 November 2019

#### Citation:

Torres RA and Artoni RF (2019) Editorial: Genetics, Evolution, and Conservation of Neotropical Fishes. Front. Genet. 10:1124. doi: 10.3389/fgene.2019.01124

1 **7** in fish conservation has shown its importance by demonstrating that commercially exploited species are losing genetic variability (Allendorf et al., 2014; Pinsky and Palumbi, 2014). Investigating taxonomic uncertainties and genetic diversity (evolutionary potential) of the many species of fish, diagnosing patterns of gene flow, and the processes that structure and fragment populations are among the most important topics for the conservation of fish in their natural environments. In this volume, we bring attention to analyses of genetic structure, population diversity, and research in evolutionary history. Conservation issues include problems that arise from hydroelectric dams that lead to the disappearance of rheophilic fish fauna that previously occupied the river being dammed (Hrbek et al.). This has resulted in reduced genetic diversity and changing population structure of geographical variants in the catfishes *Steindachneridion scriptum* (Paixão et al.) and *S. parahybae*, both of which are endangered (ICMBio/MMA, 2018) (Fonseca et al.). Additionally, dams are detrimental for the pirarucu, *Arapaima gigas*, cited by the European Union that set limits on the exploitation of vulnerable or potentially endangered species (CITES, 2017) (Vitorino et al.).

The current distribution of neotropical fish genetic lineages may reflect ancient geological events such as paleo-drains, or more recent phenomena such as stream head capture.A special south Brazil hydrological system (Pampa) resulted from a long history of tectonism, climate, and sea-level changes directly affecting the diversification of fish evolutionary lineages in this biome (Ramos-Fregonezi et al.). Migratory behavior and gene flow in *Prochilodus* show little genetic divergence, however it is possible to delimit mitochondrial lineages suggestive of distinct species (Melo et al.). Natural populations of *Colossoma macropomum* comprise a single and large panmitic population in the main channel of the Solimões-Amazonas River basin, and structured populations in the headwaters of the tributaries with the greatest genetic differentiation in relation the Bolivian sub-basin (da Conceição Freitas Santos et al.). In paleo-drainage reconstructions, two putative paleo-rivers in southeastern Brazil include threatened and endemic species (Lima et al.). Pleistocene climate changes were major historical events that had a lasting impact on South American biodiversity for the endangered *Salminus brasiliensis* (ICMBio/MMA, 2018), which is found in the Pantanal basin, in which there was a period of significant population expansion (de Carvalho Mondin et al.). The use of microsatellite markers has expanded conservation tools for endemic species and populations (Souza-Shibatta et al.), and when used in conjunction with markers for phylogenetic and phylogeographic analyses, can uncover sometimes complex, often alarming, scenarios (Silva-Santos et al.). Using microsatellites, derived from transcriptome sequencing, can also be important for genetic management of natural or captive-bred commercial stocks (Jorge et al.; Ariede et al.). New strands of research constantly expand the set of evolutionary inferences about Neotropical fish.Forensic genetics has also benefitted from new tools, and using DNA barcoding has assisted in the identification of meats from different shark species (some endangered) in the domestic market in Brazil (Almerón-Souza et al.). Sequencing of cytochrome oxidase subunit 1 (COI) has helped unravel trophic relationships by the molecular identification of the stomach contents of parasitic catfishes (Bonato et al.). DNA barcoding has also revealed species complexes, such as *Megaleporinus* (Ramirez et al.) and *Schizolecis guntheri* (Souza et al.), and is a promising tool for future studies of the taxonomy of Neotropical fish. New methodologies and applications for the study of Neotropical fish genetics are constantly being developed and applied in new ways. The chromosome set manipulation by polyploidization has proved to be an important tool in obtaining sterile individuals. Recently, a protocol was developed for using flow cytometry in the confirmation of triploidy in *Astyanax altiparanae* (Xavier et al.). Beyond doubt, the coming years will be stimulated by the popularization of massive sequencing of DNA and RNA and the development of user-friendly software for analysing large data sets will follow. Improved experimental design and sequencing strategies will be important and one-such strategy for the use of next-gen sequencing technology to genotype many individuals in parallel is described for Neotropical fish (Pimentel et al.). The integration of Omics information (genome, transcriptome, and proteome) with environmental data in the analysis of biological functions and epigenetic responses is an exciting development for understanding the evolutionary biology of Neotropical fish. For example, the typical Amazonian condition, where rivers with blackwater, clearwater, and whitewater converge, transcriptomic analysis indicated phenotypic plasticity in *Triportheus albus* that allows it to survive in all these environments (Araújo et al.). In this volume, the expression of reference genes was determined by qRT-PCR to set environmental analysis parameters for study of *Odontesthes humensis* (Silveira et al.). Stable genes (i.e., good as control) were distinguished from genes that are tissue specific and unstable in that study. Another recent advance is combining physiological and gene expression analyses to better understand the ecology of Neotropical fishes. Salinity tolerances in the marine *Odonthestes* species demonstrated different physiological adaptations resulting from evolutionary processes that occurred in oceanic and estuarine environments (Silveira et al.). MicroRNA use is still incipient, but the role of these molecules in the regulation of gene expression is being investigated in fishes (Herkenhoff et al.).

Here, we certainly have not described the universe of evolutionary biology of, or the tools with which to study, Neotropical fish. Yet, we hope that those interested in this area of biological research now have an overview of the state of the art and that they will be encouraged to take additional steps to extend our current understanding of the evolutionary biology, ecology and conservation of Neotropical fish.

We greatly appreciate the invaluable contribution of all reviewers as well as the efforts of Chief Editor Dr. Samuel A. Cushman of Frontiers in Genetics: Evolutionary and Population Genetics with Research Topic and the experts and support staff at Frontiers.

#### AUTHOR CONTRIBUTIONS

RA and RT contributed equally to the preparation of this editorial manuscript of the special volume edited by these guest editors.

# REFERENCES


Pinsky, M. L., and Palumbi, S. R. (2014). Meta-analysis reveals lower genetic diversity in overfished populations. *Mol. Ecol.* 23, 29–39. doi: 10.1111/mec.12509

**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Torres and Artoni. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# A Glimpse into the Satellite DNA Library in Characidae Fish (Teleostei, Characiformes)

Ricardo Utsunomia<sup>1</sup> \*, Francisco J. Ruiz-Ruano<sup>2</sup> , Duílio M. Z. A. Silva<sup>1</sup> , Érica A. Serrano<sup>1</sup> , Ivana F. Rosa<sup>1</sup> , Patrícia E. S. Scudeler<sup>1</sup> , Diogo T. Hashimoto<sup>3</sup> , Claudio Oliveira<sup>1</sup> , Juan Pedro M. Camacho<sup>2</sup> and Fausto Foresti<sup>1</sup>

<sup>1</sup> Department of Morphology, Institute of Biosciences, São Paulo State University, Botucatu, Brazil, <sup>2</sup> Departamento de Genética, Facultad de Ciencias, Universidad de Granada, Granada, Spain, <sup>3</sup> CAUNESP, São Paulo State University, Jaboticabal, Brazil

Satellite DNA (satDNA) is an abundant fraction of repetitive DNA in eukaryotic genomes and plays an important role in genome organization and evolution. In general, satDNA sequences follow a concerted evolutionary pattern through the intragenomic homogenization of different repeat units. In addition, the satDNA library hypothesis predicts that related species share a series of satDNA variants descended from a common ancestor species, with differential amplification of different satDNA variants. The finding of a same satDNA family in species belonging to different genera within Characidae fish provided the opportunity to test both concerted evolution and library hypotheses. For this purpose, we analyzed here sequence variation and abundance of this satDNA family in ten species, by a combination of next generation sequencing (NGS), PCR and Sanger sequencing, and fluorescence in situ hybridization (FISH). We found extensive between-species variation for the number and size of pericentromeric FISH signals. At genomic level, the analysis of 1000s of DNA sequences obtained by Illumina sequencing and PCR amplification allowed defining 150 haplotypes which were linked in a common minimum spanning tree, where different patterns of concerted evolution were apparent. This also provided a glimpse into the satDNA library of this group of species. In consistency with the library hypothesis, different variants for this satDNA showed high differences in abundance between species, from highly abundant to simply relictual variants.

Keywords: concerted evolution, repetitive DNA, in situ hybridization, satellite DNA, genome evolution

# INTRODUCTION

Citation:

Edited by:

Reviewed by: Marcelo De Bello Cioffi, Federal University of São Carlos,

Specialty section: This article was submitted to Evolutionary and Population Genetics,

> a section of the journal Frontiers in Genetics Received: 05 July 2017 Accepted: 26 July 2017 Published: 14 August 2017

Brazil Paloma Morán, University of Vigo, Spain \*Correspondence: Ricardo Utsunomia utricardo@ibb.unesp.br

Roberto Ferreira Artoni,

Ponta Grossa State University, Brazil

Utsunomia R, Ruiz-Ruano FJ, Silva DMZA, Serrano ÉA, Rosa IF, Scudeler PES, Hashimoto DT, Oliveira C, Camacho JPM and Foresti F (2017) A Glimpse into the Satellite DNA Library in Characidae Fish (Teleostei, Characiformes). Front. Genet. 8:103. doi: 10.3389/fgene.2017.00103

Eukaryotic genomes are composed of huge amounts of highly dynamic repetitive DNA sequences that may be dispersed throughout the genomes, e.g., transposable elements, or tandemly repeated, such as multigene families or satellite DNA (satDNA; Charlesworth et al., 1994; Jurka et al., 2005). satDNA constitutes a non-coding fraction of the genome, consisting in long arrays of tandemly repeated sequences, preferentially located on the heterochromatin of pericentromeric and subtelomeric chromosome regions, although their presence in euchromatic regions has already been reported (López-Flores and Garrido-Ramos, 2012; Plohl et al., 2012; Garrido-Ramos, 2015; Ruiz-Ruano et al., 2016). In general, satDNA sequences

**10**

constitute different families that vary in localization, constitution, unit size and abundance (Garrido-Ramos, 2015). Since these sequences are highly dynamic genomic segments being susceptible to quick changes, these elements are generally species- or genus-specific (Vicari et al., 2010; Garrido-Ramos, 2015). According to the "library hypothesis" specific groups of related organisms share a common library of satDNAs that might be independently amplified in those distinct genomes (Fry and Salser, 1977). Such events might cause rapid changes in satDNA distribution and abundance profiles, even in closely related species (Plohl et al., 2012).

Up to now, most studies of satDNAs in fish genomes have focused on the development of chromosomal markers for evolutionary studies on B and sex chromosomes (Mestriner et al., 2000; Jesus et al., 2003; Vicari et al., 2010; Utsunomia et al., 2016). However, the evolutionary trends of satDNAs in closely related fish species have not yet been well evaluated, mainly if we consider that almost all discovered satDNA analyzed until now seemed to represent species- or genus-specific sequences (Garrido-Ramos et al., 1999; Leclerc et al., 1999; de la Herrán et al., 2001; Lanfredi et al., 2001; Robles et al., 2004; Martins et al., 2006).

Next generation sequencing (NGS) has been extensively used for several applications, including the in-depth characterization of satDNA sequences by similarity-based read clustering (Macas et al., 2011; Ruiz-Ruano et al., 2016). Such strategy has been frequently used for de novo characterization of repetitive DNA sequences in different organisms (Novák et al., 2010; Macas et al., 2011; Pagán et al., 2012; Camacho et al., 2015; García et al., 2015; Ruiz-Ruano et al., 2016; Utsunomia et al., 2016). In a recent study, Utsunomia et al. (2016) used graph-based clustering of sequence reads and isolated seven satDNAs (MS1-MS7) from the characid fish Moenkhausia sanctaefilomenae, two of which (MS3 and MS7) were fully characterized and mapped on chromosomes to unveil B chromosome origin in this species. More recently, it was evidenced that one of these satellites, MS1 satDNA (from now on referred to as MsaSat01-177, to follow the nomenclature rules suggested in Ruiz-Ruano et al., 2016), was found in the genomes of other characid fishes, such as Astyanax paranae and A. mexicanus (Silva et al., submitted), indicating its intergenera conservation and thus providing an interesting opportunity to investigate the evolutionary dynamics of this satellite in closely related species within Characidae.

Characidae is the largest family of freshwater fishes and comprises more than 1000 species (Eschmeyer and Fong, 2017). The phylogenetic relationships of this family are highly controversial and several species were considered incertae sedis by different authors (Javonillo et al., 2010; Oliveira et al., 2011; Thomaz et al., 2015). During the last few years, different studies using morphological and molecular evidence showed that Characidae is a well supported group which is subdivided into three different monophyletic clades (clades A, B, and C) (Weitzman and Malabarba, 1998; Javonillo et al., 2010; Oliveira et al., 2011; Thomaz et al., 2015). However, phylogenetic hypotheses within each of these clades are still scarce or unavailable and many genera are suspected to be nonmonophyletic (Thomaz et al., 2015; Rossini et al., 2016).

Likewise, numerous cytogenetic studies were performed in representatives of this family during the last decades, which revealed extensive karyotype diversification at intra- and interspecies levels, including changes in diploid numbers, differential chromosomal location of multigene families and multiple origins of supernumerary chromosomes (Oliveira et al., 2009; Arai, 2011). However, the absence of satDNAs shared among species has impeded testing the main evolutionary hypotheses on this kind of repetitive DNA, such as concerted evolution and the library hypothesis (see above). Our main purpose here was to test these hypotheses on a satDNA shared between several Characidae species, using a combination of novel (Illumina sequencing) and traditional (PCR amplification, cloning, Sanger sequencing and FISH) approaches, in 10 species of Characidae fish belonging to A, B, and C clades. Therefore, our main objectives were: (i) delimiting the taxonomic spread of this satellite, (ii) comparing its chromosome abundance and localization between species, and (iii) investigating intra- and interespecific variation of MsaSat01- 177 at nucleotide and chromosomal levels. All this information provided new insights on concerted evolution and the library hypothesis.

# MATERIALS AND METHODS

#### Ethics Statement

Sampling was carried out on private lands and the owners gave permission to conduct this study. The animals were captured using nets, transported to the Laboratory, kept in a fish tank and were anesthetized before the analyses. The animals were collected in accordance with Brazilian environmental protection legislation (Collection Permission MMA/IBAMA/SISBIO—number 3245) and the procedures for sampling, maintenance and analysis of the fishes were performed in compliance with the Brazilian College of Animal Experimentation (COBEA) and was approved (protocols 405 and 504) by the BIOSCIENCE INSTITUTE//UNESP ETHICS COMMITTEE ON THE USE OF ANIMALS (CEUA).

#### Sampling, Chromosomal Preparations and DNA Extraction

In the present study, we analyzed ten allopatric Characidae species, namely Astyanax paranae, A. bockmanni, A. altiparanae, A. fasciatus, A. jordani, M. sanctaefilomenae, Hasemania kalunga and Hyphessobrycon bifasciatus, all of them belonging to clade C. In addition, Bryconamericus stramineus and Serrapinus notomelas, classified as clades A and B, respectively, were also analyzed (**Table 1**). The relationship between clades A, B, and C is represented in **Figure 1**. The available internal relationships among clade A species were not considered in this study, as several genera appear to be non-monophyletic. Cell suspensions from all species were already available in our laboratory from previous studies (Silva et al., 2013, 2014, 2016; Utsunomia et al., 2016), except for H. bifasciatus, H. kalunga, B. stramineus and S. notomelas whose karyotypes were analyzed here for the first time. Metaphase chromosomes were obtained from cell suspensions of the anterior kidney, according to Foresti et al.

TABLE 1 | Analyzed species in the present study and information regarding MsaSat01-177 distribution patterns.


2n, Diploid chromosome number. Sites, number of chromosomes showing the satDNA. c, clustered; nc, non-clustered.

(1981). Genomic DNA was extracted from muscle or liver, using the Wizard Genomic DNA Purification Kit (Promega), following the manufacturer's instructions.

#### Whole-Genome Sequencing and Characterization of Monomers from Raw Reads

MsaSat01-177 was previously discovered in the M. sanctaefilomenae genome using RepeatExplorer (Utsunomia et al., 2016). Here, in order to perform a thorough search for MsaSat01- 177 monomers in different genomic libraries, we used gDNA Illumina HiSeq2000 reads (2x101bp) from M. sanctaefilomenae and A. paranae stored in SRA (accession numbers SRR5839692 and SRR5461470, respectively). In addition, two individuals of A. fasciatus were sequenced on the Illumina MiSeq, yielding 2 × 250 bp paired-end reads. Firstly, in these three species, we performed a random sequence subsampling step of 5.000.000 paired-end reads per species. Detailed information about the used Illumina libraries is shown in **Table 2**. In addition, we used other

TABLE 2 | Genetic variation found in the monomers of MsaSat01-177 extracted from Illumina reads of three different species and PCR-amplified from other five different characid species.


Seq, Number of sequences in the primary fastq library. N, number of isolated monomers. Hap, number of haplotypes. Hd, haplotype diversity. π, nucleotide diversity.

gDNA Illumina MiSeq reads (2 × 250 bp) recently sequenced in our laboratory (data not shown) from several characiform fishes, including Anostomidae (Megaleporinus macrocephalus and Leporinus friderici), Crenuchidae (Characidium gomesi) and Serrasalmidae (Piaractus mesopotamicus), all belonging to the Characiformes order (Oliveira et al., 2011).

To obtain a detailed and reliable score of haplotype abundance for MsaSat01-177 sequences from the genomic libraries of A. paranae, A. fasciatus and M. sanctaefilomenae, we extracted complete monomers directly from the Illumina raw reads, as this is expected to provide accurate estimates of haplotype abundance without the bias of PCR amplification. For this purpose, we performed a series of bioinformatic workflows that included joining the paired-end reads, aligning them against the MsaSat01- 177 sequence and trimming the ends to get full monomers, as described in Utsunomia et al. (2016). Importantly, singletons (e.g., sequence variants found only once) were discarded at this stage of the analysis in order to minimize the impact of possible sequencing errors. Collected monomers from Illumina reads in the three species were aligned separately using the Muscle algorithm (Edgar, 2004), under default parameters, to be displayed as sequence logos using the WebLogo 3.3 software (Crooks et al., 2004). The obtained monomers were used for all downstream analyses in this study, except for RepeatExplorer (described below).

In order to investigate possible structural variation of MsaSat01-177 in these three species and to search for possible associations with other repetitive elements, we selected pairs of reads showing homology with this satDNA in each gDNA library separately, by using BLAT (Kent, 2002). This step is implemented in a custom script<sup>1</sup> . We then used the selected read pairs from each library to run RepeatExplorer clustering (Novák et al., 2013) with at least 2 × 2500 reads.

#### satDNA Amplification, Cloning and Sequencing

After complete characterization of the repetition unit of MsaSat01-177, different sets of divergent primer pairs were designed: MsaSat01F1 (5<sup>0</sup> -TTTTGACCATTCATGAAACCT TG-3<sup>0</sup> ) and MsaSat01R1 (5<sup>0</sup> -ACCAGAATCACATACCGCG G-3<sup>0</sup> ); MsaSat01F2 (5<sup>0</sup> -TGCCCATGCATTTTCCCACT-3<sup>0</sup> ) and MsaSat01R2 (5<sup>0</sup> -GAARGATTTCATGAAATTTYGC-3<sup>0</sup> ). PCR reactions were performed in 1x PCR buffer, 1.5 mM MgCl2, 200 µM each dNTP, 0.1 µM each primer, 2 pg–10 ng of DNA and 0.5 U of Taq polymerase (Invitrogen). The cycling program

<sup>1</sup>https://github.com/fjruizruano/ngs-protocols/blob/master/mapping\_blat\_gs.py

for amplification consisted of an initial denaturation at 95◦C for 5 min, followed by 30 cycles at 95◦C for 20 s, 63◦C for 30 s, 72◦C for 20 s and a final extension at 72◦C for 15 min. The PCR products were visualized in 2% agarose gels, and the fragment obtained from each sample was extracted from the gel and cloned into the pGEM-T Easy Vector (Promega, Madison, WI, United States). DNA sequencing was performed with the Big Dye TM Terminator v3.1 Cycle Sequencing Ready Reaction Kit (Applied Biosystems) following the manufacturer's instructions. Consensus sequences from forward and reverse strands of the sequenced clones were obtained using Geneious Pro v.8.04.

#### DNA Probes and FISH

DNA probes for MsaSat01-177 were obtained by PCR amplification on genomic DNA from all species, except B. stramineus and S. notomelas, using the same conditions described above and labeling DNA with digoxigenin-11-dUTP or biotin-16-dUTP. Complementarily, probes were also obtained directly from single cloned sequences to compare the results. Thus, for every species, FISH was performed using probes obtained from their own genomes.

Fluorescence in situ hybridization was performed under highstringency conditions using the method described by Pinkel et al. (1986). Pre-hybridization conditions included a 1-h incubation with RNAse (50 µg/ml) followed by chromosomal DNA denaturation in 70% formamide/2x SSC for 5 min at 70◦C. For each slide, 300 µl of hybridization solution (containing 200 ng of labeled probe, 50% formamide, 2x SSC and 10% dextran sulfate) was denatured for 10 min at 95◦C, then dropped onto the slides and allowed to hybridize overnight at 37◦C in a moist chamber containing 2x SSC. Post-hybridization, all slides were washed in 0.2x SSC/15% formamide for 20 min at 42◦C, followed by a second wash in 0.1x SSC for 15 min at 60◦C and a final wash at room temperature in 4x SSC, 0.5% Tween for 10 min. Probe detection was carried out with avidin-FITC (Sigma) or anti-digoxigenin-rhodamine (Roche), and the chromosomes were counterstained with DAPI (4<sup>0</sup> ,6-diamidino-2-phenylindole, Vector Laboratories) and analyzed under an optical photomicroscope (Olympus BX61). Images were captured with an Olympus DP70 digital camera and with the Image Pro plus 6.0software (Media Cybernetics). From each individual, a minimum of five cells was analyzed for FISH.

#### Nucleotide Analyses

A global alignment from both Illumina-derived and PCRderived sequences was generated using the Muscle algorithm (Edgar, 2004) under default parameters. DNA diversity analyses,

considering indels and all haplotypes, were performed with DnaSP v5.05 (Librado and Rozas, 2009). In order to get fewer haplotypes in the Minimum spanning tree (MST), we performed a clustering analysis with CD-HIT-EST (Li and Godzik, 2006) selecting a sequence identity level of 99%. The MST was built on the basis of pairwise differences using ARLEQUIN v3.5.1.3 (Excoffier and Lischer, 2010) and was visualized with HAPSTAR (Teacher and Griffiths, 2011).

# RESULTS

# Chromosomal Analysis

Cytogenetic analyses evidenced different diploid chromosome numbers for the analyzed species (**Table 1**). PCR amplification of MsaSat01-177 yielded a ladder pattern in 2% agarose gels for all species within clade C, while no visible banding patterns were detected for species from clade A (B. stramineus) and B (S. notomelas), suggesting that these sequences are not present in these species or were not amplified with the designed primers due to high sequence divergence. Also, FISH with inter-specific probes did not returned any visible signal on the chromosomes of these two species (data not shown).

FISH evidenced that MsaSat01-177 shows a non-clustered organization in A. fasciatus, but a clustered distribution in the other C-clade species. Remarkably, all clusters for this satDNA were located pericentromerically (**Figure 2**), but showing extensive variation among species concerning the number of chromosomes carrying it, namely two in H. kalunga and A. bockmanni, 10 in A. altiparanae and A. paranae, 18 in H. bifasciatus and A. jordani, and 36 in M. sanctaefilomenae.

#### Bioinformatic and Molecular Analyses

Selection of Illumina reads showing homology with MsaSat01-177 resulted in 298.622, 3.160 and 24 reads in M. sanctaefilomenae, A. paranae and A. fasciatus, respectively. Those found in the latter species were insufficient for RepeatExplorer analysis and, in A. paranae, we had to use two copies of each read in order to meet the requirement of 5.000 reads minimum. Finally, for M. sanctaefilomenae, we subsampled the reads from 298.622 to 30.000 reads to optimize RepeatExplorer calculations. Output data evidenced spherical graphs for MsaSat01-177 in both species (**Supplementary Figure S1**), as expected for satDNAs. Although these results do not exclude the possibility of association with other repetitive sequences, they indicate that this satDNA is not primarily associated with other repetitive elements.

We successfully extracted monomers directly from sequencing reads of A. paranae, A. fasciatus and M. sanctaefilomenae and the detailed information is summarized in **Table 2**. Conversely, searches for MsaSat01-177 in distinct Characiformes genomes, except Characidae, did not yield any result, suggesting that MsaSat01-177 is not present in other families than Characidae within this order. In this context, we restricted our high-throughput analyses to the three Characidae fishes available. The extraction of MsaSat01-177 monomers from read pairs showing overlapping, resulted in a total of 470, 201 and 8 monomers in M. sanctaefilomenae, A. paranae and A. fasciatus, respectively. The eight sequences in the latter species showed the highest nucleotide diversity (π), whereas those in A. paranae showed higher nucleotide diversity than those in M. sanctaefilomenae (**Table 2**). Sequence logos corroborated this result and exhibited different levels of sequence conservation between the analyzed species for MsaSat01-177 monomers, with those in M. sanctaefilomenae showing higher conservation than those in A. paranae and A. fasciatus (**Figure 3**).

PCR amplification in the C-clade species, and subsequent cloning and sequencing, yielded several sequences per species (**Table 3**). Notably, most of the few sequenced clones in A. paranae, A. fasciatus and M. sanctaefilomenae were also found among the Illumina reads. In general, the number of haplotypes was almost equal to the number of sequenced clones for all species, while nucleotide diversity (π) values were variable, with those in A. altiparanae showing the highest values.

In order to obtain a global alignment and generate a MST, we firstly performed a clustering step with CD-HIT-EST to the Illumina-derived monomers to reduce the numbers of haplotypes. Thus, a total of 470, 201 and 8 monomers were reduced to a final matrix with 55, 51 and 8 clusters, from M. sanctaefilomenae, A. paranae and A. fasciatus. After that, a final alignment matrix was composed of 150 haplotypes, 114 of which were obtained from Illumina reads and 36 from PCR clones. Considering this whole alignment, we built a MST, considering haplotype relative abundance, which evidenced overall species-specific groups of haplotypes, the main exception being A. paranae which showed several groups linked with those in most remaining species (**Figure 2**). The main steps performed in this study to obtain the described results are represented in **Figure 4**.

# DISCUSSION

It is generally assumed that satDNA sequences evolve following a pattern of "concerted evolution," as a consequence of intraspecific sequence homogenization and fixation (Dover, 1982, 1986). Notably, the homogenization process is driven by molecular mechanisms such as unequal crossing-over or gene conversion (Smith, 1976; Dover, 1986), which usually lead to a quite low sequence divergence among monomers within satDNA arrays (Plohl et al., 2012). Our sequence analysis of MsaSat01-177 in eight Characidae fish species has revealed some interesting features. First, we could indeed observe a higher homogenization within species since most haplotypes in **Figure 2**, coming from a same species, tended to group together, with the exception of those in A. paranae, which were distributed into three different groups. This is the expected pattern for concerted evolution of satDNA, but the A. paranae case demands additional explanations (see below).

Second, a comparison between the number of clusters observed by FISH and nucleotide diversity, in the three species analyzed by Illumina sequencing, revealed an interesting pattern, since the species showing the highest number of clusters (M. sanctafilomenae) showed the lowest nucleotide

diversity, whereas the species that failed to show clusters (A. fasciatus) showed the highest diversity, with A. paranae showing intermediate values for both parameters. Population demographical events (e.g., bottlenecks) might have contributed to yield a pattern like this (Ardern et al., 1997; Pons et al., 2002), but between species differences in the homogenization/mutation balance could also provide an explanation (reviewed in Ugarkovic´ and Plohl, 2002). For example, a recent amplification event of the MsaSat01-177 in M. sanctaefilomenae could explain its low nucleotide diversity. On the other hand, differential amplification between satellite subfamilies could explain the high diversity in A. paranae (Willard and Waye, 1987).

Our present results have shown the presence of the MsaSat01-177 satDNA, previously described in the characid

fish M. sanctaefilomenae (Utsunomia et al., 2016), in eight species belonging to four different Characidae genera, belonging to Clade C (Oliveira et al., 2011). Notably, we could neither amplify it by PCR in species belonging to clades A or B, nor find any trace of this sequence in genomic libraries of other Characiform families, suggesting that the MsaSat01- 177 satDNA might be restricted to the C-clade species. This suggests the conservation of this satDNA in this fish group and allows testing some features of the satDNA library hypothesis (Fry and Salser, 1977) in these Characidae fish.

The MST shown in **Figure 2** provides a glimpse into a small part of the satDNA library of the Characid C-clade species, as it only shows some library volumes for the MsaSat01-177 satDNA in only eight species. Clearly, the complete library should include all haplotypes found in all species for the whole satellitome catalog, with MSTs for each satDNA family and some families connected by common branches if belong to a same superfamily. Of course, this appears to be an impossible task, but looking at a small corner of the library is also very illustrative. Firstly, **Figure 2** shows that species from four different genera share the MsaSat01-177 satellite, with Astyanax paranae showing connections with all remaining species, but with higher number of differences with M. sanctafilomenae. The central position of A. paranae among the five Astyanax species might actually be an artifact due to the higher number of sequences obtained from sequence reads in this species. However, in the case of A. fasciatus, we also employed this approach but we found only 24 reads showing homology with the MsaSat01-177 satellite, indicating that its presence in this species is just a relic, with only a few small arrays for eight highly divergent haplotypes scattered through the genome, since it was not apparent by FISH (see **Figure 2**). Ruiz-Ruano et al. (2016) suggested that satDNA follows a three-step evolutionary pathway: birth, dissemination and clustering. It is thus conceivable that, in A. fasciatus, this satDNA has not reached the third stage, and the extremely high divergence shown by the few units found (π = 0.18) suggests that they are not subjected


TABLE 3 | Genetic variation found in the monomers of PCR-amplified MsaSat01-177 from eight different characid species.

N, number of sequences. Hap, number of haplotypes. Hd, haplotype diversity. π, nucleotide diversity.

to concerted evolution thus probably being disseminated across the genome. In high contrast, in M. sanctafilomenae, we found two extremely abundant haplotypes along with many other less abundant ones at few mutational steps, suggesting that sequence homogenization works very efficiently in this species, as also indicated by its low nucleotide diversity (π = 0.013). This indicates that this species has lost much of the satDNA variants, which were originally present in the common ancestor of the eight species here analyzed. The case of A. paranae is intermediate (π = 0.055), suggesting that it has preserved a higher proportion of the original satDNA variation, presumably because satDNA homogenization works poorer than in M. sanctafilomenae. This might explain the central position of A. paranae in the tree. As a whole, the former observations are consistent with the independent amplification of satDNA variants in different genomes, suggested by the library hypothesis (Fry and Salser, 1977).

The MST in **Figure 2** also suggests that the Illumina approach is much more informative than the PCR one and that the traditional conclusions on concerted evolution inferred from the latter method could be biased by the unavoidable filtering inherent to the PCR reaction, with products enriched in those sequences to which the primers are able to anchor. Illumina sequencing, however, provides a random sample of sequences thus giving more realistic information. The PCR bias might explain why in **Figure 2** the immense majority of haplotypes obtained by PCR were grouped per species. However, on the basis of the multiple connections shown by the Illumina haplotypes in A. paranae, we can imagine a much more intricate haplotype tree with connections between most species. Therefore, the analysis of satDNA variation through Illumina sequencing can open, in the next future, the satDNA library doors wide.

At genus level, it appears that the Astyanax satDNA library keeps more variation in common with those in Hasemania and Hyphessobrycon than with that in Moenkhausia, suggesting closer relationship between the three former genera. However, this might be a false impression due to the efficient homogenization in M. sanctafilomenae, which has erased, in the satDNA library, many signs of their common descent.

In general, satDNA accumulation in heterochromatic areas is an overall trend, although recent analyses have revealed that euchromatic areas might also be occupied by this kind of repetitive sequences (Kuhn et al., 2012; Ruiz-Ruano et al., 2016). Here, we found that all clusters of MsaSat01-177 found on the chromosomes of seven species were exclusively located on heterochromatic pericentromeric regions. The high differences between species for the number and size of MsaSat01-177 clusters, in the seven species where this satellite was visualized by FISH, from two in A. bockmanni and H. kalunga to 36 in M. sanctafilomenae (see **Table 1**), indicates that satDNA clustering has followed different evolutionary pathways in most species, although it is also conceivable that some clusters residing in chromosomes showing synteny among species might have descended from a common ancestor. According to the three-step hypothesis (Ruiz-Ruano et al., 2016), the former results suggest that satDNA evolution may follow different pathways in different species by reaching variable degrees of interchromosomal spread. Notably, A. paranae is phylogenetically more related to A. bockmanni than to A. altiparanae (Rossini et al., 2016) consistent with our MST. However, the number of sites per genome evidenced by FISH (10, 2 and 10, respectively) would not indicate that. In this context, as other repetitive DNA sequences, the number of sites and satDNA-bearing chromosomes do not appear to completely reflect phylogenetic relationships, and thus probably reflect historical contingency. Unfortunately, as mentioned before, a complete phylogeny considering the taxa sampled in our study is not available (Oliveira et al., 2011; Thomaz et al., 2015; Rossini et al., 2016).

As components of the repetitive fraction of genomes, satDNA is highly dynamic and its abundance might rapidly change due to expansion and/or decrease of these sequence arrays (Plohl et al., 2008; Garrido-Ramos, 2015). Therefore, different mechanisms may have led MsaSat01-177 to be highly abundant and homogenized in M. sanctaefilomenae, presumably due to recent amplification on 72% of its chromosomes (50 chromosomes – 36 FISH signals) (**Supplementary Figure S2**). In contrast, this satellite is relictual in A. fasciatus, it shows a cluster on a single chromosome pair in A. bockmanni and H. kalunga, or several chromosome pairs in the remaining species. Such dynamics, at the chromosomal level, has frequently been reported for several satDNA sequences in a wide range of organisms, at the intra- and interespecific levels (Plohl et al., 2012; Garrido-Ramos, 2015). Although multiple mechanisms have been put

forward to explain this variation, such as unequal crossingover, ectopic recombination, replication slippage, association with transposable elements and extrachromosomal circular DNA (Dover, 1993; McMurray, 1995; Hancock, 1996; Cohen et al., 2010; Milani and Cabral-de-Mello, 2014; Ruiz-Ruano et al., 2015, 2016), the ultimate explanation has not yet been figured out.

Taken together, our present results have provided evidence for the presence of a shared satDNA among several species within Characidae, which probably arose after the split of Clade C. The chromosomal distribution of MsaSat01-177 was highly variable and several spreading mechanisms might be acting in this case. As expected, monomers from all species are subjected to concerted evolution, except those in A. fasciatus where

short tandem arrays of MsaSat01-177 are probably scattered across the genome. In addition, sequence homogenization levels were also different among species, and our results have also shown the differential amplification of some variants for this satellite in different species. This is in high consistency with the library hypothesis (Fry and Salser, 1977), that a same satellite family can follow different evolutionary pathways in different species, including not only for amplification levels but also for chromosome distribution.

#### AUTHOR CONTRIBUTIONS

RU, DS, PS, ES, and IR collected the samples, performed the cytogenetic analyses, the production of DNA probes and the cloning and FISH experiments. RU, DS, and FR-R performed the bioinformatics analyses. RU, DS, PS, ES, IR, and FR-R drafted the text and designed the figures. DH, JC, CO, and FF critically revised the manuscript and approved the final version.

#### REFERENCES


#### ACKNOWLEDGMENTS

Authors thank to Renato Devide for help with obtaining samples. This study was supported by Fundação de Amparo à Pesquisa do Estado de São Paulo – (grant numbers 2010/17009-2 and 2014/26508-3 to CO) and Conselho Nacional de Desenvolvimento Científico e Tecnológico (grant number 306054/2006-0 to CO and 403066/2015-8 to RU).

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fgene. 2017.00103/full#supplementary-material

FIGURE S1 | Cluster-graphs obtained for MsaSat01-177 from A. paranae and M. sanctaefilomenae after analysis with RepeatExplorer.

FIGURE S2 | Metaphases from different Characidae species after FISH with MsaSat01-177 probes.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Utsunomia, Ruiz-Ruano, Silva, Serrano, Rosa, Scudeler, Hashimoto, Oliveira, Camacho and Foresti. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Triportheus albus Cope, 1872 in the Blackwater, Clearwater, and Whitewater of the Amazon: A Case of Phenotypic Plasticity?

#### José D. A. Araújo1,2 \*, Andrea Ghelfi<sup>3</sup> and Adalberto L. Val<sup>1</sup>

<sup>1</sup> Laboratory of Ecophysiology and Molecular Evolution, National Institute of Amazonian Research, Manaus, Brazil, <sup>2</sup> Federal University of Amazonas, Manaus, Brazil, <sup>3</sup> Kazusa DNA Research Institute, Kisarazu, Japan

The Amazon basin includes 1000s of bodies of water, that are sorted according to their color in three types: blackwater, clearwater, and whitewater, which significantly differ in terms of their physicochemical parameters. More than 3,000 species of fish live in the rivers of the Amazon, among them, the sardine, Triportheus albus, which is one of the few species that inhabit all three types of water. The purpose of our study was to analyze if the gene expression of T. albus is determined by the different types of water, that is, if the species presents phenotypic plasticity to live in blackwater, clearwater, and whitewater. Gills of T. albus were collected at well-characterized sites for each type of water. Nine cDNA libraries were constructed, three biological replicates of each condition and the RNA was sequenced (RNA-Seq) on the MiSeq <sup>R</sup> Platform (Illumina <sup>R</sup> ). A total of 51.6 million of paired-end reads, and 285,456 transcripts were assembled. Considering the FDR ≤ 0.05 and fold change ≥ 2, 13,754 differentially expressed genes were detected in the three water types. Two mechanisms related to homeostasis were detected in T. albus that live in blackwater, when compared to the ones in clearwater and whitewater. The acidic blackwater is a challenging environment for many types of aquatic organisms. The first mechanism is related to the decrease in cellular permeability, highlighting the genes coding for claudin proteins, actn4, itgb3b, DSP, Gap junction protein, and Ca2+-ATPase. The second with ionic and acid-base regulation [rhcg1, slc9a6a (NHE), ATP6V0A2, Na+/K+-ATPase, slc26a4 (pedrin) and slc4a4b]. We suggest T. albus is a good species of fish for future studies involving the ionic and acid-base regulation of Amazonian species. We also concluded that, T. albus, shows well defined phenotypic plasticity for each water type in the Amazon basin.

Keywords: Rio Negro, Tapajós River, Solimões River, differential expression, RNA-Seq, acidic pH, ionic regulation

#### INTRODUCTION

The rivers of the Amazon are interconnected to the central channel, the Amazon River, sheltering a rich ichthyofauna and allowing the entire Amazon basin to be linked through its waters. The river connection make possible for species to migrate among the rivers of the region (Val and Almeida-Val, 1995). However, these rivers contain different water types, due to the geographical location of each river and the materials that are deposited on their beds (Sioli, 1984; Konhauser et al., 1994).

#### Edited by:

Roberto Ferreira Artoni, Ponta Grossa State University, Brazil

#### Reviewed by:

Luciana Rodrigues De Souza-Bastos, Institutos LACTEC, Brazil Rafael Mendonça Duarte, Universidade Estadual Paulista Júlio de Mesquita Filho, Brazil Rui Rosa, Universidade de Lisboa, Portugal

> \*Correspondence: José D. A. Araújo deneyaraujo@gmail.com

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 30 June 2017 Accepted: 17 August 2017 Published: 31 August 2017

#### Citation:

Araújo JDA, Ghelfi A and Val AL (2017) Triportheus albus Cope, 1872 in the Blackwater, Clearwater, and Whitewater of the Amazon: A Case of Phenotypic Plasticity? Front. Genet. 8:114. doi: 10.3389/fgene.2017.00114

In many cases, the physicochemical parameters of the waters govern the selection of species which inhabit them (Val and Almeida-Val, 1995). This selection depends on the capacity that species possess to get adapted to the environmental conditions to which they are exposed (Evans et al., 2005; Wood et al., 2007). In this context, phenotypic plasticity has recently gained significance in transcriptome analyses and can be defined as the ability of a genotype to produce variable phenotypes under different environmental conditions. Thus, the phenotypic plasticity supporting physiological adjustments necessary for the transition between environments is valuable for many species (Gibbons et al., 2017).

The waters of the Amazon constitute a singular place to analyze that issue, as water bodies containing blackwater, clearwater, and whitewater (Sioli, 1984) are interconnected.

Indeed, physicochemical parameters are closely related to their characteristic colors. Three major rivers have these water profiles: the Negro River (blackwater), considered one of the most challenging aquatic environments for aquatic species, due to its natural acidic water and low level of ions. The presence of significant concentrations of dissolved organic matter (DOM), produced by plant decomposition because of the seasonal flooding of part of the forest (Sioli, 1984; Ertel et al., 1986), is responsible for releasing humic and fulvic acids in the water, which in turn, have carboxylic groups (−COOH) and hydroxylic (−OH) groups in their structures. These dissociate themselves and release H<sup>+</sup> ions into the water, thus reducing the pH in the environment (Queiroz et al., 2009). Several studies have been developed in an attempt to understand how a river with such an enormous environmental challenge houses a significant diversity of fish, estimated at, approximately, 1,000 species (Val and Almeida-Val, 1995; Gonzalez et al., 2002; Wood et al., 2014; Duarte et al., 2016).

The Solimões/Amazonas river, the primary representative whitewater, is the largest freshwater river on the planet. It has its distinctive color due to the high quantity of material in suspension derived from sediments from the Andes. These sediments are transported in large volumes along the whole river, being deposited on the banks, forming sandbanks with the clayey material known in the region of the várzea (the area on the riverbanks that gets flooded during the flood season) (Furch, 1984). The Solimões River is richer than the Rio Negro in dissolved ions and its electrical conductivity is the highest one when compared to clearwater and blackwater, with an average of 98.8 µS/cm (Küchler et al., 2000). The predominant ions in its waters are Na+, K+, Mg2+, and Ca2+. The pH of whitewater is close to neutrality (6.6) (Sioli, 1984; Gaillardet et al., 1997).

The Tapajós River is known for its crystalline, slightly greenish waters. Its waters contain little clayey sediment, due to the drainage that the river makes on the soil of pre-Cambrian origin (Sioli, 1984). Just as the Negro River, it has areas of igapós (flooded forests) on its banks, depending on the seasonal cycle. The pH is around 6.5 and it has low conductivity (14.4 µS/cm), with a reduced amount of suspended material. Whereas the physicochemical parameters of its waters are referenced to as an intermediary between blackwater and whitewater rivers (Duncan and Fernandes, 2010), the clearwater is the one with the highest annual variation on its physicochemical parameters in the Amazon basin, mainly as regards to the water pH (Junk and Furch, 1980). Therefore, both the Tapajós River and the Solimões River have a pH close to neutrality, while the Negro River has acidic waters (Sioli, 1984).

Depending on the annual water fluctuation (seasonal ebb or river flooding), the physicochemical parameters can be altered. Observing the environmental conditions of different types of waters makes the challenges that these environments impose on ichthyofauna evident (Furch, 1982).

These environments are home to the most diverse ichthyofauna in the world. Many species simultaneously inhabit two of these environments (reviewed by Val and Almeida-Val, 1995). Few, however, have developed biological mechanisms to live simultaneously in the three types of water in the Amazon. Among these, we highlight the species Triportheus albus, popularly known in the region as sardine. This species is often found in all three types of water in the Amazon basin (INCT/ADAPTA Project Report 2012–2013). The understanding of how this species responds to the different environmental conditions prevailing in the three types of Amazonian environments was the primary factor that challenged us in the present study.

Considering the environmental and physicochemical characteristics of the waters in which the presence of the T. albus occurs, the purpose of our study was to analyze if the gene expression of T. albus is determined by the different types of water. Our hypothesis is that T. albus presents phenotypic plasticity to live in blackwater, clearwater, and whitewater.

# MATERIALS AND METHODS

The experimental procedures were approved by the Animal Use Ethics Committee of the Brazilian National Institute for Research of the Amazon (CEUA-INPA) (Protocol 026/2015). The permit for the collection of the biological material to carry out the research was authorized by the Brazilian Institute of Environment and Renewable Natural Resources (IBAMA/SISBio), number 29837-8/2015.

#### Collection of Samples

Specimens of T. albus were captured in their natural environments, covering the different water types of the Amazon basin in expeditions carried out in July and August 2015. See Supplementary Table S1 for length and mass of the analyzed fish. Blackwater specimens were captured on the banks of the Negro River, in the Anavilhanas Archipelago (2◦ 430 11.800S, 60◦ 450 19.100W). Fish collection in whitewater was carried out on the banks of the Solimões River, near the island of Marchantaria (3◦ 140 47.000S, 59◦ 570 23.300W). The collection of specimens in clearwater was carried out on the banks of the Tapajós River (2◦ 480 46.300S, 55◦ 020 21.200), for information on fish in Supplementary Table S1.

Gills were the analyzed tissue because they are the primary site for osmotic regulation, excretion of ammonia, as well as an important site for gas exchange and regulation of body fluid pH.

Araújo et al. Phenotypic Plasticity in Amazon Sardine

Thus, nine T. albus specimens were captured using fishing line and hook, three individuals for each environmental condition. After captured, they were sacrificed by severing their spinal cord, the gills were removed and immediately stored in RNALater <sup>R</sup> (Ambion <sup>R</sup> ), and kept at 4◦C, until arrival at the Laboratory of Ecophysiology and Molecular Evolution (LEEM), at INPA's facilities in Manaus, Amazonas, Brazil. Water physicochemical parameters were measured at the same collection site of the organisms between 5 and 7 pm. During the period of collection, the seasonal "river flooding" period prevailed. The pH values were measured using a pHmeter (Micronal model B374) and dissolved oxygen and conductivity was measured with an oxygen meter (YSI, model 55/12 FT).

#### RNA Extraction and Library Construction

Total RNA extraction was performed individually for each of the nine T. albus gill tissue samples using the TRIzol <sup>R</sup> reagent protocol (InvitrogenTM). After that, 30 µL of ultrapure water (Nuclease-Free) was added to each bullet tube containing the RNA extract, which was stored at −80◦C until the time of analyses. The quantification of extracted RNA was determined using a BioAnalyzer Agilent 2100 (Agilent Technologie <sup>R</sup> ). The procedures for building the libraries were performed according to the TruSeq RNA Sample Prep v2 LS protocol (Illumina <sup>R</sup> ), observing the manufacturer's recommendations.

The mRNA was isolated from the total RNA using magnetic oligo (dT) beads that were bound to the poly (A) tail of the mRNA. Then, the first cDNA strand was synthesized using reverse transcriptase and random primers. The second cDNA chain was immediately synthesized, using the enzymes DNA Polymerase I and RNAse H. It was then added to the end 3<sup>0</sup> of the fragments of a single A nucleotide (adenine). The adapters were bound to these fragments, the PCR was then performed for the enrichment of the libraries.

Finally, the quality and quantification of the constructed libraries were analyzed using the BioAnalyzer Agilent 2100 (Agilent Technologie <sup>R</sup> ) and Real-Time PCR 7500 (Applied Biosystems <sup>R</sup> ). Three biological replicates of samples from each environmental condition were sequenced on the MiSeq <sup>R</sup> platform (Illumina <sup>R</sup> ) in three sequencing runs – a paired-end run (2 × 150 cycles) and two paired-end runs (2 × 250 cycles). The data generated in this study are stored in the National Center for Biotechnology Information/Sequence Read Archive (NCBI-SRA), project number PRJNA391967.

#### Quality Control and Reassembly

The bioinformatics analyses were performed at the Bioinformatics Laboratory of LEEM in the facilities of INPA. The FastQC program [v.0.11.3] (Andrews, 2010) was used to analyze the quality of sequenced reads. Treatment of the 5<sup>0</sup> and 3<sup>0</sup> ends of the low quality reads (Q-score ≤ 20) and filtering of reads of less than 50 bp (base pairs) (≤50) were performed using the program Trimmomatic [v. 0.33] (Bolger et al., 2014).

The reassembly of the transcriptome was performed with the Trinity program [version 2.1.1] (Grabherr et al., 2011). In addition, programs that assisted Trinity to assemble the transcriptome and to calculate the abundance of transcripts TABLE 1 | Physicochemical parameters of the waters of the Negro River (blackwater), Tapajós River (clearwater) and Solimões River (whitewater).


Values expressed as mean ± SD; N = 5 in each condition. Different letters indicate significant differences (P ≤ 0.05), using one-way ANOVA and Tukey test. <sup>∗</sup>DOC values from Holland et al. (2017).

were used, among them, the Bowtie2 [v. 2.2.6] (Langmead and Salzberg, 2012), RSEM [v. 1.2.19] (Li and Dewey, 2011), and R/Bioconductor packages [v. 3.1] (Gentleman et al., 2004).

### Differential Expression Analysis and Gene Annotation

The analysis of differentially expressed genes (DEGs) was performed with the R/Bioconductor package, edgeR [v. 3.8.6] (Robinson et al., 2010), with a False Discovery Rate (FDR) ≤ 0.05. After the transcript quantification, the data generated by the RSEM, served as the input to the edgeR, when fold change was calculated. The DEGs were annotated with the BLASTx program [v. 2.3.1] (Altschul et al., 1997), through consultation of the database of Uniprot/TrEMBL proteins (class Actinopterygii)<sup>1</sup> , with E-value 1.0E-5. After functional annotation, the genes were classified according to their gene ontologies (GO), using a script developed at the Multidisciplinary Support Center of the Federal University of Amazonas (CAM/UFAM).

# RESULTS

### Physicochemical Parameters of the Waters

The physicochemical parameters are quite different, with a specific water pattern for each environmental condition (**Table 1**). Conductivity is one of the reflexes of the differences between the waters, being directly connected to the quantity of dissolved ions. It presents low conductivity in waters that are poor in ions such as the Negro River (10.5 µS/cm), with a moderate increase in the Tapajós River (17.3 µS/cm) and very high in the Solimões River (74.2 µS/cm).

Data recently published by Holland et al. (2017), who, for their part, carried out water collections in the same period of our collections of the specimens (July, 2015), in locations close to our collection points, corroborate with additional information on some parameters Physical-chemical properties of water. Among them, we highlight the concentration of dissolved organic

<sup>1</sup>www.uniprot.org

carbon (DOC) for the Rio Negro of 8.75 mg/L, Solimões River (3.86 mg/L) and Tapajós River (<0.4 mg/L). We also highlight the difference concerning the pH of the respective rivers: acidic water from the Negro River with a pH of 4.6, being as low as a pH 3, the clearwater of the Tapajós River with a pH of around 6.0 and the whitewater of the Solimões River with a pH around 6.3, both close to neutrality. The blackwater is low in ions, while it has a high concentration of H<sup>+</sup> (2.5 × 10−<sup>5</sup> mol/L), derived from the humic and fulvic acids of the plant decomposition that occurs on the Rio Negro bed (reviewed by Val and Almeida-Val, 1995). On the other hand, the Tapajós River and the Solimões River have a low concentration of DOC, with a concentration of H<sup>+</sup> equal to 1.0 × 10−<sup>6</sup> mol/L and 5.0 × 10−<sup>7</sup> mol/L, respectively. This represents a concentration of H<sup>+</sup> in the Negro River of approximately 25 times greater than that observed in the Tapajós River and 50 times greater than that present in the Solimões River.

#### Sequencing and Quality Control

Nine T. albus cDNA libraries were constructed, three biological replicates for each environmental condition: blackwater, clearwater, and whitewater. The RNA-Seq Sequencing in the MiSeq <sup>R</sup> Platform (Illumina <sup>R</sup> ) produced about 51.6 million paired-end reads. In the quality control and filtering of the raw data, the bases were removed from the ends of the reads with Q-Score ≤ 20 and the reads with a size of less than 50 pb were excluded, resulting in a total of 45.8 million paired-end reads that were saved. Only 11.17% of the total reads sequenced were discarded (Supplementary Table S2).

#### Assembly De Novo and Differential Expression Analysis

The reads resulting from the pre-processing of the data were grouped and the transcriptome assembly de novo was performed. A total of 285,456 transcripts were assembled, using the Bruijn graph analysis (Grabherr et al., 2011), from which the contigs with average lengths of 584.93 pb were derived. The value of N50 was 751 pb, totaling 166,972,252 bases assembled. The contigs were aligned and the abundance of transcripts quantified in each environmental condition. This was done for all biological replicates. The analysis of the differential expression of T. albus among environmental conditions was performed using FDR ≤ 0.05. In addition to FDR, we considered fold change ≥ 2. A total of 13,754 genes were differentially expressed in the three environmental conditions. In blackwater conditions versus whitewater conditions, 3,956 DEGs were found, 265 up-regulated ones (6.7%), 3,691 down-regulated ones (93.3%). In clearwater versus whitewater conditions, only 30 differentially expressed transcripts, 2 up-regulated ones (6.7%) and 28 downregulated ones (93.3%) were found. When it comes to blackwater versus whitewater conditions, 9,768 DEGs, 4,318 up-regulated ones (43.2%) and 5,550 down-regulated ones (56.8%) were found (**Figure 1**).

#### Transcriptome Annotation of T. albus

Using BLASTx, with E-value of 1.0E-5, through consultation to the database Uniprot/TrEMBL (Actinopterygii class), 33,739

genes were identified. The top hits returned by a BLAST search were associated with the following species of fish: Astyanax mexicanus (43%), Danio rerio (14%), Oncorhynchus mykiss (7%), Poeciliopsis prolifica (6%), Ictalurus punctatus (5%) and other species (25%).

below zero, represent down-regulated transcripts.

Differentially expressed genes were grouped according to their GO. In blackwater versus whitewater conditions, 3,206 terms were annotated. Out of these, 159 up-regulated ones (Biological Process – PB: 54, Molecular Function – FM: 50 and Cell Component – CC:55) (**Figure 2A**) and 3,047 down-regulated ones (PB: 1,100, FM: 1,064 and CC: 883) (**Figure 2B**).

In blackwater versus clearwater conditions, 9,566 terms were annotated – 5,938 up-regulated ones (PB: 2,476, FM: 2,077 and CC: 1,385) (**Figure 3A**) and 3,628 down-regulated ones (PB: 2,476, FM: 2,077 and CC: 1,385) (**Figure 3B**). When it comes to clearwater versus whitewater conditions, only 21 terms were annotated (9 in PB, 9 in FM and 3 in CC, all of them were up-regulated).

#### Common Terms in "Blackwater versus Clearwater" and "Blackwater versus Whitewater"

In the DEGs analysis, a greater number of transcripts expressed in the conditions of blackwater versus clearwater and blackwater versus whitewater were observed. We then verified whether such conditions share the same gene ontology terms. Thus, we located 1,551 up-regulated terms in common under both conditions. Among these, we highlight some terms such as integral component of membrane, calcium ion binding, plasma membrane, cell adhesion, cytoskeleton, myosin

complex, transporter activity, ammonium transmembrane transporter activity, bicellular tight junction and ATPase activity (Supplementary Figure S1).

After that, we analyzed the genes linked to the annotated terms. We observed that the genes identified were related to a fundamental role in cellular permeability, ionic regulation and acid-base (**Table 2**). Among these, we found the claudin genes, actn4 (actinin, alpha 4), itgb3b (integrin beta) (paracellular junction and cell adhesion), rhcg1 (ammonium transporter Rh type C 1), slc9a6a (sodium/hydrogen exchanger) (NHE), ATP6V0A2 (V-type proton ATPase subunit a), Na+/K+- ATPase (sodium/potassium-transporting ATPase subunit alpha) (ionic regulation and acid-base) nr3c1 (glucocorticoid receptor), prlra (prolactin receptor a) (endocrine response) (**Table 2**).

# DISCUSSION

The diversity of rivers of the Amazon basin, besides containing the largest number of species of freshwater fish, represents geographical barriers by means of its restricted water profile within each river. This requires biological adjustments of species living in these different environments (De Pinna, 2006). Many fish species live in an environment close to neutrality (Val and Almeida-Val, 1995). However, many fish species live in waters with low pH may that influence several physiological processes (Wood et al., 1998; Matsuo and Val, 2007). The species T. albus is one of the few species that differs from the innumerable other species of fish in the region, since it occurs in all three types of waters (blackwater, clearwater, and whitewater) in the Amazon Basin.

The work of Gibbons et al. (2017), with the fish species Gasterosteus aculeatus, which survives in freshwater and saltwater environments, shows an excellent approach to how the gene expression of fish responds to the most diversified environments. Although living in environments that are different from those of the species studied here, the authors observed that the environment in which the organism was being exposed directly influenced the gene expression pattern, a situation corroborated by the present study. They have also described the occurrence of phenotypic plasticity to respond to the environment to which they are exposed and to maintain the homeostasis under the prevailing conditions, a very similar situation to the one found here for T. albus.

To date, there is no information on the transcriptome profile of another species that is able to survive in all three types of water in the Amazon. In this context, our study is unique since we present the analysis of the transcriptome of T. albus living in all three types of water in the Amazon.

We observed 13,754 differentially expressed transcripts in all three environmental conditions. The DEGs were grouped by their functional categories according to the GOs (Ashburner et al.,



2000). Thus, we selected the top10 terms, both up-regulated and down-regulated, from each functional category (PB, FM and CC).

In blackwater versus whitewater conditions and blackwater versus clearwater conditions, GO enriched up-regulated terms are involved mainly with membrane components, active transport of ions across the membrane, cytoskeletal/cell adhesion change, and synthesis of proteins (**Figures 2A**, **3A**). These functional categories show responses to exposure to the acidic and low ion environment, as the blackwater, contrasting with whitewater and clearwater that have a pH close to neutrality (6.0 and 6.4 respectively) and are relatively richer in ions. These responses trigger cellular processes aimed at maintaining the body's homeostasis in relation to the environment to which they are exposed. In general, the response is due to the increase in tightness of the paracellular junctions and active transport through membrane proteins (Bonga et al., 1990; Evans et al., 2005; Kumai and Perry, 2011).

In contrast, the down-regulated terms are related to reduced protein synthesis, embryonic development, cell cycle regulation, membrane components, mitochondria, and the respiratory chain (**Figures 2B**, **3B**). These repressed categories may indicate a metabolic readjustment, because in addition to the essential mechanisms of the organism, it has to maintain the homeostasis under the environmental challenges to which it is exposed (Li et al., 2016). In the gene expression pattern, both in blackwater versus whitewater conditions, as in blackwater versus clearwater conditions, up-regulated genes were mainly involved with mechanisms to maintain the body's osmotic and ionic homeostasis (**Table 2**).

Thus, some cellular mechanisms may have been deactivated/reduced to reduce energy expenditure (Li et al., 2016). Vidal-Dupiol et al. (2013) and Logan and Buckley (2015) also observed that in order to maintain the vital functions of the organism, some genes can be readjusted until the establishment of the homeostatic balance. However, future studies should analyze the reaction of repressed genes when they are exposed to the environment that is predominant in the Negro River.

In clearwater versus whitewater conditions, there was no significant difference. Only 30 differentially expressed transcripts were found and 21 terms were annotated. Such terms are related to the immune response, hemoglobin complex and ribosomal RNA. Given the small difference, considering the whole transcriptome, we concentrated on the analyses for contrasts of blackwater versus whitewater, and blackwater versus clearwater, which presented greater difference with regards to gene expression in this study.

According to the terms annotated in common, both blackwater versus whitewater conditions and blackwater versus clearwater conditions (Supplementary Figure S1) the genes of the referred terms were located, and it was found that these genes could be linked to the response to the acidic environment. Thus, we selected 14 genes involved in such a response (**Table 2**). We have grouped these genes of T. albus into two main response mechanisms to low pH.

The first mechanism is related to the modulation of the branchial epithelium (paracellular junctions - JPs), highlighting the genes coding claudin proteins, actn4 (actinin, alpha 4), itgb3b (integrin beta), DSP (desmoplakin), Gap junction protein, and Ca2+-ATPase (calcium-transporting ATPase) (**Table 2**). Several studies have shown differentiated responses in some Amazon species exposed to low pH (Gonzalez and Wilson, 2001; Matsuo and Val, 2007; Wood et al., 2014). One of the fundamental characteristics to maintain the homeostasis in the acidified waters of the Negro River is the increase of the stiffness of JPs, avoiding excessive losses of Na<sup>+</sup> and Cl<sup>−</sup> to the environment (Wood et al., 1998; Gonzalez et al., 2002; Matsuo and Val, 2007). Paracellular junctions represent one mode of cell-to-cell adhesion in epithelial or endothelial cell sheets, forming continuous seals around cells. The up-regulated gene claudin, together with others found in this study as DSP, itgb3b and Gap junction, encode for the primary sealing agents of JPs (Kumai and Perry, 2011; Kwong and Perry, 2013). This mechanism, supported by many genes had not been previously reported, as this mechanism is found in Amazonian cichlids, not in characids (Gonzalez and Wilson, 2001; Duarte et al., 2013).

It is worth noting that blackwater is rich in H+, and this could in some way affect the JPs negatively, increasing permeability, due to the important role of Ca2<sup>+</sup> in cell adhesion (McDonald, 1983; Freda et al., 1991). However, blackwater have a unique characteristic that differs them from the others. It is rich in DOM with a high concentration of DOC (Walker, 1995; Johannsson et al., 2016). An important function of DOC in acidic water is the buffering capacity against damaging effects caused by low pH (Holland et al., 2012).

Campbell et al. (1997) showed that DOC can bind directly to the biological membrane of gill cells and alter the permeability of the cell membrane. In addition to that, Wood et al. (2011) pointed out an important role of DOC on JPs, stating that DOC could act with similar functions to Ca2<sup>+</sup> in JPs, reducing paracellular permeability. This statement about the protective role of DOC has recently been confirmed by Duarte et al. (2016). Duarte and collaborators observed that in Ca2<sup>+</sup> poor waters such as the ones in the Negro River, DOC can act with functions similar to Ca2<sup>+</sup> functions to protect the integrity of JPs. Thus, our data corroborate the information of Duarte et al. (2016), showing that even in Ca2<sup>+</sup> poor waters, the genes continued to be expressed so as to maintain the integrity of JPs and, consequently, to maintain the ion balance allowing survival in waters that are challenging for many aquatic species.

The second mechanism related to the up-regulated genes encompasses ion and acid-base regulation [rhcg1 (ammonium transporter Rh type C 1), slc9a6a (sodium/hydrogen exchanger) (NHE), ATP6V0A2 (V-type proton ATPase subunit a) Na+/K+- ATPase (sodium/potassium-transporting ATPase subunit alpha), slc26a4 (pedrin), slc4a4b (anion exchange protein) (**Table 2**)]. One of the essential functions to maintain homeostasis in fish is ammonia excretion (NH<sup>3</sup> <sup>+</sup>), mainly through the gills (Wood et al., 2007). Several studies have shown that the organism exposed to an acidic environment can use the NH<sup>3</sup> <sup>+</sup> excretion function to capture Na<sup>+</sup> from the environment (Kumai and Perry, 2011; Wright and Wood, 2012; Wood et al., 2014). This information was questioned taking thermodynamics into account (Parks et al., 2008). However, Kumai and Perry (2011) observed that the excretion of ammonia through the gills increased Na<sup>+</sup> uptake through the NHE complex. This interaction was associated with the presence of rhcg1 glycoprotein responsible for ammonium dissociation (NH<sup>4</sup> <sup>+</sup>) in NH<sup>3</sup> <sup>+</sup> and H<sup>+</sup> creating a favorable microenvironment for the transport of NH<sup>3</sup> <sup>+</sup> and H+, through rhcg1 and NHE, respectively (Wright and Wood, 2012; Ito et al., 2013). These genes were up-regulated in the analyzed speciemens T. albus from Rio Negro.

In addition to the rhcg1 and slc9a6a (NHE), we also observed the expression of ATP6V0A2 (H+-ATPase). Like the rhcg1 and NHE proteins, H+-ATPase has been the focus of past studies, since it is responsible for providing an intracellular environment that favors Na<sup>+</sup> uptake during low pH exposure through the active H<sup>+</sup> extrusion from the cell (Lin et al., 2006; Chang et al., 2009).

A favorable electrochemical environment is required for a continuous ion regulation. In this study, we also observed an up-regulation of the gene encoding Na+/K+-ATPase. This was already expected, since organisms exposed to acidic environments perform Na<sup>+</sup> uptake through the mechanism described above. Thus, for the Na<sup>+</sup> uptake through the NHE exchanger, Na+/K+-ATPase is vital, since intracellular Na<sup>+</sup> in excess could disturb ion regulation (Marshall, 2002; Evans et al., 2005; Wood et al., 2014). In addition to the control of cellular permeability and ion regulation, it is necessary to maintain the acid-base, in view of the natural tendency of Cl<sup>−</sup> loss to the acidic environment (Evans et al., 2005). We observed the up-regulated expression of the slc26a4 and slc4a4b genes. These genes are involved in the control of intracellular pH. This information corroborates Bayaa et al. (2009) and Perry et al. (2009) proposing that the uptake of Cl<sup>−</sup> from the environment would occur through families of exchange proteins, such as slc26.

However, for this exchange to be possible, an intracellular HCO<sup>3</sup> <sup>−</sup> molecule is required for the coupling of the Cl−/HCO<sup>3</sup> − exchanger (Evans et al., 2005). Studies have shown that carbonic anhydrase is responsible for the supply of internal HCO<sup>3</sup> <sup>−</sup> through the hydration of CO<sup>2</sup> (Lin et al., 2008; Gilmour and Perry, 2009). In our data, we did not find the differential expression of carbonic anhydrase. However, we can verify its performance through the by-product (HCO<sup>3</sup> −), which is being used in the acid-base balance, through the gene expression for the protein slc26a4 that acts on the apical membrane of the cell by exchanging Cl−/HCO<sup>3</sup> −,

and also for slc4a4b, a protein located in the basolateral membrane, which uses HCO<sup>3</sup> <sup>−</sup> and Na<sup>+</sup> to transport HCO<sup>3</sup> − to the blood, keeping the internal pH balanced (Evans et al., 2005).

regulation was reduced and the modulation of the branchial epithelium (paracellular junctions) completely disconnected (C).

The mechanisms described above are essential to maintain the homeostasis of the organism when exposed to acidic environments, such as the waters of the Negro River. For these mechanisms to be triggered, many authors have reported the importance of endocrine responses in fish (as reviewed by Breves et al., 2014; Kwong et al., 2014). We highlighted the genes nr3c1 (glucocorticoid receptor) and prlra (prolactin receptor a) (**Table 2**), both involved in hormonal responses, cortisol and prolactin, respectively. According to Kwong and Perry (2013), these hormones contribute to reduce epithelial permeability, avoiding excessive losses of Na<sup>+</sup> and Cl<sup>−</sup> to the environment. They can also promote the reabsorption of these ions (Flik and Perry, 1989; Kelly and Wood, 2001; Kumai et al., 2012), and increase H+-ATPase activity (Lin and Randall, 1993).

The ability of T. albus to modulate such mechanisms demonstrates a well-developed phenotypic plasticity system for this species. The potential mechanisms described here, based on the gene expression, demonstrate the potential of T. albus to survive the challenges presented by the different types of water in the Amazon basin. Depending on the environmental condition to which it is exposed, the genotype of this species can be differentially transcribed (**Figure 4**). The review by Schneider and Meyer (2017) highlights the mechanisms of differential expression. Analyzing **Figure 4**, in clearwaters (A) the activation level of ion and acid-base response mechanisms (genes coding for respective proteins involved) are almost null when compared to animals collected in blackwaters (B). In blackwater animals, every expressed mechanism for the maintenance of ion homeostasis is well characterized. In whitewater animals, these mechanisms are being expressed in an intermediate form (C), when compared to clearwater and blackwater conditions.

Physiological responses of Amazonian fish have been better understood over the years (Matsuo and Val, 2007; Duarte et al., 2013; Wood et al., 2014). The use of new techniques of molecular biology and bioinformatics resources, has increased the knowledge of how some species of fish respond to adverse environmental conditions (Lemgruber et al., 2013; Jesus et al., 2016; Prado-Lima and Val, 2016). In this context, we could observe in the present study that the T. albus species responds differently according to the exposed environment (**Figure 5**). When analyzing the heatmap graph, we verified two clusters of differential gene responses, related to the two extremes of aquatic environments.

Although literature emphasizes that clearwater is classified as intermediate between blackwater and whitewater (Sioli, 1984; Duncan and Fernandes, 2010), we can observe that T. albus presents a larger set of adjustments in gene expression in the Tapajós River (clearwater) and in the Negro River (blackwater). Thus, through the gene response observed in this species, we can conclude that clearwater and blackwater are two distinct extremes, and they present a greater environmental challenge for the survival of the organism. On the other hand, we verified that the whitewater was the one that presented an intermediate level, requiring less quantitative adjustment of the gene response to

the environment to which it was exposed. Therefore, the gene expression pattern observed for T. albus suggests that this species presents phenotypic plasticity to live in the three main types of water in the Amazon.

#### CONCLUSION

The Negro River is the most critical environment for the survival of many aquatic species in the Amazon basin, due to its high acidity and low ion concentration. As this river system harbors more 1,000 species, it is possible that more species present similar phenotypic plasticity as T. albus that showed two main mechanisms that allow survival in Amazonian aquatic environments, including those with low pH. The first mechanism is the control of genes in paracellular junctions, such as claudin, actn4, itgb3b, DSP that are involved in the process of maintaining the paracellular permeability control and, consequently, the loss of Na<sup>+</sup> and Cl<sup>−</sup> ions to the environment. This characteristic that until then had been observed only in Amazon cichlids, was well developed in T. albus, a species of characid.

The second mechanism was attributed to the ability of ionic and acid-base regulation developed by this species. We observed high expression of the genes involved in Na<sup>+</sup> uptake, where excretion of NH<sup>3</sup> <sup>−</sup> through the rhcg1 protein somehow favors Na<sup>+</sup> uptake through the NHE exchanger, in addition to the

H+-ATPase and the Na+/K+-ATPase pump. We also find the prlra and nr3c1 genes, responsible for triggering the two mechanisms described above. Therefore, we could verify that the species T. albus presents phenotypic plasticity with mechanisms that confer abilities to survive in environments considered critical for many species. We suggest that the species T. albus is a good candidate for future studies involving ion and acid-base regulation processes, as well as to analyze the activities of the respective enzymes involved in these processes.

#### AUTHOR CONTRIBUTIONS

JA, AG, and AV designed the work. JA and AG analyzed and interpretation of the data. JA and AV drafted the work. All the authors approve the final version.

### REFERENCES


# ACKNOWLEDGMENTS

We thank MSc. Maria de Nazaré Paula da Silva, for the support in logistics in order to collect the samples. The MSc. Luciana Mara Fé Gonçalves and MSc. Erica Martinha Silva de Souza for the support in the sequencing of the data. This study is part of INCT-ADAPTA (CNPq/FAPEAM). AV is a recipient of a research fellowship from CNPq. JA was a recipient of a MSc. fellowship from FAPEAM and CNPq.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fgene. 2017.00114/full#supplementary-material

implications on the distribution of freshwater stingrays (Chondrichthyes, Potamotrygonidae). Pan Am. J. Aquat. Sci. 5, 454–464.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Araújo, Ghelfi and Val. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Bunocephalus coracoideus Species Complex (Siluriformes, Aspredinidae). Signs of a Speciation Process through Chromosomal, Genetic and Ecological Diversity

Milena Ferreira<sup>1</sup> \*, Caroline Garcia<sup>2</sup> , Daniele A. Matoso<sup>3</sup> , Isac S. de Jesus<sup>4</sup> , Marcelo de B. Cioffi<sup>5</sup> , Luiz A. C. Bertollo<sup>5</sup> , Jansen Zuanon<sup>6</sup> and Eliana Feldberg<sup>1</sup>

<sup>1</sup> Laboratório de Genética Animal, Coordenação de Biodiversidade, Instituto Nacional de Pesquisas da Amazônia, Manaus, Brazil, <sup>2</sup> Laboratório de Citogenética, Departamento de Ciências Biológicas, Universidade Estadual do Sudoeste da Bahia, Jequié, Brazil, <sup>3</sup> Laboratório de Citogenômica Animal, Instituto de Ciências Biológicas, Departamento de Genética, Universidade Federal do Amazonas, Manaus, Brazil, <sup>4</sup> Laboratório de Fisiologia Comportamental e Evolução, Coordenação de Biodiversidade, Instituto Nacional de Pesquisas da Amazônia, Manaus, Brazil, <sup>5</sup> Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Carlos, Brazil, <sup>6</sup> Laboratório de Sistemática e Ecologia de Peixes, Coordenação de Biodiversidade, Instituto Nacional de Pesquisas da Amazônia, Manaus, Brazil

#### Edited by:

Roberto Ferreira Artoni, Ponta Grossa State University, Brazil

#### Reviewed by:

Vladimir Pavan Margarido, Universidade Estadual do Oeste do Paraná, Brazil Mauro Nirchio, Universidad de Oriente, Venezuela

> \*Correspondence: Milena Ferreira milena\_fro@hotmail.com

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 17 July 2017 Accepted: 29 August 2017 Published: 21 September 2017

#### Citation:

Ferreira M, Garcia C, Matoso DA, de Jesus IS, Cioffi MB, Bertollo LAC, Zuanon J and Feldberg E (2017) The Bunocephalus coracoideus Species Complex (Siluriformes, Aspredinidae). Signs of a Speciation Process through Chromosomal, Genetic and Ecological Diversity. Front. Genet. 8:120. doi: 10.3389/fgene.2017.00120 Bunocephalus is the most species-rich Aspredinidae genus, corresponding to a monophyletic clade with 13 valid species. However, many species have their classification put in question. Here, we analyzed individuals from four Amazonian populations of Bunocephalus coracoideus by cytogenetic and molecular procedures. The geographic distribution, genetic distances and karyotype data indicate that each population represents an Evolutionary Significant Unit (ESU). Cytogenetic markers showed distinct 2n and karyotype formulas, as well as different numbers and locations of the rDNA sites among ESUs. One of such populations (ESU-D) highlighted an extensive polymorphic condition, with several cytotypes probably due to chromosomal rearrangements and meiotic non-disjunctions. This resulted in several aneuploid karyotypes, which was also supported by the mapping of telomeric sequences. Phylograms based on Maximum Likelihood (ML) and Neighbor Joining (NJ) analyses grouped each ESU on particular highly supported clades, with the estimation of evolutionary divergence indicating values being higher than 3.8–12.3% among them. Our study reveals a huge degree of chromosomal and genetic diversity in B. coracoideus and highly points to the existence of four ESUs in allopatric and sympatric speciation processes. In fact, the high divergences found among the ESUs allowed us to delimitate lineages with taxonomic uncertainties in this nominal species.

Keywords: chromosomal differentiation, molecular taxonomy, ecological adaptations, evolutionary units, banjo catfish

# INTRODUCTION

Bunocephalus is the most species-rich Aspredinidae genus, corresponding to a monophyletic clade (Cardoso, 2008, 2010) with 13 valid species (Eschmeyer et al., 2017). However, many species have their classification put in question (Carvalho et al., 2015), and, consequently, with questionable taxonomy. The genetic divergence among morphologically indistinguishable specimens within a

**33**

single "species" rises doubts on their taxonomic status, once such variations imply in cladogenesis processes (Bickford et al., 2007), which may end up in speciation. Some approaches based only on morphological data may sometimes underestimate such variation due to the phenotypic plasticity evidenced by a significant number of species. Thus, an integrative cytotaxonomic, molecular and morphological analysis is required, attempting to elucidate the real taxonomic status of polymorphic species.

Cytogenetic studies, including the molecular organization and cytogenetic mapping of repetitive DNAs might be a significant data set for the characterization of particular segments of biota, providing important information for phylogenomics (Cioffi et al., 2012). Besides, these sequences seem to escape the selective pressure that acts in the non-repetitive segments, thus representing good evolutionary markers to detect recent events of evolution, once the number and location of these sequences may reveal polymorphisms, with intra- and interspecific variations due to rearrangements, even in conserved karyotypes (Cioffi et al., 2009; Matoso et al., 2011; Motta-Neto et al., 2012; Oliveira et al., 2015). Up to now, only two Bunocephalus species had cytogenetic studies already conducted, B. doriae and B. coracoideus. While the first one has 2n = 50 chromosomes, the later presents 2n = 42, in addition to a multiple X1X1X2X2/X1Y1X2Y<sup>2</sup> sex chromosome system (Fenocchio and Swarça, 2012; Ferreira et al., 2016).

Besides, genetic data can unmask distinct populations covered by a same taxonomic status, which are identified as Significant Evolutionary Units (ESUs) by the conservation biology area. Thus, ESU corresponds to a population, or even to a group of populations, genetically distinct within a given species that contribute to biodiversity (Hey et al., 2003). The ESUs recognition is a task that requires the accordance of a group of procedures as an identification criterion.

TABLE 1 | Estimates of evolutionary divergence between sequences using the COI gene and the K2P model.


Values are shown as percentages.

The present study is a contribution to the biodiversity presented by B. coracoideus using DNA barcoding and conventional and molecular cytogenetic methodologies. It was analyzed four allopatric populations from the Amazonian hydrographic basin and the results were able to highlight a huge cryptic diversity both intra- and inter-populations, pointing out that B. coracoideus corresponds to a species complex.

#### MATERIALS AND METHODS

#### Specimens

Individuals of Bunocephalus coracoideus from four populations of distinct drainages of the Amazon River were analyzed (**Figure 1** and **Table 1**). The specimens were collected under appropriate authorization of the Brazilian environmental agency ICMBIO/SISBIO (License number 48795-1). All specimens were properly identified by morphological criteria and voucher samples were deposited in the fish collections of the National Institute for Amazon Research (INPA: Instituto Nacional de Pesquisas da Amazônia). The experiments followed ethical and anesthesia conducts, in accordance with the Ethics Committee for Animal Use of the National Institute of Amazon Research, under the protocol number 010/2015.


FIGURE 2 | Bunocephalus coracoideus karyotypes. On the left, Giemsa staining; on the right. Double-FISH evidencing chromosome pairs bearing the 18S rDNA (red) and 5S rDNA (green) sequences. (A,A<sup>0</sup> ) ESU-A Igarapé Jundiá – Cuieiras River, (B,B<sup>0</sup> ) ESU –B Demini River (Cuieiras River) and (C,C<sup>0</sup> ) ESU-C Igarapé Apeú – Guamá River. Bar = 10 µm.

# Mitotic and Meiotic Chromosomal Preparations

Mitotic chromosomes were obtained from the kidney cells according to the protocol described by Gold et al. (1990), using RPMI culture medium (Cultilab). Male testis were used for meiotic preparations, following the protocols described by Bertollo et al. (1978), with changes introduced by Gross et al. (2009). For conventional cytogenetic analysis the chromosomes were stained with 5% Giemsa solution (pH 6.8).

# Preparation of FISH Probes

The GoTaq Colorless Master Mix (Promega) was used for the polymerase chain reaction (PCR) amplification of the 18S and 5S rRNA genes and telomeric sequences, using the following primers: 18Sf (5<sup>0</sup> -CCG CTT TGG TGA CTC TTG AT-3<sup>0</sup> ), 18Sr (50 -CCG AGGACC TCA CTA AAC CA-3<sup>0</sup> ) (Gross et al., 2010), 5S A (5<sup>0</sup> -TAC GCC CGA TCT CGT CCG ATC-3<sup>0</sup> ), and 5S B (5<sup>0</sup> - CAGGCT GGT ATG GCC GTA AGC-3<sup>0</sup> ) (Martins and Galetti, 1999). The ribosomal sequence amplification cycles comprised a denaturation for 2 min at 95◦C; 35 cycles of 1 min at 94◦C, 30 seg. at 56◦C, and 1.5 min at 72◦C; a final extension of 5 min at 72◦C; and a cooling period at 4◦C. The primers (TTAGGG)5 and (CCCTAA)5 (Ijdo et al., 1991), were used to obtaining telomeric sequences. PCR was performed with the following profile: 4 min at 94◦C; 12 cycles of 1 min at 94◦C, 45 s at 52◦C, and 1.5 min at 72◦C; and 35 cycles of 1 min at 94◦C, 1.5 min at 60◦C, and 1.5 min at 72◦C. The 18S rDNA and the telomeric probes were labeled with digoxigenin-11 dUTP using a DIG-Nick Translation Mix kit (Roche), while the 5S rDNA probe was labeled with biotin-14 dATP using a Biotin-Nick Translation Mix kit (Roche) according to the manufacturer's instructions.

Frontiers in Genetics | www.frontiersin.org

**36**

# Detection of Repetitive DNA Sequences by FISH

Fluorescence in situ hybridization (FISH) was performed according to the protocol described by Pinkel et al. (1986), with some modifications. The 18S rDNA and telomeric probes were detected with Anti-digoxigenin-rhodamin (Roche, Basel, Switzerland), while the 5S rDNA probe was detected with avidin-FITC (Sigma). Chromosomes were counterstained with DAPI (1.2 µg/ml) and slides mounted with antifade solution (Vector, Burlingame, CA, United States).

# Microscopic Analysis and Image Processing

At least 30 metaphase spreads per individual were analyzed to confirm the 2n, karyotype structure and FISH results. Images were captured using an Olympus BX50 microscope (Olympus Corporation, Ishikawa, Japan) with CoolSNAP camera and the images processed using Image Pro Plus 4.1 software (Media Cybernetics, Silver Spring, MD, United States). The chromosome classification followed the method proposed by Levan et al. (1964), with the following limits for the arms relationship (AR): AR = 1.00–1.70, metacentric (m); AR = 1.71–3.00, submetacentric (sm); AR = 3.01–7.00, subtelocentric (st); and AR > 7.00, acrocentric (a). For the number of chromosome arms [fundamental number (FN)], the metacentric, submetacentric, and subtelocentric chromosomes was considered having two chromosomal arms and the acrocentric chromosomes a single one.

#### DNA Barcoding Analysis

Representatives of each population were used (**Table 1**). Bunocephalus cf. aloikae, B. amaurus, and Amaralia hypsiura species were used as out groups. Tissues of liver and muscle were stored in absolute ethanol for the acquisition of Cytochrome C Oxidase Subunit 1 (COI) sequences. Total DNA was obtained

fgene-08-00120 September 19, 2017 Time: 16:32 # 4

Frontiers in Genetics | www.frontiersin.org

FIGURE 4 | FISH with telomeric probe. (A) ESU-A Igarapé Jundiá – Cuieiras River, (B) SEU-B Demini River, (C) SEU-C Igarapé apeú – Guamá River, and (D1−D14) SEU-D1−<sup>14</sup> Purus River. Arrows indicate the occurence of interstitial telomeric sites (ITS). Bar = 10 µm.

by Wizard <sup>R</sup> Genomic DNA Purification Kit. The pair of primers used for the COI mitochondrial region amplification in PCR reactions was VF1\_t1 (TGT AAA ACG GCC AGT CAA CCA ACC ACA AAG ACA TTG G) + VR1\_t1 (CAG GAA ACA GCT ATG ACT AGA CTT CTG GGT GGC CAA AGA ATC A) (Ivanova et al., 2006). Each PCR reaction presented a final volume of 25 µl containing 1 µl of DNA template [250 ng/µl] + 1 µl of each primer [5 pM]. It was used the GoTaq Colorless Master Mix <sup>R</sup> (Promega) for the PCR. The amplification cycles comprised denaturation, 2 min at 95◦C; 35 cycles of 1 min at 94◦C, 30 seg. at 56◦C, and 1.5 min at 72◦C; a final extension of 5 min at 72◦C; and a cooling period at 4◦C. PCR products were visualized on a 1.7% agarose gel and purified with 20% PEG (Lis, 1980). For sequencing it was used a "Big Dye Sequence Terminator v.3.1" kit (Applied Biosystems), according to the manufacturer instructions. The amplification conditions were comprised 25 cycles at 96◦C for 30 seg.; 15 seg. at 50◦C; and 4 min at 60◦C. After the reaction, the products were precipitated and sequenced (sequencer model ABI PRISM 3100 Genetic Analyzer from Applied Biosystems/made by HITACHI).

### Sequence Alignment and Phylogenetic Analysis

Sequences with 690 pb were used to perform the barcoding analyses, by using COI gene, which were aligned using the Geneious <sup>R</sup> 10.1.3 software. The distance model of Kimura 2 parameters (Kimura, 1980), was used to build a Neighbor-Joining (NJ) dendrogram and a bootstrap analyses was performed (Felsenstein, 1985) with 1,000 replicates. All the aligned sequences were translated into amino acids to detect possible alignment errors. The Maximum Likelihood (ML) model (Tamura et al., 2004) was performed to recover the phylogenetic topology. All positions containing gaps and missing data were eliminated. There were a total of 461 positions in the final dataset. Pairwise genetic distance calculations and NJ tree analysis were implemented using Molecular Evolutionary Genetics Analysis version 5 (MEGA5) software (Tamura et al., 2011) and applying 1,000 bootstrap replicates.

# RESULTS

# Cytogenetic Data

The four B. coracoideus populations presented distinct karyotypes, which classified as evolutionary significant units (ESUs), with the following characteristics ESU-A: 42 chromosomes (16m+20sm+4st+2a, NF = 82) from Igarapé Jundiá – Cuieiras River (**Figure 2A**), ESU-B: 44 chromosomes (2m+14sm+2st+26a, NF = 62) from Demini River (**Figure 2B**) and ESU-C: 56 chromosomes (4m+12sm+6st+34a, NF = 78) from Igarapé Apeú – Guamá River (**Figure 2C**). There was no karyotype differentiation among males and females in these populations. For the ESU-D, from Purus River, 14 distinct cytotypes bearing variant chromosomes in number and morphology were observed, with 2n varying from 40 to 46. Due to such variation, chromosomes were not grouped in pairs once homeology was not usually found among the distinct cytotypes (**Figure 3**). Significant variations were also found concerning the 18S rDNA carrier chromosomes, localization and number of sites. For ESUs A, B, and C such sequences were found in one or five chromosomes pairs, in the pericentromeric or telomeric regions, in the short or long arms (**Figure 2**). Similar results were also detected among the cytotypes of the ESU-D. However, in this case, although the telomeric position of the sites was consistent for all cytotypes, they were found in the short arms of two chromosomes in the cytotypes D1, D2, D4, D5, D6, D7,

fgene-08-00120 September 19, 2017 Time: 16:32 # 5

FIGURE 5 | Testicular meiotic cell plates from the ESU-D in Giemsa staining. (a) Zigotene cells, with a possible tetravalent arrangement; (b–e) Aneuploid pachytene cells and arrows evidencing chiasmata formation; (f) Diplotene cells, with associated chromosomes likely in a chromosomal chain organization. Bar = 10 µm.

FIGURE 6 | Phylogeny of B. coracoideus inferred by the analysis of (A) Maximum Likelihood (ML) and (B) Neighbor-Joining (NJ) using the mitochondrial gene COI. The bootstrap values for 1000 replications are evidenced above the branches.

D8, D9, D11, D<sup>12</sup> and D14, but in only one chromosome in the cytotypes D3 and D10. In turn, the cytotype D<sup>13</sup> highlighted two chromosomes bearing sites in the short arms and three chromosomes in the long arms (**Figure 3**).

Likewise, the 5S rDNA also showed great variation among ESUs. Like for 18S rDNA, with distinct chromosomes carrying these sequences and with variations in number and localization on the chromosomes. ESU-A showed only two chromosome pairs carrying 5S sequences, while ESUs B, C, and D presented a higher number of these sites. Although maintaining the preferential telomeric localization, interstitial positions were also highlighted mainly among ESU-D cytotypes. In addition, the syntenic localization with the 18 rDNA was evidenced in three chromosome pairs of the ESU-B, as well as in two chromosomes of the ESU-D – cytotype D<sup>13</sup> (**Figures 2**, **3**).

The mapping of telomeric sequences evidenced only the usual terminal marks on the chromosomes of the ESUs A, B, and C. In turn, the ESU-D exhibited additional interstitial sites (ITS) in seven cytotypes, D1, D4, D7, D8, D9, D12, and D<sup>14</sup> (**Figure 4**).

Meiotic plates from individuals of the ESU-D, showed a variable number of chromosomes corroborating the diversity found in the mitotic chromosomes. From 18 to 22 bivalents were evidenced, in addition to interstitial chiasmata and synaptic points, and probable tetravalent and chromosomal chain formations (**Figure 5**).

#### DNA Barcoding Analysis

Topologies obtained with the Neighbor-Joining (NJ) and Maximum Likelihood (ML) algorithms were congruent. The major clades were well-supported and it was confirmed that Bunocephalus represents a monophyletic group (**Figure 5**). Each population was also grouped as a monophyletic and wellsupported clade, justifying them as four ESUs. ESU-A occupies the more basal position in relation to the other ones, and the ESUs C and D are more related to each other and with a more recent divergence (**Figure 6**).

#### DISCUSSION

The geographic distribution, genetic distances, and karyotype data indicated that each B. coracoideus population represents an ESU. In fact, these populations differed by conspicuous karyotypes variability, where each ESU shows specificities on their 2n, karyotype formula and ribosomal sites distribution in the genome (**Figures 7**, **8**). In addition, they have possibly evolved in allopatry due to vicariant events, making their natural contact unfeasible. Oliveira and Gosztonyi (2000) proposed that the ancestral karyotype of Siluriformes contained 2n = 56 chromosomes, mainly two-armed ones. According to our phylogenetic data, ESU-A (2n = 42; 16m+20sm+4st+2a) corresponds to the firstly differentiated karyotype among the four populations analyzed. In this way, ESU- A probably retains an ancestral feature of Siluriformes by the large number of bi-armed chromosomes they have, but with the reduction of the 2n due to chromosomal fusions. In this sense, the other ESUs share a synapomorphic condition by presenting karyotypes mostly composed by acrocentric chromosomes, where pericentric inversions and/or centric fissions may have played a role. Such feature is also found in other Bunocephalus species, such as B. doriae (Fenocchio and Swarça, 2012) and B. coracoideus population from the Negro River (Ferreira et al., 2016), in

which acrocentric chromosomes are also mainly composing the karyotype. Thus, taking in account the Oliveira and Gosztonyi (2000) proposition, it can be considered that the Bunocephalus ESUs A, B, and D presents a trend toward the reduction of the chromosome number in relation to other Siluriformes, while ESU-C maintained the probable ancestral 2n = 56 chromosomes.

Chromosomal rearrangements can play a role on speciation processes as they may act in the reproductive isolation (Mayr, 1995), generating an useful investigation area concerning genetic variability (Faria and Navarro, 2010), Indeed, in some distinct Neotropical fish species, such as Hoplias malabaricus (Bertollo et al., 2000), Astyanax fasciatus (Pazza et al., 2006), Astyanax scabripinnis (Moreira-Filho and Bertollo, 1991; Maistro et al., 2000), and Hoplerythinus unitaeniatus (Giuliano-Caetano et al., 2001), the chromosomal diversification raised the hypothesis that they may encompass different species under a same nomenclature.

Additionally, both number and location of rDNA sites were highly variable among ESUs (**Figures 2**, **3**), highlighting their dynamic behavior in the genomes and in generation of the genetic diversity among populations. Besides, it seems that multiple 5S rDNA sites represent a synapomorphy in B. coracoideus, since all populations analyzed present such condition. Accordingly, ESU-A presents the lowest number of such sites (in only two pairs of chromosomes), thus representing a basal condition (**Figure 8**). Since the accumulation of repetitive sequences in particular genomic areas can cause chromosomal rearrangements (Lim and Simmons, 1994; Dimitri et al., 1997), the dynamic behavior of rRNA genes might also be linked with the huge karyotype diversity presented by this nominal species.

The occurrence of synteny between the 5S and 18S rRNA genes in the ESUs B and D (**Figure 8B**, D13) represents an uncommon condition among vertebrates (Martins and Galetti, 2001), once these genes are transcribed by distinct RNA polymerases, suggesting the need to be distant from each other or allocated in different chromosomes, avoiding possible harmful rearrangements between them (Amarasinghe and Carlson, 1998; Martins and Galetti, 1999). However, in Siluriformes, the syntenic condition for such both rDNA classes was already found for several species, such as Imparfinis mirini and I. minutus (Ferreira et al., 2014), Ancistrus maximus, A. ranunculus, A. dolichopterus, Ancistrus aff. dolichopterus (Favarato et al., 2016), Hemibagrus wyckii (Supiwong et al., 2014), Corydoras carlae (Rocha et al., 2016), Panaqolus sp. (Ayres-Alves et al., 2017) and in B. coracoideus (present study). Nevertheless, the evolutionary paths taking to the selection of this apparently non-advantageous condition are not revealed yet but, at first, it appears to be not a deleterious character.

An outstanding finding in our study is the huge karyotype diversity found in the ESU D. In fact, this population highlighted many varying cytotypes living in sympatry. Apparently, such polymorphism does not appear to represent effective reproductive barriers capable to impair crosses, at least among some different cytotypes, increasing the chromosome diversity inside the population. Similar intrapopulacional features were also highlighted in the Characidae, Astyanax fasciatus, which presented two well-defined cytotypes, 2n = 46 and 2n = 48, but with numeric and structural chromosome variants when they occur in sympatry (Pazza et al., 2006, 2007, 2008). It is noteworthy that in both cases, ESU-D and Astyanax fasciatus, the variant karyotypes apparently do not demonstrate deleterious phenotypic effects on the carriers. However, the hypothesis that such degree of chromosomal diversity may affect, in some way, the homeostasis of the segregation cannot be fully discarded.

To better investigate the extension of the polymorphism inside ESU-D, we extended our analyzes to the chromosomal behavior during meiosis, since it was found monosomies and trisomies in nearly all cytotypes. In order to confirm this condition, meiotic plates of three individuals were analyzed, and attested that,


TABLE 2 | Aspredinidae specimens analyzed in the present study, with their respective collection places, number of individuals, diploid number (2n), and identification.

<sup>∗</sup>Biological Dynamics of Forest Fragments Project.

during meiosis I, a clear numeric variation can be observed. In fact, different bivalent numbers were found in pachytene cells of the same individual, as well as probable trivalents with synapses points. In addition, a typical tetravalent formation and an apparent chromosomal chain were also observed in zygotene and diplotene cells, respectively (**Figure 5**), and such events might have contributed for irregular segregations. It is known that chromosomal rearrangements can alter the homologs pairing during meiosis and, as a consequence, provide unbalanced gametes (Davisson and Akeson, 1993; Navarro and Ruiz, 1997; Spirito, 1998). In this way, non-disjunction events during meiosis may result in aneuploid individuals, a factor that may, at least in part, explain the polymorphic condition found in the ESU-D population. In addition, a second factor probably related to such biodiversity relies on the ecological conditions in the Purus River basin, where ESU-D occurs. This region is located in a lowland area subjected to water flooding, influenced by the seasonality of the river level (Haugaasen and Peres, 2006). These flooded forests, which appear on the rainy season, form complexes labyrinths made by tree logs, rocks and every type of vegetation common to such environments (Luize et al., 2015). This particular habitat favors fish dispersion and the consequent subpopulations segregation until their future reconnection during the dry periods.

Thus, the evolutionary scenario for the ESU-D is that chromosomal rearrangements have occurred and that geographic isolation periods, due to flood pulse cycles may have favored their fixation in the population. During the flood periods, the reestablishment of the physical connection among the previously isolated aquatic environments allowed gene flow among them and, as a consequence, the variety of the cytotypes observed among the population. This hypothesis is reinforced by the ITS found in several cytotypes, indicating the occurrence of chromosomal rearrangements (**Figure 4**).

The DNA barcoding analysis is a very informative tool for biodiversity studies. In Salminus fish, for example, it was evidenced eight distinct lineages increasing its current diversity, nowadays limited to four species (Machado et al., 2016). Rhamdia voulezi and Rhamdia branneri, considered synonyms of Rhamdia quelen, are currently argued to constitute valid species supported by karyotype, ecomorphology and morphometric data (Abucarma and Martins-Santos, 2001; Garcia et al., 2010; Mise et al., 2013; Garavello and Shibatta, 2016), as well as by the barcoding DNA analysis (Ribolli et al., 2017).

Facing the karyotype diversity found in B. coracoideus, the DNA barcoding methodology was also useful for analyzing the relationships among populations. In fact, this procedure is a helpful tool for analyzing the occurrence of cryptic species (Smith et al., 2008). Theoretically, the nucleotide divergences between populations of a single species (intraspecific variations) are smaller than the ones between distinct species (interspecific variations), the "barcoding gap" (Ward et al., 2005; Hajibabaei et al., 2006). Most congeneric species have showed substantial nucleotide divergences by means of this molecular marker (Hebert et al., 2003). Intraspecific divergences are rarely superior to 2%, and usually do not overcome 1% (Avise, 2000). For B. coracoideus the intra-population genetic distance did not overcome the value of 2%, except for the Purus population (ESU-D), which presented divergences among the sequences from 0.2 to 10.3%.

Such molecular data corroborated the karyotype diversity allowed us to infer that there is a probable ongoing sympatric speciation process within this population. From the NJ analysis, all the ESUs were supported with bootstrap values higher than 96%. The same occurred with the phylogeny based on ML, except for the ESU-D, which presented a bootstrap value of 83%, reflecting once more the karyotype variation present among the specimens of this population. However, the high value observed supports its identity as an ESU.

The ESU-A presented a mean distance of 10.6% from the other ESUs (**Table 2**), which is a value equivalent to species differences. The bootstrap value of 62% of ML between the ESU-A and the super clade including ESU-B, ESU-C, and ESU-D is much lower to the 95% ML and 99.8% NJ to grouping them. Besides, in the NJ phylogram, the ESU-A is more related to B. cf. aloikae and B. amaurus than to the other ESUs. In addition to its particular karyotype features, ESU-A presents a high value of allopatric speciation and the potential of being a new species. According to Avise and Walker (1999), the high divergences among the ESUs of B.coracoideus allowed us to delimitate lineages with taxonomic uncertainties in this nominal species.

The genetic variability and the natural selection are important conditions for evolutionary changes. Thus, understanding the neutralization of the gene flow or the locking for factors that prevent gene exchanges, such as vicariance, gene mutations and chromosomal rearrangements, are important steps to explain evolutionary processes that frequently lead to speciation (Turelli et al., 2001; Kawakami et al., 2011). Indeed, it is well-known that mutations and chromosome rearrangements can be fixated by genetic drift and, more easily, in small and isolated populations (Jesus et al., 2016), as is the case for the B.coracoideus populations here investigated. However, the great challenge for genetic biodiversity analyzes is to preserve the connection with the natural history and the species nomenclature, with consequent implications on their management and conservation (Pellens et al., 2016). In this sense, a key question that emerges is how to classify the evolutionary history of a specific population concerning their genetic diversification. In fact, the description of new species, based on genetic diversity, still finds some resistance and is not yet fully adopted. In this way, many cryptic species remain undescribed, even after their identification by genetic markers (Schlick-Steiner et al., 2007).

#### REFERENCES


#### CONCLUSION

The diversity of Neotropical freshwater fishes is still largely underestimated (Reis et al., 2016) and requires additional investigations. Nevertheless, a previous challenge remains still to be overcome: "what is a species and what new information is needed to solve this issue?" (Hey, 2001). Our study reveals a huge degree of chromosomal and genetic diversity in B. coracoideus and highly suggests the existence of four ESUs in allopatric and sympatric speciation processes. We believe that they were enough to reveal the occurrence of a B. coracoideus species complex. It indicates that new available methods, such as the genetic variability, can be definitely used in taxonomic procedures.

#### AUTHOR CONTRIBUTIONS

MF and CG performed techniques and analyzed the data; MF, JZ, and EF contributed with reagents, materials and analysis tools; MF, CG, DM, IdJ, MC, LB, JZ, and EF wrote the paper.

### FUNDING

This study was supported by the Brazilian agencies, Instituto Nacional de Pesquisas da Amazônia (INPA), Fundação de Amparo à Pesquisa do Estado do Amazonas (UNIVERSAL AMAZONAS/FAPEAM 030/2013-062.00663/2015); CAPES pró Amazônia; ADAPTA-II INCT para Adaptação da Biota Aquática da Amazônia; Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP 2016/22196-2); and received a productivity grant from CNPq (#313183/2014-7).

#### ACKNOWLEDGMENTS

The authors thank to Douglas Bastos for the collaboration during the acquirement of the specimens and to Leonardo Goll for the help in meiosis analysis. This is the study number 725 of the technical series of the Biological Dynamics of Forest Fragments Project – PDBFF (INPA/STRI).

and chromosomal organization of repetitive DNA sequences in species of Panaque, Panaqolus, and Scobinancistrus (Siluriformes and Loricariidae) from the Amazon Basin. Zebrafish 14, 251–260. doi: 10.1089/zeb.2016. 1373



rearrangements. J. Fish Biol. 76, 1117–1127. doi: 10.1111/j.1095-8649.2010. 02550.x



River Basin. Neotrop. Ichthyol. 15:e160147. doi: 10.1590/1982-0224-2016 0147


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Ferreira, Garcia, Matoso, de Jesus, Cioffi, Bertollo, Zuanon and Feldberg. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Flow Cytometry Protocol to Estimate DNA Content in the Yellowtail Tetra Astyanax altiparanae

Pedro L. P. Xavier <sup>1</sup> , José A. Senhorini <sup>1</sup> , Matheus Pereira-Santos <sup>2</sup> , Takafumi Fujimoto<sup>3</sup> , Eduardo Shimoda<sup>4</sup> , Luciano A. Silva<sup>5</sup> , Silvio A. dos Santos <sup>6</sup> and George S. Yasui <sup>1</sup> \*

<sup>1</sup> National Center for Research and Conservation of Continental Fish, Chico Mendes Institute of Biodiversity Conservation, Pirassununga, Brazil, <sup>2</sup> Aquaculture Center, Sao Paulo State University, Jaboticabal, Brazil, <sup>3</sup> Faculty of Fisheries Sciences, Hokkaido University, Hakodate, Japan, <sup>4</sup> Department of Pharmacy, Cândido Mendes University, Rio de Janeiro, Brazil, <sup>5</sup> Department of Veterinary Medicine, University of Sao Paulo, Pirassununga, Brazil, <sup>6</sup> AES Tietê, Promissão, Brazil

The production of triploid yellowtail tetra Astyanax altiparanae is a key factor to obtain permanently sterile individuals by chromosome set manipulation. Flow cytometric analysis is the main tool for confirmation of the resultant triploids individuals, but very few protocols are specific for A. altiparanae species. The current study has developed a protocol to estimate DNA content in this species. Furthermore, a protocol for long-term storage of dorsal fins used for flow cytometry analysis was established. The combination of five solutions with three detergents (Nonidet P-40 Substitute, Tween 20, and Triton X-100) at 0.1, 0.2, and 0.4% concentration was evaluated. Using the best solution from this first experiment, the addition of trypsin (0.125, 0.25, and 0.5%) and sucrose (74 mM) and the effects of increased concentrations of the detergents at 0.6 and 1.2% concentration were also evaluated. After adjustment of the protocol for flow cytometry, preservation of somatic tissue or isolated nuclei was also evaluated by freezing (at −20◦C) and fixation in saturated NaCl solution, acetic methanol (1:3), ethanol, and formalin at 10% for 30 or 60 days of storage at 25◦C. Flow cytometry analysis in yellowtail tetra species was optimized using the following conditions: lysis solution: 9.53 mM MgCl2.7H20; 47.67 mM KCl; 15 mM Tris; 74 mM sucrose, 0.6% Triton X-100, pH 8.0; staining solution: Dulbecco's PBS with DAPI 1 µg mL−<sup>1</sup> ; preservation procedure: somatic cells (dorsal fin samples) frozen at −20◦C. Using this protocol, samples may be stored up to 60 days with good accuracy for flow cytometry analysis.

Keywords: yellowtail tetra, ploidy status, fish, flow cytometry, sample preservation

# INTRODUCTION

The yellowtail tetra, Astyanax altiparanae, has been highlighted as a great model fish for laboratory studies, aquaculture (Prioli et al., 2002; dos Santos et al., 2016), basics and applied research for reproductive technologies, conservation of valuable genetic resources and establishment of gene banks for endangered species using surrogate technologies (Yamaha et al., 2007).

Surrogate propagation permits a fish to produce gametes from another species, and in this case, the yellowtail tetra is strategic because this species can be manipulated to produce gametes from related endangered species. This technology requires sterile host individuals to ensure the production of exogenous gametes. Some protocols established the sterilization of the yellowtail

#### Edited by:

Roberto Ferreira Artoni, Ponta Grossa State University, Brazil

#### Reviewed by:

Lenin Arias Rodriguez, Universidad Juárez Autónoma de Tabasco, Mexico Diogo Teruo Hashimoto, Sao Paulo State University, Brazil

> \*Correspondence: George S. Yasui yasui@usp.br

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 27 July 2017 Accepted: 06 September 2017 Published: 25 September 2017

#### Citation:

Xavier PLP, Senhorini JA, Pereira-Santos M, Fujimoto T, Shimoda E, Silva LA, dos Santos SA and Yasui GS (2017) A Flow Cytometry Protocol to Estimate DNA Content in the Yellowtail Tetra Astyanax altiparanae. Front. Genet. 8:131. doi: 10.3389/fgene.2017.00131

**45**

tetra by chronic exposure to high temperatures. However, sterilization was maintained only for a few weeks (de Siqueira-Silva et al., 2015). On the other hand, artificial induction of triploid by chromosome set manipulation using temperature shock has been promising in several fish species like tilapia and yellowtail tetra may be an efficient method to obtain permanent sterile individuals (Mair, 1993; Piferrer et al., 2009). In previous work (Adamov et al., 2016; do Nascimento et al., 2017), a protocol was established to produce triploid yellowtail tetra, and the ploidy status was confirmed by flow cytometry using a general protocol for plants and other animal species, but several samples could not be analyzed due to inaccurate analysis. It suggests that there is a specific protocol needed for this fish species.

Identification of triploids can be achieved using different methods including nuclear volume of fish erythrocytes (Allen and Stanley, 1978, 1979) and chromosome counts by karyotype (Allen et al., 1982). However, flow cytometry analysis is currently the most effective, rapid, and accurate method to identify the ploidy status. Flow cytometry analysis is a valuable tool for chromosomal studies in fish species, including cell cycle analysis, confirmation of the ploidy status, and determination of genome size in different fish species including the loach (Zhou et al., 2008), Atlantic salmon (Allen, 1983), Poecilia formosa (Lampert et al., 2008), and other teleost species (Ciudad et al., 2002; Zhu et al., 2012). However, the most protocols used for those analysis employs especially propidium iodide protocols and commercial kits that despite exhibit successfully results are usually expensive and not tested for fish species.

In such analysis, tissue sampling is critical since samples rapidly loss the viability if they are fresh and quality if previous fixatives was used for flow cytometry analysis. However, some times such samples are usually obtained in field conditions and have to be transported alive into the laboratory facilities for subsequent analysis. Furthermore, it is usually necessary to process large number of samples and such a situation require the development of protocols for long-term storage of these samples during the analysis. Some protocols are available for species such as the grass carp (Brown et al., 2000), Chinese grass carp (Burns et al., 1986), Atlantic salmon and shellfish (Allen, 1983), but there is not a protocol for Neotropical species like the yellowtail tetra.

Therefore, the aim of this study was firstly to establish a two-step protocol to estimate the DNA content in the yellowtail tetra, A. altiparanae by flow cytometry. In addition, an efficient protocol for long-term storage of tissue used for subsequent flow cytometry analysis was established. The combination of both procedures may give rise to a reliable determination of ploidy status in the yellowtail tetra and other important related species.

# MATERIALS AND METHODS

All the procedures were accepted by the Guide for the Care and Use of Laboratory Animals of CEPTA (CEUA #02031.000033/2015-11).

#### General Procedure for Flow Cytometry

Adult yellowtail tetra A. altiparanae were collected from the Mogi-Guassu river, São Paulo State, Brazil (21.925706 S,



Xavier et al. Flow Cytometry in Characin Fish

TABLE 1 | Continued


Pieces of fin (∼2 mm<sup>2</sup> ) were treated with the solutions and detergents combinations below, stained with DAPI, and analyzed by flow cytometry. Different letters within the columns indicates significant differences by Tukey test (P < 0.05).

47.369496 W). Males and females were used for the analysis, but no differences in DNA content between sexes were found for this species (Martinez et al., 2012). A small piece of dorsal fin (∼2 mm<sup>2</sup> ) was clipped from the fish and then placed in a 1.5 mL microtube containing 120 µL of lysing solution. This lysing solution was prepared using detergents and other components (see following experiment sections). The samples were incubated at room temperature for 30 min with occasional mixing. Staining was achieved by addition 800 µL of Calcium-Free Dulbecco's PBS (Sigma #D5773) containing DAPI at 1 µg mL−<sup>1</sup> . The samples were then filtered by a 30-µm mesh (Celltrics, Partec GmBH, Germany). Stained samples were then analyzed by a Partec CyFlow Plody Analyzer (Partec GmBH, Germany) with a specific filter set for DAPI excitation (358 nm). As a control group, a commercial kit specific for aquaculture and plants (Partec CyStain DNA 2-step, Partec GmBH, Germany) was used.

#### Solutions and Detergents for Cell Lysing

In this study, five (1–5) solutions (S) for cell lysing based on plant flow cytometry were compared: **S1** (Dpoolezel et al., 1989): 15 mM Tris, 2 mM Na2EDTA, 80 mM KCl, 20 mM NaCl; **S2** (Arumuganathan and Earle, 1991): 9.53 mM MgCl4, 47.67 mM KCl, 15 mM Tris; **S3** (Galbraith et al., 1983): 45 mM MgCl2, 30 mM sodium citrate, 2 mM NaHCO3; **S4** (Marie and Brown, 1993): 50 mM glucose, 15 mM KCl, 15 mM NaCl, 5 mM Na2EDTA, 50 mM sodium citrate; **S5** (Pfosser et al., 1995): 200 mM tris, 4 mM MgCl2.6H2O. In each solution mentioned above, three detergents (Tween-20, triton-X, and nonidet P-40 substitute) were added at three concentrations each: 0.1, 0.2, and 0.4%. The combination of five solutions, three detergents at three concentrations, and the control group (commercial kit) gave rise to 46 lysing treatments. Dorsal fin samples were processed in each solution as mentioned above and then were checked for the best results according to the peak quality. This experiment was performed in triplicates.

#### Evaluation of High Concentrations of Detergents, Trypsin, and Sucrose Additions

Using the best combination of detergents and the solution obtained above (i.e., Arumuganathan and Earle, 1991), detergent concentrations at 0.15, 0.3, 0.6, and 1.2% were evaluated with or without the addition of sucrose (74 mM). Using the best result, the effects of trypsin addition (at 0.125, 0.25, and 0.5%) were also evaluated on peak quality.

#### Fixatives and Preservation Strategies

In this experiment, two kinds of samples (fish fin or isolated nuclei) were evaluated. For each of them, five procedures were evaluated: freezing at −20◦C, ethanol 70%, acetic methanol (1 part of acetic acid:3 parts of methanol), saline solution (saturated NaCl solution), and formalin at 10%. For preservation of tissue, about ∼2 mm<sup>2</sup> of dorsal fin from the A. altiparanae were placed in 1.5 mL microtubes containing 1 mL of each of fixatives and maintained at 25◦C in a BOD incubator under dark conditions. For freezing, the sample was placed in a 1.5 mL microtube containing 200 µL of 0.9% NaCl and then directly frozen at −20◦C in a biomedical freezer. Isolated nuclei were obtained from dorsal fins (∼2 mm<sup>2</sup> ) using 100 µL of lysing solution obtained in previous experiment. For preservation of nuclei, 1 mL of each fixative was added to each nuclei suspension and then maintained at 25◦C. Nuclei suspensions were also frozen at −20◦C. Flow cytometry analysis were performed at 30 and 60 days of storage. Fixed tissue was collected and washed in Dulbecco's PBS and then processed in lysing solution before staining. Nuclei suspensions were centrifuged at 12,000 G for 2 min, and the fixatives were removed by pipetting. The nuclei pellet was re-suspended in staining solution and then analyzed on flow cytometer. As control groups, fresh (non-fixed) samples from day 0, 30, and 60 were used. For each of the 21 treatments, 10 replications were used.

#### Statistical Analysis

Data are shown as mean ± SD. Data were checked for normality using the Liliefor test and also compared using ANOVA followed by Tukey's multiple range test (P < 0.05). Statistica version 11 were used for statistical analysis.

#### RESULTS

# Solutions and Detergents for Cell Lysing

In the control group that used commercial kit, two clear peaks arose: one with 2C content and the other with 4C, originating from dividing cells (**Figure 1A**). On the other hand, none of the combinations of the solutions and detergents evaluated generated clear peaks. Problems included noisy peaks (**Figures 1B–F**), deviation in the DNA content (**Figure 1G**), absence of peaks (**Figure 1H**), and in most cases, low amounts of particles were found (**Figure 1I**). As observed in **Table 1**, the only solution that kept the same relative DNA content in comparison to the control, independent of the detergent or concentration, was the

TABLE 2 | Evaluation of trypsin additions for flow cytometry in fish.


S2 with 0.6% Triton-X was combined with trypsin at 0.125, 0.25, and 0.5% and 74 mM Sucrose and measured the relative DNA content and number of isolated particles. Pieces of fin (2 mm<sup>2</sup> ) were treated with the solutions, stained with DAPI and analyzed by flow cytometry. Different letters within the columns indicates significant differences by Tukey test (P < 0.05).

S2. Furthermore, the S2 solution with Triton at 0.1% presented a good number of particles isolated and the best peak quality in terms of the histograms (**Table 1**). All other solutions presented poor results in regards to quality of peaks, DNA content, and number of particles. Considering the quality of peaks, DNA content, and the number of particles, S2 presented the best results and was used in later experiments.

### Evaluation of High Concentrations of Detergents, Trypsin and Sucrose Additions

As observed in **Figure 2**, the addition of trypsin at 0.125 (**Figure 2B**), 0.25 (**Figure 2C**), and 0.5% (**Figure 2D**) presented detrimental effects on the flow cytometry analysis, and lower peaks arose with increasing concentrations of trypsin. Such data coincided with lower amounts of particles (**Table 2**), although it was significantly decreased only at 0.5% (74.3 ± 24.0, P = 0.0491). The DNA content did not present any statistical differences (P = 0.0578) and ranged from 0.99 ± 0.03 (at 0.5% trypsin) to 1.04 ± 0.01 (at 0.125% trypsin).

Flow cytometry using S2 solution associated with triton-X at 0.15, 0.3, and 0.6% gave rise to noisy peaks (**Figures 3A–C**). The addition of sucrose at 74 mM associated with triton-X at the same concentrations produced clearer peaks, and at 0.3% (**Figure 3E**) and 0.6% (**Figure 3F**), but not at 0.15% (**Figure 3D**), two clear 2 and 4C peaks arose. Relative DNA content (**Table 3**) was decreased in Triton-X at 1.2% (0.85 ± 0.03 C, P = 0.0155), and sucrose with Triton-X at 0.15% (0.87 ± 0.02 C, P = 0.0492), 0.3% (0.85 ± 0.01 C, P = 0.0155), and 0.6% (0.84 ± 0.02 C; P = 0.1054). The number of particles from treatment S2 with triton-X at 0.125% was 927.0 ± 132.4 and did not present statistical differences (P = 0.0538) when compared with the control (2,716.3 ± 1,822.2). However, other treatments had a significant decrease and ranged from 194.7 ± 37.1 (S2 and triton-X at 0.6%) to 826.3 ± 67.0 (S2 with 0.3% triton X and 74 mM sucrose).

As there were promising results with 0.6% triton-X, higher concentrations of other detergents at 0.6 and 1.2% were also evaluated. The results using S2 and sucrose presented good results and Triton-X, Tween-20, and Nonidet P-40 presented similar results, both at 0.6 and 1.2% (**Figure 4**), with a main peak at 2C and a secondary peak at 4C. The DNA content did not present significant differences (P = 0.0779) within all the detergents and concentrations evaluated in this study and ranged from 0.99 ± 0.01 C (1.2% Nonidet P-40) to 1.03 ± 0.01 C (1.2% Tween-20). Similarly, all the treatments did not present significant differences regarding the number of particles (P = 0.0602), and this parameter ranged from 725.3 ± 119.0 (0.6% triton) to 1095.3 ± 183.4 (1.2% Tween-20).

Based on the results above, Triton-X at 0.6% gave better results on nuclei isolation (**Table 4**) and subsequent flow cytometric analysis (**Figure 4C**) and, thus, was then chosen for later experiments.

#### Fixatives and Preservation Strategies

After 30 and 60 days of fixation, only frozen dorsal fin samples maintained the ability for flow cytometric analysis (**Figure 5**).


S2 was associated Triton-X at 0.15, 0.30, 0.60, 1.20% and sucrose at 74 mM and measured the DNA content and number of particles. Pieces of fin (2 mm<sup>2</sup> ) were treated with the solutions, stained with DAPI and analyzed by flow cytometry. Different letters within the columns indicates significant differences by Tukey test (P < 0.05).

Very clear peaks arose (**Figures 5B,C**), like samples from the control groups with fresh tissue (**Figure 5A**). Preserved dorsal fin using 70% ethanol produced peaks after 30 and 60 days (**Figures 5D,E**), but deviation of the peak occurred in both cases. Other fixatives gave rise to noisy peaks or, in most cases, did not show any peak (**Figure 5F**). The preservation of isolated nuclei as a new strategy failed in all cases giving rise to noisy or very few particles.

After preservation in 70% ethanol, fin samples presented increased DNA content in flow cytometry analysis after 30 days (1.14 ± 0.04; P = 0.0001) and 60 days (1.21 ± 0.06 C; P = 0.0001) when compared with control group (1.00 ± 0.02 C) (**Table 5**). Nucleus preserved in formaldehyde after 30 days presented decreased DNA content (0.86 ± 0.04; P = 0.0001). A similar result was observed for fin samples preserved with formaldehyde for 60 days (0.80 ± 0.07; P = 0.0001) and nucleus preserved for 60 days with salt (0.89 ± 0.10 C; P = 0.0011), formaldehyde (0.78 ± 0.04 C; P = 0.0001), ethanol (0.72 ± 0.03 C; P = 0.0001), and methanol (0.72 ± 0.05 C; P = 0.0001).

The number of particles was reduced in all cases when compared with fresh samples (2215.6 ± 1015.9; P = 0.0001). However, the best results were achieved with frozen dorsal fins after 30 days (892.9 ± 357.9) and 60 days (862.2 ± 467.8).

### DISCUSSION

Successful flow cytometry analysis depends on several aspects including the interaction of the sample and the components of lysing solutions (Loureiro et al., 2007). An adequate combination of osmolality, buffers, stabilizers, and detergents is then required to achieve good results. A two-step procedure with cell lysing and nuclear staining, solubilization of cell membrane and subsequent preservation of nuclear envelope is the key point for flow cytometry, and combination of detergents, solutions, and membrane stabilizer was successful for the studied species. The current results indicates that solutions containing magnesium, presented better results during flow cytometry analysis. This is assumed to be the case because of magnesium ions work as a cromatin stabilizer (Galbraith et al., 1983) and then we conclude that this component is important in order to preserve nuclear membrane of the yellowtail tetra. Similarly, sucrose additions presented good results. Sucrose affects osmolality and presents membrane stabilizer ability and is also widely used in several process including cell lysis and cryopreservation ir order to maintain the membrane characteristics (Medina-Robles et al., 2005). Furthermore, sucrose is important for the maintenance of nuclear integrity (Marie and Brown, 1993). Non-ionic detergents used in this study efficiently solubilized cell membrane, but the nuclei were kept intact, indicating that lysing and stabilizing process is membrane-specific.

In this work, a protocol for sample storage and subsequent flow cytometry analysis was achieved. Such a procedure is applicable for long-term preservation for future analysis. The preservation of marine bacteria under fixation with paraformaldehyde and storage in liquid nitrogen reduced the cell count and generated multiple peaks in flow cytometry analysis (Kamiya et al., 2007). Similar decrease in quality was observed in marine bacteria and algae (Troussellier et al., 1993). In this study, when sub-zero temperatures at −20◦ C were used the peak

TABLE 4 | Evaluation of S2 solution associated with high concentrations of detergents (0.6 and 1.2%) regarding the relative DNA content and number of isolated particles.


Pieces of fin (2 mm<sup>2</sup> ) were treated with the solutions, stained with DAPI, and analyzed by flow cytometry. Different letters within the columns indicates significant differences by Tukey test (P < 0.05).

TABLE 5 | Preservations strategies (Salt, Formaldehyde, Ethanol, Methanol, and Freeze) for flow cytometry in fish.


Pieces of fin (2 mm<sup>2</sup> ) and isolated nuclei were fixed with different procedures. The samples were analyzed at 0 (control), 30 and 60 days by flow cytometry and obtained the relative DNA content and number of isolated particles. Different letters within the columns indicates significant differences by Tukey test (P < 0.05).

quality and cell concentration was maintained for 60 days of storage providing the good results in flow cytometry. Such a new procedure for preservation of fish samples may be used in many fields including aquaculture and biomedicine.

A new procedure for fixation of samples for flow cytometry was evaluated, including isolated nuclei instead of tissue, however, further work is required to improve the method, results as well as the alternative comparisons of different fixatives and ways of storage. However, such an approach of preserving an isolated nucleus may be interesting because fixation and storage of tissue may affect membrane characteristics (Suganuma and Morioka, 1979). It can also reduce the efficiency of lysing and staining procedures, and, thus, the nucleus cannot be isolated in some cases. Nuclear isolation and preservation may be applicable as a preservation procedure for flow cytometry analysis.

In conclusion, a simple and inexpensive protocol for DNA content analysis by flow cytometry in yellowtail tetra A. altiparanae and sample preservation was established, as follows: lysing solution for fish samples composed by 9.53 mM MgCl2.7H2O, 47.67 mM KCl, 15 mM Tris, 74 mM sucrose, and 0.6% Triton-X. Tissue samples of fish may be preserved at −20◦C for 60 days for future analysis using such a protocol.

# AUTHOR CONTRIBUTIONS

PX: Acquisition, analysis, and interpretation of data, draft of the work, development of intellectual content, writing of the manuscript, final approval of the version. JS: Draft of the work, developmente of intellectual content, final approval of the version. MP: Analysis and interpretation of data, draft of the work, development of intellectual content, final approval of the version. TF: Draft of the work, development of intellectual content, final approval of the version. ES: Analysis of data, development of intellectual content, final approval of the version. LS: Interpretation of data, development of intellectual content, writing of the manuscript, final approval of the version. SdS: Interpretation of data, development of intellectual content, writing of the manuscript and final approval of the version. GY: Analysis and interpretation of data, draft of the work, development of intellectual content, writing of the manuscript, final approval of the version.

# ACKNOWLEDGMENTS

Authors are grateful to Sao Paulo Research Foundation (FAPESP) for the financial support of this research (Young Investigators Award Grant #2010/17429-1 and Young Researcher Scholarship

### REFERENCES


#2011/11664-1); AES Tietê (Research & Development Project #0064-1052/2014) and FUNDIBIO; Michael Stablein for the english review of the manuscript. We also acknowledge CEPTA-ICMBio for gently provide the facilities and experimental fish.


**Conflict of Interest Statement:** SdS was employed by company AES Tietê. The other authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The authors declare that this study received funding from AES Tietê (Research & Development Project #0064-1052/2014). The funder was not involved in the study design or collection, analysis, or interpretation of the data. The reviewer DH declared a shared affiliation, with no collaboration, with one of the authors MP to the handling Editor.

Copyright © 2017 Xavier, Senhorini, Pereira-Santos, Fujimoto, Shimoda, Silva, dos Santos and Yasui. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Revealing Hidden Diversity of the Underestimated Neotropical Ichthyofauna: DNA Barcoding in the Recently Described Genus Megaleporinus (Characiformes: Anostomidae)

Jorge L. Ramirez<sup>1</sup> \*, Jose L. Birindelli<sup>2</sup> , Daniel C. Carvalho<sup>3</sup> , Paulo R. A. M. Affonso<sup>4</sup> , Paulo C. Venere<sup>5</sup> , Hernán Ortega<sup>6</sup> , Mauricio Carrillo-Avila<sup>7</sup> , José A. Rodríguez-Pulido<sup>8</sup> and Pedro M. Galetti Jr.<sup>1</sup>

#### Edited by:

Rodrigo A. Torres, Federal University of Pernambuco, Brazil

#### Reviewed by:

Fábio Fernandes Roxo, Universidade Estadual Paulista Júlio de Mesquita Filho (UNESP), Brazil Henrik R. Nilsson, University of Gothenburg, Sweden Uedson Pereira Jacobina, Federal University of Alagoas, Brazil

> \*Correspondence: Jorge L. Ramirez jolobio@ufscar.br

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 09 August 2017 Accepted: 27 September 2017 Published: 12 October 2017

#### Citation:

Ramirez JL, Birindelli JL, Carvalho DC, Affonso PRAM, Venere PC, Ortega H, Carrillo-Avila M, Rodríguez-Pulido JA and Galetti PM Jr. (2017) Revealing Hidden Diversity of the Underestimated Neotropical Ichthyofauna: DNA Barcoding in the Recently Described Genus Megaleporinus (Characiformes: Anostomidae). Front. Genet. 8:149. doi: 10.3389/fgene.2017.00149 <sup>1</sup> Laboratório de Biodiversidade Molecular e Conservação, Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Paulo, Brazil, <sup>2</sup> Departamento de Biologia Animal e Vegetal, Universidade Estadual de Londrina, Londrina, Brazil, <sup>3</sup> Laboratório de Genética da Conservação, Programa de Pós-Graduação em Biologia de Vertebrados, PUC Minas, Belo Horizonte, Brazil, <sup>4</sup> Departamento de Ciências Biológicas, Universidade Estadual do Sudoeste da Bahia, Jequié, Brazil, <sup>5</sup> Departamento de Biologia e Zoologia, Universidade Federal de Mato Grosso, Cuiabá, Brazil, <sup>6</sup> Departamento de Ictiología, Museo de Historia Natural, Universidad Nacional Mayor de San Marcos, Lima, Peru, <sup>7</sup> Facultad de Ciencias Exactas y Naturales, Universidad Surcolombiana, Huila, Colombia, <sup>8</sup> Grupo de Investigación en Genética y Reproducción Animal, Universidad de los Llanos, Villavicencio, Colombia

Molecular studies have improved our knowledge on the neotropical ichthyofauna. DNA barcoding has successfully been used in fish species identification and in detecting cryptic diversity. Megaleporinus (Anostomidae) is a recently described freshwater fish genus within which taxonomic uncertainties remain. Here we assessed all nominal species of this genus using a DNA barcode approach (Cytochrome Oxidase subunit I) with a broad sampling to generate a reference library, characterize new molecular lineages, and test the hypothesis that some of the nominal species represent species complexes. The analyses identified 16 (ABGD and BIN) to 18 (ABGD, GMYC, and PTP) different molecular operational taxonomic units (MOTUs) within the 10 studied nominal species, indicating cryptic biodiversity and potential candidate species. Only Megaleporinus brinco, Megaleporinus garmani, and Megaleporinus elongatus showed correspondence between nominal species and MOTUs. Within six nominal species, a subdivision in two MOTUs was found, while Megaleporinus obtusidens was divided in three MOTUs, suggesting that DNA barcode is a very useful approach to identify the molecular lineages of Megaleporinus, even in the case of recent divergence (< 0.5 Ma). Our results thus provided molecular findings that can be used along with morphological traits to better define each species, including candidate new species. This is the most complete analysis of DNA barcode in this recently described genus, and considering its economic value, a precise species identification is quite desirable and fundamental for conservation of the whole biodiversity of this fish.

Keywords: cryptic species, freshwater fishes, allopatric speciation, South American basins, cytochrome oxidase subunit I

# INTRODUCTION

fgene-08-00149 October 10, 2017 Time: 15:45 # 2

Neotropical freshwater fishes have a remarkable diversity, exceeding 8000 species (Reis et al., 2016), however, much taxonomic uncertainty exists leading to underestimated diversity (Pereira et al., 2013; Reis et al., 2016). Molecular studies have been crucial to improve our knowledge on the ichthyofauna, and DNA barcoding has successfully been used in fish species identification and in detecting species of taxonomic concerns or cryptic diversity (Pereira et al., 2013; Gomes et al., 2015; Ramirez and Galetti, 2015; Machado et al., 2016). Within the neotropical freshwater fishes, the order Characiformes represents more than 30% of the known species, and Anostomidae is one of the most species-rich families, occurring in all major hydrographic basins, with trans- and cis-Andean distribution in South America (Reis et al., 2003).

Comprising approximately 150 described species, distributed in 15 genera (Garavello and Britski, 2003; Sidlauskas and Vari, 2008; Ramirez et al., 2017), the known diversity of the Anostomidae has increased in recent years. For instance, 14 species and 1 genus were described only in the last 5 years (Birindelli et al., 2013; Burns et al., 2014). DNA barcoding has revealed taxonomic uncertainties within the genus Laemolyta (Ramirez and Galetti, 2015), and molecular phylogeny has helped to provide an understanding of the evolutionary history of the Anostomidae (Ramirez and Galetti, 2015; Ramirez et al., 2016, 2017).

Recently, the genus Megaleporinus (Ramirez et al., 2017) was described to include 16 lineages, corresponding to 10 nominal species, previously recognized in Leporinus or Hypomasticus (Ramirez et al., 2017). Megaleporinus is supported by cytogenetic, molecular, and morphological data. It is characterized by having a unique ZZ/ZW sex chromosome system (Galetti et al., 1995), while most cytogenetically known Leporinus species have no sex chromosomes (Galetti et al., 1981, 1991). Its monophyly is also well supported by mitochondrial and nuclear markers, which identified it as the sister group to Abramites (Ramirez et al., 2017). Concerning its morphology, Megaleporinus is characterized by being relatively large (adults usually reaching more than 35 cm standard length, including the largest species of the family), three teeth on each premaxillary and dentary bones, and a color pattern of one to three dark mid-lateral blotches (Ramirez et al., 2017). Because of its large size, Megaleporinus has an economic importance in subsistence fisheries and aquaculture (Garavello and Britski, 2003).

Recent studies indicate that there is a hidden biodiversity within Megaleporinus that needs to be better understood (Avelino et al., 2015; Ramirez et al., 2017). A study based on mitochondrial and nuclear markers, but using few individuals for each species, showed that several nominal species allocated to this genus comprise two or more molecular lineages allopatrically distributed in different basins (Ramirez et al., 2017).

In this study, we used a DNA barcoding approach to generate a reference library for Megaleporinus, assessing all nominal species and lineages previously described. We included a broad sampling for most of the species. Our hypothesis is that DNA barcoding support the observation that some of the nominal species represent species complexes with most molecular operational taxonomic units (MOTUs) allopatrically distributed in different basins, as proposed by Ramirez et al. (2017). Identifying such hidden biodiversity within this genus, this paper will contribute to a more complete understanding of its diversity and to the conservation of this important fish group.

# MATERIALS AND METHODS

#### Sampling

Animals were collected on public land, handled and killed under permission (ICMBIO/MMA N◦ 32215) provided by the Environment Ministry (MMA). This study did not involve endangered or protected species. Fish were collected by fishing rods and gillnets. No ethics committee approval is required for these organisms in Brazil. Fish were killed in the field using cold water and immediately transferred onto ice. Tissue samples were collected after fish death was confirmed through lack of operculum movement.

Specimens from several populations of all Megaleporinus species were used in this study, totaling 79 samples of the 10 nominal species, and comprising the 16 molecular lineages described by Ramirez et al., 2017 (**Figures 1**, **2** and **Table 1**). Voucher numbers are provided for the specimens (**Table 1**). Additionally, previous DNA barcode sequences of specimens from the São Francisco (Carvalho et al., 2011), Paraná (Pereira et al., 2013), Paranapanema (Frantine-Silva et al., 2015), and lower Paraná basins (Díaz et al., 2016) were included in our data set giving a total of 116 sequences (**Figures 1**, **2** and **Table 1**).

#### DNA Extraction, Amplification, and Sequencing

Total DNA was extracted from tissues (fins, muscle, or liver) by the standard phenol–chloroform method (Sambrook et al., 1989). A fragment of Cytochrome Oxidase subunit I (COI; 698 bp) was amplified via polymerase chain reaction (PCR) using primers AnosCOIF and AnosCOIR (Ramirez and Galetti, 2015). PCR products were sequenced for both strands using an ABI 3730xl (Applied Biosystems, Waltham, MA, United States) automatic sequencer. Contigs were assembled and edited using BioEdit (Hall, 1999). All sequences were evaluated manually, deleting regions of low quality. All sequences were verified to represent the COI gene and were checked for indels and stop codons. GenBank (Benson et al., 2017) accession numbers are given in **Table 1**. All information about specimen, sequences, and electropherograms were deposited in a data set of The Barcode of Life Database platform (BOLD) with code DS-MGLEP.

#### DNA Barcode Analysis

The general mixed Yule coalescent (GMYC) model (Pons et al., 2006) with a single threshold, implemented in the splits packages

in the R 3.3.3 statistical software (R Core Team, 2017), was used to infer MOTUs. For the GMYC input, an ultrametric tree was generated using Beast 2.4.3 (Bouckaert et al., 2014), with a lognormal relaxed clock, a birth and death model, and a GTR+G substitution model, chosen using jModeltest 2 (Darriba et al., 2012), using 50 million MCMC generations and a burn-in of 10%. Poisson tree processes (PTP) model (Zhang et al., 2013) was used for MOTUs delimitation through the bPTP server<sup>1</sup> , using default values. The bPTP server includes a Bayesian implementation of the PTP model and the original maximum likelihood PTP. For the PTP input, a tree was generated using Beast 2.4.6 (Bouckaert et al., 2014), with a strict clock, a birth and death model, and the GTR+G substitution model, using 50 million MCMC generations and a burn-in of 10%.

Additionally, two cluster algorithms were used, the Barcode Index Number System (BIN) (Ratnasingham and Hebert, 2013) and Automatic Barcode Gap Discovery (ABGD) (Puillandre et al., 2012). The BIN was automatically determined in the BOLD Workbench, while the ABGD was performed using Kimura-2 parameter (K2P) distance and default values through the web interface<sup>2</sup> .

COI intraspecific and interspecific genetic distances were estimated using the K2P model implemented in Mega 6.0 (Tamura et al., 2013). These values were used to calculate the mean, minimum, and maximum values for intra- and inter-MOTU distances, and intra- and interspecific distances (nominal species). A genetic distance neighbor-joining (NJ) tree analysis was performed based on the K2P substitution model in Mega 6.0 (Tamura et al., 2013).

#### RESULTS

The alignment of COI sequences resulted in 600 characters with 158 parsimony informative sites (included in the Supplementary Material). The GMYC analysis resulted in 18 MOTUs (Confidence interval: 16–18) (**Table 2**). The GMYC model was preferred over the null model (likelihood ratio = 73.49, P < 0.0001), indicating that GMYC results are reliable. The PTP analyses (maximum likelihood and Bayesian implementation) resulted in the same 18 MOTUs obtained in GMYC. The ABGD analysis found six partitions with 27 (P = 0.001) to 16 groups (P = 0.01), including a partition with the same 18 MOTUs (P = 0.005) obtained in the GMYC and PTP analyses. The BOLD system determined 16 BINs (**Table 2**), showing discordance with our MOTUs in only two BINs, AAB8569 [M. piavussu (Britski et al., 2012) and M. cf. piavussu lower Paraná] and AAD1729 [M. reinhardti (Lütken, 1875) and M. cf. reinhardti]. The clustering of the MOTUs obtained by the analyses is shown in **Figure 3**.

Only Megaleporinus brinco (Birindelli and Britski, 2013), Megaleporinus garmani (Borodin, 1929), and Megaleporinus elongatus (Valenciennes, 1850) showed correspondence between nominal species and MOTUs. Within six nominal species, a subdivision in two MOTUs was found, while Megaleporinus obtusidens (Valenciennes, 1837) was divided in three MOTUs (**Table 2**).

The mean of intra-MOTU and maximum intra-MOTU distances, the nearest neighbor (NN), and the minimum distance to the NN are shown in **Table 2**, for both GMYC MOTUs and nominal species.

The overall mean of intra-MOTU distances was 0.03%, the maximum intra-MOTU distance was 0.5% (M. obtusidens), and the mean of inter-MOTU distances was 9.19%. The lowest and highest values of inter-MOTU distances were 0.67 and 15.31%, respectively. Considering these values, there is a barcode gap that allowed identifying successfully all MOTUs using COI distance. In contrast, when only the nominal species were considered, the maximum intraspecific distance increased to 15.31% [M. muyscorum (Steindachner, 1900)], and, in addition, no barcode gap was found.

<sup>1</sup>http://species.h-its.org/ptp/

<sup>2</sup>http://wwwabi.snv.jussieu.fr/public/abgd/abgdweb.html

TABLE 1 | Sampling information and GenBank accession for specimens included in the analysis.


(Continued)

#### TABLE 1 | Continued

fgene-08-00149 October 10, 2017 Time: 15:45 # 6


(Continued)

#### TABLE 1 | Continued

fgene-08-00149 October 10, 2017 Time: 15:45 # 7


<sup>∗</sup>Obtained from BOLD.

### DISCUSSION

Our hypothesis that some of the nominal species represent species complexes separated in different basins could not be rejected by DNA barcoding analysis, revealing taxonomic uncertainties and a hidden diversity within this recently described genus. The DNA barcode analyses identified 16 (ABGD and BIN) to 18 (ABGD, GMYC, and PTP) different MOTUs (**Figure 3**), with two new MOTUs (M. macrocephalus Paraná and M. cf. piavussu lower Paraná) not analyzed by

TABLE 2 | Genetic K2P distances of Megaleporinus species.


The mean and the maximum of intra-group distances, the nearest neighbor (NN), and the minimum distance to the NN for MOTUs (ABGD, GMYC, and PTP) and Nominal species.

Ramirez et al. (2017). This high number of MOTUs contrasts with the 10 nominal species recognized in the genus thus far, showing several potential target for cryptic species to be described, reinforcing the general idea that there is still a lot of undocumented diversity within the neotropical ichthyofauna (Reis et al., 2016). The difference between the number of MOTUs detected is due to the lower genetic distance value (0.67%) between two pairs of MOTUs: M. reinhardti and M. cf. reinhardti, separating the genetic lineages from São Francisco and Itapicuru, respectively, and between M. piavussu and M. cf. piavussu lower Paraná. These lower genetic distance values are likely due to a recent divergence between these MOTUs [<0.5 Ma for M. reinhardti and M. cf. reinhardti according to Ramirez et al. (2017)]. Of note, besides presenting an allopatric distribution, these MOTUs were also recovered by the monophyly criterion (**Figure 3**). MOTUs with recent origin have less time to accumulate genetic differences than species with ancient origin, hindering their identification. Despite this low genetic distance, the species delimitation methods could delimit these MOTUS, especially those based on phylogenetic trees (GMYC and PTP).

A key aspect implicit in the DNA barcoding analysis is the genetic distance threshold used to define MOTUs. COI distances of 1% (Hubert et al., 2008) to 2% (Pereira et al., 2013) have been claimed as threshold to fish DNA barcode analysis. However, such values were derived from comparative analyses among phylogenetically diverse groups. For instance, 2% was used to characterize DNA barcoding of a fish community of a given river (Pereira et al., 2013). However, when the DNA barcoding analyses have focused within a group of species closely related (e.g., a genus), lower threshold values have been reported (Carvalho et al., 2011; Pereira et al., 2011, 2013; Ramirez and Galetti, 2015). Particularly in Anostomidae, a lower threshold of 0.92% was reported to distinguish MOTUs within the genus Laemolyta (Ramirez and Galetti, 2015). Although most of the values obtained herein were above 2% (13 out of 18 MOTUs, **Table 2**), a maximum threshold of 0.67% for Megaleporinus was detected between the MOTUs obtained. It reinforces that lower genetic distance values might be obtained when intra-genus MOTUs are analyzed, mainly between recent divergent lineages.

Five nominal species, M. conirostris (Steindachner, 1875), M. macrocephalus (Garavello and Britski, 1988), M. muyscorum, M. obtusidens, and M. trifasciatus (Steindachner, 1876), showed high COI distance values (> 1.8%, **Table 2**) between individuals from different basins, indicating a scenario of potential allopatric speciation within these species.

In contrast to previous results (Avelino et al., 2015), evidence of local differentiation was not found here and all cryptic diversity correspond to inter-basin differentiation. Analyzing only two samples of M. reinhardti from the Três Marias (MG, Brazil) region (São Francisco basin), Avelino et al. (2015) reported an intraspecific distance of 3.8% between them, suggesting a local differentiation. Here we analyzed nine individuals, representing four different localities, including Três Marias region, and we found no genetic distance (0%) among them. Mitochondrial pseudogenes, sequencing errors, or misidentification could explain such discrepancies, and it would be more cautious to consider M. reinhardti from São Francisco as a single MOTU, as recovered here.

Similar discordance is observed for M. piavussu (upper Paraná). Avelino et al. (2015) included four samples from a single locality and reported a mean intraspecific distance of 2.8%. Our present data set for this species included 18 individuals obtained from six localities and showed a lower maximum intraspecific distance of 0.17%. It is strongly suggested that M. piavussu is also a single MOTU.

Incongruences were also observed within the nominal M. obtusidens. While four groups (A–D), showing 0.7–4.1% mean intraspecific distances, were previously reported (Avelino et al., 2015), we found three MOTUs showing 0–0.5% COI distances. The group D mentioned as part of M. obtusidens by Avelino et al. (2015), which included individuals caught downstream the Itaipú dam (Paraná basin), was recovered here as a sister group of M. piavussu, and was named M. cf. piavussu lower Paraná (**Figure 3**).

One particular aspect was highlighted in our results. Several individuals clustered in the M. macrocephalus clade were caught in different hydrographic basins, as Doce, São Francisco, Tocantins, and Paraná, outside of its original distribution in the Paraguay basin likely due to aquaculture releasing. Similar findings had already been described in the São Francisco basin (Carvalho et al., 2011). This species is a commercial important fish being extensively farmed throughout the Brazilian territory, and accidental or intentional releasing can occur (e.g., Langeani et al., 2007; Vieira, 2010). In such case, the use of DNA barcoding provides a rapid and accurate identification of this species and can be used in management and monitoring potential ecosystem disturbance caused by an invasive species.

In summary, the use of DNA barcoding points at the need for a taxonomic revision of this genus. A search for morphological traits able to support a taxonomic delimitation could be facilitated whether the MOTUs identified here are considered. A morphological trait showing a range of variation when searched within a given nominal species perhaps could be more informative if studied in each MOTU separately. In such case, our results would give an important contribution for the taxonomy of Megaleporinus facilitating the search for decisive taxonomic characters. This is the most complete analysis of DNA barcode in this recently described genus, and considering the economic value of this group, a precise species identification is quite desirable and fundamental for conservation of the whole biodiversity of this genus.

#### AUTHOR CONTRIBUTIONS

JR and PG designed the research. JR, DC, PA, PV, HO, MC-A, and JR-P collected data. JR performed the analyses. All authors contributed to the writing of the manuscript.

#### FUNDING

The authors thank Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) for financial support (Universal 473474/2011-5 and 405309/2016-3 to PG, Universal 420255/2016-8 to JB and Rede BrBOL 564953/2010-5). JR received a fellowship grant from Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP 2011/21836-4). Authors received productivity research grants from CNPq (304440/2009-4 to PG) and Fundação Araucária (641/2014 to JB).

#### ACKNOWLEDGMENTS

fgene-08-00149 October 10, 2017 Time: 15:45 # 10

We are grateful to C. Cramer, C. Doria, C. Nolorbe, D. Motta, H. Sanchez, J.C. Riofrio, and W. Troy for help to

#### REFERENCES


obtain part of the tissue samples and MMA/ICMBIO for collection authorization (32215-1). The authors thank the three reviewers for suggestion and comments which improved the manuscript.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2017.00149/full#supplementary-material

(Pisces, Anostomidae). Cytogenet. Genome Res. 29, 138–142. doi: 10.1159/ 000131562


(Characiformes, Anostomidae) through molecular analysis. J. Fish Biol. 88, 1204–1214. doi: 10.1111/jfb.12906


Reise nach Südamerika 1898 gesammelte neue Fischarten. Anz. Akad. Wiss. Wien 37, 3.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Ramirez, Birindelli, Carvalho, Affonso, Venere, Ortega, Carrillo-Avila, Rodríguez-Pulido and Galetti. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Low Genetic Diversity and Structuring of the Arapaima (Osteoglossiformes, Arapaimidae) Population of the Araguaia-Tocantins Basin

Carla A. Vitorino<sup>1</sup> , Fabrícia Nogueira<sup>2</sup> , Issakar L. Souza<sup>3</sup> , Juliana Araripe<sup>2</sup> and Paulo C. Venere<sup>1</sup> \*

1 Instituto de Biociências, Universidade Federal de Mato Grosso, Cuiabá, Brazil, <sup>2</sup> Instituto de Estudos Costeiros, Universidade Federal do Pará, Bragança, Brazil, <sup>3</sup> Departamento de Biologia Celular, Embriologia e Genética, Universidade Federal de Santa Catarina, Florianópolis, Brazil

#### Edited by:

Rodrigo A. Torres, Universidade Federal de Pernambuco, Brazil

#### Reviewed by:

Łukasz Kajtoch, Institute of Systematics and Evolution of Animals (PAN), Poland Daniel Cardoso Carvalho, Pontifícia Universidade Católica de Minas Gerais, Brazil

> \*Correspondence: Paulo C. Venere pvenere@uol.com.br

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 22 May 2017 Accepted: 10 October 2017 Published: 24 October 2017

#### Citation:

Vitorino CA, Nogueira F, Souza IL, Araripe J and Venere PC (2017) Low Genetic Diversity and Structuring of the Arapaima (Osteoglossiformes, Arapaimidae) Population of the Araguaia-Tocantins Basin. Front. Genet. 8:159. doi: 10.3389/fgene.2017.00159 The arapaima, Arapaima gigas, is a fish whose populations are threatened by both overfishing and the ongoing destruction of its natural habitats. In the Amazon basin, varying levels of population structure have been found in A. gigas, although no data are available on the genetic diversity or structure of the populations found in the Araguaia-Tocantins basin, which has a topographic profile, hydrological regime, and history of fishing quite distinct from those of the Amazon. In this context, microsatellite markers were used to assess the genetic diversity and connectivity of five wild A. gigas populations in the Araguaia-Tocantins basin. The results of the analysis indicated low levels of genetic diversity in comparison with other A. gigas populations, studied in the Amazon basin. The AMOVA revealed that the Arapaima populations of the Araguaia-Tocantins basin are structured significantly. No correlation was found between pairwise FST values and the geographical distance among populations. The low level of genetic variability and the evidence of restricted gene flow may both be accounted for by overfishing, as well as the other human impacts that these populations have been exposed to over the years. The genetic fragility of these populations demands attention, given that future environmental changes (natural or otherwise) may further reduce these indices and eventually endanger these populations. The results of this study emphasize the need to take the genetic differences among the study populations into account when planning management measures and conservation strategies for the arapaima stocks of the Araguaia-Tocantins basin.

Keywords: genetic structure, microsatellite, conservation, habitat fragmentation, genetics population

# INTRODUCTION

Worldwide, the populations of many fish species are declining rapidly (Allan et al., 2005), with the communities occupying lakes, rivers and floodplains being the most affected (Abell et al., 2008). In the Neotropical region, the biodiversity of aquatic ecosystems is under intense pressure, primarily from human activities, such as overfishing, pollution, habitat fragmentation, deforestation and the introduction of exotic species (Agostinho et al., 2005).

The arapaima, Arapaima gigas (Schinz, 1822), is a fish of considerable economic importance in the neotropics, and is one of the species listed in the Convention of International Trade in Endangered Species of Wild Fauna and Flora II. The species is listed as "Data Deficient" by the IUCN (World Conservation Monitoring Centre, 2014). As the arapaima prefers the lentic environments, such as flooded forests, rivers and lakes, of the Amazon, Araguaia-Tocantins and Essequibo basins (Castello and Stewart, 2010; Castello et al., 2013), its populations become increasingly concentrated during the dry season, and the high densities of fish that accumulate during this period greatly increase their vulnerability to capture. Given this vulnerability of the species to fishing and the reduction of its stocks in recent years, researchers have focused increasingly on its genetic diversity and population structure (Farias et al., 2003; Hrbek et al., 2005, 2007; Hrbek and Farias, 2008; Hamoy et al., 2008; Araripe et al., 2013; Vitorino et al., 2015), chromosomal evolution (Marques et al., 2006), and other aspects of its biology (Gomes, 2007; Castello, 2008a,b; Arantes et al., 2010; Fernandes et al., 2012; Farias et al., 2015).

Despite this recent interest, many features of the biology of arapaima are still relatively poorly known, in particular the structure of its natural populations. Up until now, in addition, most studies have focused on the populations of the Amazon basin, and little is known about the genetic diversity of the populations that inhabit the Araguaia-Tocantins basin, even though this system is considered to be a priority area for the conservation of the aquatic biodiversity of the Cerrado biome.

As the characteristics of the Araguaia-Tocantins basin are quite distinct from those of the Amazon basin, the data available for Amazonian Arapaima gigas cannot be extrapolated reliably to the Araguaia-Tocantins. In this way, Latrubesse and Stevaux (2002) identified three principal topographical sectors in the Araguaia basin (the principal focus of the present study): (i) the upper Araguaia River, which extends from the headwaters to Registro do Araguaia, located mainly on Pre-Cambrian rocks, with a V-shaped valley and many rapids, (ii) the middle Araguaia River, between Registro do Araguaia and Conceição do Araguaia, which is characterized by a well-developed alluvial plain located on the lowland Bananal Plain, a prominent geomorphological and sedimentary unit, with some isolated rock formations along the main channel, which form small rapids, and (iii) the lower Araguaia River, which runs downstream from the Bananal Plain, over an area of crystalline rocks until its confluence with Tocantins River, which has no well-defined alluvial plain (Latrubesse and Stevaux, 2002).

Because of these characteristics, the hydrological regime of the Araguaia River is also quite unique, and is determined by welldefined dry and rainy seasons (Aquino et al., 2008). The flood pulse is extremely rapid, with floodplain lakes being connected for only short periods of time. The Araguaia-Tocantins basin also includes portions of the two principal Brazilian biomes, the Amazon, to the north and the Cerrado, to the south (Aquino et al., 2005). However, these important centers of biodiversity have been impacted intensively by industrial-scale farming operations and extensive cattle ranching, the construction of reservoirs for the production of hydroelectric energy (Aquino et al., 2008, 2009; Latrubesse et al., 2009), and the establishment of the Tocantins-Araguaia waterway (Almeida and Peres, 2007). All these impacts have resulted in extensive alterations of the region's natural environments, in particular the fragmentation of habitats, which reduces gene flow among populations through processes such as siltation and the restriction of aquatic environments (Agostinho et al., 2005).

While arapaima stocks have been reduced by both fishing and habitat loss, few populations have been studied in the Araguaia-Tocantins basin, and there is a pressing need for the expansion of the database to provide a reliable assessment of the conservation status of these populations. The first study to use genetic markers in arapaima from the Araguaia-Tocantins basin was conducted by Vitorino et al. (2015), who found low levels of expected heterozygosity and a distinct structure among the four study populations. The present study provides further advances in the understanding of the genetic characteristics of this important species of Neotropical fish, through the investigation of wild arapaima populations in the Araguaia-Tocantins basin using microsatellite markers. The present study aimed to confirm the findings of Vitorino et al. (2015), and in particular, determine the existence of significant structuring in the arapaima populations of this basin. The prediction that significant population structure exists within the study area is based on the observation that, under natural conditions, arapaima typically form relatively stable family groups, which migrate laterally over short distances (Castello and Stewart, 2010; Castello et al., 2013). In this context, habitat structure is also expected to contribute to genetic diversity, given that marginal lakes increase in number and size moving downstream from the sampling point furthest upstream (point 1, **Figure 1**), with a larger number of environments been expected to support larger numbers of fish and family groups.

While the study of Vitorino et al. (2015) analyzed the genetic diversity of the arapaima populations of the Araguaia-Tocantins basin using ISSR markers, microsatellites are more appropriate for population analyses, given their co-dominant inheritance (vs. dominant alleles in the ISSR markers) and high degree of polymorphism. Microsatellite markers thus provide a much greater potential for statistical analyses, and were expected to provide a detailed database for the more reliable definition of the genetic diversity of populations. These analyses have important implications for the definition of priority areas for conservation and the implementation of management units for this species.

#### MATERIALS AND METHODS

Two hundred and ninety-four samples of the muscle tissue and fins of Arapaima gigas were obtained from five different locations on the Araguaia and Tocantins rivers, three in the state of Mato Grosso (Araguaiana, Novo Santo Antônio, and São Félix do Araguaia), one in Goiás (Luiz Alves), and one in Pará, Itupiranga (**Figure 1** and **Table 1**). The municipality of Araguaiana represents the region furthest upstream of the middle Araguaia River, where the river first forms marginal lakes along its course, appropriate for A. gigas. Araguaiana is also the sector with the smallest number of lakes (Alves and Carvalho, 2007).

Luis Alves is just upstream from the bifurcation in the river that creates the Javaés River, and forms Bananal Island. It is important to note that this sector of the river is very popular with sport fisherman, due to the large number of lakes found in this area (Alves and Carvalho, 2007). Novo Santo Antônio, a municipality located between the Mortes and Araguaia rivers, is the region's principal arapaima fishery center, and the lakes found along the course of the Mortes River provide the bulk of the catch landed on the Araguaia River (Kirsten et al., 2012). The municipality of São Félix do Araguaia is located at the confluence of the Mortes and Araguaia river, on the Bananal plain (de Melo et al., 2007). Most of the arapaima fishermen of Mato Grosso are resident in this municipality (Kirsten et al., 2012). The municipality of Itupiranga is located downstream of the confluence of Araguaia with Tocantins River, 170 km upstream from the Tucurui hydroelectric dam.

The samples collected in Araguaiana were obtained during a rescue operation, which translocated fish from seasonal lakes to a perennial body of water. Small pieces of fin were collected from the live fish during this operation. All the other tissue samples were obtained from specimens caught and marketed by local fishermen at each site, so no specimens were euthanized

TABLE 1 | Localization and number of individual (n) of Arapaima analyzed in this study.


specifically for the purposes of the study. All the samples (fin and muscle) were preserved in 100% alcohol, and deposited at the Cytogenetics and Animal Genetics Laboratory, of the Federal University of Mato Grosso (LabGen/GEPEMA/UFMT) in Cuiabá, Brazil. When it was necessary to handle animals, all procedures adhered to the recommendations of the Guide for the Care and Use of Laboratory Animals. The collection and transportation of biological specimens by the authors of the present study is authorized by the Brazilian Institute for the Environment and Renewable Resources (IBAMA) through permanent license number 15226-1, issued by the Chico Mendes Institute for the conservation of Biodiversity (ICMBio).

The total DNA was extracted according to the saline extraction protocol of Aljanabi and Martinez (1997), with minor modifications. The amount and quality of the DNA obtained through this procedure were analyzed in a Biophotometer Plus (Eppendorf Hamburg, Hamburg, Germany), and the samples were diluted to a final concentration of 5 ng/µL.

Seven primer pairs (AgCTm4, AgCTm7, AgCAm2, AgCAm15, AgCAm16, AgCAm20, and AgCAm26) were selected based on previous studies and also because they are highly polymorphic for the Arapaima populations of the Amazon Basin (Farias et al., 2003, 2015; Araripe et al., 2013). These primers fluorescently labeled were used for amplification of the microsatellite regions by Polymerase Chain Reaction (PCR), following the conditions described by Farias et al. (2003). For genotyping, the PCR product was mixed with a standard molecular weight marker (MegaBACE ET-550R Size Standard), which was injected into a MegaBace 1000 automatic sequencer (Amersham Biosciences). The alleles were identified using Fragment Profiler 1.2 (Amersham Biosciences).

Once the database was assembled, the existence of possible genotyping errors, null alleles or scoring was verified in MicroChecker (Van Oosterhout et al., 2004). To test whether the populations were in Hardy-Weinberg equilibrium, Genepop (Raymond and Rousset, 1995) was used to estimate the intrapopulation fixation index or coefficient of inbreeding (FIS) for each locus, which was compared with the null hypothesis (FIS = 0). Genepop was also used to determine allele frequencies, and the presence of polymorphic loci, and exclusive and rare alleles. Allele richness was estimated in Fstat 2.9.3.2 (Goudet, 2002). Expected and observed heterozygosity were determined by Arlequin 3.5.1.2 (Excoffier and Lischer, 2010). The significance of the differences between expected and observed heterozygosity was evaluated using a one-way ANOVA, run in PAST 2.17c (Hammer et al., 2001).

Genetic differentiation among populations was evaluated using an Analysis of Molecular Variance (AMOVA), the molecular fixation index (FST), and the divergence parameter (RST), which were all determined by Arlequin 3.5.1.2 (Excoffier and Lischer, 2010). A Mantel test was used to verify possible correlation between genetic differentiation (FST, RST and FST/1- FST) and geographical distance, which was measured following the main channel of the river.

The probability of a given number of stocks was based on a Bayesian approach run in STRUCTURE 2.3.3 (Pritchard et al., 2000). The number of presumed populations (K) was set from 1 to 7. The analyses had a burn-in of 100,000 runs, and a Monte Carlo Markov Chain (MCMC) of 1,000,000, using a model without admixture and allele frequencies. The number of populations was defined by the delta k-value, obtained using the STRUCTURE HARVESTER program (Earl and VonHoldt, 2012).

The estimated effective size (Ne) of each population was derived from the theta values (θ) generated in MIGRATE-n 3.2.6 (Beerli and Felsenstein, 2001), using the formula N<sup>e</sup> = θ/4µ, with a microsatellite mutation rate (µ) of 5.56 × 10−<sup>4</sup> per locus per generation (Whittaker et al., 2003; Yue et al., 2007).

The MIGRATE-n program was also used to calculate migration rates between pairs of populations by the coalescence method. The number of migrants per generation (Nm) was obtained by multiplying the migration rates obtained by the program (M = m/µ, where m is the fraction of new immigrants from the population per generation) by the θ values of the receptor population in each pairwise comparison. This analysis was performed using the Brownian model, with a uniform distribution and a constant mutation rate between the loci. The standard search strategy values of the MIGRATE program were used, except for the use of 20 short chains and 5 long chains.

The BOTTLENECK program, version 1.2.02 (Cornuet and Luikart, 1996) was used to verify the existence of recent demographic events, such as population bottlenecks. The Wilcoxon test was run using the Two-Phased Mutation (T.P.M) model, established with 30% for the Infinite Allele Mutation (IAM) model and 70% for the Stepwise Mutation Model (SMM).

# RESULTS

### Genetic Diversity

A total of 25 alleles were identified, with an average of 3.57 alleles per locus. While all the loci were polymorphic, four were monomorphic in at least one of the arapaima populations analyzed from the Araguaia-Tocantins basin. The number of alleles, observed and expected heterozygosity, allelic richness, and the fixation index (FIS) were calculated for all seven microsatellite loci analyzed, together with the mean parameters for all loci (**Table 2**).

The population from Novo Santo Antônio had the highest genetic diversity, with 72% of the identified alleles and an allelic richness of 2.429. At the opposite extreme, the arapaima from Araguaiana were the least diverse (44.0%) with an allelic richness of only 1.521. The specimens from Araguaiana also returned the lowest heterozygosity (Ho = 0.133, He = 0.133), while those from Itupiranga had the highest values, with observed heterozygosity of 0.428 and an expected heterozygosity of 0.449 (**Table 2**).

Five loci deviated from Hardy-Weinberg equilibrium, with heterozygote frequencies being either lower or higher than expected (**Table 2**). Only four of the 28 values estimated for the inbreeding coefficient (FIS) were significantly positive, which indicates a deficit of heterozygotes. Three of these loci were recorded in the population from Novo Santo Antônio, while the other was from São Felix do Araguaia. A single locus from Itupiranga was significantly negative, indicating an excess of heterozygotes. When the whole set of loci is considered, a significant FIS value (p < 0.001) was only for the Novo Santo Antônio presented, indicating the occurrence of inbreeding in this population.

One to three alleles, with high frequencies, were recorded for most loci. All loci except AgCAm15 presented exclusive alleles for the different study populations, at frequencies ranging from 0.012 to 1.00. No exclusive alleles was detected in the population from São Felix do Araguaia (**Table 3**). No new alleles were detected in this study, given that all the alleles recorded here had been reported previously by Farias et al. (2003) and Araripe (2008).

The MicroChecker analysis (**Table 3**) indicated the presence of null alleles in the populations from Luís Alves (for loci AgCAm20 and AgCTm4), Novo Santo Antônio (loci AgCAm2, AgCTm4, AgCAm16 and AgCAm20), São Félix do Araguaia (AgCAm15), and Itupiranga (AgCTm4). No systematic pattern was observed in the occurrence of null alleles in the different study populations, however. No evidence was found of either the misidentification of stutters as alleles or dropouts (dominance of small alleles).

#### Population Structure

The values recorded for Wright index (FST) ranged from 0.061 to 0.669, while those for the Slatkin divergence parameter (RST), varied from 0.025 to 0.688. Both indices were significant for all pairs of locations (**Table 4**). The results of the Mantel test rejected the hypothesis that genetic differentiation was related to geographical distance, considering the values of either FST

(r <sup>2</sup> = 0.466115, p = 0.236), RST (r <sup>2</sup> = 0.452354, p = 0.240) or FST/1-FST (r <sup>2</sup> = 0.219112, p = 0.221000). The results of the Bayesian inference based on the mean likelihood [Ln (K)] of the 1k (Evanno et al., 2005) indicate the presence of two

TABLE 2 | Genetic diversity indexes for Arapaima gigas, using microsatellite markers.


N, number of alleles; AR, allelic richness; HO, observed heterozygosity; HE, expected heterozygosity; Fis, coefficient of inbreeding computed as in Weir and Cockerham (1984); PL, polymorphic locus; –, monomorphic locus. <sup>∗</sup>P < 0.05. Ara, Araguaiana; LAl, Luís Alves; NSA, Novo Santo Antônio; SFA, São Felix do Araguaia and Itu, Itupiranga.

clusters within the Araguaia-Tocantins basin, one formed by the populations of Luís Alves, Novo Santo Antônio, São Félix do Araguaia, and Itupiranga, and the other by that of Araguaiana, which forms a distinct group (**Figure 2**). These findings were used to establish the clusters for the analysis of molecular variance (AMOVA).

The results of the AMOVA indicated a lack of significant variation between the two groups (8ST = 44.15%, RST = 40.07%). The variation among populations within each group contributed only 12.35% (8ST) and 11.17% (RST) of the variance, while the variance within populations is 43.51% for 8ST and 48.76 for RST.


Ara, Araguaiana; LAl, Luís Alves; NSA, Novo Santo Antônio; SFA, São Félix do Araguaia and Itu, Itupiranga. <sup>∗</sup>Exclusive allele.

TABLE 4 | FST /RST values and geographic distance in kilometers (above the diagonal) between A. gigas populations from Araguaia-Tocantins basin.


Ara, Araguaiana; LAl, Luís Alves; NSA, Novo Santo Antônio; SFA, São Félix do Araguaia and Itu, Itupiranga.

The 8ST (8ST = 0.564; p = 0.0000) and (RST = 0.512; p = 0.000) were both relatively high among the study populations.

Effective population size ranged from 128 to 351 individuals, with the lowest value being recorded at Araguaiana and the highest at São Félix do Araguaia (**Table 5**). The MIGRATE-n coalescence analysis (Beerli and Felsenstein, 2001) indicated low levels of gene flow between populations, and only six of the 20 estimates of migrant numbers were higher than one migrant per generation.

The Bottleneck program did not detect any significant deficit of heterozygotes. However, a significant (p < 0.05) excess of heterozygotes was found in the Itupiranga population using the TPM (Wilcoxon test) model.

#### DISCUSSION

This study is the first to use microsatellite markers to examine the genetic diversity of the arapaima populations of the Araguaia-Tocantins basin. Overall, the results indicated significantly lower levels of genetic diversity and heterozygosity than those found in previous studies of the same genetic markers in populations from other areas (Farias et al., 2003, 2015; Hamoy et al., 2008; de Alencar Leão, 2009; Araripe et al., 2013).

The arapaima populations that inhabit the study region are affected by natural processes, in particular the hydrological regime, that are quite distinct in comparison with the populations found in the Amazon basin, the region in which the species has been investigated in most detail (Vitorino et al., 2015). The unique features of the Araguaia-Tocantins basin may influence the reduced genetic diversity (number of alleles, heterozygosity, and allelic richness) of its population in comparison with those from the Amazonian basin. This is also true of the allelic richness of the arapaima populations of Tucuruí (Araripe, 2008) and Bananal Island (de Alencar Leão, 2009), two other areas located within the Araguaia-Tocantins basin, which present low values in comparison with population from the Amazon basin. The levels of genetic diversity recorded in the Arapaima populations of the Araguaia-Tocantins basin are lower than expected for the species, but are consistent with the findings of Vitorino et al. (2015), based on ISSR markers. Using mitochondrial markers, da Penha (2014) also found relatively low genetic variability in arapaima populations of the Araguaia-Tocantins basin in comparison with those of the Amazon basin.

The points sampled in the present study are exploited intensively by local fisheries. In fact, most of the arapaima marketed locally are harvested from natural populations (Çiftci and Okumus, 2002), including not only adults, but also the capture of fry as stock for rearing on fish farms. This type of overexploitation often results in a decline in effective population size (Ne), provoking the loss of genetic diversity, which leads to a reduced resilience of populations to environmental stresses and

climate change, and a loss of resistance to pathogens (Allendorf and Luikart, 2007; Hughes et al., 2015).

The variation in the N<sup>e</sup> values estimated for the different populations analyzed in the present study indicates potential differences in their population dynamics, with distinct patterns of fluctuation over time in the number of individuals reproducing, a process that will have knock-on effects for the sustainability of the population (Carson et al., 2011). In other words, each arapaima population may have a distinct demographic history, based on the differences in variables such as the number of mature individuals, the sex ratio, offspring survival rates, and the availability and quality of habitats. However, the results of the Bottleneck analysis indicate a significant impact on genetic structure only in the case of the population from Itupiranga. This may be related to the construction of the hydroelectric dam at Tucuruí, and the creation of one of the world's largest reservoirs, which caused considerable impacts on the local fish fauna of the Tocantins River.

The reduced genetic variability found in the arapaima populations of the Araguaia-Tocantins basin may also be related to the anthropogenic impacts that have altered the natural features of the basin extensively throughout most of its length. The hydrological cycle of this basin is more intense than that of the Amazon, and this cycle has been modified by local farming and ranching activities, and the establishment of the Araguaia-Tocantins waterway (Latrubesse and Stevaux, 2006). These modifications have had a major impact, principally on the region's lacustrine environments, causing a reduction in the size of its lakes, and altering their flood cycle, which affects the lateral migrations typical of this fish (Castello and Stewart, 2010; Castello et al., 2013). These changes may force the arapaima to remain in the same lakes, unable to migrate in search of reproductive partners, eventually creating small, isolated populations susceptible to inbreeding. Genetic drift tends to have a greater impact in smaller populations, which also favor inbreeding, exacerbating the loss of genetic diversity. It seems likely that the combined effects of these processes have reduced the reproductive potential of the local arapaima populations, impacting their effective size. The geomorphology and flood cycle of the Araguaia-Tocantins and Amazon basins are quite distinct, and this appears to be the principal factor determining the variation in the genetic diversity of the stocks analyzed from the Amazon (Farias et al., 2003, 2015; Hamoy et al., 2008; de Alencar Leão, 2009; Araripe et al., 2013) and the Tocantins-Araguaia basin (Vitorino et al., 2015).

However, the possibility that low genetic variability is a natural characteristic of the arapaima populations of the Araguaia-Tocantins basin cannot be ruled out altogether. But whatever the determining factors, this genetic fragility is a cause for concern, given that future environmental impacts (natural or otherwise) may further reduce the diversity of these populations, and threaten their long-term viability.

This fragility is further reinforced by the differences in the genetic variability of each population in the Araguaia-Tocantins basin. Specimens collected at Araguaiana, for example, had the lowest genetic diversity of any population, indicating the smallest effective size of any population, which implies that this population is the most vulnerable to anthropogenic interference from the expansion of the agricultural frontier occurring within the basin (Latrubesse and Stevaux, 2006), as well as being the sector of the basin that has the smallest number of lakes, the preferred habitat of the arapaima (Alves and Carvalho, 2007).

The most likely explanation for the deviations from Hardy-Weinberg equilibrium detected in the arapaima populations of the Araguaia-Tocantins basin is inbreeding, given that significant FIS values (indicating a deficiency of heterozygotes) were found in six of the eight deviations recorded. A number of studies have recorded a deficiency of heterozygotes in fish populations (Castric et al., 2002; Langen et al., 2011; O'Leary et al., 2013; Ferreira et al., 2015), indicating that inbreeding may be relatively common in these vertebrates. In the arapaima this may be related to the relatively sedentary behavior of the species and its parental care, as observed in a number of other species of Neotropical fish (Sofia et al., 2006, 2008; Ferreira et al., 2015).


TABLE 5 | Analysis of the MIGRATE program showing the estimates of gene flow peer-to-peer among the Arapaima populations of the Araguaia-Tocantins region.

The same pair of populations may have different numbers of immigrant and migrant individuals. Ln(L): −2473.954. θ = Population genetic parameter defined as 4Neµ (Effective population size multiplied by the mutation rate), Ne, estimate of effective population size. Nm, estimate of the number of migrants defined as NeM (effective population size multiplied by the migration rate). Ln (L) = Ln(L), Natural log of the probability of the estimated parameters. Ara, Araguaiana; LAl, Luís Alves; NSA, Novo Santo Antônio; SFA, São Félix do Araguaia and Itu, Itupiranga.

In addition to these behavioral traits, arapaima is intolerant of lotic environments, so areas of strong rapids represent effective barriers to the dispersal of this species (Castello and Stewart, 2010; Castello et al., 2013). This may lead to the formation of family groups in the different micro-regions of the Araguaia-Tocantins basin. Genetic differences between subpopulations related to geographic distance or the presence of physical barriers, such as waterfalls, have been identified in Amazonian arapaima (Araripe et al., 2013) and in other fish species (Hughes et al., 2015).

The pairwise FST values indicate moderate (0.05–0.15) to extreme (>0.25) genetic differentiation between populations. However, the Bayesian analysis points to the presence of only two genetic stocks in the Araguaia-Tocantins basin, one at Araguaiana, and the other formed by the remaining populations, at Luís Alves, Novo Santo Antônio, Sao Félix do Araguaia and Itupiranga (**Figure 2**). This arrangement contrasts with that recorded by Vitorino et al. (2015), based on ISSR markers, which indicated that the Araguaiana and Novo Santo Antônio populations shared the same genetic stock, while São Félix do Araguaia and Itupiranga form separate populations (Luís Alves was not included in this analysis).

The structure proposed by the Bayesian analysis is also supported by the estimates of gene flow and the number of migrants between populations, given that the populations that share the same genetic stock had the highest migration rates. It is important to note that the gene flow detected here may reflect ancestral processes, rather than the recent exchange of individuals, because the number of migrants is a somewhat abstract quantity, which cannot be distinguished from the effective population size (Balloux and Lugon-Moulin, 2002).

While the genetic variation found in the present study is not related systematically to geographic distance, the geographically closer populations (Luís Alves, Novo Santo Antônio and São Félix do Araguaia) returned the lowest FST and RST values. Reduced genetic differentiation was expected between the samples from Novo Santo Antônio and São Félix do Araguaia, which are the geographically closest sites (separated by a distance of only 111 km), although the FST values indicate that the populations of São Félix do Araguaia and Luís Alves (265.1 km apart) are the most similar. In addition to the basic differences in comparison with the Amazonian arapaima populations, then, the relationship between population structure and geographic distance is also distinct from that recorded by Araripe et al. (2013).

The present study confirmed the low genetic diversity of the Arapaima gigas populations of the Araguaia-Tocantins basin, which can be linked directly to the environmental fragility of this river system, which reinforces the need for a better understanding

#### REFERENCES


of the processes that may further reduce the viability of these populations. The findings of this study also indicated that the genetic diversity of the populations is distributed heterogeneously within the study area, and that the establishment of a single protected area would be insufficient for the preservation of the genetic diversity of the arapaima populations of the Araguaia-Tocantins basin as a whole. In particular, future management measures should consider the population from Araguaiana as an independent unit, distinct from the other Araguaia-Tocantins populations. The Itupiranga region should also be defined as a priority area for conservation, given the high allelic richness found in this population.

### ETHICS STATEMENT

This study was carried out in strict accordance with the recommendations provided in the Guide for the Care and Use of laboratory Animals. During the development of this work, no animals were sacrificed. Collection was authorized by SEMA (license number 104358/2011/SEMA-MT), Instituto Brasileiro do Meio Ambiente e dos Recursos Naturais Renováveis - IBAMA and Instituto Chico Mendes de Conservação da Biodiversidade – ICMBio (License number 15226-1).

### AUTHOR CONTRIBUTIONS

CV performed the molecular genetic studies, performed the statistical analyzes and drafted the manuscript. FN performed some molecular genetic studies and contributed to the correction of the text. JA and PV conceived and coordinated the study, participated in its elaboration and helped to draft the manuscript. All authors read and approved the final manuscript.

### ACKNOWLEDGMENTS

The authors are grateful to FAPEMAT (Fundação de Amparo à Pesquisa do Estado de Mato Grosso – Process: 841147/2009/FAPEMAT/PRONEX), CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico/INAU), and CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior) for financial support; Drs. Fabio Porto Foresti and Claudio Oliveira for providing biological samples. We are also very grateful to Mr. Irineu Pirani and Wagner Alves de Santana, for permitting the collection of arapaima tissue samples during rescue operations.




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Vitorino, Nogueira, Souza, Araripe and Venere. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Hidden Diversity in the Populations of the Armored Catfish Ancistrus Kner, 1854 (Loricariidae, Hypostominae) from the Paraná River Basin Revealed by Molecular and Cytogenetic Data

Ana C. Prizon<sup>1</sup> , Daniel P. Bruschi <sup>2</sup> \*, Luciana A. Borin-Carvalho<sup>1</sup> , Andréa Cius <sup>1</sup> , Ligia M. Barbosa<sup>1</sup> , Henrique B. Ruiz <sup>3</sup> , Claudio H. Zawadzki <sup>3</sup> , Alberto S. Fenocchio<sup>4</sup> and Ana L. de Brito Portela-Castro<sup>1</sup>

#### Edited by:

Roberto Ferreira Artoni, Ponta Grossa State University, Brazil

#### Reviewed by:

Sandra Mariotto, Instituto Federal de Educação, Ciência e Tecnologia de Mato Grosso, Brazil Shane Lavery, University of Auckland, New Zealand

#### \*Correspondence:

Daniel P. Bruschi danielpachecobruschi@gmail.com

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 12 August 2017 Accepted: 07 November 2017 Published: 24 November 2017

#### Citation:

Prizon AC, Bruschi DP, Borin-Carvalho LA, Cius A, Barbosa LM, Ruiz HB, Zawadzki CH, Fenocchio AS and Portela-Castro ALdB (2017) Hidden Diversity in the Populations of the Armored Catfish Ancistrus Kner, 1854 (Loricariidae, Hypostominae) from the Paraná River Basin Revealed by Molecular and Cytogenetic Data. Front. Genet. 8:185. doi: 10.3389/fgene.2017.00185 <sup>1</sup> Laboratório de Citogenética de Vertebrados, Departamento de Biotecnologia, Genética e Biologia Celular, Universidade Estadual de Maringá, Maringá, Brazil, <sup>2</sup> Laboratório de Citogenética Animal e Mutagênese Ambioental, Departamento de Genética, Universidade Federal do Paraná, Curitiba, Brazil, <sup>3</sup> Departamento de Biologia/Nupélia, Universidade Estadual de Maringá, Maringá, Brazil, <sup>4</sup> Facultad de Ciencias Exactas, Químicas y Naturales, Universidad Nacional de Misiones, Posadas, Argentina

Only one species of armored catfish, Ancistrus cirrhosus Valenciennes 1836, has been historically described in the basin of the Paraná River, from Misiones (Argentina). However, the ample variation found in the morphology and coloration of the populations sampled in the tributaries of the Brazilian state of Paraná makes it difficult to establish the real taxonomic status and evolutionary history of the Ancistrus specimens, suggesting that A. cirrhosus is not the only species found in this basin. By combining data on mitochondrial DNA (COI gene) and chromosomal markers from different Ancistrus populations, totaling 144 specimens, in the tributaries of the Paraná, and specimens from Misiones (type-locality of A. cirrhosus), we detected five distinct evolutionary lineages. All the specimens were 2n = 50, but had four distinct karyotype formulae. The results of the Generalized Mixed Yule Coalescent (GYMC) and the genetic distances (uncorrected P-values) between lineages ranged from 3 to 5%. Clusters of 18S rDNA were observed in a single chromosome pair in seven populations of Ancistrus, but at different positions, in some cases, in synteny with the 5S rDNA sites. Multiple 5S sites were observed in all populations. Overall, the cytogenetic data reinforce the genetic evidence of the diversification of lineages, and indicate the existence of candidate species in the study region. The evidence indicates that at least four candidate species of the Ancistrus may coexist in the Paraná basin besides A. cirrhosus. Overall, our results provide a comprehensive scenario for the genetic variation among Ancistrus populations and reinforce the conclusion that the true diversity of the freshwater fish of the Neotropical regions has been underestimated.

Keywords: Ancistrus genus, cytotaxonomy, species delimitation, candidate species, chromosomal evolution

### INTRODUCTION

The inclusion of genetic data in studies of taxonomy and evolution has had a profound influence on our understanding of the unique diversity of species in the Neotropical region (Pereira et al., 2011, 2013). The combination of approaches has altered our perceptions of the region's biological diversity and contributed to an increase in the rate of discovery of new species, especially in cryptic lineages. In this context, the delimitation of species using DNA sequences can be highly efficient, enabling the more systematic documentation of species diversity (e.g., Yang and Rannala, 2010; Ence and Carstens, 2011; Fujisawa and Barraclough, 2013). For example, the Generalized Mixed Yule Coalescent (GMYC) method has been designed to delimit potential lineages using information on a single locus, that is, this method considers that the mutations arising in one species cannot spread readily into another species (Ahearn and Templeton, 1989; Barraclough et al., 2003; De Queiroz, 2007). The premise of the GMYC method is that independent evolution leads to the emergence of distinct genetic clusters, separated by longer internal branches, optimizing the set of nodes that defines the transition between inter- and intra-specific processes (Barraclough et al., 2003).

Fish are an excellent candidate group for the application of integrative taxonomical methods, which combine different lines of evidence (DNA sequences, chromosomal data, and morphological features) to define taxonomic status. Most surveys of Neotropical freshwater fish have focused on major river basins, even though a large proportion of the total diversity comprises small species, found in minor rivers and streams. These species are often highly endemic and occupy a wide variety of microhabitats, providing enormous potential for diversification (Viana et al., 2013). Castro (1999) referred to the identification of the cryptic diversity of small freshwater fish as a major challenge for Neotropical Ichthyology. The reliable definition of species and their ranges is also essential to conservation strategies (Angulo and Icochea, 2010). In this context, the rivers and streams of the Paraná River basin provide an interesting study area for the evaluation of the degree to which the taxonomic diversity of these environments has been underestimated.

The Ancistrini is composed of 29 genera, with a total of 217 recognized species (Fisch-Muller, 2003). With 69 species, Ancistrus Kner, 1854 is one of the most diverse Ancistrini groups, the second richest in species of the Loricariidae (Ferraris, 2007; Bifi et al., 2009; Froese and Pauly, 2017). Cytogenetic data on Ancistrus are still scarce, and restricted to species found in the basins of the Paraguay River in Mato Grosso, and the Amazon, in Manaus (de Oliveira et al., 2009; Mariotto et al., 2011, 2013; Favarato et al., 2016; Prizon et al., 2016). While the cytogenetics of Ancistrus species from other river basins are still unknown, considerable variability has been found in this genus, with diploid (2n) numbers of 34, 38, 40, 42, 44, 48, 50, and 54 chromosomes. Surveys of the upper Paraná River have revealed the presence of a single species of Ancistrus in this basin, identified as Ancistrus cirrhosus (Langeani et al., 2007). However, the considerable variation in the morphology and coloration observed in the specimens collected in the tributaries of the Paraná River hampers the reliable identification of the Ancistrus species found in this region. Thus, given the karyotypic diversity of Ancistrus and the assumption that cryptic diversity exists in the rivers and streams of the Upper Paraná River basin, we combine chromosomal data and DNA sequences to evaluate the taxonomic status of these populations.

#### MATERIALS AND METHODS

#### Biological Samples

A total of 144 Ancistrus specimens were collected in ten rivers of the Paraná River basin (**Table 1**, **Figure 1**). Specimen collection was authorized by the Brazilian Environment Ministry through its Biodiversity Information and Authorization System (SISBIO), under license number 36575-1. The protocols used in this study were submitted to the Ethics Committee on the use of animals in research (CEUA) of the Universidade Estadual de Maringá (UEM) and approved under case number 013/2009. Voucher specimens were deposited in the ichthyological collection of the Limnology, Ichthyology and Aquaculture Research Center (Nupélia) at Universidade Estadual de Maringá, Paraná, Brazil. The catalog numbers are provided in **Table 1**.


TABLE 1 | Details of the Ancistrus populations and specimens sampled in the study area of the upper Paraná basin.

NUP, catalog number of the voucher specimen in the Nupélia collection; ♂, male; ♀, female.

# Isolation, Amplification, and Sequencing of the DNA

Genomic DNA was extracted from the liver, muscle tissue, or a cell suspension of a subset of the sample (**Table 1**) using the TNES method, as applied by Bruschi et al. (2012). A fragment of the mitochondrial cytochrome C oxidase subunit I (COI) gene was amplified by polymerase chain reaction (PCR) using the primers: FishF1 (5′ -TCAACCAACCACAAAGACATTGGCAC-3 ′ ), FishR1 (5′ -TAGACTTCTGGGTGGCCAAAGAATCA-3′ ) (Ward et al., 2005). The solution for the amplification reaction included 20 ng/µl of the DNA template, 7 pmol of the forward and reverse primers, 10 mM of dNTPs, 1 U Taq DNA Polymerase, 1.5 mM MgCl2, and 1x PCR buffer (200 mM Tris, pH 8.4, 500 mM KCL). The amplification protocol was 5 min−94◦C/(94◦C/30 s−60◦C/1 min−72◦C/30 s) 35 cycles/10 min−72◦C. The amplified products were purified using Exonuclease I (10 units) and SAP (1 unit), incubated for 45 min 37◦C, followed by denaturation at 85◦C for 10 min (Applied Biosystems, Santa Clara, CA, USA), as recommended by the manufacturer. The samples were then used directly as templates for sequencing in an automatic ABI/Prism DNA sequencer (Applied Biosystems, Foster City, CA, USA) with the BigDye Terminator kit (Applied Biosystems, Foster City, CA, USA), as recommended by the manufacturer. The DNA samples were sequenced bidirectionally and were edited in Bioedit version 7.2.5 (http://www.mbio.ncsu.edu/bioedit/page2. html) (Hall, 1999).

# Phylogenetic Inferences and the Delimitation of Species

The phylogenetic relationships among the populations were inferred from the matrix of the 459-bp sequence of the COI gene. The dataset was complemented with 35 sequences of Ancistrus and one sequence of the sister group Lasiancistrus available in GenBank (Supplementary File 1). The outgroup was Pseudolithoxus sp., which was chosen based on the arrangement reported by Lujan et al. (2015). The sequence was aligned using Clustal W in BioEdit, version 7.2.5.0 (Thompson et al., 1994). The initial alignments were checked visually and adjusted wherever necessary. The dataset was used for phylogenetic reconstruction by Bayesian inference (BI) and the Maximum Parsimony (MP) approach.

Bayesian Inference (BI) method was applied to the dataset, which were divided into three partitions according to codon position for mit-COI. The best model of nucleotide evolution for each nucleotide partition was determined using Akaike

Information Criterion (AIC) with the software jModelTest v2.1.6 (Guindon and Gascuel, 2003; Darriba et al., 2012). The BI was performed with the software Mr. Bayes 3.2.6 (Ronquist and Huelsenbeck, 2003), as available in the CIPRES Science Gateway 3.1 (Miller et al., 2010). BI was implemented using two independent runs, each starting from random trees, with four simultaneous independent chains, and performed 10,000,000 generations, keeping one tree every 1,000th generation. Of all trees sampled, 20% were discarded as burn-in and checked by the convergence criterion (frequencies of average standard deviation of split <0.01) with Tracer v.1.6 (Rambaut et al., 2014), while the remaining were used to reconstruct a 50% majority-rule consensus tree and to estimate Bayesian posterior probabilities (BPP) of the branches. A node was considered to be strongly supported if it had a BPP ≥ 0.95, while moderate support was considered when BPP ≥ 0.9.

The MP analysis was implemented in TNT v1.1 (Goloboff et al., 2003) using a heuristic search method with tree bisectionreconnection (TBR) swapping and 100 random additional replicates. The bootstrap values of the branches inferred in this analysis were calculated with 1,000 non-parametric pseudoreplicates.

The genetic distances among and within species were calculated using the Kimura-2-Parameter (K2P) and p-distance model (Kimura, 1980) implemented in MEGA v 6.0. A neighbor-joining (NJ) tree of K2P distances was created to provide a graphic representation of the patterning of divergence between species with the software MEGA v 7.0 (Kumar et al., 2016). We also applied the General Mixed Yule-coalescent (GMYC) method to delineate species using single-locus sequence data. The GMYC requires a fully resolved and ultrametric tree as input for the analysis and combines a coalescence model of intraspecific branching with a Yule model for interspecific branching to estimate species boundaries and provide statistical confidence intervals to evaluate the sequences of the clusters recovered. Ultrametric trees were constructed by a BI tree in BEAST2 2.4.0 (Drummond et al., 2006; Drummond and Rambaut, 2007). We conducted three independent runs using different priors, that is, the Yule, relaxed clock, and constant coalescent models. An ultrametric gene tree was obtained for each prior. An XML file was produced using the BEAUti2 v2.4.5 interface with the following settings: GTR+G+I substitution model, previously inferred by MrMODELTEST (Nylander et al., 2004), empirical base frequencies, four gamma categories, all codon positions partitioned with unlinked base frequencies and substitution rates. The MCMC chain was 10 million generations long, and was logged every 1,000 generations. The Estimated Sample Sizes (ESS) and trace files of the runs were


TABLE 2 | Uncorrected pairwise distances between the mitochondrial COI sequences of the Ancistrus populations from the Paraná basin.

\*Codes: L1, Mourão River; L2, 19 Stream; L3, Keller River; L4, Patos River; L5, São João River; L6, São Francisco Verdadeiro River; L7, Arroyo Iguaçu; L8, Ancistrus cirrhosus; L9, Ocoí River; L10, São Francisco Falso River. JN988666, GU701865, and GU701865: GenBank Access Number.

evaluated in Tracer v1.6. The resulting logs were analyzed in TREEANNOTATOR 2.4.4, with 25% burn-in, maximum clade credibility trees with a 0.5 posterior probability limit, and node heights of the target tree. The splits package (https://r-forge. r-project.org/R/?group\_id=333) in R was used for the GMYC calculations, using the single-threshold strategy and default scaling parameters.

### Cytogenetic Analysis

All specimens were anesthetized and euthanized by an overdose of clove oil (Griffiths, 2000). Mitotic chromosomes were obtained from kidney cells according to Bertollo et al. (1978). The AgNORs were revealed by the silver nitrate impregnation technique (Howell and Black, 1980). The regions of heterochromatin were determined by the C-banding technique (Sumner, 1972) and stained with propidium iodide according to the method of Lui et al. (2012). Physical mapping of the 5S rDNA and 18S rDNA sequences was carried out by fluorescence in situ hybridization (FISH) according to Pinkel et al. (1986), with probes obtained from Leporinus elongatus Valenciennes, 1850 (Martins and Galetti, 1999) and Prochilodus argenteus Spix et Agassiz, 1829 (Hatanaka and Galetti, 2004). We also isolated and cloned the rDNA 5S gene of the Ancistrus sample from Keller River using a more specific DNA probe for this gene. The genomic DNA extracted for the molecular analyses was used as the template for the amplification reaction, using the primers 5S-A (5′ -TACGCCCGATCTCGTCCGATC-3′ ) and 5S-B (5′ -CAGGCTGGTATGGCCGTAAGC-3′ ) (Pendas et al., 1994). The products of this amplification were isolated in the 1.5% agarose gel, then purified with an EasyPure <sup>R</sup> Quick Gel Extraction kit. These sequences were inserted into a linearized cloning vector (pJET1.2/blunt) by the CloneJET PCR Cloning kit (Thermo Scientific) and cloned in Escherichia coli. Ten clones with the insert containing the 5S rDNA sequence were selected for sequencing. The 5S rDNA nucleotide sequences were edited using BioEdit software and compared with sequences from GenBank database (www.ncbi.nlm.nih.gov).

Hybridization was conducted under high stringency conditions (77%), and the probes were labeled by nick translation with digoxigenin-11-dUTP (5S rDNA) and biotin-16-dUTP (18S rDNA). The 5S rDNA probes obtained from the recombinant plasmids were labeled by PCR using the fluorochrome digoxigenin-11-dUTP. The solution for labeling reaction included 20 ng/µl of the DNA template, 7 pmol of the forward and reverse primers, 4 mM of dNTPs, 1 mM of dig-11-dUTP, 1 U Taq DNA Polymerase, 1.5 mM MgCl2, and 1x PCR buffer (200 mM Tris, pH 8.4, 500 mM KCL). The hybridization signals were detected using anti-digoxigeninrhodamine for the 5S rDNA probe and avidin-FITC (fluorescein isothiocyanate) for the 18S rDNA probe. The chromosomes were counterstained with DAPI. Double staining was carried out with chromomycin A3 (CMA3) and DAPI, according to Schweizer (1976). The metaphases were photographed using an epifluorescence microscope and adjusted for best contrast and brightness using the Adobe Photoshop CS6 software.

#### RESULTS

#### Phylogenetic Inferences and Delimitation of Species

The phylogenetic reconstructions based on the MP and BI approaches produced the basic topology of the dataset (**Figures 2**, **3**). Topology inferred from the Bayesian analysis performed with the software Mr. Bayes 3.2.6. and from Neighbor-Joining were presented in the supplementary material (**Figures S1**, **S2**, respectively). These analyses recovered five clades from the Ancistrus populations of the Paraná basin. The first clade (clade I) comprises five populations: Ancistrus sp. "Mourão River" (L1) + Ancistrus sp. "19 Stream" (L2) + Ancistrus sp. "Keller River" (L3) + Ancistrus sp. "Patos River" (L4) + Ancistrus sp. "João River" (L5). The genetic distance

analysis returned low uncorrected P-distances among these populations, ranging from 0.0 to 1.8% (**Table 2**). The second clade (clade II) included the Ancistrus populations from São Francisco Verdadeiro River (L6) and Ancistrus sp. from Arroyo Iguaçu (L7), separated by a genetic distance of 2.0% (**Table 2**). Two sequences deposited in Genbank as A. "cirrhosus" included in our dataset were recovered in the clade III. Clade IV included only the population from Arroyo San Juan (L8), Argentina, and was considered to represent the nominal A. cirrhosus due to its proximity to the type-locality of this species. Clade V consisted of the Ancistrus sp. "Ocoí River" (L9) + Ancistrus sp. "São Francisco Falso River" (L10) populations. The uncorrected Pdistance between these populations (1.2%) was also relatively low (**Table 2**). In all cases, by contrast, the genetic distances (uncorrected P-values) between clades were at least 3%, being 3% between clades I–II, I–III, II–III, II–IV, III–IV; 4% between clades I–IV, II–V, III–V, IV–V and finally a genetic distance of 5% between clades I–V.

The ultrametric trees obtained using the Yule and Constant Coalescent priors were congruent in the GMYC analysis, recognizing 13 separate entities and identifying nine clusters in our dataset (including outgroup sequences). Considering only the Ancistrus populations, the focus of the present study, the sequences were grouped into five well-supported clusters (**Figure 3**), which corresponded exactly to the major clades recovered in our phylogenetic inferences (BI and MP). The Yule and Constant Coalescent priors of the branching rates indicated that the likelihood of the null model (i.e., that all the sequences belong to the same species) was 412.5381 and 421.5504, respectively, with a likelihood of 421.9409 and 430.6968, respectively, for the GMYC model (i.e., the existence of distinct species). The difference is highly significant, indicating TABLE 3 | Cytogenetic parameters of the Ancistrus populations sampled in the different rivers of the Paraná basin.


\*Synteny between the 18S and 5S rDNA sites; \*\*in only one homolog; m, metacentric; sm, submetacentric; st, subtelocentric; a, acrocentric; ♂, male; ♀, female.

the presence of more than one species in our sample. The analysis based on the Relaxed clock priors presented low scores in Tracer and did not produce a reliable interpretation of the phylogenetic relationships among the populations, and was not considered in the species delimitation tests.

#### Cytogenetic Analysis

Cytogenetic data were obtained for the populations from the Mourão, Keller, São Francisco Verdadeiro, São Francisco Falso, and Ocoi Rivers, and 19 Stream (Brazil), and Arroyo San Juan (Argentina). All the specimens analyzed presented a diploid number of 2n = 50 chromosomes, although four distinct karyotype formulae were detected (**Table 3**, **Figure 4**), together with variation in the location of the 18S and 5S rDNA sites (**Figure 5**). Interestingly, these formulae corresponded strongly with the major clades recovered in our genetic analysis, with exception of the clade III that have no karyotype available. A chromosome heteromorphism found in all the males of Stream 19 (L2) and Keller River (L3) populations were consistent with an XX/XY system. The X chromosome is a large metacentric (second pair) while the Y chromosome is a small subtelocentric (**Figure 4**).

The analysis of the heterochromatin revealed C-positive blocks at the pericentromeric and subterminal positions in a number of different chromosomal pairs of the populations allocated to clade I, with a conspicuous block coinciding with the NOR site of pair 12 in all karyotypes (**Figures 6A–C**). The X chromosome (pair 2, metacentric) from 19 Stream had a C-positive block in the pericentromeric region, while we detected heterochromatin blocks in the pericentromeric, interstitial, and subterminal regions of the X chromosome from Keller River. The Y chromosome of the Keller River population had a weak heterochromatin block in the subterminal region of the long arm, which was not found in the Y chromosome from 19 Stream (**Figures 6B,C**). In clades II, IV, and V, considerable variation was found in the amount and distribution of constitutive heterochromatin, which was concentrated primarily in the pericentromeric and subterminal regions of the chromosomes (**Figures 6D–G**). In the population from the São Francisco Falso River (clade V), in addition, we detected an interstitial heterochromatin block in pair 17, as well as a much larger amount of heterochromatin distributed throughout the chromosomes in comparison with the other populations (**Figure 6G**). Conspicuous blocks of heterochromatin were also found in the nucleolar pairs of all the populations analyzed in clades II and IV (pair 18) (**Figures 6D,E**).

The base-specific fluorochrome staining enhanced the distinctive composition of some heterochromatin blocks. In particular, staining with Chromomycin A3 revealed a richness of G and C in the subterminal and pericentromeric heterochromatin blocks of all the populations analyzed, and provides additional features for the comparative analysis. As expected, conspicuous C-positive areas were detected in the NOR-bearing chromosomes (pair 12, 17, and 18) (**Figure S3**).

# DISCUSSION

### Molecular Phylogenetic Inferences Reveal a Number of Distinct Lineages in the Paraná Basin

The phylogenetic reconstructions and cytogenetic analyses presented in this study both detected the presence of five major clades among the Ancistrus populations surveyed, pointing to the existence of at least five lineages within the Paraná basin in Brazil previously undetected. Historically, all Ancistrus from the Paraná River basin were assigned to a single species, A. cirrhosus, which was described by Valenciennes (1836) from specimens collected in the Province of Misiones and Buenos Aires (Argentina), although the diagnostic traits are weakly defined (Langeani et al., 2007). Despite the morphological variations and wide geographic distribution, the Ancistrus populations of the Paraná basin have been identified invariably as A. cirrhosus. This fact has intrigued the ichthyologists with regard to the taxonomic status of many specimens, using only morphological and meristic characters. Therefore, the data obtained in the present study recovered at least five independent lineages (clades I–V) in Paraná basin. Given the proximity of the Arroyo San Juan to the type-locality of A. cirrhosus, this population (clade IV) was identified as nominal species A. cirrhosus. In this context, the other four clades (I, II, III, and V) can be categorized as "candidate species" in the terminology of Vieites et al. (2009). This interpretation is also


FIGURE 4 | Karyotype of the Ancistrus populations sampled in different rivers of the Paraná basin, stained with Giemsa. The configuration of the silver nitrate-stained nucleolar organizing regions (Ag-NORs) are shown in the box. Each color of the side bars represents one of the evolutionary lineages recovered in the analysis, according to Figure 2. (A) L1: Mourão River; (B) L2: 19 Stream; (C) L3: Keller River; (D) L6: São Francisco Verdadeiro River; (E) L8: Ancistrus cirrhosus; (F) L9: Ocoí River; (G) L10: São Francisco Falso River. Bar = 10µm.

FIGURE 5 | Chromosomes of the Ancistrus populations of species after dual color-FISH showing the 5S rDNA (red) and 18S rDNA (green) sites. Each color of the side bars represents one of the evolutionary lineages recovered in the analysis, according to Figure 2. (A) L1: Mourão River; (B) L2: 19 Stream; (C) L3: Keller River; (D) L6: São Francisco Verdadeiro River; (E) L8: Ancistrus cirrhosus; (F) L9: Ocoí River; (G) L10: São Francisco Falso River. Bar = 10µm.

FIGURE 6 | Karyotypes of the Ancistrus populations showing the distribution of the heterochromatin after C-banding. Each color of the side bars represents one of the evolutionary lineages recovered in the analysis, according to Figure 2. (A) L1, Mourão River; (B) L2, 19 Stream; (C) L3, Keller River; (D) L6, São Francisco Verdadeiro River; (E) L8, Ancistrus cirrhosus; (F) L9, Ocoí River; (G) L10, São Francisco Falso River. Bar = 10µm.

supported by the results of the GMYC method, and the genetic distance detected among populations, considering a 2% threshold for interspecific differentiation.

The GMYC method has become one of the most popular tools for the delimitation of species based on single-locus data, and has been applied to the analysis of a number of poorlyknown groups of organism (Barraclough et al., 2009; Monaghan et al., 2009; Marshall et al., 2011; Vuataz et al., 2011; Roxo et al., 2015). The GMYC method uses an ultrametric tree derived from the sequences to identify shifts in the branching rate from the Yule model (species) to the coalescent (population) process, with the algorithm computing the probability of splits between lineages in relation to speciation rates, thus identifying a cutoff value, at which species and populations split from one another (Powell, 2012). This approach was highly effective in the present study, allowing us to delimit five groups, which four corresponded exactly with the chromosomal data (unfortunately, no chromosomal data available to clade III), reinforcing the hypothesis that four "candidate species" exist in the Paraná River basin, besides A. cirrhosus as previously reported.

Armbruster (2004) found a sister group relationship between the Ancistrini and Pterygoplichthini tribes, which both belong to the subfamily Hypostominae. Ancistrus is the most speciesrich genus of the tribe Ancistrini, although phylogenetic analyses of this genus are still scarce. Lujan et al. (2015) adopted a broad approach to the phylogenetic relationships of the Loricariidae, a Neotropical catfish family, but included few Ancistrus species, which reinforces the need for further research. Studies in systematics based on osteological (Schaefer, 1987) and molecular (Montoya-Burgos et al., 1998) data show that Ancistrus constitutes a monophyletic group of species. The lack of any robust phylogenetic tree for Ancistrus still limits our understanding of its intrageneric relationships, and more detailed analyses, with a more representative dataset, are needed to provide a more comprehensive understanding of the phylogenetic relationships of this genus.

#### The Chromosomal Data Reinforce the Hypothesis of Complete Lineage Divergence

Cytogenetic studies, together with molecular and biochemical analyses, may be useful for the identification of cryptic species (Nakayama et al., 2001; Milhomen et al., 2007). In the present study, in fact, the inclusion of a cytogenetic approach was crucial to the recognition of the "candidate species," given that the distinct chromosomal formulae found in the four lineages (clades I, II, IV, and V) emphasizes their reciprocal monophyly and the lack of gene flow between them. The results obtained also complement the available cytogenetic data to the Ancistrus species, since these studies are currently restricted to the taxa found in the Paraguay (Mato Grosso) and Amazon (Manaus) basins (de Oliveira et al., 2007, 2008, 2009; Mariotto et al., 2011, 2013; Favarato et al., 2016; Prizon et al., 2016).

In the present study, a diploid number of 2n = 50 chromosomes was recorded in all the samples from the Paraná River basin. This number is within the range recorded for the Prizon et al. Hidden Diversity in Ancistrus

genus, which vary from 2n = 34 in Ancistrus cuiabae to 2n = 54 in Ancistrus claro (Mariotto et al., 2009, 2013; Favarato et al., 2016; Prizon et al., 2016). Despite the homogeneity of the diploid number, the karyotype formula varied considerably among populations. The number of acrocentric chromosomes, for example, varied from three to four pairs in the populations of clades I, II, and V, to seven pairs in clade IV (Arroyo San Juan). This points evidences to the occurrence of chromosomal rearrangements, such as translocation and pericentric inversions, which did not affect the diploid number. Unfortunately, while we were able to identify these features, we were unable to trace the pathways of the transformations due to the lack of resolution in the internal topology of the Ancistrus lineages recognized here. In this case, further phylogenetic studies, based on a larger set of characters and a multi-locus dataset, may provide more definitive insights into the evolution of this group. Among all the ancistrinis species studied so far, the karyotypic evolution of Ancistrus is invariably associated with chromosomal rearrangements, which typically involve variation in the diploid number, given that this number ranges from 34 to 54 in this genus. As Artoni and Bertollo (2001) considered 2n = 54 to be the ancestral diploid number of the Loricariidae, karyotypic evolution in Ancistrus appears to have been associated with a reduction in the chromosome number. In fact, de Oliveira et al. (2009) suggested that the karyotypic evolution of this genus was predominantly involves by centric fusions.

In Ancistrus the occurrence of heteromorphic sex chromosomes has been well-documented in some species, including simple systems (Mariotto et al., 2004; Alves et al., 2006; Mariotto and Miyazawa, 2006) and multiples (de Oliveira et al., 2007, 2008) which also contributed to the karyotype evolution in the genus (Favarato et al., 2016). One peculiar feature observed in clade I was the heteromorphic sex chromosomes found in the populations of 19 Stream (L2) and the Keller River (L3), which are consistent with an XX/XY system. Surprisingly, however, this feature was not observed in the specimens from the Mourão River (L1), which were included in clade I and separated by low genetic distances from the populations of Stream 19 and the Keller River. The karyotypic formula of the Mourão River population differs from these two other populations only by the absence of the pair of sex chromosomes, and the GYMC analysis also identified these populations as a unique taxonomical unit. A similar result was obtained by Henning et al. (2011), combining cytogenetic and molecular data for different species of Eigenmannia, included two populations of E. virescens (Mogi-Guaçu and Tietê rivers), whose karyotypes with 2n = 38 differ by the presence of a pair sexual XX/XY (Tietê river population). The acrocentric X chromosome possesses a heterochromatinized distal region (Almeida-Toledo et al., 2001) and according to Henning et al. (2011), both populations (Mogi-Guaçu e Tietê rivers) were considered sister species. Furthermore, these authors concluded that seems likely that suppression of recombination in the homologous pair of acrocentric chromosomes and accumulation of heterochromatin on the X chromosome occurred after a recent geographical separation. In this context, we hypothesized that the heteromorphic sex chromosome found in the Ancistrus populations of 19 Stream and the Keller River represent a recent event which may have occurred after the geographical isolation of these populations from that of the Mourão River, together with the behavioral characteristics of these fish, which occupy specific microhabitats, form territories, and do not normally migrate (Power, 1984, 1990; Buck and Sazima, 1995). All these characteristics favor the fixation of chromosomal rearrangements and could be contribute to allopatric speciation on a micro scale (de Oliveira et al., 2009).

The hypothesis of the recent differentiation of the sex chromosome pair in the Ancistrus populations of clade I was also supported by the C-banding data. Heterochromatin is widely used in the identification of the sex chromosomes, and the addition or deletion of heterochromatin or the occurrence of a pericentric inversion involving one of the chromosomes have been postulated as important mechanisms in the origin of simple sexual chromosome systems in Neotropical fish (Almeida-Toledo et al., 2000). If we compare the Y chromosome of males from 19 Stream (L2) and Keller River (L3), it is possible to detect a discreet heterochromatin block in the subterminal region of the long arm of the Keller River males, which was not detected in the Y chromosomes from 19 Stream. The presence of the heteromorphic pair in the Ancistrus population of 19 Stream and Keller River suggests that chromosomal rearrangements (inversions), the loss of chromosomal material and, in the specific case of the Keller River, the presence of constitutive heterochromatin in the heteromorphic pair, may all be evidence of the recent origin of the Y chromosome, derived from a large metacentric, similar to the X chromosome. Pericentric inversions, followed by a loss of chromosomal material, have been suggest as a mechanism to explain the origin of the ZZ/ZW chromosome system in Ancistrus sp. Piagaçu (de Oliveira et al., 2007) and the XX/XY chromosome systems in Ancistrus sp. Purus and Ancistrus sp. Macoari (de Oliveira et al., 2009), given absence of the heterochromatin blocks in either the X or Y chromosomes.

The NOR mapping provided an excellent marker in the present study, being found in a single chromosome pair in each clade, a condition shared with most other Ancistrus species (Medeiros et al., 2016). The variation among clades in the NOR-bearing chromosome may be the result of chromosomal rearrangements occurring during chromosomal evolution. We also recorded synteny between the 18S and 5S rDNA sites in most populations of clades I, II, and V, except for the population from Mourão River (clade I). This synteny of the rDNA may represent the basal condition for the genus (Mariotto et al., 2011), as it is found in A. claro (2n = 54), although synteny of the ribosomal sites was not found in the population from Arroyo San Juan (clade IV). The position and distribution of the 5S rDNA sites varied considerably among the four clades, as they do in other Ancistrus species, occupying multiple sites in pericentromeric, interstitial, or terminal positions. This variation is considered to be an important reflection of the enormous karyotypic diversity found in the Ancistrus, which is seen as evidence of the apomorphic condition of this group (Medeiros et al., 2016). This variability, together with the disjunction of the ribosomal sites caused by rearrangements or mobile genetic elements appears to be a common condition among Neotropical fish species (de Oliveira et al., 2009). We observed heterochromatin in association with the ribosomal DNA sites (18 and 5S), recurrent characteristic in the karyotypes of Neotropical fishes (Vicari et al., 2003). The presence of heterochromatin, which contains large quantities of transposable elements (Dimitri et al., 2009) may facilitate transposition events, moving ribosomal genes around the genome (Moreira-Filho et al., 1984; Vicari et al., 2008; Gross et al., 2009, 2010). This may be one of the factors responsible for the presence of the multiple 5S ribosomal DNA sites found in the present study.

### CONCLUSION

The cytogenetic data available for the genus Ancistrus indicate a highly heterogeneous pattern of chromosome evolution, marked by Robertsonian and non-Robertsonian rearrangements. While we do not have an exact understanding of the mechanisms that determine these rearrangements in natural populations, their fixation may either initiate or contribute to the divergence process, with specific implications for the utility of chromosomal characters for phylogenetic inference (Sites and Kent, 1994). As in other fish groups, the sex chromosomes, present in some Ancistrus species, may have contributed to high rates of evolution. The inferences obtained in the present study from a combined approach of molecular and cytogenetic analyses further corroborate the taxonomic complexity of this genus. This approach was especially important due to the lack of diagnostic features in the morphology of these fishes. The hidden diversity of the study populations was nevertheless decoded successfully by the combined approach, which allowed us to differentiate five distinct lineages of Ancistrus, reinforcing the hypothesis of the presence of at least four candidate species in the upper Paraná River basin, besides of the A. cirrhosus, previously described. Finally, our findings reinforce the observation that the true diversity of the freshwater fish of the Neotropical has been underestimated and improve our understanding of a regional diversity within Ancistrus genus.

# AUTHOR CONTRIBUTIONS

AP: provided chromosomal and molecular data, and drafted the manuscript; DB: designed and coordinated the study of molecular data and helped draft the manuscript; HR and CZ: collected specimens from Paraná state and helped to identify the specimens; AF: collected and processed material of specimens

#### from Arroyo San Juan, Argentina; LB-C, AC, and LB: assisted in the execution and analysis of chromosomal banding; AdBP-C: designed and coordinated the study of cytogenetic data and helped draft the manuscript. All authors have read and approved the final manuscript.

### FUNDING

This work was funded by CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior, Brazil) and Fundação Araucária.

# ACKNOWLEDGMENTS

We thank the Brazilian agency CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior, Brazil) for financial support, Maringá State University (UEM) and Limnology, Ichthyology and Aquaculture Research Center (Nupélia) for the logistic support, the collection, and identification of species. We also thank Prof. Dr. Mateus Arduvino Reck and Prof. Dr. Alessandra Valéria de Oliveira for their assistance and contribution in the programs of phylogeny.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2017.00185/full#supplementary-material

Figure S1 | Topology inferred from the Bayesian analysis performed with the software Mr. Bayes 3.2.6. posterior probabilities are shown at each node. Scale bar represents the number of substitutions per site. The terminal species with alphanumerical identifiers were obtained from GenBank (Supplementary File 1). Codes: L1, Mourão River; L2, 19 Stream; L3, Keller River; L4, Patos River; L5, São João River; L6, São Francisco Verdadeiro River; L7, Arroyo Iguaçu; L8, Ancistrus cirrhosus; L9, Ocoí River; L10, São Francisco Falso River.

Figure S2 | Chromosomes of the Ancistrus populations of species after CMA3 staining. Each color of the side bars represents one of the evolutionary lineages recovered in the analysis, according to Figure 2. (a) L1, Mourão River; (b) L2, 19 Stream; (c) L3, Keller River; (d) L6, São Francisco Verdadeiro River; (e) L8, Ancistrus cirrhosus; (f) L9, Ocoí River; (g) L10, São Francisco Falso River. Bar = 10 µm.

Figure S3 | NJ dendrograma of the Ancistrus specimens. Node values = bootstrap test (1,000 pseudo replicas) are shown next to the branches. The terminal species with alphanumerical identifiers were obtained from GenBank (Supplementary File 1). Codes: L1, Mourão River; L2, 19 Stream; L3, Keller River; L4, Patos River; L5, São João River; L6, São Francisco Verdadeiro River; L7, Arroyo Iguaçu; L8, Ancistrus cirrhosus; L9, Ocoí River; L10, São Francisco Falso River.

# REFERENCES


Kner, 1854 (Loricariidae, Ancistrini) from three hydrographic basins of Mato Grosso, Brazil. Comp. Cytogenet. 5, 289–300. doi: 10.3897/compcytogen. v5i4.1757


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Prizon, Bruschi, Borin-Carvalho, Cius, Barbosa, Ruiz, Zawadzki, Fenocchio and Portela-Castro. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Headwater Capture Evidenced by Paleo-Rivers Reconstruction and Population Genetic Structure of the Armored Catfish (Pareiorhaphis garbei) in the Serra do Mar Mountains of Southeastern Brazil

Sergio M. Q. Lima1,2 \*, Waldir M. Berbel-Filho1,3, Thais F. P. Araújo<sup>1</sup> , Henrique Lazzarotto4,5, Andrey Tatarenkov<sup>2</sup> and John C. Avise<sup>2</sup>

<sup>1</sup> Laboratório de Ictiologia Sistemática e Evolutiva, Departamento de Botânica e Zoologia, Centro de Biociências, Universidade Federal do Rio Grande do Norte, Natal, Brazil, <sup>2</sup> Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, CA, United States, <sup>3</sup> Department of Biosciences, College of Science, Swansea University, Swansea, United Kingdom, <sup>4</sup> Laboratório de Ecologia de Peixes, Departamento de Ecologia, Instituto de Biologia, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil, <sup>5</sup> California Academy of Sciences, San Francisco, CA, United States

#### Edited by:

Roberto Ferreira Artoni, Ponta Grossa State University, Brazil

#### Reviewed by:

Tomas Hrbek, Federal University of Amazonas, Brazil Peter A. Ritchie, Victoria University of Wellington, New Zealand

> \*Correspondence: Sergio M. Q. Lima smaialima@gmail.com

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 01 September 2017 Accepted: 21 November 2017 Published: 05 December 2017

#### Citation:

Lima SMQ, Berbel-Filho WM, Araújo TFP, Lazzarotto H, Tatarenkov A and Avise JC (2017) Headwater Capture Evidenced by Paleo-Rivers Reconstruction and Population Genetic Structure of the Armored Catfish (Pareiorhaphis garbei) in the Serra do Mar Mountains of Southeastern Brazil. Front. Genet. 8:199. doi: 10.3389/fgene.2017.00199 Paleo-drainage connections and headwater stream-captures are two main historical processes shaping the distribution of strictly freshwater fishes. Recently, bathymetricbased methods of paleo-drainage reconstruction have opened new possibilities to investigate how these processes have shaped the genetic structure of freshwater organisms. In this context, the present study used paleo-drainage reconstructions and single-locus cluster delimitation analyses to examine genetic structure on the whole distribution of Pareiorhaphis garbei, a 'near threatened' armored catfish from the Fluminense freshwater ecoregion in Southeastern Brazil. Sequences of two mitochondrial genes (cytochrome b and cytochrome c oxidase subunit 1) were obtained from five sampling sites in four coastal drainages: Macaé (KAE), São João (SJO), Guapi-Macacu [sub-basins Guapiaçu (GAC) and Guapimirim (GMI)], and Santo Aleixo (SAL). Pronounced genetic structure was found, involving 10 haplotypes for cytB and 6 for coi, with no haplotypes shared between localities. Coalescent-based delineation methods as well as distance-based methods revealed genetic clusters corresponding to each sample site. Paleo-drainage reconstructions showed two putative paleo-rivers: an eastern one connecting KAE and SJO; and a western one merging in the Guanabara Bay (GAC, GMI, and SAL). A disagreement was uncovered between the inferred past riverine connections and current population genetic structure. Although KAE and SJO belong to the same paleo-river, the latter is more closely related to specimens from the Guanabara paleo-river. This discordance between paleo-drainage connections and phylogenetic structure may indicate an ancient stream-capture event in headwaters of this region. Furthermore, all analyses showed high divergence between KAE and the other lineages, suggesting at least one cryptic species in the latter, and that the nominal species should be restricted to the Macaé river basin, its type locality. In this drainage, impacts such as the invasive species and habitat loss can be especially threatening

**87**

for such species with a narrow range. Our results also suggest that freshwater fishes from headwaters in the Serra do Mar mountains might have different biogeographical patterns than those from the lowlands, indicating a complex and dynamic climatic and geomorphological history.

Keywords: biogeography, Atlantic Forest, phylogeography, conservation genetics, Loricariidae

# INTRODUCTION

fgene-08-00199 December 5, 2017 Time: 15:33 # 2

Two alternative hypotheses are typically invoked to explain disjoint distributions encompassing more than one drainage basin of strictly freshwater species: past anastomoses of coastal basins due to marine regression episodes that resulted in paleodrainage connections or headwater stream-captures caused by tectonic adjustments (Albert and Reis, 2011). Stream capture occurs when a river changes its course and connects with another drainage system as a result of geomorphological changes (Bishop, 1995; Wilkinson et al., 2006), whereas paleo-drainage connections happen when sea levels are lowered by climatic changes, causing basins that previously were isolated to coalesce (Voris, 2000; Dias et al., 2014). To assess how these processes have influenced the genetic structure of freshwater fishes, recently developed GIS-based methods have been used to reconstruct paleo-river systems (Dias et al., 2014; Thomaz et al., 2015, 2017). These methods use topographical and bathymetrical data to reconstruct land exposure, fine-scale depth and steepness of exposed areas (which represent putative riverbeds), as well as putative flow and basin limits (Thomaz et al., 2015). Bathymetricbased reconstructions proved to be highly concordant with the available geological record for paleo-drainages (Dias et al., 2014), supporting GIS-based reconstruction as a powerful and reliable tool for paleo-drainage reconstructions, especially in regions with scarce geological data (Thomaz et al., 2015).

Headwater regions usually exhibit high levels of endemism and microhabitat specialization (Albert et al., 2011; Buckup, 2011). One such example of headwater taxa involves loricariid catfishes of the genus Pareiorhaphis. These rheophilic (fastwater) fishes usually show a narrow distribution, with 20 of 25 described species (80%) restricted to a single basin (Pereira et al., 2017). One of the exceptions within the genus is Pareiorhaphis garbei (Ihering, 1911), a species found in four coastal basins of the Rio de Janeiro State in southeastern Brazil: Macaé, São João, Guapi-Macacu, and Santo Aleixo drainages (Maia et al., 2013). This armored catfish is restricted to clear-water streams with fast-flowing high-oxygen waters and predominance of rocky (bouldered) substrate (Lazzarotto et al., 2007), and these habitats only occur in the headwaters or upper reaches of coastal basins (Maia et al., 2013). The geographic distribution of P. garbei lies entirely within the Atlantic Forest biodiversity hotspot (Myers et al., 2000), which in turn is situated within the Fluminense freshwater ecoregion, a small area comprising about 110 freshwater fish species and a high proportion (42%) of endemic species (Albert et al., 2011).

Due to the intense modification of natural areas by rural and urban expansion, many freshwater fish species in the Fluminense ecoregion are threatened. Previously, P. garbei was classified as an 'endangered' species [Ministério do Meio Ambiente [MMA], 2004], but it is currently considered 'near threatened' and is, thus, a priority species for research [Instituto Chico Mendes de Conservação da Biodiversidade [ICMBio], 2014]. Deforestation, chemical pesticides, and the introduction of rainbow trout Oncorhynchus mykiss (Walbaum, 1792) were shown to be threats for P. garbei (Pereira and Brito, 2008), even though most records for the species are in protected areas (Maia et al., 2013). However, molecular studies indicate that some taxa putatively distributed along coastal basins of the Fluminense ecoregion may be species complexes with individually narrow distributions (Villa-Verde et al., 2012; Cherobim et al., 2016; Roxo et al., 2017).

Application of molecular phylogenetic analyses and paleodrainage reconstruction of headwater taxa might reveal their population histories, including information relevant for taxonomy and conservation. Single-locus cluster delineation, often based on mitochondrial DNA, is drastically changing investigations of global biodiversity (Hajibabaei et al., 2007; Pereira L.H. et al., 2013). Clusters revealed by this analysis are considered operational taxonomic units (OTUs). Units that are supported by several clustering methods with different assumptions offer a starting point for taxonomic reevaluation (Kekkonen and Hebert, 2014). This study addresses the genetic structure of P. garbei to infer historical processes that may have influenced the species' disjointed distribution along coastal basins of the Serra do Mar. In particular, we will address possible paleo-drainage connections and headwater captures. If the lineage patterns are consistent with reconstructed paleo-rivers, then anastomosis of coastal basins by regressive glacial episodes would likely explain the current distribution. However, if the observed phylogeographic pattern is incongruent with the paleo-river reconstruction, then stream capture would likely explain lineage patterns along the headwaters of distinct basins. Furthermore, the species delimitation analyses will provide complementary data for systematic studies, as well as useful information for evaluating the conservation status of P. garbei.

#### MATERIALS AND METHODS

#### Sampling

Specimens of P. garbei were captured in five localities of four hydrographic basins covering the entire known species distribution (Maia et al., 2013): Macaé (KAE), São João (SJO), Guapi-Macacu [sub-basins Guapiaçu (GAC) and Guapimirim (GMI)], and Santo Aleixo (SAL) (**Table 1**). Whereas the rivers Santo Aleixo and Guapi-Macacu flow into the Guanabara Bay in the same mangrove area (and their river mouths are less than 2 km apart), the São João and Macaé rivers flow to the east


TABLE 1 | Sampling sites of Pareiorhaphis garbei in the Serra do Mar coastal basins, Rio de Janeiro State, southeastern Brazil.

Conservation Units (CU) are as follows: (1) Três Picos State Park, (2) Bacia do Rio São João Environmental Protection Area, and (3) Petrópolis Environmental Protection Area/Serra dos Órgãos National Park. Vouchers: (UFRN) Universidade Federal do Rio Grande do Norte fish collection, (DEPRJ) Departamento de Ecologia Universidade Federal do Rio de Janeiro fish collection. Columns cytB and coi show number of specimens studied at each gene.

coast of Rio de Janeiro State directly into the Atlantic Ocean, about 100 km from Guanabara Bay (Costa, 2014) (**Figure 1**). Muscle tissues or fin clips were obtained from specimens preserved in absolute ethanol. These individuals are stored at the ichthyological collections of the Universidade Federal do Rio Grande do Norte and of the Departamento de Ecologia of the Universidade Federal do Rio de Janeiro (**Table 1**).

#### Paleo-Drainages Reconstruction

To access paleo-drainages connections during the Last Glacial Maximum (LGM), topographical and bathymetrical data were retrieved from the digital elevation model at 30 arc-second resolution<sup>1</sup> . This layer was uploaded into ArcGIS10 software, and the Hydrological tools add-in was used to reconstruct the paleo-drainages, following the steps described in Thomaz et al. (2015). First, the area exposed during the LGM (−125 m) was identified using the Contour followed by Mask. Afterwards, the tools Fill, Flow Direction, and Basin were used, respectively, to fill depressions on the surface, identify the steepness within each cell, and define the basin borders. Finally, the Stream order function was used to estimate putative paleo-rivers.

### DNA Extraction and Amplification

Genomic DNA samples were extracted using phenol/chloroform/isopropanol with ethanol precipitation protocol (Milligan, 1998), and mitochondrial DNA fragments of genes encoding cytochrome oxidase subunit I (coi) and cytochrome b (cytB) were amplified and sequenced using primers and conditions proposed by Villa-Verde et al. (2012). Overall, 18 and 25 specimens were amplified and sequenced for cytB (1056 bp) and coi (878 bp), respectively (GenBank MG251217-251259) (**Table 1**).

#### Phylogenetic Analysis and Species Delimitation

Sequences were edited in MEGA6 (Tamura et al., 2013), then aligned using the MUSCLE algorithm (Edgar, 2004). P-distance (cytB) and Kimura-2-parameter (K2P) (coi) distances for markers were also calculated in MEGA6. A Bayesian phylogenetic reconstruction for both cytB and coi data was done in BEAST v. 1.7 (Drummond et al., 2012) using HKY+G (cytB) and K2P+G (coi) as nucleotide substitution models, as suggested by jModeltest 2 (Darriba et al., 2012). An uncorrelated relaxed lognormal model with estimated rate was used, with ucld.mean parameter set and uniform distribution (0 and 10 as lower and upper boundaries). Remaining parameters were set as default. The length of the MCMC chain was 10,000,000 runs with sampling every 1000 runs. ESS (> 200) values were checked using Tracer v. 1.5 (Rambaut, 2009). The initial 2000 trees were discarded as burn-in period, and a final tree was reconstructed using TreeAnnotator v.1.50. Each haplotype network was inferred using the TCS method on PopART (Leigh and Bryant, 2015). Pareiorhaphis cf. bahianus from Contas river basin in Bahia State was used as out-group in an additional Bayesian phylogeny, using the same parameters described above, for each gene to corroborate the relationships among P. garbei lineages (**Supplementary Figure S1**).

For increased robustness in delimiting OTUs, five singlelocus species-delimitation analyses were performed using both markers separately: single-threshold of Generalized Mixed Yule-Coalescent (sGMYC) (Fujisawa and Barraclough, 2013); multiple-threshold GMYC (mGMYC) (Fujisawa and Barraclough, 2013); Bayesian implementation of Poisson Tree Process (bPTP) (Zhang et al., 2013); multiple rate PTP (mPTP) (Kapli et al., 2017); and Automatic Barcode Gap Discovery (ABGD) (Puillandre et al., 2012). Ultrametric trees generated from cytB and coi data of the Bayesian phylogeny were used as an input file for sGMYC and mGMYC. Previous studies have shown that the clock model and tree prior have low impact on the results of both sGMYC and mGMYC (Tavalera et al., 2013). These two analyses were conducted in R (R Core Team, 2017), using the package SPLITS (Ezard et al., 2009). We re-ran the phylogenetic reconstructions under the same parameters used in BEAST using MrBayes v. 3.2.6 (Huelsenbeck and Ronquist, 2001) to generate a bifurcating phylogram (with branch length representing the number of substitutions) to be the input file for PTPs analyses. bPTP analyses were performed in the online server<sup>2</sup> . The analysis length was 500,000 generations with sampling of 500 and burn-in of 0.1. Convergence was visualized on the MCMC interactions plots vs. log-likelihood. mPTP was run using the online server<sup>3</sup> , under the same parameters as for bPTP. ABGD distance-based analyses were run through the software command line, with a gap width value of 1.0 for all the distances available (p-distance, K2P, and Jukes–Cantor). The ABGD delineation taken into account

<sup>1</sup>http://www.gebco.net/

<sup>2</sup>http://species.h-its.org

<sup>3</sup>http://mptp.h-its.org

was the one with the P-value of ∼0.01, as advocated by previous studies (Puillandre et al., 2012; Blair and Bryson, 2017). To have a species-delimitation partition based on a genetic distance threshold value, we used cut-off values of 5% divergence for cytB and 2% divergence for coi as indicators of distinct species. These values were based on previous reviews on genetic distances on fish species (Ward, 2009; Kartavtsev, 2011).

#### RESULTS

A total of 10 cytB and six coi haplotypes were found, none of which were shared among localities. For both mtDNA markers, the haplotype networks showed multiple mutational steps among all haplotypes (**Figure 1**). Mutations were especially abundant along the branch connecting KAE and the other populations (**Figure 1**). Genetic p-distances and K2P distances are presented in **Table 2**. Between KAE and the other localities, divergence values ranged from 7.5 to 8.0% for cytB and 4.9 to 5.0% for coi; between SJO and rivers that flow to the Guanabara Bay (GAC, GMI, and SAL) from 3.6 to 4.4% (cytB) and 2.2% (coi); and, among the watersheds that run into Guanabara Bay, from 2.6 to 3.2% (cytB) and 1.4% (coi) (**Table 2**).

Our phylogenetic reconstructions showed high genetic structuring within P. garbei, with all sampling sites representing monophyletic clusters in both the cytB and the coi data. According to this result, SJO is more closely related to the

TABLE 2 | P-distances of cytochrome b gene (below diagonal) and K2P distances of cytochrome oxidase subunit I gene (above diagonal) for Pareiorhaphis garbei in the Serra do Mar coastal basins, Rio de Janeiro State, southeastern Brazil.


Guanabara Bay populations (GAC, GMI, and SAL) than to the KAE. Within the Guanabara Bay watersheds, GMI is more closely related to SAL (from an adjacent basin) than it is to GAC (that belongs to the same drainage, Guapi-Macacu), although supported by lower posterior probabilities (**Figure 2**).

The sGMYC and ABGD delimitation analyses indicated five distinct OTUs, corresponding to each sampling site for cytB. mGMYC split the SJO clade into two different OTUs, resulting in six genetic clusters. Both PTP methods (bPTP and mPTP) indicated potential distinct lineages within the KAE clade. The strict species threshold (p-distance > 5%) separates KAE from the other sampling sites (**Figure 2A**). Regarding to the coi reconstruction, four OTUs corresponding to sampling sites were shown as the best genetic partition in three of the five species-delineation methods (sGMYC, bPTP, and ABGD). However, some incongruences were found. Similarly to the cytB reconstruction, mGMYC tended to split lineages, dividing the GAC clade into two different clusters. Furthermore, mPTP found putative distinct genetic lineages within the SJO and KAE clades, resulting in a total of seven OTUs. Using the K2P distance species threshold value (>2%), the Guanabara Bay drainages represent one species while SJO and KAE each represent different putative species (**Figure 2B**).

Our paleo-drainage reconstruction places Macaé (KAE) and São João (SJO) in the same paleo-drainage (the Macaé-São João paleo-river), whereas Guapiaçu (GAC), Guapimirim (GMI), and Santo Aleixo (SAL) all reside in the Guanabara paleo-river (**Figure 1A**). This paleo-reconstruction does not corroborate the phylogenetic result, according to which SJO is somewhat more closely related to Guanabara Bay rivers populations than to KAE.

#### DISCUSSION

Our results revealed a discordance between past paleo-river connections and current genetic structure, which might indicate an ancient stream capture event in the headwaters of the Serra do Mar mountains. According to the paleo-drainage scenario, it was expected that SJO would be closely related to KAE; however, it appears to be a sister group to lineages from the

Guanabara paleo-river. This result corroborates the river-capture event suggested by Maia et al. (2013) based on the occurrence of P. garbei in a single tributary adjacent to the headwaters of Macaé river basin. These drainage rearrangements are the result of tectonic reactivation that started in the Paleogene and continues to the present (Ribeiro et al., 2006; Lima and Ribeiro, 2011; Lima et al., 2016).

Although located within the Guapi-Macacu river basin, the water path connecting GAC and GMI is a lowland area where micro- and meso-habitats suitable for P. garbei are absent. The fact that GAC and GMI share no haplotypes, and belong to distinct lineages, reveals that any dispersal of P. garbei through the lower portions of rivers is very limited (and might explain in part why the current genetic structure of P. garbei does not perfectly reflect the paleo-drainage reconstruction).

Along the geographic range of P. garbei, a geographic barrier known as the Cabo Frio Magmatic Lineament (CFML) (Riccomini et al., 2005) may have influenced the evolution and diversification of many freshwater fishes. Based on molecular data, Pereira T.L. et al. (2013) suggested that this barrier had a vicariant effect isolating an eastern lineage (including a clade in the São João and Macaé basins) from a southeastern lineage (from Guanabara Bay to Paranaguá Bay) in the trahira, Hoplias malabaricus (Bloch, 1794). This pattern is also corroborated by the geographic distribution of Atlantirivulus rivulids in lowland areas (Costa, 2014). Although the CFLM seems to be an effective barrier to P. garbei (because different lineages are on its alternate sides), specimens from the São João basin are more closely related to Guanabara Bay drainages than to Macaé drainage, suggesting that different biogeographic forces may apply to headwater versus lowland fishes. Although not exclusively a headwater species, a biogeographic pattern similar to that found in P. garbei is also shown by Hisonotus loricariids, with H. notatus Eigenmann and Eigenmann, 1889 occurring from the São João river basin to the south (including Guanabara Bay's drainages), whereas H. thayeri Martins and Langeani, 2016 occurs from the Macaé river northward (Martins and Langeani, 2016). Altogether, these various patterns indicate a complex and dynamic history of the coastal basins of the Fluminense ecoregion. Curiously, the CFLM is also the boundary between two marine provinces: the Tropical Southwestern Atlantic and the Warm Temperate Southwestern Atlantic (Spalding et al., 2007).

Pareiorhaphis garbei was the subject of a taxonomic review (Pereira and Reis, 2002; Pereira, 2005) that did not include molecular data, preventing the discovery of deep genetic structure. Several other demersal Neotropical freshwater fishes that display only subtle morphological differences have proved to show substantial genetic divergences that support the description of new and often endemic species (Benine et al., 2009; Melo et al., 2011; Cherobim et al., 2016), and some of these support an ancient split between rivers flowing to the Guanabara Bay versus the São João and Macaé drainages (Villa-Verde et al., 2012; Costa, 2014; Roxo et al., 2017). All molecular analyses herein performed indicate high divergence of the KAE population of P. garbei from all others, strongly suggesting that individuals from these two groups represent different species. If this outcome is formally recognized, P. garbei (sensu stricto) would be confined to the Macaé river basin (its type locality), with another species distributed along the headwaters of São João, Guapi-Macacu, and Santo Aleixo river basins. Moreover, according to high divergence at coi, the São João lineage could also be a distinct species from the one inhabiting drainages entering Guanabara Bay. In either case, P. garbei itself would be limited to Macaé river basin, in which the invasive rainbow trout poses a risk factor (Lazzarotto et al., 2007; Pereira and Brito, 2008). Currently, P. garbei is classified as 'near threatened' [Instituto Chico Mendes de Conservação da Biodiversidade [ICMBio], 2014], a category that probably stems from the recent record of this species in the São João river and the confirmation of its occurrence in protected areas (Maia et al., 2013). However, if the deep phylogenetic lineages found here corroborate a species complex in P. garbei, the conservation status of the entire assemblage may need reexamination, mainly due to a reduction in the range of each species due to the taxonomic splitting.

In summary, our molecular analysis together with paleo-drainage reconstructions revealed that the current phylogeographic patterns of the rheophilic catfish P. garbei were importantly impacted by headwater stream-captures. This taxon proved to encompass at least two highly divergent lineages, and furthermore, each headwater seems to represent a genetically diagnosable OTU. The integrative approach employed here helps to introduce a useful way to test hypotheses regarding to the distribution and conservation of Neotropical freshwater fishes.

# AUTHOR CONTRIBUTIONS

SL had substantial contribution to the acquisition of samples, analyses, and interpretation of the results, writing the manuscript. WB-F and TA also contributed in the analyses, writing, and figures. HL helped in the fish collection, design, and writing the manuscript. AT and JA provided logistic support for molecular data during SL's postdoctoral activities at UCI. All authors contributed in conception and elaboration of the manuscript, and read and approved its final version.

# FUNDING

SL and HL received postdoctoral scholarships from Science without Borders Program/CNPq (203476/2014-0 and 202253/2015-5, respectively). WB-F receives a Ph.D. scholarship from Science without Borders Program/CNPq (233161/2014-7).

# ACKNOWLEDGMENTS

We are grateful to Edson H. L. Pereira for receiving and analyzing specimens for morphological differences.

We are further indebted to CNPq (Conselho Nacional de Desenvolvimento Cientifico e Tecnologico) for supporting the research and to ICMBio (Instituto Chico Mendes de Conservação da Biodiversidade) and INEA/RJ (Instituto Estadual do Ambiente do Rio de Janeiro) for providing collecting permits to SL (28332-1, 30532-1, and 049/2011). Thiago Barros and Érica Caramaschi were responsible for providing specimens from Guapiaçu sub-basin, and therefore, we are also thankful to them. We are thankful to Andrea Thomaz for helpful advices on paleo-drainages reconstruction.

# REFERENCES


#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2017.00199/full#supplementary-material

FIGURE S1 | Bayesian phylogenetic reconstructions for cytochrome b (cytB) and cytochrome oxidase I (coi) in Pareiorhaphis garbei, using HKY+I (cytB) and TrN+G (coi) as nucleotide substitution models. Pareiorhaphis cf. bahianus (MG496258 and MG496259) from Contas river basin (northeastern Brazil) was used as out-group. Posterior probabilities are shown above the nodes.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Lima, Berbel-Filho, Araújo, Lazzarotto, Tatarenkov and Avise. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genetic Diversity of an Imperiled Neotropical Catfish and Recommendations for Its Restoration

#### Fernando S. Fonseca1,2, Rodrigo R. Domingues<sup>3</sup> , Eric M. Hallerman<sup>4</sup> and Alexandre W. S. Hilsdorf<sup>1</sup> \*

<sup>1</sup> Laboratório de Genética de Organismos Aquáticos e Aquicultura, Unidade de Biotecnologia, Núcleo Integrado de Biotecnologia, Universidade de Mogi das Cruzes, Mogi das Cruzes, Brazil, <sup>2</sup> Instituto de Pesca, Agência Paulista de Tecnologia dos Agronegócios, Secretaria de Agricultura e Abastecimento do Estado de São Paulo, São José do Rio Preto, Brazil, <sup>3</sup> Instituto do Mar, Universidade Federal de São Paulo, Santos, Brazil, <sup>4</sup> Department of Fish and Wildlife Conservation, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States

#### Edited by:

Rodrigo A. Torres, Universidade Federal de Pernambuco, Brazil

#### Reviewed by:

Carolina Machado, Federal University of São Carlos, Brazil Natalia Martinkova, Academy of Sciences of the Czech Republic, Czechia

> \*Correspondence: Alexandre W. S. Hilsdorf wagner@umc.br

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 01 September 2017 Accepted: 20 November 2017 Published: 12 December 2017

#### Citation:

Fonseca FS, Domingues RR, Hallerman EM and Hilsdorf AWS (2017) Genetic Diversity of an Imperiled Neotropical Catfish and Recommendations for Its Restoration. Front. Genet. 8:196. doi: 10.3389/fgene.2017.00196 The long-whiskered catfish Steindachneridion parahybae (Family Pimelodidae) is endemic to the Paraíba do Sul River basin in southeastern Brazil. This species was heavily exploited by artisanal fisheries and faces challenges posed by dams, introduced species, and deterioration of critical habitat. The remaining populations are small and extirpated from some locales, and the species is listed as critically endangered in Brazil. Screening variation at a partial mitochondrial control region sequence (mtCR) and 20 microsatellite loci, we: (i) describe the patterns of genetic diversity along its current distributional range; (ii) test the null hypothesis of panmixia; (iii) investigate the main factors driving its current population structure, and (iv) propose management of broodstock for fostering recovery of wild populations through genetically cognizant restocking. Our microsatellite data for 70 individuals from five collections indicate moderate levels of heterozygosity (H<sup>O</sup> = 0.45) and low levels of inbreeding (FIS = 0.016). Individual-based cluster analyses showed clear genetic structure, with three clusters of individuals over the collection area with no mis-assigned individuals, suggesting no recent migration among the three clusters. Pairwise DEST values showed moderate and significant genetic differentiation among all populations so identified. The MUR population may have suffered a recent demographic reduction. mtCRs for 70 individuals exhibited 36 haplotypes resulting from 38 polymorphic sites. Overall, mitochondrial haplotype diversity was 0.930 (±0.023) and nucleotide diversity was 0.011 (±0.002). Significant population structure was observed, with φST = 0.226. Genetic markers could be used in a hatchery-based restoration program emphasizing breeding of pairs with low kinship values in order to promote retention of genetic diversity and avoid inbreeding. Individual average kinship relationships showed 87.3% advised matings, 11.0% marginal matings, and 1.7% advised against. While these results comprise

a contribution toward planning better breeding management and monitoring, parallel actions to be undertaken include surveying healthy riverine habits for reintroduction and continued searching for wild individuals to introduce new variation into the captive broodstock to avoid adaptation to captivity and to minimize inbreeding.

Keywords: Steindachneridion parahybae, surubim-do-paraíba, Paraíba do Sul River, Siluriformes, population structure, conservation aquaculture

#### INTRODUCTION

The Atlantic Forest is a globally important biome; with high freshwater fish diversity, a high degree of endemism, and threats of many impending extinctions, it is considered a conservation hotspot (Myers et al., 2000; Hubert and Renno, 2006). An important component of the fish fauna within this biome is the genus Steindachneridion (Order Siluriformes: Family Pimelodidae), which encompasses six recognized species of migratory long-whiskered catfishes that inhabit deep water in mid-sized streams with rocky bottoms (Agostinho et al., 2003; Garavello, 2005).

Commonly referred to "surubim-do-paraíba," Steindachneridion parahybae (Steindachner, 1877) is a mediumsized, dorsoventrally flattened, whiskered catfish endemic to the Paraíba do Sul River basin in southeast Brazil (Garavello, 2005). This species was heavily exploited by artisanal fisheries along the Paraíba do Sul River in the early 1950s (Machado and Abreu, 1952). Records of historical harvest, however, are scarce, and survival of this species faces ongoing anthropogenic impacts (Hilsdorf and Petrere, 2002; Honji et al., 2009). Because of overfishing, construction of dams, introduced species, and deterioration of critical habitat (Loris, 2008; Moraes et al., 2017), the species is listed as "Critically Endangered" in the official lists of imperiled species in Brazil (Caneppele et al., 2008; Oyakawa et al., 2009).

Population decline is a prominent concern for conservation genetics because of potentially harmful consequences for species. Indeed, small and fragmented populations can undergo changes such as loss of genetic variation, fixation of deleterious alleles, inbreeding, and reduced fitness (Frankham et al., 2009). Critically, such outcomes can limit the ability of a given species to adapt to future environmental changes, increasing the risk of extinction (Frankham, 2005). Molecular markers are powerful tools for quantifying genetic variation at the individual, population, and species levels, and screening of markers can contribute to identification of units of relevance for the management and conservation of target species (Allendorf et al., 2010; Chauhan and Rajiv, 2010). Screening of molecular markers in studies of Neotropical freshwater fishes has contributed to assessment and conservation of fisheries genetic resources (Piorski et al., 2008; Hilsdorf and Hallerman, 2017). In particular, assessing genetic diversity and characterizing population structure is of primary importance for framing management strategies to minimize the likelihood of population extinction.

Once abundant in artisanal fisheries, S. parahybae is currently on the verge of extinction, with only a few remaining small and isolated populations (Honji et al., 2017). Management to foster recovery of the species is imperative due to its ecological importance as an apex predator and its economic importance for artisanal fisheries. Against this background, screening variation at a partial mitochondrial control region sequence (mtCR) and 20 microsatellite loci, we address for the first time issues important to the genetic conservation of the endemic Neotropical catfish S. parahybae. Specifically, we: (i) describe the patterns of genetic diversity of the S. parahybae along its current distributional range; (ii) test the null hypothesis of panmixia in this region; (iii) investigate the main factors driving its current population structure, and (iv) propose ex situ genetic breeding management for recovery of wild populations through genetically cognizant restocking programs.

#### MATERIALS AND METHODS

#### Ethics Statement

All field work for fish sampling complied with the legal regulations of Brazil (collection permit SISBIO: 22140, Project: P&D CESP/ANEEL: 0061-017/2006).

#### Sampling Sites and DNA Extraction

A total of 70 individual S. parahybae were captured through an extensive occurrence survey along the tributaries of the Paraíba do Sul River basin in Brazil. Sites of occurrence were located with the help of local artisanal fishers and the Companhia Energética de São Paulo (São Paulo State Electric Power Company, CESP). Wild S. parahybae were collected at five different locations (**Figure 1**) within the Paraíba do Sul River basin in 2004 and between 2010 and 2015, consisting of: 1 individual from the Paraíba do Sul River (PSP) in São Paulo State (22◦ 340 S; 44◦ 530W), 24 individuals from the Paraíba do Sul River (PRJ) in Rio de Janeiro State (22◦ 130 S; 43◦ 250W), 5 individuals from the Preto River (PRE) in Rio de Janeiro State (22◦ 000 S; 43◦ 200W), 3 individuals from the Pomba River (POM) in Rio de Janeiro State (21◦ 320 S; 42◦ 090W), and 37 individuals from the Muriaé River (MUR) in Rio de Janeiro State (21◦ 120 S; 41◦ 550W). All individuals were transferred live to aquaculture facilities belonging to CESP and kept as an ex situ germplasm bank (Supplementary Images S1, S2), where they were fin-clipped and individually marked with passive integrated transponder tags.

Total genomic DNA was extracted from approximately 20 mg of fin-clip tissue using a phenol-chloroform protocol adapted from Taggart et al. (1992) using STE (0.1 M NaCl, 0.05 M Tris-HCl, 0.01 M EDTA) with a lower concentration of EDTA buffer.

# Microsatellite Genotyping and Data Analysis

Genetic variability was screened at 20 di-, tri-, and tetranucleotide microsatellite loci (Spa2, Spa3, Spa4, Spa5, Spa6, Spa7, Spa8, Spa11, Spa12, Spa14, Spa15, Spa16, Spa17, Spa18, Spa19, Spa20, Spa22, Spa23, Spa28, and Spa42) described by Ojeda et al. (2016) using primer pairs developed by those authors. Sequences of all microsatellites were deposited in GenBank under accession numbers KU821557 to KU821572 and KX578067 to KX578070. Microsatellite alleles were genotyped using a 6.5% denatured polyacrylamide gel matrix (KB plus 6.5% Matrix Gel, Li-Color Biosciences, Lincoln, NE, United States) and a Li-Cor 4300 DNA Analyzer (IR2, Lincoln, NE, United States) using the IRDye <sup>R</sup> 700 marker and universal M13 tail primer described by Schuelke (2000). The sizes of alleles were estimated by interpolating their position relative to molecular weight markers (50–350 bp DNA Sizing Standard IRDye <sup>R</sup> 700) using the SagaGT Client program (Li-Cor Biosciences, Lincoln, NE, United States). All recommendations for minimizing genotyping errors originating from DNA quality and handling procedures (Pompanon et al., 2005) were implemented, including semi-automated scoring followed by visual inspection by two independent people.

The presence of null alleles and allelic dropout were tested for using MICRO-CHECKER 2.2.3 (Van Oosterhout et al., 2004). Deviations of genotype frequencies from Hardy– Weinberg expectations (HWE) were calculated by an exact test using the Markov chain randomization approach (Guo and Thompson, 1992) with HW-QUICKCHECK (Kalinowski, 2006). The significance for both types of tests was assessed employing a Markov chain Monte Carlo (MCMC) procedure based on 1,000 dememorizations, 100 batches, and 10,000 iterations per batch, and the critical value for significance was adjusted for multiple tests using the Holm–Bonferroni method (Holm, 1979). The nuclear diversity for each collection area and locus was quantified as the number of alleles per locus (A), allelic richness (Ar), observed heterozygosity (HO), expected heterozygosity (HE), and fixation index (FIS) using the Adegenet package (Jombart, 2008) in R v 3.1.2 (R Development Core Team, 2014). An individual assignment approach was performed using the Neighbor-Joining (NJ) algorithm using Euclidian distance between matrices of allele frequencies using the Adegenet package in R. In addition, discriminant analysis of principal components (DAPC, Jombart et al., 2010) was used to assign individuals to genetic cluster(s), using the Adegenet package in R. This multivariate approach was designed to identify and describe clusters of genetically related individuals in a manner similar to that performed by the Structure package (Pritchard et al., 2000). DAPC has the advantage that it does not require populations to be in Hardy–Weinberg equilibrium or linkage equilibrium between loci (Jombart et al., 2010). The optimal number of principal components retained followed the indication of α scores. The cluster assignments were pre-defined to correspond to a priori defined sample locations. Population genetic structure was estimated among clusters identified by DAPC using the

population pairwise differentiation index DEST (Jost, 2008) using the DEMETICS package in R (R Development Core Team, 2014). Holm–Bonferroni (Holm, 1979) adjustments of α were used to correct for multiple tests.

The program BOTTLENECK 1.2.02 (Cornuet and Luikart, 1997) was used to evaluate the possibility of changes in the recent effective size of a population. In this analysis, heterozygote excesses were checked using three different models: the infinite alleles model (IAM), stepwise mutation model (SMM), and the two-stage intermediate model (two-phase microsatellite evolution mode – TPM) with 70% SMM and 30% IAM. The two-phase model (TPM) is most powerful when fewer than 20 loci are used (Piry et al., 1999). The probability of significant heterozygosity excess was determined using 10,000 replications and a one-tailed Wilcoxon signed-rank tests (α = 0.05).

To guide design of matings for the restocking program, we performed average kinship assessment using the program COANCESTRY version 1.0.1.5 (Wang, 2011). This analysis is based on allele frequencies in the generation of matrices (F0) using means and variances of kinship coefficients. Three relationship categories (unrelated, half-sibling, and full-sibling) were used, with 5,000 pairs of simulated individuals for each category. In this evaluation, seven rxy (KE) kinship estimators were used as follows: trioml (triadic likelihood estimator – Wang, 2007), wang (Wang, 2002), lynchLi (Li et al., 1993), lynchrd (Lynch and Ritland, 1999), ritland (Ritland, 1996), quellergt (Queller and Goodnight, 1989), and dyadml (dyadic likelihood estimator – Milligan, 2003). In addition, we applied the ML-Relate package (Kalinowski et al., 2006) to make a conservative estimation of the level of kinship. We then compared the results of both approaches.

#### Mitochondrial DNA Amplification, Sequencing, and Data Analysis

A partial mitochondrial DNA control region (mtCR) was amplified by polymerase chain reaction (PCR) using two external primers, SPf (50TCTAACTCCCAAAGCTAGAATC3<sup>0</sup> ) and SPr (50GGAACTTTCTAGGGTTCATCTTAAC3<sup>0</sup> ), developed in this study. Amplification was performed in a 20-µl reaction containing: 20 ng/µl of genomic DNA, 0.5 IU Taq DNA polymerase (Thermo Fisher Scientific, Inc., São Paulo, Brazil), 1x buffer (100 mM Tris-HCl pH 8.8, 500 mM KCl), 1.5 mM MgCl2, 2.5 mM dNTPs, 10 mM of each primer, and ultrapure water. The PCR thermal-cycling profile consisted of initial denaturation at 94◦C for 3 min; followed by 35 cycles of denaturation at 94◦C for 1 min, annealing at 53◦C for 1 min, and extension at 72◦C for 1:30 min; and a final extension step at 72◦C for 10 min.

The amplicons were purified using ExoSAP enzymes (Affymetrix, Cleveland, OH, United States) following the manufacturer's instructions, except that the shortest time for deactivation of the enzymes was 15 min. The amplicons were sequenced on an Applied Biosystems 3130 Genetic Analyzer, and bases were called using Applied Biosystems Sequencing Analysis Software version 5.2. The sequences were edited manually when necessary using CodonCode (v.5.1.5, Codon Code Corporation, Centerville, MA, United States).

Summary statistics for genetic diversity – including haplotype (Hd) and nucleotide (π) diversities, number of haplotypes (h) and polymorphic sites (S) – were calculated in DnaSP v. 5.10 (Librado and Rozas, 2009). The relationships among haplotypes and their geographic distribution were assessed using the median-joining algorithm in NETWORK 4.6.1.1. To test the null hypothesis of matrilineal panmixia of S. parahybae along the Paraíba do Sul River basin, φST (Weir and Cockerham, 1984) was calculated by analysis of molecular variance (AMOVA, Excoffier et al., 1992). To assess genetic structure across the collections, 10,000 permutations of the data were used to test the significance of hierarchical differentiation using Arlequin version 3.5.1.3 (Excoffier and Lischer, 2010).

The estimated probabilities were corrected using Holm– Bonferroni sequential adjustments for multiple tests (Holm, 1979). Due to small sample sizes at the PSP (n = 1), PRE (n = 5), and POM (n = 3) sites, data from those collection areas were included only to provide preliminary insights into how these collections may be related to the others.

Two neutrality metrics were calculated to infer the population demographic history of S. parahybae, R<sup>2</sup> (Ramos-Onsins and Rozas, 2002) and F<sup>S</sup> (Fu, 1997). Both metrics were calculated and their statistical significance tested using 10,000 permutations under the coalescent process implemented in DnaSP v.5.10 (Librado and Rozas, 2009).

# RESULTS

### Microsatellite Intra- and Inter-population Diversity

All samples of S. parahybae from the five collection areas along the Paraíba do Sul River basin were successfully genotyped at 20 microsatellite loci. All loci within all collections were in Hardy– Weinberg equilibrium after correction for multiple tests. The linkage disequilibrium tests presented significant values after sequential Bonferroni correction (p < 0.05) for seven pairs of loci in a single population (MUR), which may be the consequence of many factors, including limited numbers of parents in the preceding generation, as well as natural selection, gene flow, assortative mating, and linkage. There was no evidence of genotyping errors, such as stuttering, null alleles, or large allele drop-out. The highest number of alleles (13) was amplified at the Spa18 locus, while Spa3, Spa7, Spa16, Spa23, and Spa28 all exhibited only two alleles. The number of alleles per locus within particular populations ranged between 2 and 13, with a mean of 4.7 ± 3.2. The allelic richness values ranged between 6.1 and 10.4, with average of 2.65 ± 2.07. The observed heterozygosity values ranged between 0.0 and 0.89, with average of 0.45 ± 0.24. Private alleles were observed in the MUR (14), PRE (1), and PRJ (2) collections. The FIS inbreeding coefficient values ranged between −0.002 and 0.356, with average of 0.016. The summary statistics for all loci are shown in Supplementary Table S1.

Results of the individual-based DAPC and NJ cluster analyses showed clear genetic structure, with three clusters of individuals over the entire collection (**Figure 2**). Overall, individuals from the MUR and PRJ sites showed a 100% likelihood of assignment

to their original collection area (hereafter termed populations). Individuals from POM, PSP, and PRE collections present a group of individuals of mixed ancestry (**Figure 2A**). Even though they do not form a same genetic cluster, we decided to assume a single cluster due to the low sample number. Neighbor-joining analysis (**Figure 2B**) also corroborated the three clusters of individuals: MUR, PRJ, and the POM/PSP/PRE sites. No misassigned individuals were observed, suggesting that there may have been no recent migration among these well-defined clusters.

(B) Unrooted tree from a Neighbor-joining analysis of distance data.

The genetic differentiation indices were highly significant (DEST = 0.082, p = 0.001). The pairwise DEST values for populations identified previously by NJ and DAPC analysis were congruent with the results of individual-based analysis, showing moderate and significant genetic differentiation among all populations (**Table 1**).

In silico analyzes performed in BOTTLENECK 1.2.02 showed significant values (p < 0.05) for the MUR population using the SMM mutation model (**Table 2**). However, there were no significant values for TPM model, which is more powerful for fewer than 20 loci (Piry et al., 1999). These results suggest that among all populations, only the MUR population may have TABLE 1 | Pairwise mtDNA φST values (above diagonal) and pairwise Jost's DEST values using microsatellites (below the diagonal) among three population clusters of Steindachneridion parahybae.


nsNot significant, <sup>∗</sup>highly significant (p < 0.01).

suffered recent demographic reduction. Because the POM, PSP, and PRE collections populations had few specimens, assessment of demographic bottlenecks in these populations had insufficient power.

# Mitochondrial Partial Control Region Sequence Variation

A total of 791 bp of the mtCR were resolved from 70 S. parahybae. Sequences of all haplotypes were deposited in GenBank under accession numbers MG012754–MG012789. The

TABLE 2 | Results of the BOTTLENECK 1.2.02 program (Cornuet and Luikart, 1997) used to evaluate the possibility of changes in the effective size of the population.<sup>1</sup>


<sup>1</sup> p is the probability value, with significant values (p < 0.05) shown in bold; IAM, the infinite alleles model; SMM, the stepwise mutation model; TPM, two-phase model of microsatellite evolution mode; and SD, standard deviation.

sequences exhibited 36 haplotypes (Supplementary Tables S2, S3) resulting from 38 polymorphic sites and 41 mutations (34 transitions and 7 transversions). The nucleotide composition was 34.14% A, 19.66% C, 20.98% G, and 25.23% T. The overall h and π diversities were 0.930 (±0.023) and 0.011 (±0.002), respectively. The haplotype diversity values ranged from 0.868 ± 0.049 (MUR) to 0.99 ± 0.272 (POM), and nucleotide diversity values ranged from 0.005 ± 0.004 (POM) to 0.013 ± 0.007 (PRJ) (**Table 3**). Haplotype H2 (n = 13) was the most common, occurring mainly in MUR, and shared by all populations except for PSP (Supplementary Table S3). A total of 33 haplotypes were singletons, 16 occurring in MUR and 12 in PRJ. An unrooted median-joining tree of mtCR haplotypes (**Figure 3**) showed mainly singleton haplotypes in populations MUR and PRJ. Haplotype H2 was most frequent haplotype, shared by all S. parahybae populations except for PSP. Significant population structure was observed for S. parahybae along the Paraíba do Sul River basin based on mtCR data (φST = 0.226, p < 0.05).

Fu's F<sup>S</sup> metric presented high values, but there were no significant values (**Table 3**). In contrast, the R<sup>2</sup> values were high and significant (p < 0.05), indicating a significant departure from neutrality. According to Ramos-Onsins and Rozas (2002), the R<sup>2</sup> test is more powerful in detecting demographic events in small sample sizes, whereas Fu's F<sup>S</sup> is better for large sample sizes.


(N) = number of sequenced individuals; (S) = number of polymorphic sites; H = haplotypes; (h) haplotype diversity; (π) nucleotide diversity; (SD) standard deviation; and F<sup>S</sup> (Fu, 1997) and R<sup>2</sup> (Ramos-Onsins and Rozas, 2002) indices of neutrality. nsNot significant, <sup>∗</sup>highly significant (p < 0.01).

#### Genetic Relatedness Among Steindachneridion parahybae in the Ex Situ Germplasm Bank

Among prospective broodstock in the ex situ germplasm bank, sampling variances for the kinship estimators (KE) ranged from 0.0198 to 0.1046. The lowest variance was found in the triadic likelihood estimator (TrioML). Considering that the KEs had high correlations and little influence on the inference of relatedness, paired kinship coefficients are reported only for the TrioML. Values estimated by the TrioML of kinship were low, indicating a low inbreeding index for 67 individuals, and thus, a range of permissible pairings among the broodstock candidates. Only three individuals with the highest values of inbreeding were classified as moderately endogamous. Means and variances for kinship likelihood estimators are showcased in Supplementary Table S4.

When we matched the 44 females with the 21 males among the broodstock candidates, we obtained 924 interactions that were added to the possible interactions with the five individuals of indeterminate sex (MUR12, MUR15, MUR20, MUR28, and MUR37), obtaining 1,269 possible matings. The index obtained for kinship values was split into three categories according to expected average theoretical values (Wang, 2011). That is, the relatedness values for simulated pairs were considered to be high above 0.5 (full-sib and parent-offspring), intermediate between 0.25 and 0.5 (half-sib or other kinship), and low below 0.25. Considering all possible matings, the calculated kinship relationships indicated 87.3% (1,108/1,269) advisable matings, 11.0% (140/1,269) marginal matings, and 1.7% (21/1,269) inadvisable matings. A table of consensus outcomes from the COANCESTRY and ML-Relate analyses is showcased in **Table 4**. Supplementary Tables S5, S6 show the results of the respective analyses.

# DISCUSSION

# Genetic Diversity and Population Structure

Our assessment of the extant S. parahybae populations reflected the endangered status of this Neotropical catfish in its area of occurrence. Attempts to locate wild populations proved the first indicator of the imperiled status of S. parahybae in the Paraíba do Sul watershed. Many efforts have been made during recent years to find and collect S. parahybae individuals to be kept in an ex situ gene bank so that a genetically cognizant restocking program (Miller and Kapuscinski, 2003) could be implemented. However, relatively few specimens have been found and collected until now. Although intensive efforts were made to collect specimens in localities where their occurrence was recorded by local anglers, at sites POM, PRE, and PSP, just a few individuals were collected; in the case of PSP, just one individual was collected in 2004. According to Honji et al. (2009), this species is regionally extinct in São Paulo State. At one collection site (PRJ), 24 individuals were collected in 2004 before an organochlorine insecticide (Endosulfan) spill into the Paraíba do Sul River killed fish and other riverine animals (Brito et al., 2012). After that, several unsuccessful attempts to capture S. parahybae have been made; hence, the species was likely extirpated at this site. The MUR is a locality where the species seems to be present, although in a small population. Despite the difficulty of catching this species in artisanal fisheries, local fishermen report the rare presence of S. parahybae in the Muriaé River.

Assessing the genetic differentiation of wild populations and individuals kept in ex situ gene banks is pivotal for developing conservation strategies to save a given species from extinction (FAO, 2008). Summary statistics for genetic diversity for the 20 microsatellite loci screened are shown in Supplementary Table S1. Heterozygosity (H<sup>E</sup> = 0.47) and number of alleles per locus (A = 4.7) for wild S. parahybae populations were low across the 20 loci screened in this study.

Despite this low genetic variability, the inbreeding index (FIS = 0.016) was below the average value (FIS = 0.05) for 18 endangered animal species kept in ex situ conservation programs (Witzenberger and Hochkirch, 2011). The relatively low values of genetic diversity at microsatellite loci, a molecular marker type whose dynamics are more closely associated with contemporary events (Selkoe and Toonen, 2006; Weese et al., 2012), could be associated with the current state of fragility of populations likely impacted by anthropogenic habitat alterations and overfishing. These findings were corroborated with in silico analyses performed in BOTTLENECK 1.2.02.

The discreteness of populations of S. parahybae currently along the Paraíba do Sul River basin evident by analyses of microsatellite loci and mitochondrial DNA sequencing may be the result of isolation due to environmental alterations or to the characteristic short-distance reproductive migration behavior of this catfish species. No data are specifically available about the migration intensity of S. parahybae, but studies on catfishes in the Uruguay River, including the congeneric S. inscripta, show apparent short-distance migration (Zaniboni-Filho and Schulz, 2003). As the mobility of species is a factor determining the degree of homogenization of allele frequencies by gene flow (Avise, 2004), the contemporary genetic structure found for this species may then be a combination of short migration and river disruption by dams, which may act as barriers for gene flow (Dehais et al., 2010; Roberts et al., 2013; Gouskov et al., 2016). Indeed, this recent effect may be supported by the observed differentiation values for the nuclear markers, indicating low to moderate differentiation within the contemporary period.

The differentiation shown for mitochondrial D-loop sequences (φST = 0.226) and microsatellites (DEST = 0.082) can be interpreted as reflecting historical and recent structuring, especially for the female component of the population, with some degree of homogenizing gene flow for males, with or without the effects of recent anthropogenic isolation. Cluster analyses (**Figure 2**) showed considerable population structure, indicating the division of the S. parahybae gene pool into three distinct groups, MUR, PRJ, and a mixed group formed by POM, PSP, and PRE. The strongly supported assigned clusters for MUR and PRJ (q > 0.9) denoted virtual absence of migration between the two groups. The identification and characterization of an Evolutionary Significant Unit (ESU), based on population genetic

TABLE 4 | Consensus pairwise relatedness of broodstock candidates among the Steindachneridion parahybae individuals maintained in the germplasm bank, as assessed by both the COANCESTRY (TrioML estimator) and the ML-Relate packages.

Crosses in red and yellow cells should be avoided due to high or moderate level of kinship. Shown in green cells are the consensus best crosses for producing offspring for restocking due to low kinship. Females are denoted (F) and males (M). N/A – data not available, for individuals of indeterminate sex.

structure and adaptive differentiation, would provide a useful basis for conservation, as well as a more complete understanding of the role of adaptive evolution in the natural history of the species (Casacci et al., 2014). Carvalho et al. (2012) reported two ESUs within the widely distributed Pseudoplatystoma corruscans, a commercially important Neotropical siluriform catfish. However, despite the presence of private alleles in the MUR sample, populations of S. parahybae may more appropriately be considered Management Units (MUs, Moritz, 1994), which can be defined as populations with significant divergence of allele frequencies at nuclear or mitochondrial loci, regardless of the phylogenetic distinctiveness of the alleles. Taking another approach, Palsbøll et al. (2007) proposed that MUs should be determined by the level of interpopulation genetic divergence in general; however, the threshold values to establish populations' genetic connectivity must be flexible to reflect each specific biological and conservation context. This is particularly the case for the Muriaé River population, where S. parahybae individuals yet seem to occur.

#### Phylogeography and Demographic History

Despite the difficulty of comparing among studies the average divergence between mtDNA sequences due to lack of standardization of sequence length, small nucleotide diversity combined with considerable haplotype diversity (Grant and Bowen, 1998) might suggest a demographic expansion after a period of low effective population size for S. parahybae populations. Mesquita et al. (2001) assessed Chondrostoma lusitanicumem, an endangered fish species in Portugal, and found the same combination of molecular genetic variation among marker types, suggesting that high mitochondrial haplotype diversity with low nucleotide divergence may indicate non-equilibrium evolutionary action or non-neutral forces such as "founder-flush" population differentiation or even the persistence of slightly deleterious mutations in small populations due to ineffectiveness of "purifying" selection. In the present study, the hypothesis of demographic expansion is supported by results of the R<sup>2</sup> test, a test to detect population growth. The positive, significant values refuted the null hypothesis of constant population size under the neutral model. In addition, the haplotype network (**Figure 3**) does not have the star-like shape typical of populations that show a signature of a species that has expanded its numbers from a modest number of founders (Avise, 2000). It is likely that throughout the evolutionary history of S. parahybae, different mitochondrial lineages colonized and developed in each locality of the basin, and that carriers of distinct mitochondrial lineages migrated along the basin when no artificial barriers, such as dams, were present. Such dams are barriers that prevent the reproductive migration of many fish species (Pelicice et al., 2015). As human occupation expanded along the watershed, overfishing, pollution, introduction of species, and dam construction for electric power generation impacted S. parahybae population size and geographic range, accompanied by the local and global loss of many mitochondrial haplotypes. For instance, the only specimen captured from PSP has a unique haplotype at the terminus of a branch of the network, not present in any other sampling site in this survey. In addition, the existence of many different mitochondrial haplotypes in the MUR and PRJ populations, many separated by inferred mutational differences not represented by observations in living individuals, supports this interpretation.

On the other hand, the result of the BOTTLENECK test suggests that the MUR population was subject to a recent population bottleneck. This result is supported by observation of excess heterozygosity in the population as a whole. The bottleneck process, which the extant S. parahybae population may have undergone, suggests the loss of rare alleles during the population reduction. The loss of alleles may ultimately jeopardize the population due to loss genetic variability crucial for meeting future ecological challenges; however, the presence of private alleles may also suggest local adaptation (Santos et al., 2016). The absence of S. parahybae individuals in the artisanal fisheries in localities where they were common as recently as the early 1950s (Machado and Abreu, 1952) and in scientific samples at sites of previous occurrence in the Paraíba do Sul River basin clearly suggests that S. parahybae populations underwent a genetic bottleneck process and that some populations were extirpated.

### A Restoration Genetics Approach to Recovery of S. parahybae Populations

Restocking of freshwater fishes is often conducted to restore populations impacted by the construction of hydroelectric power dams (Marmulla, 2001; Palmer et al., 2007; Piorski et al., 2008). While the effectiveness of the approach has been questioned (Snyder et al., 1996; Fraser, 2008; Neff et al., 2011), restoration of imperiled or extirpated populations must be considered when other efforts have proven unsuccessful (Attard et al., 2016). Restoration genetics attempts to use the tools of genomics to support proper management of captive populations and propagation of genetically healthy fingerlings (Frankham, 2010). Human-driven environmental changes in the Paraíba do Sul watershed date to the onset of agricultural production (sugarcane in the 17th century, coffee in the late 18th–19th centuries), and industrialization from the 1950s (Pádua, 2017). More than 120 hydropower stations in the Paraíba do Sul River and its tributaries have disrupted fluvial connectivity, leading to losses of local populations and species (Agostinho et al., 2008; Loris, 2008). Steindachneridion parahybae is among the endemic fishes of Paraíba do Sul negatively impacted by human actions, and is listed as threatened by the environmental agencies of the Brazilian government (Honji et al., 2017). The recovery of these imperiled species has been promoted by establishing ex situ germplasm banks and genetics-based captive breeding programs. The reintroduction programs for S. parahybae follow to some degree the framework proposed by Attard et al. (2016), which accounts for past, present, and future components of captive-breeding and reintroduction. The geographical localization and genetic characterization of extant S. parahybae populations described herein, and their tagging and maintenance in the ex situ gene bank facilities encompass the past and present components of the holistic framework for successful reintroductions.

Successful captive breeding depends upon keeping levels of inbreeding low and optimizing genetic variability so that fingerlings intended for reintroduction reflect as much as possible the genetic makeup of the species. To achieve this goal, we assessed the genetic relatedness of all captively bred individuals to generate guidance for the hatchery manager to select genetically unrelated fish to be crossed (Saura et al., 2013). A set of 20 species-specific microsatellite markers (Ojeda et al., 2016) was used successfully to assess the mean kinship within and among the remaining populations of S. parahybae, generating relatedness indices between each captive individual. Keeping the minimum number of founders to minimize a loss of genetic diversity, avoiding both inbreeding and outbreeding depression, and searching for new wild populations to constantly renew the captive stock are all pivotal to the success of ex situ conservation projects (Miller and Kapuscinski, 2003; Witzenberger and Hochkirch, 2011; IUCN/SSC, 2013). Neff et al. (2011) argued that just taking into account multilocus heterozygosity to measure the likely effectiveness of reintroduction is too simplistic. According to these authors, it is more important to come to better comprehend the complexity of the genetic architecture of fitness, so that the importance of genetic diversity to the evolutionary potential of populations of fishes and their subsequent adaptation after reintroduction can be gauged. Therefore, they recommended that effective management and conservation of populations in imminent peril of extirpation should include: (i) rehabilitation and maintenance of habitat and ecological function; (ii) incorporation of natural ecological processes into artificial breeding programs, and (iii) use of a full or partial factorial breeding design in artificial breeding programs. This last recommendation regarding breeding design is intended to maximize the amount of genetic differentiation throughout the species' occurrence.

Use of estimators for pairwise relatedness and individual inbreeding coefficients based on multilocus microsatellite markers has proven an efficient strategy to achieve better responses in reintroduction programs. Sriphairoj et al. (2007), working with the critically endangered Mekong giant catfish Pangasianodon gigas tested different scenarios regarding different broodstock recruitment and mating strategies. The authors suggested that using minimal kinship in a random mating scheme could keep the effective population size (Ne) at 100 so that allelic diversity can be retained at above 90% of current allelic diversity for at least four generations. In the present study of endangered S. parahybae populations, we used this same approach, also based on multilocus microsatellite markers, to genetically characterize the few remaining wild populations and broodstock currently maintained in an ex situ germplasm bank. These genetic data comprise a contribution toward planning better breeding management and monitoring any changes in the genetic makeup of the broodstock, as well as in the reintroduced fingerlings in their new environments. Parallel actions to be taken would include surveying healthy riverine habits for reintroduction and continued searching for wild individuals to introduce new variation into the captive broodstock to avoid adaptation to captivity and to prevent inbreeding. Successes in other genetic marker-assisted restoration programs suggest that application of this approach might also prove effective for S. parahybae. Olsen et al. (2000) used microsatellite DNA markers to identify a targeted stock of pink salmon Oncorhynchus gorbuscha in a supportive breeding program on the Dungeness River in Washington State, United States, providing the basis for hatchery-based supplementation. Following cessation of stocking from outside sources, microsatellite marker-assisted hatchery-based supplementation of walleye Sander vitreus (Palmer et al., 2007) contributed to a rebuilding of the native population in the New River, Virginia, United States and its recognition as a premier recreational fishery for the species.

#### AUTHOR CONTRIBUTIONS

AH conceived and supervised the project. FF performed the experiments, analyzed data, and drafted the manuscript, RD performed statistical analyses and contributed to manuscript writing, AH and EH wrote and critically edited the final manuscript. All authors read and approved the final version of the manuscript.

# FUNDING

The authors thank the São Paulo Research Foundation (FAPESP # 11/23752-2) and AGEVAP (#10/2012) for granting funds to support this project, as well as FUNDEPAG and Instituto de Pesca for funding FF to develop this work.

# ACKNOWLEDGMENTS

The authors thank Danilo Caneppele from the Companhia Energética de São Paulo (CESP) for access to the S. parahybae samples kept in the CESP ex situ germplasm bank and Dr. Guilherme Gui from Fundação Piabanha for the great effort to find and sample S. parahybae in rivers of the Paraíba do Sul watershed in the State of Rio de Janeiro. They also thank Vívian Uhlig, M.Sc. from ICMBIO, for the map drawing, and Leo Menino for helping to format figures. Funding for EH's participation in this work was provided in part by the Virginia Agricultural Experiment Station and the United States Department of Agriculture National Institute of Food and Agriculture. This work was developed as part of the full requirements for the doctoral dissertation of FF in biotechnology at the University of Mogi das Cruzes/FAEP. The manuscript was strengthened by attention to the comments to two peer reviewers.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2017.00196/full#supplementary-material

# REFERENCES

fgene-08-00196 December 8, 2017 Time: 17:25 # 11



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Fonseca, Domingues, Hallerman and Hilsdorf. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# First Chromosomal Analysis in Hepsetidae (Actinopterygii, Characiformes): Insights into Relationship between African and Neotropical Fish Groups

Pedro C. Carvalho<sup>1</sup> , Ezequiel A. de Oliveira1,2, Luiz A. C. Bertollo<sup>1</sup> , Cassia F. Yano<sup>1</sup> , Claudio Oliveira<sup>3</sup> , Eva Decru<sup>4</sup> , Oladele I. Jegede<sup>5</sup> , Terumi Hatanaka<sup>1</sup> , Thomas Liehr<sup>6</sup> , Ahmed B. H. Al-Rikabi<sup>6</sup> and Marcelo de B. Cioffi<sup>1</sup> \*

<sup>1</sup> Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Carlos, Brazil, <sup>2</sup> Secretaria de Estado de Educação de Mato Grosso (Seduc-MT), Cuiabá, Brazil, <sup>3</sup> Departamento de Morfologia, Instituto de Biociências, Universidade Estadual Paulista, Botucatu, Brazil, <sup>4</sup> Section Vertebrates, Ichthyology, Royal Museum for Central Africa, Tervuren, Belgium, <sup>5</sup> Department of Fisheries and Aquaculture, Adamawa State University, Mubi, Nigeria, <sup>6</sup> Institute of Human Genetics, University Hospital Jena, Jena, Germany

#### Edited by:

Roberto Ferreira Artoni, Ponta Grossa State University, Brazil

#### Reviewed by:

Maelin Silva, Universidade Federal da Fronteira Sul, Brazil Lukas Kratochvil, Charles University, Czechia

> \*Correspondence: Marcelo de B. Cioffi mbcioffi@ufscar.br

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 18 October 2017 Accepted: 22 November 2017 Published: 12 December 2017

#### Citation:

Carvalho PC, de Oliveira EA, Bertollo LAC, Yano CF, Oliveira C, Decru E, Jegede OI, Hatanaka T, Liehr T, Al-Rikabi ABH and Cioffi MB (2017) First Chromosomal Analysis in Hepsetidae (Actinopterygii, Characiformes): Insights into Relationship between African and Neotropical Fish Groups. Front. Genet. 8:203. doi: 10.3389/fgene.2017.00203 Hepsetidae is a small fish family with only the genus Hepsetus, with six described species distributed throughout the South, Central and Western regions of Africa, showing a close relationship with the Alestidae and some Neotropical fish families. However, no cytogenetic information is available for both Hepsetidae and Alestidae species, thus preventing any evolutionary comparative studies at the chromosomal level. In the present study, we are providing new cytogenetic data for Hepsetus odoe, including the standard karyotype, C-banding, repetitive DNAs mapping, comparative genomic hybridization (CGH) and whole chromosome painting (WCP), providing chromosomal patterns and subsidies for comparative cytogenetics with other characiform families. Both males and females H. odoe have 2n = 58 chromosomes (10m + 28sm + 20st/a), with most of the C-band positive heterochromatin localized in the centromeric and subtelomeric regions. Only one pair of chromosomes bears proximal 5S rDNA sites in the short arms, contrasting with the 18S rDNA sequences which are located in the terminal regions of four chromosome pairs. Clear interstitial hybridization signals are evidenced for the U1 and U2 snDNA probes, but in only one and two chromosome pairs, respectively. Microsatellite motifs are widely distributed in the karyotype, with exception for the (CGG)10, (GAA)<sup>10</sup> and (GAG)<sup>10</sup> probes, which highlight conspicuous interstitial signals on an unique pair of chromosomes. Comparative data from conventional and molecular cytogenetics, including CGH and WCP experiments, indicate that H. odoe and some Erythrinidae species, particularly Erythrinus erythrinus, share similar chromosomal sequences suggesting some relatedness among them, although bearing genomic specificities in view of their divergent evolutionary histories.

Keywords: fishes, molecular cytogenetics, chromosomal painting, comparative genomic hybridization (CGH), karyotype evolution

# INTRODUCTION

fgene-08-00203 December 10, 2017 Time: 16:5 # 2

Characiformes comprises 24 families and more than 2100 species (Eschmeyer and Fong, 2017), distributed in many Neotropical and Ethiopian rivers (Nelson et al., 2016). As they are exclusively freshwater fishes, their evolutionary history is related with continents fragmentations and settlement and, with the development of natural barriers during their dispersion throughout secondary habitats (Vari and Malabarba, 1998; Oliveira et al., 2007).

The most primitive characiforms are the African citharinoids (Arroyave et al., 2013), and the relationship between the Neotropical and Ethiopian species may be closely linked with the Gondwana break-up, with a fast diversification established in a new habitat free of competition (Calcagnotto et al., 2005). Despite significant efforts on morphological and molecular analyzes, phylogenetic relationships are currently still uncertain for several groups and even the monophyly of Characiformes is still debated (Arcila et al., 2017).

The wide diversification of the characiforms is highlighted by the high karyotype variability found within distinct Neotropical groups, showing the fast evolution of these fishes as expected by the high fragmentation observed in the South American rivers, in contrast with the African ones, which presents lower fragmentation and variability (Ortí and Meyer, 1997; Oliveira et al., 2007). One example of such scenario concerns the Erythrinidae, a small family widely distributed throughout South America, consisting of the genus Erythrinus, Hoplerythrinus, and Hoplias (Oyakawa, 2003). Cytogenetics of the Erythrinidae fishes have been quite investigated over years, especially for H. malabaricus and E. erythrinus, where a variety of chromosomal features occurs even within a same nominal species, thus supporting the presence of species complexes (Bertollo, 2007; Cioffi et al., 2012). In fact, erythrinids hold a variety of different karyomorphs, with diploid numbers (2n) varying from 39 in Hoplias malabaricus (karyomorph D) to 2n = 54 in Erythrinus erythrinus (karyomorph A), in addition to distinct sex chromosomes systems with independent origins and particular evolutionary trajectories (Cioffi et al., 2013). The diploid number found for most Erythrinus species (2n = 54) is also the common one observed for Characiformes, which possibly represents the ancestral condition for this order (Oliveira et al., 2007; Cioffi et al., 2012). However, the full comprehension of the evolutionary relationships of its families is not clear until now. A recent phylogeny based on 1,051 genetic markers showed that both African Hepsetidae and Alestidae families have closer relationship and, in a lower scale, to other Neotropical families, such as Erythrinidae, Cynodontidae and Hemiodotidae, but not with Lebiasinidae and Ctenoluciidae (Arcila et al., 2017). This result is not fully consensual with some previous phylogenetic proposals (Ortí and Meyer, 1997; Buckup, 1998; Calcagnotto et al., 2005), where some of above families were found to be related.

Notwithstanding, except for the Erythrinidae (see above), most of these families remain with kayotypes poorly analyzed, thus limiting any evolutionary comparative studies among them at the chromosomal level. In this sense, karyological data for Hemiodontidae are mainly restricted to chromosome numbers although all species presenting the same diploid number (2n = 54) and bi-armed chromosomes (Arefjev, 1990; Porto et al., 1992, 1993; Arai, 2011). Concerning Cynodontidae, the only species analyzed up to now (Rhaphiodon vulpinus) also presented the same 2n and karyotype structure (Pastori et al., 2009). In turn, the available chromosome data for Lebiasinidae are also mainly restricted to chromosome numbers (Scheel, 1973; Arai, 2011), with exception for a few species (Arefjev, 1990; Oliveira et al., 1991). Despite such largely limitation, a high diversity characterizes their diploid numbers, which ranges from 2n = 22 in Nannostomus unifasciatus, to 2n = 46 in N. trifasciatus (Oliveira et al., 2007; Arai, 2011). Occasional occurrence of large metacentric pairs, such as in N. unifasciatus (Arefjev, 1990) points to Robertsonian fusions in the karyotype differentiation. Pyrrhulina australis and Pyrrhulina aff. australis share 2n = 40 (4st + 36a), however, a significant genomic divergence was found between them, evidencing that they correspond to distinct evolutionary units (Moraes et al., 2017). As regards to Ctenoluciidae, four species of the Boulengerella genus from the Amazon River basin (Brazil), showed 2n = 36 and a very similar karyotype organization. A conspicuous chromosomal heteromorphism in male specimens point to a possible XX/XY sex chromosome system in such species (de Souza E Sousa et al., 2017). Besides, Ctenolucius hujeta (2n = 36) is the only additional species of Ctenoluciidae that has its chromosomal number already analyzed (Arefjev, 1990), coinciding with those found for the Boulengerella species.

The Hepsetidae family contains only a single genus (Hepsetus) and, for a long time, H. odoe was considered the only valid species. However, five additional species have been described by recent studies: H. kingsleyae, H. lineatus, H. occidentalis, H. cuvieri, and H. microlepis, distributed throughout the South, Central and Western regions of Africa (Decru et al., 2012, 2013a,b, 2015), where they have great significance for local economy (Kareem et al., 2016). Despite the economic and evolutionary importance of this group, no chromosome data are available for any Hepsetidae species.

In the present study, we provide, for the first time, cytogenetic data for Hepsetus odoe, including the standard karyotype, C-banding, repetitive DNAs mapping, comparative genomic hybridization (CGH) and whole chromosome painting (WCP), in order to investigate its chromosomal patterns and provide subsidies for comparative analyzes with some Neotropical fish families. In this sense, this study represents the first one of a series focusing on the cytogenetics and cytogenomics of the African species toward their karyoevolutionary processes.

#### MATERIALS AND METHODS

### Specimens, Chromosome Preparations, C-banding and DNA Samples

Eleven specimens of Hepsetus odoe (06 males and 05 females) from the Opa Reservoir, Obafemi Awolowo University, Nigeria

(6◦ 510 45<sup>00</sup> N, 4◦ 790 00<sup>00</sup> E) were analyzed (**Figure 1**). The specimens were transferred to laboratory aquaria and kept under standard conditions for 1 day prior to the experiments. As H. odoe represent a non-CITES threatened species, no proper authorization was required for their sampling and/or transportation. All specimens were deposited in the Museu de Zoologia of the Universidade de São Paulo (MZUSP), under the accession No. 119844. Mitotic chromosomes were obtained by the protocols described in Bertollo et al. (2015) and experiments followed ethical conducts in accordance with the Ethics Committee on Animal Experimentation of the Universidade Federal de São Carlos (Process number CEUA 1853260315). The C-positive heterochromatin was detected using the Barium hydroxide protocol (Sumner, 1972). The genomic DNA was extracted according to standard phenol–chloroform procedures (Sambrook and Russell, 2001).

adapted from http://geografiahistoriajodar.blogspot.com.br/.

# Probes for Chromosome Hybridization

A total of 11 repetitive DNA sequences, including four multigene families (U1 and U2 snDNA, 5S and 18S rDNAs) and seven microsatellite repeat motifs (A)30, (CA)15, (GA)15, (CAC)10, (CGG)10, (GAA)<sup>10</sup> and (GAG)10, were used as probes for FISH experiments. The oligonucleotide probes were directly labeled with Cy3 during synthesis according to Kubat et al. (2008). The other four tandemly arrayed DNA sequences were obtained via PCR from the nuclear DNA of H. odoe. The 5S rDNA repeat copy included 120 base pairs (bp) of the 5S rRNA transcribing gene and 200 bp of the non-transcribed spacer (NTS), produced according to Pendás et al. (1994). The second probe contained 1,400-bp repeats of the 18S rRNA gene, obtained according to Cioffi et al. (2009). Both rDNA probes were cloned into plasmid vectors and propagated in DH5α Escherichia coli competent cells (Invitrogen, San Diego, CA, United States). The U1 and U2 snDNA sequences were produced by PCR, according to Cross et al. (2005) and Silva et al. (2015), respectively. All these probes were directly labeled with Spectrum Orange-dUTP by nick translation, according to manufacturer's recommendations (Roche, Mannheim, Germany), with the exception of 5S rDNA, which was directly labeled with Spectrum Green-dUTP, also by nick translation (Roche, Mannheim, Germany).

# Fluorescence in Situ Hybridization (FISH) for Repetitive DNA Mapping

Fluorescence in situ hybridization (FISH) was performed under high stringency conditions on metaphase chromosome spreads, as described in Yano et al. (2017a). The chromosome slides were incubated with RNAse (10 µg/mL) for 1 h at 37◦C in a wet chamber and then washed for 5 min in 1x PBS and incubated with pepsin 0,005% for 10 min at room temperature. It was followed a wash in 1x PBS, a fixation with 1% formaldehyde for 10 min at room temperature, and another 1x PBS wash. The slides were then set for an alcoholic series of 70, 85, and 100% 2 min each, followed by the DNA denaturation in 70% formamide/2x SSC for 3 min at 75◦C. After denaturation, the chromosome spreads were dehydrated in an ethanol series of 70, 85, and 100% at room temperature, 2 min each. 20 µL of the hybridization mixture (100 ng probes, 50% deionized formamide, 10% dextran sulfate) were then dropped on the slides, and the hybridization was performed for 16–18 h at 37◦C in a wet chamber containing 2x SSC. A post-hybridization wash was carried out with 2x SSC for 5 min followed by another wash in 1x SSC at 42◦C, 5 min. A final washing series was then performed at room temperature, consisting of 1x PBS for 5 min, and ethanol 70, 85, and 100% for 2 min each. Finally, the chromosomes were counterstained with DAPI (1.2 µg/mL) and the slides mounted with an antifading solution (Vector, Burlingame, CA, United States).

# Chromosomal Microdissection, Probe Preparation and Labeling

Fifteen copies of the following chromosomes were isolated by microdissection and amplified using the procedure described in Yang et al. (2009): (i) X chromosome of Hoplias malabaricus karyomorph B (HMB-X); (ii) Y<sup>1</sup> chromosome of H. malabaricus karyomorph G (HMG-Y1) and (iii) Y chromosome of Erythrinus erythrinus karyomorph D (ERY-Y). These probes were labeled with Spectrum Orange-dUTP (ERY-Y) or Spectrum GreendUTP (HMB-X and HMG-Y1) (Vysis, Downers Grove, IL, United States) in a secondary DOP PCR using 1 µL of the primarily amplified product as a template DNA, following Yang et al. (2009).

### FISH of Whole Chromosome Specific Probes (W)

Chromosomal preparations of males and females of H. odoe were used for Zoo-FISH experiments with all the above mentioned probes. The hybridization procedures followed Yano et al. (2017a). To block the hybridization of highcopy repeat sequences 60 µg of C0t-1 DNA directly isolated from H. malabaricus (karyomorphs B and G) and E. erythrinus (karyomorph D) male genomes were prepared according to Zwick et al. (1997). Hybridization was performed for 144 h at 37◦C in a moist chamber. The post-hybridization wash was carried out with 1x SSC for 5 min at 65◦C, and in 4x SSC/Tween using a shaker at RT and then rinsed quickly in 1x PBS. Subsequently, the slides were dehydrated in an ethanol series (70, 85, and 100%), 2 min each. Finally, the chromosomes were counterstained with DAPI (1.2 µg/mL) and mounted in antifade solution (Vector).

# Probes for Comparative Genomic Hybridization (CGH)

The gDNA of H. odoe was used for comparative analyzes with the gDNAs of several Erythrinidae species, namely E. erythrinus (karyomorph D), Hoplias lacerdae, H. malabaricus (karyomorph A) and Hoplerythrinus unitaeniatus (karyomorph D). The gDNA of H. odoe was labeled with biotin-16-dUTP using BIO-nick-translation Mix (Roche), while the male-derived gDNAs of E. erythrinus, H. malabaricus, H. unitaeniatus, and H. lacerdae were labeled with digoxigenin-11-dUTP using DIG-nick-translation Mix (Roche, Manheim, Germany). In all experiments it was utilized C0t-1 DNA (i.e., fraction of genomic DNA enriched for highly and moderately repetitive sequences), prepared according to Zwick et al. (1997), for blocking common genomic repetitive sequences. The final probe was composed of 500 ng of H. odoe gDNA plus 500 ng of the corresponding gDNA for each Erythrinidae species. The probe was ethanol-precipitated and the dry pellet dissolved in a hybridization buffer (20 µL per slide) containing 50% formamide + 2x SSC + 10% SDS+ 10% dextran sulfate and Denhardt's buffer, pH 7.0).

### Fluorescence in Situ Hybridization for CGH

CGH experiments were performed according to Symonová et al. (2013). Slides with the metaphase plates were stored overnight in a freezer, being submitted to an alcoholic series of 70, 85, and 100%, 3 min each, before and after the storage. After that, the slides were aged for 1–2 h at 60◦C, washed in 2x SSC for 5 min, treated with RNAse (200 µg/mL) for 90 min at 37◦C in a wet chamber and them washed in 2x SSC for 30 s. It was followed another alcoholic series treatment, a wash in 1x PBS for 5 min, a Pepsin (50 µg/mL) treatment, a wash in 1x PBS for 5 min and an additional alcoholic series treatment. Finally, the material was denaturated in 75% formamide/2x SSC at 74◦C for 3 min, followed by an alcoholic series being the first 70% cold ethanol. 20 µL of the probes were spotted to the slides, which were them incubated at room temperature (37◦C) in a dark humid chamber for 3 days, with rubber-sealed coverslips. The rubber cement and coverslips were removed in a solution of 4x SSC/0.1% Tween. The slides were then washed twice in 50% formamide/2x SSC for 10 min each, three times in 1x SSC, rinsed in 2x SSC at room temperature, and incubated 20 min. in a humid chamber with 500 µL of 3%BSA/4x SSC/Tween, with coverslips. The hybridization signal was detected with anti-digoxigenin-Rhodamin (Roche) diluted in 0.5% bovine serum albumin (BSA) in PBS, and avidin-FITC (Sigma) diluted in PBS containing 10% normal goat serum (NGS). Four final washes were performed at 44◦C in 4x SSC/0.1% Tween, 7 min each. Finally, the chromosomes were counterstained with DAPI (1.2 µg/mL) and mounted in an antifade solution (Vector, Burlingame, CA, United States).

#### Microscopic Analyses

fgene-08-00203 December 10, 2017 Time: 16:5 # 5

At least 30 metaphase spreads per individual were analyzed to confirm the diploid number, karyotype structure and FISH results. Images were captured using an Olympus BX50 microscope (Olympus Corporation, Ishikawa, Japan) with CoolSNAP and the images processed using Image Pro Plus 4.1 software (Media Cybernetics, Silver Spring, MD, United States). Chromosomes were classified as metacentric (m), submetacentric (sm), subtelocentric (st) or acrocentric (a), according to their arm ratios (Levan et al., 1964).

# RESULTS

#### Karyotype Composition and C-banding

All specimens, both males and females, have 2n = 58 (10m + 28sm + 20st/a). The C-positive heterochromatic is most localized in the centromeric and subtelomeric regions, with a more conspicuous block present in the 28th chromosome pair of the karyotype (**Figure 2**).

#### Chromosomal Mapping of Repetitive DNAs

The 5S rDNA occurs in the proximal region of the short arms of only one sm chromosome pair, while the 18S rDNA is located in the terminal region of the long arms of four chromosome pairs (1m + 2sm + 1st) (**Figure 2**). Clear interstitial hybridization signals were observed in one pair of chromosomes for the U1 snDNA, and in two chromosome pairs for the U2 snDNA, being interstitial and telomeric located in each one of them, respectively (**Figure 3**). Widely distributed marks were evidenced by the microsatellite motifs. Signals were mainly telomeric, but some also interstitial, as for (A)30, (CA)15, (GA)15, (CAC)<sup>10</sup> and (GAG)<sup>10</sup> probes. Exceptions for these general patterns were presented by the (CGG)10, (GAA)<sup>10</sup> and (GAG)<sup>10</sup> probes, which highlighted a conspicuous interstitial signal on a unique pair of chromosomes (**Figure 3**).

### Comparative Genomic Hybridization (CGH)

The comparative genomic hybridization showed that the gDNA of H. odoe shares some homologies with those of the Erythrinidae species analyzed. Despite some scattered hybridization, labeled telomeric and pericentromeric regions were evidenced according to each species. However, it stands out the hybridization pattern with E. erythrinus, where some whole chromosome pairs were labeled, in addition to telomeric overlaps in other ones. An exclusive acrocentric chromosome of H. odoe, that presented hybridization signals only with the gDNA of H. unitaeniatus, was also highlighted (**Figure 4**).

### Detection of Chromosomal Homeologies by Zoo-FISH Experiments

Hybridization performed with HMB-X (X chromosome from Hoplias malabaricus karyomorph B) probe highly painted one small st/a chromosome of H. odoe (**Figure 5a**). Both HMG-Y1 (Y<sup>1</sup> chromosome from H. malabaricus karyomorph G) and ERY-Y (Y chromosome from Erythrinus erythrinus karyomorph D) probes painted the p arms of medium-sized st/a chromosomes (**Figures 5b,c**) of H. odoe. Besides, ERY-Y probe also produced faint scattered hybridization pattern on several other chromosomes of H. odoe (**Figure 5c**).

# DISCUSSION

#### General Chromosome Features of Hepsetus odoe

The lack of karyotype data for several fish groups impairs comparative analyzes on their evolutionary trends and chromosomal relationships. This is the case for the African Hepsetidae family for which chromosomal characteristics are completely unknown. In this sense, this study is the first one providing classical and molecular cytogenetic data for one of its representative species, H. odoe.

Both male and female specimens of H. odoe have the same karyotype structure, with 2n = 58 (5m + 14sm + 10st/a), with no evidence of differentiated sex chromosomes. The heterochromatin distribution follows the general pattern usually found in many other fish species, with preferential centromeric localization. Only one chromosome pair bears proximal 5S rDNA sites in their short arms, in contrast to the 18S rDNA sequences that are located in the telomeric regions of four different pairs in the karyotype. The distribution of these multigene families is shared among many fish groups (Pendás et al., 1994; Gornung, 2013) where the clustering of the 5S and 18S rDNAs in different chromosomes may avoid unwanted chromosomal changes between them (Martins and Galetti, 1999). In addition, this differential clustering is also true for the U2 snDNA sequences, since the cytogenetic mapping for different genes that composed this multigene family, although scarce among fishes, shows a preferential distribution among distinct chromosomes (reviewed in Yano et al., 2017b), as also observed in H. odoe.

With respect to microsatellites, although the scattered distribution of some of them is not so useful for comparative approaches, the conspicuous interstitial bands that (GAG)10, (CGG)<sup>10</sup> and (GAA)<sup>10</sup> probes highlighted in the genome of H. odoe, constitute important markers for comparative evolutionary analyzes with other Hepsetidae and also close related species. In fact, the clustering of microsatellites represents important evolutionary stages by composing non-coding genome regions, as well as relevant steps in the sex chromosome's differentiation process (Bergero and Charlesworth, 2009).

DNA sequence analysis strongly suggests that Hepsetidae and Alestidae are phylogenetic close related families (Oliveira et al., 2011; Arcila et al., 2017). In this sense, the present data set for H. odoe are useful tools for complementary investigations covering other Hepsetidae and Alestidae species. In fact, this study represents the first one of a series focusing on the cytogenetics and cytogenomics of such African families, toward the investigation of their karyoevolutionary processes and relatedness.

FIGURE 2 | Hepsetus odoe male karyotypes under standard Giemsa staining, C-banding and double-FISH with 18S rDNA (red) and 5S rDNA (green) probes. Both males and females have the same karyotypes. Bar = 5 µm.

FIGURE 3 | Metaphase plates of Hepsetus odoe hybridized with repetitive DNA sequences, including mono-, di- and trinucleotide microsatellites and the multigene families U<sup>1</sup> and U<sup>2</sup> snDNAs. Bar = 5 µm.

### Comparative Cytogenetics of Hepsetus odoe with Other Characiformes Species

Some previous phylogenetic studies (Ortí and Meyer, 1997; Buckup, 1998; Calcagnotto et al., 2005) have suggested a relationship between Hepsetidae and some other Neotropical groups, such as the Erythrinidae, Ctenoluciidae and Lebiasinidae, although without a full consensus among them. Using new sequencing technology together with phylogenetic reconstructions, a new scenario was evidenced, discarding relationships of Hepsetidae with Lebiasinidae and Cnetoluciidae and, instead off, placing Hepsetidae and Alestidae in a closer clade which has a near position in the phylogenetic tree to some other Neotropical families, such as Erythrinidae, Cynodontidae and Hemiodontidae (Oliveira et al., 2011; Arcila et al., 2017). In this way, as the cytogenetic studies among Cynodontidae and Hemiodontidae families are until now restricted to 2n descriptions in few species, and Alestidae species are still unavailable in spite of recent collecting efforts, we performed a comparative analysis among H. odoe and Erythrinidae, Ctenoluciidae and Lebiasinidae

FIGURE 4 | Comparative genomic hybridization (CGH) in metaphase plates of Hepsetus odoe. First column: DAPI images (blue); Second column: hybridization pattern with Hepsetus odoe (Hep) gDNA probe; Third column: Hybridization patterns with Hoplias malabaricus (HMA) gDNA, Hoplias lacerdae (HLA) gDNA, Erythrinus erythrinus (ERY) gDNA and Hoplerythrinus unitaeniatus (HPL) gDNA probes; Fourth column: merged images of each genomic probes and DAPI staining. The common genomic regions are depicted in yellow. Bar = 5 µm.

FIGURE 5 | Whole chromosome painting (WCP) in metaphase plates of Hepsetus odoe showing the chromosomes hybridized with (a) the X chromosome of Hoplias malabaricus karyomorph B (HMB-X), (b) the Y<sup>1</sup> chromosome of Hoplias malabaricus karyomorph G (HMG-Y1) and (c) the Y chromosome of Erythrinus erythrinus karyomorph D (ERY-Y).

FIGURE 6 | Representative idiograms of Hepsetus odoe and Erythrinidae species: Erythinus erythrinus (ERY) karyomorphs (Kar) A, C, D; Hoplias malabaricus (HMA) karyomorphs A. B, C, D, F; Hoplias lacerdae (HLA); Hoplias aimara (HAI) and Hoplerythrinus unitaeniatus (HPL) karyomorphs A.C, D. The distribution of the 18S and the 5S rDNAs for each species are highlighted in red and green, respectively. The sex chromosomes are boxed. Data from Cioffi et al. (2009), Martins et al. (2013), Martinez et al. (2015), and Oliveira et al. (2015).

representatives. In this sense, **Figures 6**, **7** depict some data, including chromosome number, karyotype organization, sex chromosome systems and distribution of the major and minor rDNA sequences in some Erytrinidae, Lebiasinidae, and Ctneoluciidae species. A general overview clearly indicates that Erythrinidae retains the highest amount of characters resembling those of H. odoe than Lebiasinidae and Ctenoluciidae species. Indeed, Erythrinus erythrinus (2n = 54/52), Hoplias lacerdae and H. aimara (2n = 50) and Hoplerythrinus unitaeniatus (2n = 48/52) show diploid numbers

closer to that of H. odoe (2n = 58) then Pyrrhulina (2n = 40; Lebiasinadae) and Boulengerella (2n = 36; Ctenoluciidae) species.

Particularly, inside Erythrinidae, E. erythrinus stand out as having more chromosomal similarities with H. odoe than the other ones, taking into account the broad organization of the karyotype and the amount of mono-armed chromosomes that they have. In fact, E. erythrinus karyomorph A shows the most basal karyotype inside this genus, considering that the other Erythrinus karyomorphs highlight clearly derived features, such as the differentiation of a multiple X1X1X2X2/X1X2Y sex chromosome system (Bertollo et al., 2004) and the huge dispersion of the 5S rDNA sequences in the genome (Cioffi et al., 2010; Martins et al., 2013). In addition, like H. odoe, E. erythrinus karyomorph A presents only one chromosome pair bearing 5S rDNA sequences at a similar position on the chromosomes, as well as a number of exclusive telomeric 18S rDNA sites. However, whereas in H. odoe the major rDNA sequences are only distributed in the long arms of the chromosomes, in E. erythrinus they are found both in the short as well as in the long arms (Cioffi et al., 2010). This is not an unexpected condition in view of differential distributions that can be set up along the evolutionary history of the species. In fact, repetitive DNAs have played a particular role on fish karyotyping (Cioffi and Bertollo, 2012), and variations in amount and types of several classes of repetitive DNAs are expected considering the inherent dynamism of these sequences during the evolutionary history of different taxa (Kubat et al., 2008; Cioffi et al., 2010, 2012; Pokorná et al., 2011; Yano et al., 2016). In spite of this, the distribution pattern of the (GAG)<sup>10</sup> microsatellites in H. odoe also shows a significant accumulation on the E. erythrinus chromosomes (Yano et al., 2014).

Considering the above correlations between Hepsetus and Erythrinidae, comparative genomic hybridization (CGH) and whole chromosome painting (WCP) were also performed to obtain additional informative markers for comparative cytogenetics. Among fishes, CGH has been already applied for several purposes, such as to compare genomes of closely related species (Zhu and Gui, 2007; Knytl et al., 2013; Majtánová et al., 2016; Moraes et al., 2017), to detect parental genomes in hybrids (Symonová et al., 2013; Pereira et al., 2014), and to elucidate the origin and evolution of B and sex chromosomes (Fantinatti et al., 2011; Freitas et al., 2017; Yano et al., 2017c), among others. In our present case, CGH with four Erythrinidae species evidenced the co-localization of scattered signals in almost all chromosomes of H. odoe, together with the preferential signals in the terminal parts of some chromosomes, thus indicating the shared repetitive content of such regions. However, it stands out the hybridization pattern with E. erythrinus, where some whole chromosome pairs were painted, in addition to telomeric overlaps in other ones. Furthermore, the hybridization with H. odoe gDNA revealed the occurrence of conspicuous species-specific regions, very likely as a result of its particular evolutionary history, given that the resolution of the CGH method predominantly relies on the presence of species-specific (or sex-specific) repetitive DNA sequences and the evolutionary distance of the compared genomes.

Besides CGH, WCP experiments were also performed using microdissected sex chromosomes from H. malabaricus karyomorphs B (HMB-X) and G (HMG-Y1) and E. erythrinus karyomorph D (ERY-Y) as probes, in order to verify the occurrence of putative sex chromosomes in H. odoe. As a control experiment, all probes were previously hybridized in male chromosomal preparations of H. malabaricus (karyomorphs B and G) and E. erythrinus (karyomorph D), clearly demonstrating the hybridization signals on the sex chromosomes of these karyomorphs, thus corroborating previous data (Cioffi et al., 2013; Oliveira et al., 2017). When these probes were hybridized to chromosomal preparations of H. odoe, HMB-X highly painted one small st/a chromosome, while HMG-Y1 and ERY-Y probes painted the p arms of medium-sized st/a chromosomes. This way, these results highlight that such linkage groups are shared by H. odoe and Erythrinidae species, corroborating the CGH experiments which also demonstrated the sharing of a considerable genomic fraction among such groups. The maintenance of such linkage groups is somehow surprising considering the phylogenetic distance between these clades. However, chromosome homology across widely phylogenetically distributed clades have been also detected in several mammals (Balmus et al., 2007; Dementyeva et al., 2010; Kulemzina et al., 2011), birds (Oliveira et al., 2008, 2010; Tagliarini et al., 2011) and lizard (Pokorná et al., 2011) species. In the later, Zoo-FISH experiments using a Z-derived probe from Gallus gallus showed that the fraction of the reptile genome that is homologous to the avian Z chromosome exhibits a conserved synteny, despite the very ancient times (∼275 Mya) of their divergence (Pokorná et al., 2011).

### CONCLUSION

This study, focusing on standard and molecular cytogenetic approaches of H. odoe, represents the first data set for an Hepsetidae species. Our data supports the likely proximity between African and Neotropical families, such as Hepsetidae and Erythrinidae. In fact, our experiments, including CGH and WCP, indicate that H. odoe and some Erythrinidae species, in special from the genus Erythrinus, share similar chromosomal sequences, thus reflecting some degree of relationship among them. In fact, Erythrinus seems to carry the most basal karyotype organization within Erythrinidae, and likely the most proximal to that highlighted by H. odoe. This study represents the first one of a series of further investigations focusing on the African Characiformes chromosomal and genomic characteristics, allowing a broader and more detailed view on the evolutionary history of this group through a cytogenetic approach. Such additional data will securely improve our knowledge about the relatedness of the African and the Neotropical characiform families.

#### AUTHOR CONTRIBUTIONS

fgene-08-00203 December 10, 2017 Time: 16:5 # 10

PC carried out the cytogenetic analysis and drafted the manuscript. EdO, CY, AA-R, and TH helped in the cytogenetic analysis, drafted and revised the manuscript. CO, ED, OJ, and TL drafted and revised the manuscript. MC and LB coordinated the study, drafted and revised the manuscript. All authors read and approved the final version of the manuscript

#### REFERENCES


#### ACKNOWLEDGMENTS

This study was supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico – CNPq (Proc. nos. 304992/2015-1 and 401575/2016-0) and Fundação de Amparo à Pesquisa do Estado de São Paulo – FAPESP (Proc. nos. 2016/21411-7, 2016/17556-0 and 2017/08471-3) (FAPESP 2014/26508-3 and CNPq 306054/2006-0 to CO).

the fish, Hoplias malabaricus. BMC Genetics 10:34. doi: 10.1186/1471-21 56-10-34



An ongoing process narrated by repetitive sequences. J. Heredity 107, 342–348. doi: 10.1093/jhered/esw021


Zwick, M. S., Hanson, R. E., McKnight, T. D., Islam-Faridi, M. N., Stelly, D. M., Wing, R. A., et al. (1997). A rapid procedure for the isolation of C0t-1 DNA from plants. Genome 40, 138–142. doi: 10.1139/g97-020

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Carvalho, de Oliveira, Bertollo, Yano, Oliveira, Decru, Jegede, Hatanaka, Liehr, Al-Rikabi and Cioffi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Population Genetic Structure of Cnesterodon decemmaculatus (Poeciliidae): A Freshwater Look at the Pampa Biome in Southern South America

#### Aline M. C. Ramos-Fregonezi1,2, Luiz R. Malabarba<sup>3</sup> and Nelson J. R. Fagundes<sup>1</sup> \*

<sup>1</sup> Laboratory of Medical and Evolutionary Genetics, Department of Genetics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil, <sup>2</sup> Laboratory of Bioinformatics and Evolution, Department of General Biology, Federal University of Viçosa, Viçosa, Brazil, <sup>3</sup> Laboratory of Ichthyology, Department of Zoology, Federal University of Rio Grande do Sul, Porto Alegre, Brazil

#### Edited by:

Rodrigo A. Torres, Universidade Federal de Pernambuco, Brazil

#### Reviewed by:

Paulo Affonso, Universidade Estadual do Sudoeste da Bahia, Brazil Tomas Hrbek, Federal University of Amazonas, Brazil

> \*Correspondence: Nelson J. R. Fagundes nelson.fagundes@ufrgs.br

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 31 August 2017 Accepted: 04 December 2017 Published: 19 December 2017

#### Citation:

Ramos-Fregonezi AMC, Malabarba LR and Fagundes NJR (2017) Population Genetic Structure of Cnesterodon decemmaculatus (Poeciliidae): A Freshwater Look at the Pampa Biome in Southern South America. Front. Genet. 8:214. doi: 10.3389/fgene.2017.00214 The Pampas is a Neotropical biome formed primarily by low altitude grasslands and encompasses the southernmost portion of Brazil, Uruguay, and part of Argentina. Despite the high level of endemism, and its significant environmental heterogeneity, Pampean species are underrepresented in phylogeographic studies, especially aquatic organisms. The Pampean hydrological system resulted from a long history of tectonism, climate, and sea level changes since the Neogene. In this study, we examined the population genetic structure of Cnesterodon decemmaculatus, a freshwater fish species that occurs throughout most of the Pampa biome. We characterized mitochondrial and autosomal genetic lineages in populations sampled from Southern Brazil and Uruguay to investigate (1) the correspondence between current drainage systems and evolutionary lineages, (2) the demographic history for each genetic lineage, and (3) the temporal depth of these lineages. Overall, we found that the major evolutionary lineages in this species are strongly related to the main Pampean drainage systems, even though stream capture events may have affected the distribution of genetic lineages among drainages. There was evidence for recent population growth in the lineages occupying drainages closest to the shore, which may indicate the effect of quaternary sea-level changes. In general, divergence time estimates among evolutionary lineages were shallow, ranging from 20,000 to 800,000 years before present, indicating a geologically recent history for this group, as previously reported in other Pampean species. A Bayesian phylogeographical reconstruction suggested that an ancestral lineage probably colonized the Uruguay River Basin, and then expanded throughout the Pampas. This evolutionary scenario may represent useful starting models for other freshwater species having a similar distribution.

Keywords: Bayesian phylogeography, mitochondrial DNA, neotropical ichthyology, Pampa biome, stream capture

# INTRODUCTION

fgene-08-00214 December 16, 2017 Time: 16:52 # 2

The South American Pampa, or "Pampas," is a Neotropical biome dominated by natural grasslands spreading over plains of Uruguay, Northern Argentina, Southern Brazil, and part of Paraguay (Pallarés et al., 2005). However, from the biological standpoint, the Pampas are far from homogeneous. Indeed, many proposals for small biological "provinces" have been made, especially regarding on the different plant communities (Cabrera and Willink, 1980; Overbeck et al., 2007), which reflects the mosaic of soil types resulted from its complex geological history. At a smaller scale, such heterogeneity leads to high species endemism and significant genetic structure in Pampean species studied so far (e.g., Freitas et al., 2012; Turchetto et al., 2014; Felappi et al., 2015). While some phylogeographic studies in the Pampas have found little geographic structure and shallow gene trees [<0.1 million years ago (mya)] (Speranza et al., 2007; Turchetto et al., 2014), other studies found strong geographic structure and a deep mitochondrial gene tree (>2.5 mya) (Felappi et al., 2015). However, these studies did not include freshwater organisms, and, therefore, it is difficult to predict the level and depth of the genetic structure exhibited by these species. Species occurring in the Pampa biome have been underrepresented in studies of phylogeography and conservation genetics (Lawler et al., 2006; Beheregaray, 2008; Turchetto-Zolet et al., 2013), which further complicates the understanding of general drivers of biological diversification in this biome.

The Pampas hydrological system resulted from a long history of tectonism, climate, and sea level changes since the Neogene (Casciotta et al., 1999). During the Quaternary, soil erosion and other marine regression/transgression cycles have continued to shape hydrological systems (Tonni and Cione, 1997; Quattrocchio et al., 2008) by altering the relationship among tributaries, isolated drainages, lagoons, and estuaries (Martin and Dominguez, 1994; Thomaz et al., 2015). These geomorphological processes promoted successive stream captures events, allowing obligate freshwater species to disperse using temporary connections (e.g., Loureiro et al., 2011; Candela et al., 2012; Grassi et al., 2017). Stream capture (also known as river capture, headwater capture, or drainage rearrangement) is a geomorphological process that consists in the contact between neighboring drainages. It occurs when the tributary of a river basin starts flowing toward a neighbor basin, and results in icthyofauna dispersal between them (Bishop, 1995). Temporary connections between adjacent basins caused by fluctuations in the sea level during glacial–interglacial cycles may have also influenced the current distribution of several freshwater fish genera (e.g., Ketmaier et al., 2004; Swartz et al., 2007, 2009; Thomaz et al., 2015), including the Pampean taxa Australoheros, Cnesterodon, Jenynsia, and Corydoras (e.g., Bruno et al., 2011, 2016; Ponce et al., 2011).

It is well accepted that the distribution of freshwater fish lineages mainly reflects the paleogeography of a specific region (Bermingham and Avise, 1986; Bernatchez and Wilson, 1998; Avise, 2000; Lévêque et al., 2008). Cnesterodon decemmaculatus (Jenyns 1842) is endemic to the Pampa biome and is one of the most widespread freshwater fish species in this region. It is found in the freshwater Ecoregions 332 (lower Uruguay) and 334 (laguna dos Patos basin) described by Abell et al. (2008), but exclusively in grassland environments in the southern part of these ecoregions, associated to the Pampa biome, being absent in the northern portion of these ecoregions located in the Atlantic Forest Biome. It is further found in the freshwater ecoregions 345 (Subtropical Potamic axis) and 347 (Bonaerensean Atlantic), also in the Pampas. Thus, this species constitutes an excellent model to examine historical processes that played a major role in shaping the genetic structure of freshwater fishes in the Pampas. In addition, this species can be found in a wide range of habitats, including rivers, ponds, and shallow wetlands, even though it has a poor swimming capacity in high-speed currents (Trenti et al., 1999). A recent study using C. decemmaculatus populations from the southern Pampean region showed low genetic divergence and evidences of recent colonization of this area (Bruno et al., 2016). However, this study focused on the coastal drainages from Argentina (ecoregion 347 – Bonaerensean Atlantic), which represent a small portion of the natural distribution of C. decemmaculatus in the Pampas.

In this study, we used mitochondrial and nuclear genetic markers to evaluate the genetic structure of C. decemmaculatus over its distribution. More specifically, we were interested in answer the following questions: (1) How strong is the correspondence between current drainage systems and evolutionary lineages? (2) Is there evidence of ancient population growth or reduction for each genetic lineage in this species? (3) What is the temporal depth for the evolutionary divergence among lineages and populations? (4) What is the colonization history for this taxon in the Pampas?

# MATERIALS AND METHODS

We sampled 99 individuals from 37 localities (**Figure 1**), covering a significant part of the species' range. Our sampling ranged from one to eight individuals per locality (Supplementary Table S1). All individuals included in our analyses were fixed in 96% ethanol and deposited in the fish collection at the Federal University of Rio Grande do Sul (Universidade Federal do Rio Grande do Sul – UFRGS). All collections were performed under the approval of the government authorities of Brazil and Uruguay (Ministério do Meio Ambiente, Brazil – SISBIO, license number 12038-2, and Dirección de Recursos Naturales del Ministerio de Ganadería, Agricultura y Pesca, Uruguay). The collection and euthanasia of the specimens were approved by the ethics committee of the UFRGS, license number 24434.

Total genomic DNA was isolated from muscle tissue with cetyltrimethyl ammonium bromide (CTAB) as described by Doyle and Doyle (1987). We tested two mitochondrial (mtDNA) genes: cytochrome oxidase subunit I (COI) (Herbert et al., 2003) and NADH dehydrogenase 2 (ND2) (Sorenson, 2003) and two nuclear (nDNA) genes: SH3 and PX domain containing 3-like protein (SH3PX3) and myosin heavy chain 6 (Myh6) (Li et al., 2007). However, SH3PX3 and COI had insufficient variation in a preliminary sample of 10 individuals per basin and, therefore, were excluded from further analysis. PCR amplification

protocols for ND2 and Myh6 followed Sorenson (2003) and Li et al. (2007), respectively. PCR products were checked on a 1% agarose gel, purified with Exonuclease I and Shrimp Alkaline Phosphatase (GE Healthcare <sup>R</sup> ) and sequenced using the Sanger method at Macrogen Inc. (Seoul, South Korea). Both DNA strands were sequenced, checked, and aligned with GeneiousPro v.4.8 (Drummond et al., 2007). Haplotypes of the Myh6 gene were estimated in PHASE 2.1 (Stephens et al., 2001) using 10,000 steps sampling every 10 steps and discarding the first 1000 steps as burnin. We ran PHASE several times using different starting seeds to ensure the reliability of the final estimate. Mitochondrial haplotypes were defined using DnaSP 5.10 (Rozas et al., 2003). All sequences obtained in the present study were deposited in GenBank (KU214332–KU214430 and KU214252–KU214331, for ND2 and Myh6 genes, respectively).

Dispersal of obligatory freshwater fish requires connections of aquatic habitats between adjacent basins and, most of taxa have their distribution range narrowly coincident, if not equal, with the hydrographic basins boundaries in which they live (Ward et al., 1994; Albert and Reis, 2011). Based on this, and also because we found a general genetic structure associated to major Pampean drainages, we defined four geographical groups for population analysis: Uruguay River Basin (URU), Negro River Basin (NEG), Mirim Lagoon Basin (MIR), and Southern coastal Basins of Uruguay (SOU) (**Figure 1** and Supplementary Table S1). Because independent drainage systems represent plausible boundaries for isolated biological populations for freshwater organisms, individuals collected in different localities within each basin were merged into a single population, resulting in a sample of 62 individuals for URU, 11 for NEG, 15 for MIR, and 11 for SOU. Following Abell et al. (2008), URU and NEG represent the freshwater Ecoregion 332 (lower Uruguay), MIR represents Ecoregion 334 (laguna dos Patos), while SOU have populations located in both Ecoregions 334 and 345 (lower Paraná).

To determine the correspondence between current drainage systems and evolutionary lineages, we inferred the evolutionary relationship among haplotypes using a median-joining network (Bandelt et al., 1999) estimated in NETWORK 4.1.0.9<sup>1</sup> . The resulting genetic structure was quantified using an analysis of molecular variance (AMOVA) (Excoffier et al., 1992) and pairwise 8STs calculated in Arlequin 3.5 (Excoffier and Lischer, 2010). In addition, we tested for population structure without assuming any hypothesis based on hydrography using the Bayesian clustering algorithm implemented in BAPS (Corander et al., 2008). We used the mixture population model and set the maximum number of populations (K) to 37, the number of sampled local populations. Because stream capture events may affect gene genealogies and molecular diversity patterns

<sup>1</sup>www.fluxus-engineering.com

within drainages, we decided to run several analyses excluding localities for which we may have evidence for stream capture. We inferred a possible stream capture event whenever we found shared haplotypes between neighbor drainages, especially when populations closely related to watersheds were involved (see the section "Discussion" for further details for each case).

To test for evidence of ancient population growth or reduction for each population (URU, NEG, MIR, and SOU) we first estimated descriptive statistics of genetic diversity, such as nucleotide (π) and haplotype diversity (H), followed by Tajima's D (Tajima, 1983) and Fu's F<sup>S</sup> (Fu, 1997) neutrality tests, which were calculated in Arlequin 3.5 (Excoffier and Lischer, 2010). We performed neutrality tests under two schemes: considering all individuals for a given basin, and excluding localities for which there was evidence of headwater capture. We also estimated the effective population size (NE) and the population growth parameter (G) for each population without the localities with evidence for headwater capture (namely URU 7, URU 8, NEG 35, NEG 36, MIR 31), using both markers in the program LAMARC v. 2.1.6 (Kuhner, 2006). We performed a maximum-likelihood search based on three replicates of 10 initial chains and two final chains. The initial chains consisted of 250 samples drawn every 20 steps, and a burn-in of 1000 samples for each chain. The two final chains used the same burn-in and sampling interval, but consisted of 10,000 samples. We assumed a generation time of 1 year based on estimates of sexual maturity for the poeciliid Poecilia reticulata (Reznick and Bryga, 1987).

Finally, we inferred the divergence times among populations and the colonization history for C. decemmaculatus based on two approaches. First, we estimated a time-calibrated genetree genealogy for mtDNA lineages using Bayesian inference in BEAST 1.7.5 (Drummond and Rambaut, 2007). We used a coalescent constant-size tree prior with a random starting tree and the TN93+G model of sequences evolution, as determined by the corrected Akaike Information Criterion (AICc) in jModelTest2 (Darriba et al., 2012). We used each drainage as a discrete trait to allow the estimation of the most likely location of all ancestors in the mtDNA tree. While this may be indicative of dispersal events, it should not be viewed as a formal test of stream capture hypotheses. The MCMC was run twice for 10 million generations each, and 10% of the samples was discarded as burnin. A normal strict molecular clock was assumed, calibrated with a normal distribution for molecular substitution rate of 8.6 ± 0.1 × 10−<sup>9</sup> substitutions per site per year (s/s/y), which has been suggested in the literature for Cyprinodontiformes based on a dataset which included the ND2 gene (Hrbek and Meyer, 2003).

Second, we used a coalescent-based species-tree analysis (Heled and Drummond, 2010) to infer the colonization history of each river basin using the StarBeast model in the program BEAST 1.7.5 (Drummond and Rambaut, 2007). For this analysis, we used populations as terminals, and used both markers with unlinked gene-trees (mitochondrial – ND2 and nuclear – Myh6). We excluded localities showing evidence for headwater capture, as the species-tree method does not allow gene flow among terminals. We used a rate-reference prior to calibrate the molecular clock for Myh6 (Ferreira and Suchard, 2008). Under this approach, a time-calibrated genealogy (i. e., ND2) is taken as a reference to estimate the relative molecular clock associated with alternative gene trees (i.e., Myh6). This is a useful approach when different partitions evolve at different rates, but there is no prior information on the molecular clock for all independent partitions in the matrix (Ferreira and Suchard, 2008). All remaining parameters and priors followed the Bayesian phylogenetic analyses mentioned above. In all Bayesian analysis, sampling sufficiency was evaluated by monitoring effective sample size (ESS) and ensuring that all values were higher than 200.

# RESULTS

Ninety-nine sequences were obtained for the mtDNA ND2 gene (980 bp) yielding 31 different haplotypes defined by 42 variable sites (28 parsimoniously informative). For the autosomal Myh6 gene we had 188 sequences (681 bp) resulting in 25 different haplotypes defined by 21 variable sites (16 parsimoniously informative). In general, the mtDNA haplotype network showed little haplotype sharing among populations, and a close relationship among haplotypes from the same population (**Figure 2A**). Populations of MIR and SOU exhibited related haplotypes, except for two SOU localities that had haplotypes H21 and H30, which are more closely related to haplotypes found in URU. Concerning Myh6 haplotypes (**Figure 2B**), URU, MIR, and SOU shared the most frequent haplotypes (H5 and H1), and H9 was the only haplotype present in populations of all drainage systems. The close relationship among haplotypes for both markers is suggestive of a shallow genealogical history for this species (see below).

Considering all samples, the Bayesian clustering analysis found K = 6 genetic populations (**Figure 2C**). There was a marked difference in the number of different genetic population occurring in each major drainage. Five genetic populations occurred in URU, three in NEG, and two in both SOU and MIR. Not considering samples possibly affected by stream capture events, we found four genetic populations in URU, two in SOU, and one in both NEG and MIR, suggesting significant genetic structure within URU mainly due to three genetic groups with a restricted geographical distribution in this drainage (**Figure 2C**).

The mtDNA genealogy for C. decemmaculatus (**Figure 3**) suggests the most recent common ancestor (MRCA) of all mitochondrial lineages dates from 0.8 mya [95% credible interval (CI) 0.5–1.2 mya], and were most probably located in URU (PP = 0.58). A well-supported clade (posterior probability PP = 1.00), sister to all other lineages, occurred in isolated populations inhabiting elevated plains from three basins: URU, MIR, and NEG (haplotypes H11, H12, H14, and H19). These haplotypes represented one of the Bayesian genetic populations. However, the location of the MRCA for this clade was uncertain (PP < 0.5 for all locations). Its sister clade, which had moderate support (PP = 0.82) had URU as the most likely location (PP = 0.75), and showed H25 (another Bayesian genetic population) sister to a well-supported clade (PP = 0.99) that also has URU as its most likely location (PP = 0.80). In turn, this clade harbored four Bayesian genetic populations that corresponded to

FIGURE 2 | Median-joining networks for (A) mtDNA ND2 haplotypes, and (B) nDNA Myh6 haplotypes. Circles size are proportional to the observed frequency of each haplotype. Cross marks represent mutational differences between haplotypes. (C) Geographic location of each mtDNA haplotype. Different colors represent the six genetic groups identified in BAPS analysis.

three well-supported clades (PP > 0.90) plus a "mixed" genetic population that included H1, H7, H8, H30, and H21. Regarding the most likely ancestral location, all remaining genetic groups had URU as the most likely location (PP > 0.85), with one exception. The clade, which was almost exclusive to MIR and SOU (though H10 also occurred in NEG), had SOU or MIR as its most likely ancestral location (PP = 0.46, PP = 0.30, respectively).

Considering the full dataset, both markers showed high haplotype diversities (H = 0.94 for ND2 and H = 0.79 for Myh6). Both haplotype and nucleotide diversity were higher for URU and SOU compared with NEG and MIR (**Table 1**). There was significant genetic population structure (ND2 8ST = 0.26, and Myh6 8ST = 0.30, P < 0.001). Excluding localities that may represent stream capture events increased 8ST estimates (ND2 8ST = 0.59, and Myh6 8ST = 0.35, P < 0.001). For ND2, this was still lower than the 8ST estimate among the genetic populations delimited with BAPS (8ST = 0.75, P < 0.001), which is expected, given that Bayesian populations were defined based on ND2 data. However, this trend was not followed by Myh6 (8ST = 0.19, P < 0.001). Pairwise 8ST values indicated that geographic groups were significantly different from each other except MIR vs. SOU (**Table 2**). Taking into account possible events of stream capture resulted in increased 8ST values between NEG and the remaining populations for both genetic markers, and between the other pairs only for ND2 (**Table 2**). Because localities from URU might represent different genetic populations, we also quantified the genetic structure between them. For both ND2 and Myh6, we found significant genetic structure in URU (0.81 and 0.39, respectively, P < 0.001), which was robust to the exclusion of some localities due to stream capture events (0.64 and 0.31, respectively, P < 0.001).

Neutrality tests provided mixed evidence for population growth in some populations. However, they were dependent on which statistic or genetic marker were used (**Table 1**). On the other hand, maximum likelihood-based demographic estimates revealed population growth for SOU and MIR (**Table 3**). Effective population size was also larger for SOU and MIR (∼3 × 10<sup>6</sup> effective individuals), while NEG had the lowest value (∼180,000 effective individuals; **Table 3** and **Figure 4**). The combined analysis of mtDNA and nDNA markers suggested that the split

FIGURE 3 | Maximum clade credibility tree among ND2 lineages found in Cnesterodon decemmaculatus individuals. Posterior probabilities (PP) are shown above the branches for selected nodes with PP > 0.8. Pie charts beside selected ancestral node represent the PP of ancestral location among each of the four populations according to the color scheme shown in the inlet.

TABLE 1 | Diversity indexes obtained for the mitochondrial and nuclear markers, respectively (ND2 values/Myh6 values), for the four populations considered in the study.


N, number of individuals sampled for ND2/Myh6 markers; H, haplotype diversity; π, nucleotide diversity; <sup>∗</sup>P < 0.05; ∗∗P < 0.02; ‡excluding localities showing evidence of stream capture (see text).

between NEG and the clade URU+MIR+SOU, which was also the root of the tree, dates back from ∼0.6 mya (95% CI 0.27– 0.95 mya). This was followed by the divergence between URU and MIR+SOU around 0.12 mya (95% CI 0.04–0.19 mya), and the subsequent divergence between MIR and SOU <30,000 years ago (95% CI 23,000–57,000) (**Figure 4**).

#### DISCUSSION

In this study, we characterized the major genetic structure patterns of populations of C. decemmaculatus in most of its distribution in the Pampa biome. Differently from the star-like pattern found for Argentinean populations of C. decemmaculatus in the southern Pampas (Bruno et al., 2016), which indicates a very recent evolutionary history, our results reveal a significant genetic structure that, in general, paralleled broad drainage systems, especially considering mtDNA lineages (**Figures 2A**, **3**). However, there is no simple relationship between clades and basins, such that a single basin may contain more than one mtDNA clade, and different basins may share the same mtDNA haplotype (**Figures 2A**, **3**). Thus, understanding population TABLE 2 | Pairwise 8ST values for mtDNA ND2 (lower diagonal) and nDNA Myh6 (upper diagonal) markers.


P < 0.05 for all values except when noted by an asterisk (<sup>∗</sup> ). <sup>1</sup>Excluding locations showing evidence of stream capture (see text).

structure imposed by the drainage system is necessary, but not sufficient, to account for the genetic patterns observed in our study. This is reinforced by the Bayesian clustering of populations, which suggested K = 6 as the bet number of genetic populations. Three alternative (but not exclusive)


TABLE 3 | Estimates for the effective population size (NE) and growth parameter (G) in all populations.

Theta (θ) = 4NEµ, where µ is the mutation rate per site per generation. 95% CI–95% credible interval for the estimates. There was no evidence for growth in URU and NEG (see the section "Materials and Methods").

scenarios that may explain the discrepancy between drainage and genetic structure are: (1) migration among neighbor populations (active dispersal), (2) headwater capture (geodispersal), or (3) shared ancestral polymorphism. The first scenario seems unlikely because C. decemmaculatus has a poor swimming capacity in high-speed currents (Trenti et al., 1999), being unable to disperse actively across long distances. On the other hand, Cnesterodon species are commonly found in shallow wetland habitats (Maltchik et al., 2014), suggesting that geodispersal may represent a more likely alternative for these species. In this case, wetland habitats close to the watersheds between different drainages may facilitate headwater capture events, as minor environmental changes may reshape the hydrological network at these sites, impeding a clear-cut relationship between genetic lineages and drainages.

However, distinguishing between the geodispersal and ancestral polymorphism hypotheses may be more nuanced. This is further complicated by the fact that it is very difficult to set up an explicit test of stream capture hypotheses. One example of a likely case of stream capture involving closely related haplotypes occurring in different drainages is represented by mtDNA haplotypes H11, H14, and H19 (**Figure 3**). Even though URU has received higher support as the most probable location of the MRCA, mtDNA network analysis indicates that this clade evolved in NEG with subsequent stream capture events toward URU and MIR. First, there are seven exclusive substitutions in this clade (**Figure 2A**), suggesting it may have evolved in relative isolation. Second, the sampling points in URU and MIR containing these lineages are adjacent to NEG. Third, H14 and H19 are probably descendent from H11, which occurs in NEG (**Figure 2A**). Under this interpretation, this clade would reflect three independent stream capture events: NEG would have been colonized from URU between 0.2 and 0.8 mya (see below), while MIR and URU would have received migrants from NEG more recently (<0.2 mya). Even in the alternative scenario in which URU is the true location for the MRCA of this clade, two headwater capture events would have occurred, since this lineage would have reached NEG from URU by ∼0.2 mya, dispersing to MIR later on. Two other examples of haplotype sharing that may reflect stream capture toward NEG include H8 (from URU) and H10 (from SOU). In both cases, these haplotypes are more closely related to other lineages from specific drainages, and their sequence identity suggests a recent arrival in NEG. However,

while H10 occurs in a site close to SOU, H8 occurs relatively distant from the watershed with URU. An alternative hypothesis for the occurrence of H8 in NEG would involve upstream active dispersal from lower URU populations despite the low dispersal ability of Cnesterodon (Trenti et al., 1999).

The tectonic activity associated with stream capture events in eastern South America may be as old as 1.6 mya (Saadi et al., 2002). The presence of H11 and H19 in MIR and H8 in NEG is in agreement with a pattern in which more eastern drainages capture upland shield drainages. However, the presence of H14 in URU and H10 in NEG indicates that these events could also occur in the opposite direction, suggesting a highly dynamic scenario in the Pampas, possibly because this biome is largely flat. Indeed, Loureiro et al. (2011) suggested drainage rearrangements in both directions across the MIR/NEG watershed based on the distribution pattern of the killifish Austrolebias in Uruguay. Similarly to Cnesterodon, Austrolebias inhabit shallow wetland habitats, and have little capabilities for active dispersal (Maltchik et al., 2014), reinforcing the relevance of passive geodispersal in the Pampas. In contrast, the most parsimonious explanation for the presence of haplotypes H21 and H30 in SOU is incomplete lineage sorting, given the low phylogenetic signal associated with location and clade structure for these haplotypes, and because the sampling points

in which these haplotypes were found are distant from the URU watershed.

Our results suggest that the MRCA of C. decemmaculatus reached the lower Uruguay before 0.6 myr (**Figure 4**). The mtDNA genealogy suggests that from URU, C. decemmaculatus would have colonized NEG and SOU basins independently (**Figure 3**). The high genetic diversity in URU is consistent with its role as the ancestral location. Even though we have found a case of headwater capture from NEG to MIR, the most likely colonization route leading to MIR is through SOU, given that most haplotypes found in SOU and MIR belong to the same mtDNA haplogroup. The Bayesian delimitation of genetic populations also suggested that most localities in MIR and SOU belong to the same genetic population, which is also in agreement with pairwise indices of genetic structure (**Table 2**). These finding also highlight that, for this species, the genetic diversity in the lower Uruguay ecoregion would be much higher than in Laguna dos Patos (sensu Abell et al., 2008). Furthermore, given that URU is neighbor to SOU, but not MIR, a colonization route from URU to MIR and then SOU is less likely. Indeed, Bruno et al. (2016) suggested that the Argentinean coastal populations of C. decemmaculatus descend from upstream URU populations associated with the Rio de la Plata mouth, which highlight the mouth of the Uruguay river as a putative source for populations currently inhabiting isolated drainages flowing to Río de la Plata. Alternatively, SOU/MIR could have been colonized from NEG. This hypothesis would be strengthened if we assume that H10 evolved in situ in NEG before dispersing eastward. Additional sampling efforts, especially in NEG, but also in regions not sampled by this study, such as the Argentinian coast, will be required to refine these scenarios, as well as explicitly testing for hypotheses of stream capture or incomplete lineage sorting.

While the Pampean region has been always dominated by grasslands during the whole Pleistocene (Behling et al., 2005), changes in precipitation regimes and in the sea-level may have affected populations of C. decemmaculatus. For example, it could be expected that periods of increased humidity would have favored the formation of new wetland areas and promoted population expansion (Zemlak et al., 2008; Jones and Johnson, 2009). On the other hand, marine transgressions would have caused local extinctions (Villwock and Tomazelli, 1995). In this regard, the high genetic diversity in SOU was surprising, since this area suffered at least three marine transgression events since the Pleistocene (Villwock and Tomazelli, 1995) that could have led to declines in population size and local extinction. The high

#### REFERENCES


N<sup>E</sup> and the significant negative values of Fu's neutrality test for mtDNA indicate that either local extinction did not affect these populations, or that the high genetic diversity resulted from recent and strong population growth (**Table 3**). We also found evidence for population growth in MIR. However, differences in genetic diversity and N<sup>E</sup> estimates between MIR and SOU may reflect different population histories after marine transgressions. Alternatively, some founder effect may have reduced genetic diversity in MIR following its colonization from SOU. Coastal populations from SOU may have also benefited from an extended coastal plain during marine retractions, facilitating population growth during these periods, as have been suggested for other species occurring along the coastal rivers of southern and southeastern Brazil (Thomaz et al., 2015).

#### AUTHOR CONTRIBUTIONS

AR-F, LM, and NF designed the study, interpreted the results, and contributed to the final version of the manuscript. AR-F did the laboratory work. AR-F and NF analyzed the data.

#### FUNDING

This research was supported by Conselho Nacional de Desenvolvimento Científico (CNPq), Programa de Pósgraduação em Biologia Animal (PPGBAN-UFRGS), and Programa de Pós Graduação em Genética e Biologia Molecular (PPGBM-UFRGS).

#### ACKNOWLEDGMENTS

The authors would like to thank Marcelo Loureiro Barrela (Universidad de la República, Uruguay), Juliana Wingert, Alice Hirschmann, and Juliano Ferrer, who provided specimens for study or assistance during field work. They also thank Jéferson Fregonezi, Andréa Thomaz, and three reviewers for contributions in a preliminary version of this manuscript.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2017.00214/full#supplementary-material



populations of this species complex and comments on its biogeography. Comp. Cytogenet. 11, 15–28. doi: 10.3897/CompCytogen.v11i1.10262



the genetic and morphological variation of the widespread Petunia axillaris complex (Solanaceae). Mol. Ecol. 23, 374–389. doi: 10.1111/mec.12632


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Ramos-Fregonezi, Malabarba and Fagundes. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genetic Pattern and Demographic History of Salminus brasiliensis: Population Expansion in the Pantanal Region during the Pleistocene

Lívia A. de Carvalho Mondin<sup>1</sup> , Carolina B. Machado<sup>2</sup> \*, Emiko K. de Resende<sup>3</sup> , Debora K. S. Marques<sup>3</sup> and Pedro M. Galetti Jr.<sup>2</sup>

<sup>1</sup> Departamento de Ciências Biológicas, Universidade do Estado de Mato Grosso, Tangará da Serra, Brazil, <sup>2</sup> Laboratório de Biodiversidade Molecular e Conservação, Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Carlos, Brazil, <sup>3</sup> Embrapa Pantanal, Empresa Brasileira de Pesquisa Agropecuária, Corumbá, Brazil

#### Edited by:

Rodrigo A. Torres, Universidade Federal de Pernambuco, Brazil

#### Reviewed by:

Rubens Pazza, Federal University of Viçosa, Brazil Silvia Helena Sofia, Universidade Estadual de Londrina, Brazil

> \*Correspondence: Carolina B. Machado carolbioms@gmail.com

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 01 September 2017 Accepted: 03 January 2018 Published: 17 January 2018

#### Citation:

Mondin LAC, Machado CB, Resende EK, Marques DKS and Galetti PM Jr. (2018) Genetic Pattern and Demographic History of Salminus brasiliensis: Population Expansion in the Pantanal Region during the Pleistocene. Front. Genet. 9:1. doi: 10.3389/fgene.2018.00001 Pleistocene climate changes were major historical events that impacted South American biodiversity. Although the effects of such changes are well-documented for several biomes, it is poorly known how these climate shifts affected the biodiversity of the Pantanal floodplain. Fish are one of the most diverse groups in the Pantanal floodplains and can be taken as a suitable biological model for reconstructing paleoenvironmental scenarios. To identify the effects of Pleistocene climate changes on Pantanal's ichthyofauna, we used genetic data from multiple populations of a top-predator longdistance migratory fish, Salminus brasiliensis. We specifically investigated whether Pleistocene climate changes affected the demography of this species. If this was the case, we expected to find changes in population size over time. Thus, we assessed the genetic diversity of S. brasiliensis to trace the demographic history of nine populations from the Upper Paraguay basin, which includes the Pantanal floodplain, that form a single genetic group, employing approximate Bayesian computation (ABC) to test five scenarios: constant population, old expansion, old decline, old bottleneck following by recent expansion, and old expansion following by recent decline. Based on two mitochondrial DNA markers, our inferences from ABC analysis, the results of Bayesian skyline plot, the implications of star-like networks, and the patterns of genetic diversity (high haplotype diversity and low-to-moderate nucleotide diversity) indicated a sudden population expansion. ABC allowed us to make strong quantitative inferences about the demographic history of S. brasiliensis. We estimated a small ancestral population size that underwent a drastic fivefold expansion, probably associated with the colonization of newly formed habitats. The estimated time of this expansion was consistent with a humid and warm phase as inferred by speleothem growth phases and travertine records during Pleistocene interglacial periods. The strong concordance between our genetic inferences and this historical data could represent the first genetic record of a humid and warm phase in the Pantanal in the period since the Last Interglacial to 40 ka.

Keywords: approximate Bayesian computation, mitochondrial markers, interglacial period, neotropical fish, Upper Paraguay basin

# INTRODUCTION

fgene-09-00001 January 13, 2018 Time: 15:57 # 2

The genetic diversity and population structure of species are largely determined by intrinsic traits, contemporary factors that include anthropogenic activities and historical events that have affected species over geological time (Avise, 2000; Osborne et al., 2014). Phylogeographic studies have shown that the genetic diversity patterns of several populations were shaped by past geological and climatic events (Hewitt, 1996; Bonaccorso et al., 2006; Carnaval et al., 2009; Thomaz et al., 2015; Cabanne et al., 2016).

The Pleistocene (2.58–0.01 million years ago) was marked by worldwide climatic changes involving multiple successive cycles of glacial and interglacial events, associated with abrupt and dramatic changes in temperature and precipitation (Webb and Bartlein, 1992). Overall, the environmental conditions varied from warm and wet during the interglacial to cold and dry during the glacial periods (EPICA Community Members, 2004).

Although there are no records of large ice sheets covering the landscape, as happened in the Northern Hemisphere, climatic changes during the Pleistocene also drastically affected Southern Hemispheric biomes (Vuilleumier, 1971; Hewitt, 1996). Several studies have reported contractions and range expansions in the Amazon and the Atlantic Forest (Harris and Mix, 1999; Carnaval et al., 2009; Ribas et al., 2012; Cabanne et al., 2016; Ledo and Colli, 2017), replacement by savannas and arid lands (Wang et al., 2004; Carnaval and Bates, 2007; Thomé et al., 2016), climatic disturbances in humid and arid areas in the Andes (Bräuning, 2009), and fluctuations in sea levels affecting the Atlantic coast (Thomaz et al., 2015; Ponce and Rabassa, 2016). The magnitude of climatic oscillations varied in different parts of South America. However, all of these events impacted the biodiversity, driving extinction episodes, speciation, intraspecific divergence, and demographic oscillations (Vuilleumier, 1971; Hewitt, 1996; Beheregaray, 2008). However, it remains poorly understood how these historical events affected the Pantanal biome and its biodiversity.

The Pantanal is a large seasonal floodplain located in the center of South America. This region is entirely contained in the Upper Paraguay basin, which is a complex drainage network due to its geological history (Assine, 2015). Pantanal was formed recently and has mainly been associated with the Andean uplift (Assine, 2015). However, geomorphological (Assine and Soares, 2004) and palynological (Ferraz-Vicentini and Salgado-Labouriau, 1996) data have shown that the paleoclimatic fluctuations have promoted considerable landscape changes since the Late Pleistocene. Although there are historical records of environmental changes, little is known of the effects of these changes on biodiversity in the Pantanal (Márquez et al., 2006; Lopes I.F. et al., 2007; Santos et al., 2008).

Fish are one of the most diverse groups in the Pantanal floodplains (Alho, 2008). Because their distribution is restricted to freshwater drainages, which in turn reflect geomorphological or climatic changes, these animals are a suitable biological model for reconstructing paleoenvironmental scenarios (Montoya-Burgos, 2003). To try to identify any effects of Pleistocene climate changes in fish, we used genetic data from multiple sites of Salminus brasiliensis. As a top predator distributed throughout the La Plata basin (Lima and Britski, 2007), this species represents an important resource in the Pantanal floodplains. Although it plays an important ecological role in controlling the structure of the ecosystem, the genetic diversity of S. brasiliensis in the Upper Paraguay basin is unknown. Assessing the population genetic pattern of the existing species can identify how its genetic variation was affected by past climate changes and can also provide insights into its demographic response (Pil et al., 2017).

Here, we focused on assessing the genetic diversity and pattern of S. brasiliensis sampled in nine rivers to trace its demographic history in the Upper Paraguay basin using a statistical framework to test the following five scenarios: constant population (null hypothesis), old expansion, old decline, old bottleneck followed by recent expansion, and old expansion followed by recent decline. We specifically investigated whether Pleistocene climate changes affected past S. brasiliensis demography. If so, we expected to find a population reduction signature during glacial periods due to climatic conditions that reduced the habitat, when the Pantanal region was almost dry. Further, a population expansion signature was expected in the interglacial periods, when the wetlands were re-established, increasing fluvial discharge and suitable habitats for the biological model. After reconstructing the most probable demographic history for S. brasiliensis based on two mitochondrial markers (CytB and Dloop), we associated changes in population size over time with known regional climatic fluctuations. Our findings might contribute to the conservation and management of natural populations of S. brasiliensis from the Upper Paraguay basin by being used to predict whether these populations will be able to persist in future scenarios of environmental change.

# MATERIALS AND METHODS

### Ethics Statement, Sampling Collection, DNA Extraction, and Amplification

This study was developed in accordance with Brazilian law, approved by the ethic committee on animal use at Federal University of São Carlos (CEUA number 5765010416), and carried out under a temporary scientific collection license number 32217-4 provided by ICMBio-SISBIO. A total of 52 S. brasiliensis individuals were sampled in nine rivers from the Upper Paraguay basin (**Figure 1** and Supplementary Table S1). We sampled a small fragment of the caudal fin and then returned the fish to the river. All tissue samples were preserved in absolute ethanol and stored at −20◦C. Total genomic DNA was extracted using the saline precipitation protocol (Aljanabi and Martinez, 1997).

We amplified partial sequences of two mitochondrial DNA markers: Cytochrome B (CytB) and Dloop (see Supplementary Table S2 for primers and references). Polymerase Chain Reactions for both markers (PCR) occurred in 25 µL of final volume containing 16.9 µL of mili-Q water, 2.5 µL of 10x Invitrogen PCR buffer (1x), 2.5 µL of dNTPs mix (0.25 mM), 1 µL of MgCl<sup>2</sup> (2 mM), 0.5 µL of each primer (0.2 mM), 0.1 µL of Platinum <sup>R</sup> Taq Polymerase (0.5 unit) and 1 µL of DNA

template (50 ng/µL). The annealing temperature of PCR thermal profile was described in Supplementary Table S2. All PCR products were purified using the 20% Polyethylene glycol protocol (Lis and Schleif, 1975). Fragments were sequenced using ABI3730XL automatic sequencer. All the sequences were aligned automatically using the multiple alignment ClustalW and minor manual adjustments were made to improve it in BioEdit (Hall, 1999). The GenBank accession numbers are given in Supplementary Table S1.

### Genetic Diversity and Population Structure

The genetic diversity indices for each population, including number of haplotypes, haplotype diversity, and nucleotide diversity, were calculated in DnaSP 5.0 (Librado and Rozas, 2009). We also reconstructed the haplotype networks for each molecular marker using the median-joining method (Bandelt et al., 1999) included in PopART (Population Analysis with Reticulate Trees – Leigh and Bryant, 2015).

We investigated whether there were distinct mitochondrial genetic clusters by assigning individuals to populations using a Bayesian model-based approach implemented in Bayesian Analysis of Population Structure (BAPS) v. 6.0 (Corander et al., 2008). BAPS identifies clusters based on nucleotide frequencies in DNA sequences. We concatenated mitochondrial markers and conducted a mixing analysis without spatialization (option chosen "clustering of group of individuals"). This method determines which combination of predetermined samples is best supported by the data. The program was run for 10 replicates for each K (1–10). The best clustering partition was determined by the highest value of likelihood and highest probability determined by the program.

# Inference and Hypotheses Tests of Demographic Histories

Because no genetic population structuring was found in S. brasiliensis using BAPS (Supplementary Figure S1 and Supplementary Table S3), we drawn all scenarios based on historical demographic changes in a single population. Demographic history was inferred through a Bayesian skyline plot (BSP) analysis (Drummond et al., 2005) implemented in BEAST 2.4.4 (Bouckaert et al., 2014). BSP assumes a single panmictic population and uses inferred patterns of coalescence to fit a demographic model to a set of sequence data (Drummond et al., 2005).

First, we determined the best evolutionary model for each molecular marker in JModelTest 2.1.4 (Posada, 2008) using the Bayesian information criterion. The CytB and Dloop markers were set to evolve according to a HKY and HKY+I models, respectively. In BEAST 2.4.4, we chose as the tree prior the coalescent Bayesian skyline with both linked mitochondrial DNA markers to perform the BSP analysis. Priors used in this analysis were kept in default. To date the past demographic events, a molecular clock was calibrated using the mutation rate defined for the Salminus genus (Machado et al., unpublished), using the relaxed lognormal model: 6.31 × 10−<sup>3</sup> (CytB) and 1.935 × 10−<sup>2</sup> (Dloop) mutations per site per million years. This rate was based on a previous topology calibrated from the genus rooted with Brycon specimens (Machado et al., unpublished). This topology was reconstructed using two mitochondrial and two nuclear markers and calibrated using a biogeographic event, the uplift of the Eastern Andes Cordillera (Hoorn et al., 1995), which separated the Magdalena and Amazon paleobasins and is consistent with the divergence of S. affinis from all other Salminus.

Three independent Markov chains were initiated from random trees, run separately for 10 million generations, and sampled at intervals of 10000 steps, 25% of which were discarded as burn-in. The log and tree files for each independent Markov chain were combined using LogCombiner in BEAST 2.4.4. Tracer v1.5 (Rambaut et al., 2014) was used to check the convergence of the runs according to the effective sampling size (>200) and reconstruct the population dynamic over time through the Bayesian Skyline reconstruction option.

To corroborate the findings on the population expansion of S. brasiliensis in the Upper Paraguay basin suggested by the BSP analysis, we tested five demographic scenarios that could have happened during the Late Pleistocene (expansions and bottlenecks) using an approximate Bayesian computation (ABC) framework in DIYABC v2.1.0 (Cornuet et al., 2014): constant population (null hypothesis), old expansion, old decline, old bottleneck followed by expansion, and old expansion followed by decline (**Figure 2**). ABC approach allows a quantitative evaluation of the demographic and evolutionary history by strictly contrasting realistic models defined a priori and estimating relevant parameters (Beaumont et al., 2002).

All scenarios assumed an initial ancestral population (Na), demographic changes occurring during the Late Pleistocene (2–120 ka for t1, and 4–120 ka for t2), and recent populations represented in t0 (**Figure 2**). The prior distribution of demographic parameters and the mutation rate are listed in Supplementary Table S4. For each tested scenario, we calculated the following summary statistics: the number of haplotypes, number of segregating sites, variance of pairwise differences, and variance of numbers of the rarest nucleotides at segregating sites. The summary statistics were used for comparison between simulated and observed datasets.

We ran one million simulations for each scenario and used summary statistics and principal component analysis to confirm the good fit of all scenarios with the observed data (Supplementary Figure S2). Then we compared the competing scenarios by estimating their posterior probabilities using polychotomous logistic regression on 1% of simulated datasets closest to the observed data (Cornuet et al., 2008). The best scenario was the one that showed the highest posterior probability with a 95% confidence interval (CI) non-overlapping with other scenarios' posterior probabilities. In order to evaluate confidence in each scenario, we also calculated the posterior predictive error. Subsequently, we estimated the posterior probabilities of each parameter under the best scenario using a local linear regression on 1% closest simulated data sets and applying a logit transformation to parameter values as suggested by Cornuet et al. (2008). We assessed their precision of parameter estimation through the relative median of the absolute error (RMAE) based on 500 pseudo-observed datasets (pods) under

the best scenario. Low RMAE values indicated that the parameter estimated is trustworthy (Cornuet et al., 2008). After these steps, we performed a model verification by evaluating the goodnessof-fit of the best scenario with respect to empirical dataset. This was carried out by simulating 1000 pods under the best scenario. In this step, we used different summary statistics to avoid over-estimating the fit by using the same statistics twice (Cornuet et al., 2010). We selected mean of pairwise differences, Tajima's D, private segregating sites and mean of number of the rarest nucleotide at segregating sites.

#### RESULTS

Fragments of 710 base pairs (bp) of CytB and 571 bp of Dloop were obtained from 52 individuals of S. brasiliensis from the Upper Paraguay basin. CytB had 20 polymorphic sites, of which 9 were parsimoniously informative, while Dloop had 37 polymorphic sites, of which 26 were parsimoniously informative. No insertions, deletions, or stop codons (for CytB sequences) were observed for either marker.

We identified 21 and 33 haplotypes for CytB and Dloop, respectively (Supplementary Table S5 and Supplementary Figure S1). No common haplotype was shared among all populations in any molecular markers, but a high number of private haplotypes (61.9% for CytB and 66.7% for Dloop) was observed. The distribution of the haplotypes in the network indicated no population structuring in the rivers sampled (Supplementary Figure S1). The haplotype and nucleotide diversity values for each population are shown in Supplementary Table S5. Overall, despite the nucleotide diversity exhibited was low (0.00365 for CytB) to moderate (0.01434 for Dloop), the level of haplotype diversity was high (0.928 for CytB, and 0.975 for Dloop).

The BAPS approach clustered all individuals into only one mitochondrial genetic group (Supplementary Figure S1) and the BSP analysis gave the effective population size, which underwent a process of expansion that started approximately 75 ka (**Figure 3**). The greatest expansion was underway through around 40 ka and continued gradually until 20 ka, when the effective population size became stable.

Approximate Bayesian computation analysis also corroborated the old expansion scenario as the most probable one to explain the current genetic diversity of S. brasiliensis from the Upper Paraguay basin (Supplementary Figure S3). The best scenario showed a posterior probability (PP) of 0.5466 (95% CI: 0.5341–0.5590), while the second PP was observed for scenario 4 (old bottleneck followed by expansion), with 0.2589 (95% CI: 0.2466–0.2712) (Supplementary Table S6). The posterior error rate was 0.303. Features such as the non-overlapping 95% CI among the scenarios' PPs, and the signal expansion found by BSP analysis and genetic pattern (see section "Discussion") suggest a greater confidence that scenario 2 (old population expansion) is the best of the proposed scenarios.

The best scenario recovered the ancestral population of S. brasiliensis (Na = 161000) expanding at 31000 generations ago (i.e., about 62 ka). At present, the median effective population size is estimated at N<sup>2</sup> = 811000 (**Table 1**). Large confidence intervals are common in such demographic inferences. All our RMAE values are <0.2, except by current effective population size (RMAE = 0.299), indicating the parameters estimated by ABC analysis were reliable values. This was expected because mitochondrial is reliable to detect past demographic parameter estimates (Na and t2) (Cornuet et al., 2010). When we applied the model checking option to the best scenario, we observed on each PCA plane a wide cloud of data sets simulated from the prior, with the observed dataset in the middle of a small cluster of datasets from the posterior predictive distribution (Supplementary Figure S4). This indicate that the scenario 2 fits well the empirical data.

#### DISCUSSION

High haplotype of genetic diversity and no spatial genetic structuring among populations of S. brasiliensis were here observed. The results also revealed that its current genetic diversity was strongly influenced by an expansion event that happened during the Pleistocene, likely associated with warm and humid weather conditions during interglacial periods. All of these findings are discussed in detail below.

TABLE 1 | Parameter estimates generated from a local linear regression on simulated datasets generated from the best scenario.


The precision of parameter estimations was assessed by computing the relative median of absolute error (RMAE). ka, 1000 years before present.

# Genetic Diversity and Structure of S. brasiliensis

fgene-09-00001 January 13, 2018 Time: 15:57 # 6

Previous studies in different hydrographic systems and using other molecular markers, such as RAPD and microsatellites, have determined a great genetic variability for S. brasiliensis (Ramella et al., 2006; Lopes C.M. et al., 2007; Gomes et al., 2013; Ribeiro et al., 2016; Ribolli et al., 2017). High levels of haplotype diversity seem to be common in migratory freshwater fishes with large populations that occupy heterogeneous environments (see Oliveira et al., 2009). A large effective population size and high migration rates may minimize the effects of genetic drift as a force that decreases intraspecific genetic diversity (Allendorf and Luikart, 2007). Furthermore, the pattern of genetic diversity of S. brasiliensis also can be attributed to its demographic history. According to Grant and Bowen (1998), high haplotype diversity and low nucleotide diversity is associated with an expansion event after a period of small effective population size. Thus, this process could have introduced and maintained new mutations in the population of S. brasiliensis, increasing its genetic variability.

No mtDNA genetic structuring was detected in S. brasiliensis. Its higher migratory capacity (it can move 1000 km during the reproductive season; Petrere, 1985) probably played an important role in causing the lack of spatial genetic differentiation between populations. It is expected that long-distance migratory fishes within a hydrographic basin such as the Upper Paraguay, without physical, abiotic, and/or biotic barriers, will not show a spatial genetic structuring (Pil et al., 2017). However, it seems that S. brasiliensis has undergone a different process of population differentiation. Ribolli et al. (2017) identified distinct temporal genetic populations of S. brasiliensis from the Upper Uruguay basin during the reproductive season. According to those authors, individuals or populations of S. brasiliensis that occupy the same geographic location might be adapted to spawning at different times during the same reproductive season (Ribolli et al., 2017). Because our sample lacks more precise temporal information for some sampled individuals, no timerelated genetic population structuring can be inferred here.

# Revisiting the Pleistocene in the Pantanal Region

The demographic history of S. brasiliensis populations in the Upper Paraguay basin was addressed for the first time in this study, and it was found that a demographic expansion event probably occurred in its early evolutionary history. This took place in the late Pleistocene, before the Last Glacial Maximum (LGM, 23–19 ka). Our inference from ABC analysis, the results of BSP, the implications of star-like networks, and the patterns of genetic diversity (high haplotype diversity and low to moderate nucleotide diversity) indicated a sudden population expansion.

The ABC approach was an efficient tool for reconstructing and testing alternative demographic scenarios; this allowed us to determine the most plausible evolutionary history of S. brasiliensis, beyond making strong quantitative inferences about its demographic history. We estimated a small ancestral population size that underwent a drastic fivefold expansion, probably associated with the colonization of newly formed habitats. The time estimated for this expansion was consistent with the humid and warm phase inferred from speleothem growth phases and travertine records during Pleistocene interglacial periods (Wang et al., 2004).

Pleistocene climatic oscillations in South America drastically altered freshwater systems by changing hydrological flow and lake distributions as well as rerouting rivers (Stevaux, 2000; Cross et al., 2001; Fritz et al., 2004). As an alluvial plain dominated by rivers, the Pantanal region reveals geomorphological relicts overprinted on its modern landscape that indicate the occurrence of intense historical hydrological rearrangement (Assine and Soares, 2004). Paleochannels, changes in the levels of marginal lakes, and discontinuous sedimentation caused by intermittent flows are examples of relict landforms, which reflect environmental and climatic conditions different from those currently observed (Assine, 2015). Radiocarbon dating and palynological data support the conclusion that during colder and arid events in the Late Pleistocene/Holocene, the Pantanal experienced desert-like conditions with sparse dry vegetation, intermittent flow along the alluvial fan, and the development of paleochannels and erosional surfaces. Further, humid and warm conditions would have influenced the development of lakes and river systems and consequently expanded freshwater surface area (Ferraz-Vicentini and Salgado-Labouriau, 1996; Assine and Soares, 2004; Assine, 2015).

The pulses of the reduction and expansion of the landscape left a phylogenetic signature in regional fauna, leading to a population expansion in bird (Lopes I.F. et al., 2007; Santos et al., 2008) and mammal species (Márquez et al., 2006) stemming from the humid cycles of the Pleistocene. The signature population expansion in S. brasiliensis was dated to 62 ka. Although there is a lack of regional paleoenvironmental data extending further than 40 ka, speleothem data from the Brazilian Northeast region reveal a pluvial maxima phase at about 61.2–59.1 ka (Wang et al., 2004), reinforcing our findings from S. brasiliensis. The strong concordance between our genetic inferences and this historical data could represent the first genetic record of a humid and warm phase in the Pantanal in the period since the Last Interglacial to 40 ka.

# CONCLUSION

In summary, we focused to identify the effect of Pleistocene climatic changes on demographic history of S. brasiliensis from the Upper Paraguay basin. Our findings suggest that Pleistocene climate fluctuations fundamentally shaped the genetic diversity and pattern of this species in this region. Coalescent-based analyses, particularly the statistical framework provided by the ABC method, supported an ancient expansion event before the LGM as the best scenario to explain the current genetic diversity of S. brasiliensis based on two mitochondrial markers.

Understanding how historical events impact genetic diversity is important for predicting whether populations will persist under future environmental changes (Pauls et al., 2013). High genetic variation in populations is desirable, because it provides the

basis for evolutionary change in a species, thereby improving the chances of survival in a dynamic environment (Frankham, 1995). Although populations of S. brasiliensis from the study area show high levels of genetic diversity, our concern over its conservation status cannot wane, because its genetic pool is unique and completely different from the S. brasiliensis found in the Upper Paraná basin (Machado et al., 2017). We suggest that conservation management of this species should concentrate on maximizing the retention of diversity, and preventing severe population decline due to anthropogenic influence.

# AUTHOR CONTRIBUTIONS

LCM, CM, ER, DM, and PG conceived the ideas and designed the experiments. LCM, ER, and DM collected the samples. CM and PG performed the bioinformatics analyses. LCM, CM, and PG led the writing, with assistance from ER and DM. All authors read and approved the final manuscript.

# REFERENCES


### FUNDING

This study was funded by the Brazilian agencies Fundação de Amparo à Pesquisa do Estado de Mato Grosso (FAPEMAT), Fundação CAPES, and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq).

# ACKNOWLEDGMENTS

The authors thank EMBRAPA Pantanal for the use of some tissue samples.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00001/full#supplementary-material

Brazilian Atlantic forest hotspot. Science 323, 785–789. doi: 10.1126/science.11 66955



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Mondin, Machado, Resende, Marques and Galetti. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Microsatellites Associated with Growth Performance and Analysis of Resistance to *Aeromonas hydrophila* in Tambaqui *Colossoma macropomum*

Raquel B. Ariede<sup>1</sup> , Milena V. Freitas <sup>1</sup> , Milene E. Hata<sup>1</sup> , Vito A. Mastrochirico-Filho<sup>1</sup> , Fabiana Pilarski <sup>1</sup> , Sergio R. Batlouni <sup>1</sup> , Fábio Porto-Foresti <sup>2</sup> and Diogo T. Hashimoto<sup>1</sup> \*

<sup>1</sup> Aquaculture Center of Unesp, São Paulo State University (Unesp), Jaboticabal, Brazil, <sup>2</sup> School of Sciences, São Paulo State University (Unesp), Bauru, Brazil

#### *Edited by:*

Roberto Ferreira Artoni, Ponta Grossa State University, Brazil

#### *Reviewed by:*

Luis Antonio Inoue, Embrapa Western Agriculture, Brazil Carolina Sousa Sá Leitão, National Institute of Amazonian Research, Brazil

> *\*Correspondence:* Diogo T. Hashimoto diogo@caunesp.unesp.br

#### *Specialty section:*

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> *Received:* 01 September 2017 *Accepted:* 04 January 2018 *Published:* 18 January 2018

#### *Citation:*

Ariede RB, Freitas MV, Hata ME, Mastrochirico-Filho VA, Pilarski F, Batlouni SR, Porto-Foresti F and Hashimoto DT (2018) Microsatellites Associated with Growth Performance and Analysis of Resistance to Aeromonas hydrophila in Tambaqui Colossoma macropomum. Front. Genet. 9:3. doi: 10.3389/fgene.2018.00003 Tambaqui, Colossoma macropomum, is the main native fish species produced in Brazil, and is an important species for genetic improvement in aquaculture. In addition, breeding studies on this species can be optimized with the use of molecular markers associated with productive phenotypes. The objective of the present study was to test the performance of growth traits and resistance to the bacteria, Aeromonas hydrophila, in association with microsatellite markers in C. macropomum. In this study, three full-sib families were subjected to bacterial challenge and morphometric growth assessments. Tambaqui families subjected to the bacterial challenge differed significantly in death time and mortality rate. There was, however, no association between resistance to bacteria and microsatellite markers. In relation to growth traits, we observed a marker/phenotype association in two microsatellites. The marker in the 6b isoform x5 gene (TNCRC6b) was associated with length, whereas an anonymous marker was associated with height. The present study highlighted the evaluation of molecular markers associated with growth traits, and can serve as the basis for future marker-assisted selection (MAS) of tambaqui.

Keywords: animal improvement, aquaculture, bacterial challenge, MAS, QTL

# INTRODUCTION

Tambaqui (Colossoma macropomum) is a freshwater fish belonging to the family Serrasalmidae, with natural occurrence in the Amazon and Orinoco basins (Jégu, 2003). Its body shape is similar to that of piranhas (round-shaped fish), reaching up to 1 m in length and a weight of up to 30 kg (Goulding and Carvalho, 1982). The natural food of tambaqui, as opposed to the carnivorous piranhas, is composed mainly of fruits and seeds (Lucas, 2008). Aquaculture production of tambaqui has increased over the past 10 years in several countries of South America, and particularly in Bolivia, Brazil, Colombia, Ecuador, Peru, and Venezuela (reviewed in Valladão et al., 2016). Tambaqui is currently considered the most important native fish produced in Brazilian aquaculture, with approximately 135 thousand tons produced in 2015 (IBGE, 2016). Production of tambaqui is also widespread throughout Brazil, particularly in the north, northeast, midwest, and southeast regions of the country (MPA, 2013).

In South America, most of the fish production by aquaculture is resulting from exotic species such as tilapia, Oreochromis niloticus, and salmon, Salmo salar (Valladão et al., 2016). However, different from tilapia and salmon, which has 20 and 13 breeding programs for genetic selection, tambaqui has few studies in order to select individuals of better performance for aquaculture (Gjedrem et al., 2012; Marcos et al., 2016). Tilapia is the main fish produced in Brazil (approximately 220 thousand tons per year) (IBGE, 2016), and most of this production is a result of the genetically improved farmed tilapia (GIFT) program, a genetic selection project developed over several generations (Ponzoni et al., 2011). Knowledge about growth performance and disease resistance in aquaculture is often derived from research using model fish, such as rainbow trout (Oncorhynchus mykiss), salmon, and tilapia (Gjedrem, 2000, 2010; LaFrentz et al., 2016; Shoemaker et al., 2017). Similar studies, however, are scarce for native species such as tambaqui. Any additional information on the genetic improvement of this species will contribute significantly toward increasing its productivity in aquaculture operations.

Growth performance (e.g., weight gain and morphometric parameters) is the only trait that has been studied for improving the breeding of tambaqui (Marcos et al., 2016; Mello et al., 2016). However, disease resistance is another important characteristic that can be evaluated for the genetic selection of this species. In South Brazil, where temperature variations during station changes are frequent, tambaquis suffer from outbreaks of disease, resulting in economic losses to the producers. The bacterium Aeromonas hydrophila is one the main opportunistic pathogens that causes disease in tambaqui (Valladão et al., 2016). This microorganism is often associated with hemorrhagic septicemia, fin erosion, and distention of the abdominal cavity, among other pathological conditions (Paniagua et al., 1990), and causes mass mortality in several fish species (Abdel-Tawwab et al., 2008; Sirimanapong et al., 2014).

Molecular markers associated with productive traits are being increasingly used to improve breeding programs (Houston et al., 2012; Yáñez et al., 2014). Marker-assisted selection (MAS) can increase the genetic gain of target species in approximately 25–50% of cases, compared to traditional selection techniques (Ma et al., 2014). The MAS approach associates genotypes with phenotypes to increase accuracy when predicting the genetic values of selection candidates (Yáñez et al., 2014). Moreover, MAS can be performed during the early life stages, and is indicated for traits such as disease resistance that are difficult to measure directly in selection candidates (Ma et al., 2014; Taylor, 2014). Therefore, numerous studies have been performed on fish species to identify, characterize, and validate molecular markers associated with different traits, including growth performance and disease resistance (Song et al., 2012; Rodríguez-Ramilo et al., 2013).

The objective of this study was to evaluate growth parameters and resistance to A. hydrophila in three full-sib families of tambaqui, C. macropomum. Moreover, we tested a set of microsatellites, including anonymous and gene-associated markers, to verify the association of tambaqui genotypes with its growth performance and resistance to A. hydrophila.

# MATERIALS AND METHODS

#### Ethics Statement

This study was approved by the Ethics Committee on Animal Use (CEUA number 18.764/16) of Faculdade de Ciências Agrárias e Veterinárias, UNESP, Campus Jaboticabal, SP, Brazil. Fish were anesthetized during the experiments with benzocaine (40 mg/L) and all efforts were made to minimize suffering.

# Family Obtainment

Three full-sib families of tambaqui were obtained by mating three breeding pairs from the Aquaculture Center of UNESP, Campus Jaboticabal, SP, Brazil. Spawning was induced using carp pituitary extract that was dissolved in saline solution (0.9% NaCl) and applied in two dosages at an interval of 12 h (first and second dosage of 0.6 and 5.4 mg/kg, respectively), according to the protocol described by Pinheiro et al. (1988). After hatching in 20 L conical incubators over 15 days, approximately 150 larvae per family were transferred to 60 L tanks and maintained there until they reached approximately 3 g of body weight. Juveniles were identified by pit-tags placed posteriorly, and transferred to three different 750 L tanks and maintained until 6 months after hatching.

### Bacterial Challenge

Aeromonas hydrophila was previously isolated from diseased Colossoma macropomum and stored in 30% glycerol at −80◦C. This pathogenic strain was subsequently cultured in a nutrient tryptic soy agar (TSA), Vegitone (Sigma-Aldrich, India), for 24 h at 28◦C. The colony was then transferred to nutrient tryptic soy broth (TSB) (Fluka, Sigma-Aldrich, India) and cultured for 24 h at 28◦C. After growth, the culture broth was centrifuged at 3000 × g for 10 min at 4◦C (Eppendorf Centrifuge 5810). The supernatant was discarded and the bacterial pellet was resuspended in sterile phosphate-buffered saline (PBS). This washing procedure was repeated twice. The optical density (OD) of the solution (previously determined for LD<sup>50</sup> tests—lethal dosage in 50% of fish) was adjusted to 0.8 at 600 nm (CDCP, WHO, 2003) in a spectrophotometer (2100 Unico, Japan). A sample of 100 µl of this solution was removed from the inoculum (used for the fish challenge test) for conducting serial dilutions and plate counts in duplicate on TSA. A LD<sup>50</sup> test was previously conducted on 30 individuals (10 from each family), using the following dosages: 10<sup>6</sup> , 10<sup>7</sup> , and 10<sup>8</sup> CFU (colony forming units)/ml of A. hydrophila.

In the final experiment, 48 full-sib animals were used from each family (3 families × 48 fish = 144 fish). Each family was further separated into triplicate treatment groups of 12 fish (12 fish × 3 treatment replicates = 36 fish), plus an additional 12 fish were used for the control group. Replicate and control groups were composed of 36 individuals (i.e., 12 × 3 families). Each replicate was maintained in 750 L tanks, using a recirculation system fitted with mechanical and biological filters. Water temperature was maintained at 30 ± 1 ◦C.

After adjustment of the LD<sup>50</sup> (1 × 10<sup>8</sup> CFU), infection was induced via intraperitoneal inoculation with a dose of 0.1 ml of bacteria per 10 g of fish body weight. Fish in the control group were inoculated with saline solution. The experiment was conducted over a period of 15 days, in order to evaluate resistance traits of mortality rate and the time of death. To evaluate the resistance among families, death time variance was performed using Tukey's test (p < 0.05) and the mortality rate was assessed by a chi-square test (p < 0.01).

#### Growth Traits

Six growth traits were measured in 96 individuals (32 fish from each family). Traits assessed included head length (HL), head height (HH), standard length (SL), height from the beginning of the dorsal fin to the beginning of the ventral fin (H1), height from the end of the dorsal fin to the beginning of the anal fin (H2), and body weight (BW). The five morphometric traits were measured in millimeters using the Image J/Fiji 1.46 software and BW was measured by a precision digital balance (accuracy of 0.01 g). After measurements, correlation of traits was estimated using the Pearson correlation (r). We did not compare growth performance among the families because each family grew in different environments (individual tanks).

#### Evaluation of Microsatellite Markers

We used 13 gene-associated microsatellite markers in total (c5009, c5837, c4604, c3592, c3842, c4296, c2311, c3818, c841, c4706, C2647, c3843, c3905) and 4 anonymous markers (r1366, r912, r415, r3808) for evaluating associations with growth and resistance traits (Ariede et al., 2018). All microsatellites were previously tested for polymorphism in the progenitors. However, only heterozygous markers in at least one of the progenitors were selected for analysis in the progeny. Only nine microsatellites (c5837, c4604, c3592, c3842, c4296, c2311, c3818, r1366, r912) were selected with the previous characteristics, seven of which were gene-associated and two of which were anonymous markers. Individuals from family 1 were genotyped with the microsatellites c2311, c3592, c5837, c3818, and c4296; family 2 with c3842; and family 3 with c4604, c3842, r1366, and r912.

Genomic DNA was extracted from all full-sibs and parents, following the Wizard Genomic DNA Purification Kit (Promega) protocol. The sequencing strategy adopted in this study followed the protocols described by Schuelke (2010), and used the CAGtag primer (5′ -CAGTCGGGCGTCATCA-3′ ) (Shirk et al., 2013) labeled with the fluorochromes HEX, FAM, or NED. The genotyping by polymerase chain reaction (PCR) was performed with the following reagents: 100µM of dNTP, 1.5 mM MgCl2, 1 × Taq DNA buffer, 0.1µM of each primer (F and R), 0.01µM of CAGtag primer, 0.5 units of Taq Polymerase (Invitrogen), and 10–50 ng of genomic DNA. The cycling program for amplification consisted of: 9 cycles of 95◦C for 30 s, 55–60◦C for 30 s (adjusted for each primer set), 72◦C for 20 s; followed by 30 cycles of 95◦C for 30 s, 50◦C for 30 s and 72◦C for 20 s. During the first nine cycles, the annealing temperature of 55– 60◦C allowed for the incorporation of the primers (F and R) from the microsatellite loci. In the following 30 cycles, the temperature of 50◦C facilitated the annealing of the fluorescent dye-labeled CAGtag primer. PCR products were analyzed by capillary electrophoresis in the equipment ABI3730 XL, using the DS-30 matrix with the GeneScan 500 ROX dye Size Standard (Thermo). We used the program GeneMapper 4.0 (Applied Biosystems) to analyze allele sizes.

Chi-square tests (p < 0.01) were performed on progeny genotypes to verify if they were distributed according to the Mendelian pattern. The association between microsatellite markers and growth traits and time of death was evaluated with the generalized linear model procedure of the analysis of variance (PROC-GLM ANOVA), using the statistical program, SAS (Statistical Analysis System, version 9.2). A linear animal model was used as follows: Y = m + G + e, where **Y** is the observed value of a given continuous characteristic (growth traits or time of death); **m** is the overall mean of the characteristic; **G** is the effect of the genotype; and **e** the random error effect (p < 0.05). The statistical model was based on that described by Ma et al. (2014), with some modifications. The mortality rate was considered as a binary variable (live = 1 and dead = 0); thus, this characteristic was evaluated by means of a contingency table (chi-square test, p < 0.01).

# RESULTS

### Analysis of Resistance to *A. hydrophila*

All infected animals presented clinical signs of A. hydrophila infection. The average of death time observed in families 1, 2, and 3 was 17.20, 9.13, and 18.25 h, respectively, indicating significant differences between families 1 and 2, and families 2 and 3 (**Figure 1A**). Moreover, mortality rates were significantly different between the families (**Figure 1B**); the mortality rate of family 2 (57.6%) was higher than the mortality rates of family 1 (30.5%) and family 3 (28.5%). There was no fish mortality observed in the control treatment.

#### Analysis of Growth Performance

The average BW in families 1, 2, and 3 was 44.22, 38.22, and 30.68 g, respectively. Body weights in families 1, 2, and 3 ranged between 12–70 g, 16–67 g, and 21–57 g, respectively. The average SL in families 1, 2, and 3 was 106.47, 103.69, and 96.01 mm, respectively. Other growth traits are described in **Table 1**. Phenotypic correlation indicated that all characteristics were significantly correlated (p < 0.01). Measurements of SL and H1, however, were poorly correlated with a correlation value of 0.37 (**Table 2**).

### Analysis of Association with Microsatellite Loci

Genotypes of the parents with their respective microsatellites are described in **Table 3**. The markers c4604, c3842, and r1366 that were applied in family 3 did not follow the Mendelian segregation pattern (p < 0.01), and were therefore excluded from the association analysis. In relation to the analysis of resistance to A. hydrophila, significant association was not detected between microsatellites and death time or mortality rate. In contrast, growth traits had significant association with markers c3842 and r912 in families 2 and 3, respectively. Progeny genotypes and growth traits averages are described in **Table 4**. The geneassociated marker c3842 showed an association with SL, and the anonymous marker r912 was associated with H2. For the marker c3842, individuals with the genotype 183/183 had an average SL of 111.56 mm, whereas those with the genotype 183/186 showed an average SL of 98.33 mm (**Table 4**). Moreover, for the marker r912, fish with the genotypes 175/175 and 175/185 demonstrated an average H2 of 27.44 and 29.65 mm, respectively.

#### DISCUSSION

Challenge experiments of A. hydrophila in this study revealed significant variations in death times and mortality rates between

mortality rate) indicating significant differences between the tambaqui families.

the three families of tambaqui. Our results indicated that family 2 was more susceptible to A. hydrophila than families 1 and 3; death time was lower and the mortality rate higher in family 2, compared with those observed in the other two families. These results indicate genetic variation for this trait in the families of tambaqui analyzed in this study; selective breeding could thus be applied for resistance to A. hydrophila in tambaqui, as has been done in other model aquaculture species such as salmon,

TABLE 2 | Analysis of Pearson correlation (r) between growth traits (weight and five morphometric measures) in three families of tambaqui Colossoma macropomum.


All values represented significant correlations (p < 0.05). BW, body weight (g); HL, head length (mm); HH, head height (mm); SL, standard length (mm); H1, height from the beginning of the dorsal fin to the beginning of the ventral fin (mm); H2, height from the end of the dorsal fin to the beginning of the anal fin (mm).

TABLE 3 | Characterization of heterozygous microsatellites in at least one of the progenitors in three families of tambaqui Colossoma macropomum.


Genotypes are represented by allele size in base pairs (bp).

TABLE 1 | Analysis of growth performance in three families of tambaqui Colossoma macropomum, including average values and standard deviation for weight and morphometric measures.


BW, body weight (g); HL, head length (mm); HH, head height (mm); SL, standard length (mm); H1, height from the beginning of the dorsal fin to the beginning of the ventral fin (mm); H2, height from the end of the dorsal fin to the beginning of the anal fin (mm).


TABLE 4 | Analysis of the microsatellite loci in the progeny and average values of growth traits (weight and five morphometric measures) by clustering individuals according to the respective genotype in three families of tambaqui Colossoma macropomum.

Genotypes are represented by allele size in base pairs (bp). BW, body weight (g); HL, head length (mm); HH, head height (mm); SL, standard length (mm); H1, height from the beginning of the dorsal fin to the beginning of the ventral fin (mm); H2, height from the end of the dorsal fin to the beginning of the anal fin (mm). \*Markers that did not follow the Mendelian segregation pattern (p < 0.01).

tilapia and rainbow trout (Silverstein et al., 2009; Ødegård et al., 2011; Wiens et al., 2013; Evenhuis et al., 2015; LaFrentz et al., 2016; Shoemaker et al., 2017). For example, Nile tilapia families challenged with the pathogen Streptococcus iniae showed an additive genetic component to resistance, indicated by the variation in survival rate; genetic selection methods can be thus applied in the production of animals resistant to this pathogen (LaFrentz et al., 2016).

Standard methods for disease prevention and control in the aquaculture industry include the use of vaccines, antibiotics, and management strategies (fish density, water flow, oxygen control, etc.), all of which are only partially effective. Vaccination is generally expensive and impractical for large-scale fish production, as it requires that all fish are individually treated (Yáñez et al., 2014). The indiscriminate use of antibiotics can lead to the development of populations of antibiotic-resistant bacteria, and pose a risk to human health and the environment (Vivekanandhan et al., 2002; Taylor et al., 2011). Conversely, selective breeding for disease resistance is a more effective, sustainable, and low-cost approach to improving fish health and performance, without the risks and inefficiencies posed by the other standard methods (Stear et al., 2001; Bishop, 2010).

This study did not identify the association of molecular markers with resistance to A. hydrophila in tambaqui. In general, as this trait is usually controlled by polygenic architecture, studies to search for association with resistance to pathogens have previously been conducted using hundreds of molecular markers on species such as turbot (Scophthalmus maximus), rainbow trout, salmon, Nile tilapia, and cod (Gadus morhua) (Pardo et al., 2008; Ødegård et al., 2010; Yáñez et al., 2014; Evenhuis et al., 2015; LaFrentz et al., 2016). Therefore, studies specifically on tambaqui are required to search for quantitative trait loci (QTL) of resistance to A. hydrophila, using a higher marker density.

The results of phenotypic correlation between morphometric measures in this study were similar to those previously reported for tambaqui by Mello et al. (2016); these authors demonstrated that selection for filet weight can be made on the basis of morphometric measures due to positive correlation between these characteristics. In both studies, weight had a higher phenotypic correlation with standard length and head length (r = 0.70 in both studies). Moreover, the highest phenotypic correlation was observed between head length and standard length (r = 0.84 in the present study, and 0.90 in Mello et al., 2016). This pattern of high correlation between body measurements and weight (r = 0.90) was also observed in Nile tilapia (Rutten et al., 2005), indicating that the selection for a specific trait would also result in the gain of other traits, as is the case for tambaqui.

The morphometric data observed in C. macropomum indicates that this species shows a large head length, which is not attractive for the commercial sector; a larger head length lowers the filet yield (Mello et al., 2016). As observed in both studies, there is a positive correlation between head length and weight; therefore, it is not possible to select individuals of greater body weight without them also having a larger head length.

Several studies using growth markers were performed on genes that are directly linked to this phenotype, such as the growth hormone (GH) gene and the insulin gene (IGF1) (Yue and Orban, 2002; Tsai et al., 2014). In a study with two tilapia species, six microsatellites were detected in four genes that were involved in growth and reproduction processes (Yue and Orban, 2002). The c3842 marker, identified in this study in association with growth, is inserted in the region of the repeat-containing 6b isoform x5 gene (TNCRC6b), located in the 5'-UTR region. This gene has already been described in mammals (humans and rats) and in fish (Maylandia zebra), and its function may be related to various biological processes such as the silencing of genes guided by micro RNAs, and the regulation of positive messenger RNA, among others. It is thus necessary to increase our understanding of the role of this gene and the possible relation between its molecular functions and differential growth in tambaqui. The results obtained from the association analysis with the gene-associated marker c3842 suggest that the allele 183, when homozygous, exerts influence on the increase of the mean SL of the animal.

The second microsatellite that showed association with growth was the anonymous marker r912. In general, anonymous markers are often used in QTL identification studies, because such markers are physically close to these loci and therefore closely linked (Reid et al., 2005; Bouza et al., 2012). Our results suggest that r912 is related to a second morphometric characteristic (Height 2). Moreover, and contrary to that observed in the gene-associated marker, increased height occurs when the r912 marker is heterozygous, suggesting that the presence of allele 185 exerts influence on animal height. To date, the reference genome of tambaqui is not available;

#### REFERENCES

Abdel-Tawwab, M., Abdel-Rahman, A. M., and Ismael, N. E. M. (2008). Evaluation of commercial live baker's yeast, Saccharomyces therefore, it is not possible to characterize the loci of the r912 marker and the putative linked gene. A similar strategy of gene prospection was carried out in salmon, whose reference genome is sequenced, allowing the prediction of two markers in association with the genes MEP1A (meprin A subunit betalike) and PCNT (pericentrin) (Houston et al., 2014; Tsai et al., 2016).

Currently, 100% of the tambaqui production is still based on stocks without genetic improvement. The identification of two markers (c3842 and r912) associated with growth in tambaqui will support and accelerate the processes for selection of superior genotypes using MAS. A classic example of QTL identification is the detection of a haplotype that is associated with approximately 80% of the genetic variation in salmon resistant to pancreatic necrosis infection (Houston et al., 2008, 2012). Owing to their efficacy, these markers are currently being successfully incorporated into breeding programs of Atlantic salmon in Norway and Scotland (Moen et al., 2009; Houston et al., 2010). The results of this study will thus likely increase our knowledge base of tambaqui aquaculture, and facilitate the development of future tambaqui breeding programs.

#### CONCLUSION

This study provides new information on tambaqui (C. macropomum) growth traits to guide future breeding programs for this species. The prospective microsatellites highlighted in this study could be used in the validation of MAS in families of C. macropomum. This study also contributes to the development of a bacterial challenge protocol with Aeromonas hydrophila in order to characterize pathogen-resistant tambaqui families for selective breeding.

#### AUTHOR CONTRIBUTIONS

RA–Acquisition, analysis and interpretation of data, draft of the work, development of intellectual content, writing of the manuscript, final approval of the version. MF, VM-F, FP, and SB–Draft of the work, development of intellectual content, final approval of the version. MH–Analysis and interpretation of data, draft of the work, final approval of the version. FP-F–Acquisition, Interpretation of data, development of intellectual content, final approval of the version. DH–Draft of the work, development of intellectual content, writing of the manuscript, final approval of the version.

#### ACKNOWLEDGMENTS

This work was supported by grants from FAPESP (2014/03772-7) and CNPq (446779/2014-8, 130629/2015-4 and 305916/2015-7).

cerevisiae, as a growth and immunity promoter for Fry Nile tilapia, Oreochromis niloticus challenged in situ with Aeromonas hydrophila. Aquaculture 280, 185–189. doi: 10.1016/j.aquaculture.2008. 03.055


IBGE (2016). Produção da Pecuária Municipal 2015. Rio de Janeiro 41, 1–108.


Atlantic salmon (Salmo salar): population-level associations between markers and trait. BMC Genomics 10:368. doi: 10.1186/1471-2164-10-368


rearing environment on survival phenotype. Aquaculture 388–391, 128–136. doi: 10.1016/j.aquaculture.2013.01.018


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Ariede, Freitas, Hata, Mastrochirico-Filho, Pilarski, Batlouni, Porto-Foresti and Hashimoto. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Extensive Karyotype Reorganization in the Fish *Gymnotus arapaima* (Gymnotiformes, Gymnotidae) Highlighted by Zoo-FISH Analysis

Milla de Andrade Machado<sup>1</sup> , Julio C. Pieczarka<sup>1</sup> , Fernando H. R. Silva<sup>1</sup> , Patricia C. M. O'Brien<sup>2</sup> , Malcolm A. Ferguson-Smith<sup>2</sup> and Cleusa Y. Nagamachi <sup>1</sup> \*

<sup>1</sup> Laboratório de Citogenética, Centro de Estudos Avançados da Biodiversidade, Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém-Pará, Brazil, <sup>2</sup> Cambridge Resource Centre for Comparative Genomics, Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom

#### The genus Gymnotus (Gymnotiformes) contains over 40 species of freshwater electric fishes exhibiting a wide distribution throughout Central and South America, and being particularly prevalent in the Amazon basin. Cytogenetics has been an important tool in the cytotaxonomy and elucidation of evolutionary processes in this genus, including the unraveling the variety of diploid chromosome number (2n = from 34 to 54), the high karyotype diversity among species with a shared diploid number, different sex chromosome systems, and variation in the distribution of several Repetitive DNAs and colocation and association between those sequences. Recently whole chromosome painting (WCP) has been used for tracking the chromosomal evolution of the genus, showing highly reorganized karyotypes and the conserved synteny of the NOR bearing par within the clade G. carapo. In this study, painting probes derived from the chromosomes of G. carapo (GCA, 2n = 42, 30 m/sm + 12 st/a) were hybridized to the mitotic metaphases of G. arapaima (GAR, 2n = 44, 24 m/sm + 20 st/a). Our results uncovered chromosomal rearrangements and a high number of repetitive DNA regions. From the 12 chromosome pairs of G. carapo that can be individually differentiated (GCA1–3, 6, 7, 9, 14, 16, and 18–21), six pairs (GCA 1, 9, 14, 18, 20, 21) show conserved homology with GAR, five pairs (GCA 1, 9, 14, 20, 21) are also shared with cryptic species G. carapo 2n = 40 (34 m/sm + 6 st/a) and only the NOR bearing pair (GCA 20) is shared with G. capanema (GCP 2n = 34, 20 m/sm + 14 st/a). The remaining chromosomes are reorganized in the karyotype of GAR. Despite the close phylogenetic relationships of these species, our chromosome painting studies demonstrate an extensive reorganization of their karyotypes.

Keywords: chromosome painting, WCP, *Gymnotus*, FISH, cytotaxonomy, karyotype evolution

# INTRODUCTION

Gymnotus (Gymnotiformes) is a monophyletic genus of freshwater electric fishes (Albert, 2001; Lovejoy et al., 2010; Tagliacollo et al., 2016) distributed throughout South America (Albert et al., 2005). It represents the most specious genus (40 species; Ferraris et al., 2017) and the widest distribution in the order, with prevalence in the Amazon basin, where several species of Gymnotus co-occur in sympatry (Albert and Crampton, 2003; Crampton et al., 2005).

#### *Edited by:*

Roberto Ferreira Artoni, Ponta Grossa State University, Brazil

#### *Reviewed by:*

Alexandr Sember, Institute of Animal Physiology and Genetics (ASCR), Czechia Marcelo De Bello Cioffi, Federal University of São Carlos, Brazil

*\*Correspondence:*

Cleusa Y. Nagamachi cleusanagamachi@gmail.com; cleusa@ufpa.br

#### *Specialty section:*

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> *Received:* 23 September 2017 *Accepted:* 08 January 2018 *Published:* 26 January 2018

#### *Citation:*

Machado MA, Pieczarka JC, Silva FHR, O'Brien PCM, Ferguson-Smith MA and Nagamachi CY (2018) Extensive Karyotype Reorganization in the Fish Gymnotus arapaima (Gymnotiformes, Gymnotidae) Highlighted by Zoo-FISH Analysis. Front. Genet. 9:8. doi: 10.3389/fgene.2018.00008

**145**

Based on the integrated data from DNA sequencing of six genes, coupled with 223 morphological characters and with Model-Based Total Evidence phylogenetic analyses, Tagliacollo et al. (2016) divided the genus into six clades: G. pantherinus, G. coatesi, G. anguillaris, G. tigre, G. cylindricus, and G. carapo. The Gymnotus carapo group is regarded as monophyletic and is located in a derived position within the genus (Albert, 2001; Lovejoy et al., 2010; Tagliacollo et al., 2016). Craig et al. (2017) described seven subspecies for G. carapo.

Cytogenetics has been an important tool in cytotaxonomy and has proved to be very useful in understanding the evolutionary processes behind the diversification of Gymnotus. The Gymnotiformes order has considerable variation, not only in diploid number (from 2n = 24 in Apteronotus albifrons, Howell, 1972; Almeida-Toledo et al., 1981; Mendes et al., 2012; to 2n = 74 in Rhabdolichops cf eastward, Suárez et al., 2017) but also in the karyotype formula and location of repetitive sequences (Fernandes et al., 2005; Almeida-Toledo et al., 2007; Silva et al., 2009; da Silva et al., 2013; Jesus et al., 2016; Araya-Jaime et al., 2017; Batista et al., 2017; Sousa et al., 2017; Takagui et al., 2017). Recently, fluorescence in situ hybridization (FISH), has played an important role in understanding the genome structure of fish species (Yi et al., 2003; Cabral-de-Mello and Martins, 2010; Martins et al., 2011; Vicari et al., 2011; Gornung, 2013; Knytl et al., 2013; Yano et al., 2017) and molecular cytogenetic studies in Gymnotiformes have shown dynamic reorganization, including pericentric inversions observed through repetitive DNA position (Fernandes et al., 2017), sequence dispersion via transposable elements and the association between different repetitive sequences (Utsunomia et al., 2014; da Silva et al., 2016; Machado et al., 2017) and the presence of different sex chromosome systems (Margarido et al., 2007; Henning et al., 2008, 2011; da Silva et al., 2011, 2014; Almeida et al., 2015). This evolutionary plasticity of the karyotype is seen in Gymnotus (**Table 1**), a genus that has high interspecific variability in chromosome numbers (**Figure 1**, **Table 1**), ranging from 2n = 34 in Gymnotus capanema (Milhomem et al., 2012a) to 2n = 54 in G. carapo (Foresti et al., 1984), G. mamiraua (Milhomem et al., 2007), G. paraguensis (Margarido et al., 2007) and G. inaequilabiatus (Scacchetti et al., 2011). Gymnotus arapaima is located within the G. carapo clade, with 2n = 44 (24 m/sm + 20 st/a; Milhomem et al., 2012b).

Whole chromosome painting (WCP) techniques use specific painting probes of whole chromosomes, chromosomes arms or chromosome regions to find homologous segments in other species (Yang and Graphodatsky, 2017) and Nagamachi et al. (2010) produced whole chromosome probes from G. carapo (GCA, 2n = 42) by chromosome sorting using flow cytometry and made a comparative genomic map against the chromosomal background of the cytotype with 2n = 40 chromosomes. The results uncovered a high degree of chromosomal repatterning between these cytotypes, with only eight pairs showing conserved synteny (GCA 1, 2, 6, 9, 14, 19, 20, 21). Nagamachi et al. (2013) used the same set of probes for G. capanema (GCP, 2n = 34) and the results showed that the degree of genomic reorganization was much higher, with only four pairs (GCA 6, 7, 19, 20) showing conserved synteny with GCA 2n = 42 and three pairs (GCA 6, 19, 20) with GCA 2n = 40. Of these, GCA 7 and 19 are associated with other chromosomes in the karyotype of GCP. The study of Milhomem et al. (2013), with the probe derived from the NOR bearing par of GCA, 2n = 42, shows that there is a possible synapomorphy of the NOR bearing par within the G. carapo clade.

We use the same set of probes produced by Nagamachi et al. (2010) to analyze the karyotype of G. arapaima and to compare the results with our previous studies of species in the genus Gymnotus. Our findings confirm and extend our understanding of the extensive karyotype reorganization within this genus.

#### MATERIALS AND METHODS

#### Sampling

Samples of G. arapaima (GAR, 2n = 44, 24 m/sm + 20 st/a) were collected in the Mamiraua Reserve (Reserva de Desenvolvimento Sustentável Mamiraua) in the Amazon basin, Brazil (03◦ 02′ 11.8′′S 064◦ 51′ 16.6′′W). These samples were previously analyzed by conventional cytogenetic methods (Milhomem et al., 2012b). The animals collected were handled following procedures recommended by the American Fisheries Society. JCP has a permanent field permit, number 13248 from "Instituto Chico Mendes de Conservação da Biodiversidade." The Cytogenetics Laboratory of UFPa has permit number 19/2003 from the Ministry of Environment for sample transport and permit 52/2003 for using the samples for research. The Ethics Committee of the Federal University of Para (Comitê de Ética Animal da Universidade Federal do Pará) approved this research (Permit 68/2015).

#### WCP

WCP probes from G. carapo (2n = 42; 30 m/sm + 12 st/a) described in Nagamachi et al. (2010) were hybridized onto metaphases of G. arapaima (GAR, 2n = 44, 24 m/sm + 20 st/a). The chromosomes of GCA, 2n = 42 were flow-sorted into four regions (R1–R4), from which probes were produced. R1 represented the NOR-bearing chromosome (GCA20), R2 contains the four largest pairs (1–3 and 16); R3 contains the eight medium-sized pairs (4–8 and 17–19) and R4 the eight smallest pairs (9–15 and 21). Additional sorting produced subregion probes (S) from each of the three regions with multiple chromosome pairs included (R2, R3 and R4). R2: S2A (GCA 1, 2 and 16); S2B (GCA 2 and 16) and S2C (GCA 1 and 16). R3: S3A GCA (5–7 and 17); S3B (GCA6 not 7; re-analyzed in Nagamachi et al., 2013, GCA 19); S3C (GCA 7); and S3D (GCA5– 7, 17 and 18). R4: S4A (GCA 12, 13 and 15); S4B (GCA 12– 15); S4C (GCA 10–13, 15 and 21); and S4D (GCA 12–15 and 21). For details, see **Figure 2** and **Table 1** in Nagamachi et al. (2010).

To find out the corresponding segments between GAR and GCA (2n = 42), we used dual-color FISH with probes from R3 and R4. The other non-hybridized chromosomes or segments correspond to R1 (GAR 19, Milhomem et al., 2013) or R2. For a more refined identification of the chromosomes from R2, R3 and R4, we employed dual-color FISH using probes from the subregions as specified in **Table 2** of Nagamachi et al. (2010),



\*Number of chromosome with signals; m, metacentric; sm, submetacentric; st, subtelocentric; a, acrocentric.

with some modifications related to the identification of the chromosomes of S3B made in Nagamachi et al. (2013). With those experiments (as illustrated in **Figure 2**) it was possible to identify individually GCA pairs 1–3, 6, 7, 9, 14, 16, and 18–21, while it was not possible to distinguish the pairs [4, 8], [10, 11], [5, 17], and [12, 13, 15].

#### FISH

Chromosome painting techniques followed Yang et al. (1995) with adaptations. Slides were digested with 1% pepsin to remove the excess of cytoplasm, treated with formaldehyde 1%, and dehydrated in ethanol series (2x 2 min 70%, 2x 2 min 90%, and 1x 4 min 100%). Subsequently the slides were aged overnight


TABLE 2 | Chromosome homologies between G. carapo (2n = 42), G. carapo (2n = 40), G. capanema (2n = 34), and G. arapaima (2n = 44).

dist, distal; prox , proximal; int , interstitial.

<sup>a</sup>According to Nagamachi et al. (2010).

<sup>b</sup>According to Nagamachi et al. (2013).

at 37◦C. The probes were prepared following Nagamachi et al. (2010), denatured for 15 min at 70◦C and applied onto a slide with chromosomes that were previously denatured at 70◦C for 4 min in 70% formamide/2× SSC [pH 7.0]. The hybridization lasted 72 h at 37◦C. The slides were washed once in a solution of 50% formamide/2× SSC, once in 2× SSC and once in 4× Tween, 5 min each.

The dual-color FISH experiments were made with probes that were either directly labeled or biotinylated detected with avidin, (Vector Laboratories, Burlingame, CA, USA) linked to Cy3 or FITC (Amersham, Piscataway, NJ, United States). DAPI (4′ ,6-diamidino-2-phenylindole) was used as a counterstain.

#### Microscopy and Image Processing

Image acquisition was made using the software Nis-elements in the microscope Nikon H550S. Chromosomes were morphologically classified according to Levan et al. (1964). The karyotype was organized according to Milhomem et al. (2012b).

# RESULTS

The whole chromosome probes from G. carapo were hybridized to chromosomes of G. arapaima. The regions of homology (hereafter designated as R1-4) obtained with GCA (2n = 42) probes against the chromosomes of GAR are indicated on the karyotype of GAR arranged from DAPI-stained chromosomes (**Figure 3**). Dual color FISH with the probes of R3 (red) and R4 (green) defined the chromosome groups in GAR that corresponded to the four groups of regions in GCA (**Figure 3**), as R3 and R4 do not share chromosome pairs. Any chromosome segments hybridizing simultaneously with two colors indicate repetitive DNA sequences that are common to both regions. The chromosomes or segments in blue (DAPI) represent the NOR-bearing chromosomes (R1, GCA20) and the chromosomes corresponding to R2 (pairs 1–3 and 16). **Table 2** shows the correspondence of the GCA (2n = 42) chromosomes with the previously published karyotypes of GCA (2n = 40) and GCP (2n = 34), and GAR (2n = 44, present study).

From the 12 chromosome pairs of G. carapo that can be individually differentiated (GCA 1–3, 6, 7, 9, 14, 16, and 18–21), six pairs (GCA 1, 9, 14, 18, 20, 21) have conserved homology within GAR. GCA 20 hybridizes to one whole chromosome, pair 19, as described by Milhomem et al. (2013). Six chromosome pairs (GCA 2, 3, 6, 7, 16, and 19) show two signals on GAR chromosomes.

The GCA probes that represent two chromosome pairs [4, 8] revealed two signals, and pairs [10, 11] and [5, 17] revealed three signals and the probe representing three pairs [12, 13, 15] also revealed three signals on GAR chromosomes.

The following associations were found: GAR 4: [10, 11]/C/6, GAR 7: 19/C/[10, 11], GAR 13: [12, 13, 15]/C/ [12, 13, 15]/<sup>∗</sup> /3, GAR 14: <sup>∗</sup> /C/16/<sup>∗</sup> /2, GAR 16: 7/C/ [5, 17]/6/ [5, 17] (where C = centromere and <sup>∗</sup> = repetitive sequences).

# DISCUSSION

Our results demonstrate that the genomic reorganization in the analyzed species of Gymnotus is greater than that assumed by classical cytogenetics (Milhomem et al., 2008, 2012a,b).

Whole chromosome probes from GCA 2n = 42 have been used for comparative genomic mapping (CGM) of the karyotype

equivalent homeologous parts on GAR chromosomes are DAPI-stained (blue) only. For each of the 22 GAR chromosome pairs, the DAPI-only stained homolog is depicted on the left, while the dual-color FISH hybridization pattern is present on the right. The correspondence to G. carapo (GCA) homeologous chromosomes is indicated by chromosome pair numbers on the left side of the DAPI-stained GAR chromosomes, while the correspondence to the particular GCA regions (R1–4) is indicated on the right side of FISH-painted chromosomes. \*Repetitive sequences.

of (i) cryptic species GCA 2n = 40 (Nagamachi et al., 2010), (ii) GCP 2n = 34 (Nagamachi et al., 2013) and, in the present work, iii) onto the karyotype of GAR 2n = 44 (**Figure 4**). Similar to the observations in the two previously mapped species (Nagamachi et al., 2010, 2013), GAR also presents a highly reorganized karyotype (**Figures 3**, **4**, **Table 2**) in relation to GCA 2n = 42 and also in relation to GCA 2n = 40 and GCP 2n = 34. From the 12 chromosome pairs of GCA 2n = 42 that can be individually differentiated (GCA 1–3, 6, 7, 9, 14, 16, 18–21), GAR shows conserved synteny of six pairs (GCA 1, 9, 14, 18, 20, 21); five pairs (GCA 1, 9, 14, 20, 21) with the cryptic species GCA 2n = 40 and only one pair with GCP (GCA 20). On the other hand, GCA 2n = 40 shares with GCA 2n = 42, eight pairs (GCA 1, 2, 6, 9, 14, 19, 20, 21) and with GCP, three pairs (GCA 6, 19, 20) (**Figure 4**). It is also worth noting that the probes representing GCA [4, 8] and [12, 13, 15] show two and three signals, respectively, in three species (GCA 42, GCP and GAR) indicating that these chromosomes may have retained their homology.

A comparative analysis of the WCP data described above shows that the karyotypes of both GCP and GAR are related to the karyotypes of GCA. GCP, although part of the carapo group (Milhomem et al., 2012a), has an uncertain position inside the phylogeny of the clade, while GCA and GAR are closely related. GCP and GAR do not share the same chromosome rearrangements (**Table 2**), meaning that these rearrangements must have occurred after their speciation. The results of the CGM suggest either a divergence prior to that of GAR or a recent divergence characterized by fast karyotype evolution and fixation of a high number of chromosomal rearrangements.

It is also clear that the karyotype of GAR is evolutionary closer to the GCA karyotype than to the GCP karyotype. However, GAR is located 2000 km away from the other species, while GCP and GCA (2n = 42) are 200 km apart (**Figure 5**). This might suggest that the karyotypes of GCA and GAR are more conserved while GCP changed over a shorter period of time. Another explanation for this huge differentiation of the GCP karyotype might lie in the fact that this species inhabits Rio Açaiteuazinho drainage from Northeast Para, which is not connected with the Amazon basin, while GCA and GAR are part of the same hydrographic basin, despite the long distance between them (**Figure 5**).

Freshwater fishes in general have a higher rate of chromosomal rearrangements than marine fishes due to the reduced flow with the natural barriers present in the freshwater environment compared to the open marine biome, with bigger populations and high potential for dispersion and higher gene flow, reducing the chance for karyotype changes to fixate in the population (Molina, 2007; Nirchio et al., 2014; Artoni et al., 2015). Lande (1977) theorizes that the rates of chromosomal rearrangement are proportional to selection and inversely proportional to the effective size of the population and Araya-Jaime et al. (2017) suggests that this could be considered a general model of chromosomal evolution within Gymnotiformes, since populations with little or no geneflow may facilitate the fixation of chromosomal rearrangements within a particular species in a shorter evolutionary time. This may be a contributory factor to speciation within the group and may also contribute to the higher number of rearrangements found. It is a valid reminder that the high

of chromosomes in (B–D) show the homology with the karyotype (A) of G. carapo. Each color in the karyotypes (B–D) represents the correspondent chromosome colored in (A). Chromosomes groups [4, 8]; [5, 17]; [10, 11], and [12, 13, 15] share the same color within each group.

number of rearrangements observed in the present study was possible through WCP, and groups with a more stable diploid number and karyotypic formula potentially could have fixed a higher number of rearrangements that did not cause major structural changes.

As Region 3 was labeled with a red fluorochrome and Region 4 with a green one, all yellow regions in **Figure 3** are the result of hybridization of both probes to the same region. Although R3 and R4 do not share the same chromosome pair, they share the same or highly similar repetitive DNA. The hybridization of both probes to the same regions of GAR chromosomes confirms that this sequence is also present in this species. Since repetitive sequences evolve quickly by concerted evolution with significant differences between species (Pons and Gillespie, 2004), the presence of the highly similar repetitive DNA sequence in different species clearly shows that these species diverged recently, without sufficient time to accumulate sequence differences. Despite the huge amount of rearrangement, the repetitive DNA sequence strongly suggests that these species diverged recently and also that the rearrangements responsible for the karyotypic differences are also recent.

Taken together, the sum of the results might explain the difficulty in finding synapomorphies among the species compared so far, since most of the rearrangements might have become fixed after the species became isolated. On the other hand, because the G. carapo clade is a derived one (Tagliacollo et al., 2016, **Figure 1**) and because up until today there are few species of Gymnotus studied by chromosome painting, we currently cannot conclusively resolve whether the homologous chromosomes present a symplesiomorphic or synapomorphic character. An example is the NOR bearing pair that maps to GCA 20 using rDNA probes in species of the carapo group, but this location is different in species outside this group (Milhomem et al., 2013), which suggests that it is a synapomorphy. This matter will be better understood once species outside the carapo group are mapped with all the GCA whole chromosome probes.

#### REFERENCES


#### AUTHOR CONTRIBUTIONS

MM, JP, FS, PO, MF-S, and CN: gave substantial contributions to the conception of the work; the acquisition, analysis, and interpretation of data for the work; participated in the draft of the work or revised it critically for important intellectual content; gave final approval of the version to be published; and agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

#### FUNDING

This research was supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) through the Edital Universal (Proc. 475013/2012-3) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) through the Edital 047/2012 PRÓ-AMAZÔNIA: Biodiversidade e Sustentabilidade on a project coordinated by CYN; by Fundação Amazônia Paraense de Amparo à Pesquisa (FAPESPA) through the National Excellence on Research Program (PRONEX, TO 011/2008) and Banco Nacional de Desenvolvimento Econômico e Social – BNDES (Operação 2.318.698.0001) on a project coordinated by JP.

### ACKNOWLEDGMENTS

This study is part of the Master Dissertation of MM who was a recipient of a CAPES Scholarship in Genetics and Molecular Biology, UFPA. CYN (308428/20013-7) and JP (308401/2013-1) are grateful to CNPq for Productivity Grants. The authors are grateful to members of the team of the cytogenetics laboratory UFPA for the fieldwork and chromosomal preparations. To MSc. Jorge Rissino, to MSc. Shirley Nascimento and Maria da Conceição for assistance in laboratory work. We also thank the Instituto Chico Mendes de Conservação da Biodiversidade (ICMBio) for the collection permit 020/2005 (Registration: 207419).

inferences about chromosomal evolution of Gymnotidae. J. Hered. 106, 177–183. doi: 10.1093/jhered/esu087


carapo (Gymnotidae, Gymnotiformes). Cytogenet. Genome Res. 141, 163–168. doi: 10.1159/000354988


complex (Apteronotidae–Gymnotiformes) implications in cytotaxonomy and karyotypic evolution. Caryologia 70, 1–4. doi: 10.1080/00087114.2017.13 06385


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Machado, Pieczarka, Silva, O'Brien, Ferguson-Smith and Nagamachi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fgene-09-00013 January 31, 2018 Time: 17:43 # 1

# Genetic Diversity of the Endangered Neotropical Cichlid Fish (Gymnogeophagus setequedas) in Brazil

Lenice Souza-Shibatta<sup>1</sup> \*, Thais Kotelok-Diniz<sup>1</sup> , Dhiego G. Ferreira<sup>2</sup> , Oscar A. Shibatta<sup>3</sup> , Silvia H. Sofia<sup>1</sup> , Lucileine de Assumpção<sup>4</sup> , Suelen F. R. Pini<sup>4</sup> , Sergio Makrakis<sup>4</sup> and Maristela C. Makrakis<sup>4</sup>

<sup>1</sup> Laboratório de Genética e Ecologia Animal, Departamento de Biologia Geral, Universidade Estadual de Londrina, Londrina, Brazil, <sup>2</sup> Laboratório de Genética e Conservação, Universidade Estadual do Norte do Paraná, Cornélio Procópio, Brazil, <sup>3</sup> Museu de Zoologia, Departamento de Biologia Animal e Vegetal, Universidade Estadual de Londrina, Londrina, Brazil, <sup>4</sup> Grupo de Pesquisa em Tecnologia em Ecohidráulica e Conservação de Recursos Pesqueiros e Hídricos – GETECH, Universidade Estadual do Oeste do Paraná, Toledo, Brazil

#### Edited by:

Roberto Ferreira Artoni, Ponta Grossa State University, Brazil

#### Reviewed by:

Paulo Affonso, Universidade Estadual do Sudoeste da Bahia, Brazil Charles Masembe, Makerere University, Uganda

> \*Correspondence: Lenice Souza-Shibatta lenicesouza@hotmail.com

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 31 October 2017 Accepted: 10 January 2018 Published: 02 February 2018

#### Citation:

Souza-Shibatta L, Kotelok-Diniz T, Ferreira DG, Shibatta OA, Sofia SH, de Assumpção L, Pini SFR, Makrakis S and Makrakis MC (2018) Genetic Diversity of the Endangered Neotropical Cichlid Fish (Gymnogeophagus setequedas) in Brazil. Front. Genet. 9:13. doi: 10.3389/fgene.2018.00013 Gymnogeophagus setequedas is a rare and rheophilic species of tribe Geophagini, considered endangered in Brazilian red lists. Its previously known geographical distribution range was the Paraná River basin, in Paraguay, and a tributary of the Itaipu Reservoir in Brazil. Since its description no specimens have been collected in the original known distribution area. However, recent records of G. setequedas in the lower Iguaçu River, in a region considered highly endemic for the ichthyofauna, extended the known geographical distribution and may represent one of the last remnants of the species. The aim of this study was to estimate the genetic diversity and population structure of G. setequedas, using microsatellite markers and mitochondrial haplotypes, in order to test the hypothesis of low genetic diversity in this restricted population. Muscular tissue samples of 86 specimens were obtained from nine locations in the Lower Iguaçu River basin, between upstream of the Iguaçu Falls and downstream of the Salto Caxias Reservoir. Seven microsatellites loci were examined and a total of 120 different alleles were obtained. The number of alleles per locus (NA) was 17.429, effective alleles (NE) 6.644, expected heterozygosity (HE) 0.675, observed (HO) heterozygosity 0.592, and inbreeding coefficient (FIS) 0.128. Twelve haplotypes in the D-Loop region were revealed, with values of h (0.7642) and π (0.00729), suggesting a large and stable population with a long evolutionary history. Thus, both molecular markers revealed high levels of genetic diversity and indicated the occurrence of a single G. setequedas population distributed along a stretch of approximately 200 km. The pattern of mismatch distribution was multimodal, which is usually ascribed to populations in demographic equilibrium. Nevertheless, the construction of a new hydroelectric power plant, already underway between the Salto Caxias Reservoir and Iguaçu Falls, could fragment this population, causing loss of genetic diversity and population decline, and for this reason it is necessary to maintain the Iguaçu River tributaries and downstream area from the Lower Iguaçu Reservoir free of additional dams, to guarantee the survival of this species.

Keywords: conservation, freshwater, Lower Iguaçu River, Gymnogeophagus, endangered species

# INTRODUCTION

fgene-09-00013 January 31, 2018 Time: 17:43 # 2

The largest biodiversity asset in the world is located in Brazil (ICMBio, 2017). However, more than 1,170 species in Brazil have been classified as threatened with extinction. Unfortunately, of these, more than 26% are Actinopterygii fish found in freshwaters (ICMBio, 2017), including Gymnogeophagus setequedas Reis et al. (1992), the only one of the 17 species of the genus described so far considered as threatened (Abilhoa and Duboc, 2004; Pavanelli and Reis, 2008). This has led the Brazilian Environmental Ministry to decree the species with the status of Endangered species (EN) (decree #445, International Union for Conservation of Nature [IUCN], 2014).

Gymnogeophagus setequedas was described based on specimens collected in tributaries of the Paraná River in Paraguay and Brazil, near the Sete Quedas region, an area currently submerged due to construction of the Itaipu Hydroelectric Power Plant (Reis et al., 1992; Pavanelli and Reis, 2008). However, since its description no specimens have been collected in the original known distribution area (Abilhoa and Duboc, 2004; Agostinho et al., 2004; Pavanelli and Reis, 2008). Nevertheless, 15 specimens of G. setequedas were recently collected in the Lower Iguaçu River, both up and downstream from Iguaçu Falls in the Iguaçu National Park (Paiz et al., 2017). According to the authors, finding this species in that region was quite unexpected as the Iguaçu waterfalls have lead to effective geographic isolation of the Ichthyofauna of the Iguaçu River (Zawadzki et al., 1999), providing an accentuated degree of endemicity, estimated between 51 and 71% (Abell et al., 2008). In addition, due to its high ecological importance, the Iguaçu River basin is considered an ecoregion, separated from the rest of the Paraná River Basin (Abell et al., 2008).

The Iguaçu River basin covers an area of approximately 72,000 km<sup>2</sup> , representing part of the landscape of the three Paraná plateaus, subdivided into three regions: Upper Iguaçu (1st plateau, Curitiba region), Middle Iguaçu (2nd plateau, Ponta Grossa region), and Lower Iguaçu (3rd plateau, Guarapuava region) (Maack, 2001). The portion of the 3rd plateau that includes the Lower Iguaçu is characterized by the presence of numerous waterfalls, such as Salto Grande (13 m), Salto Santiago (40 m), Salto Osório (30 m), and Iguaçu Falls (Maack, 1981). The region is very attractive for hydroelectric use due to its high gradient, and, thus, the original rapids and waterfalls, have been transformed into a sequence of reservoirs that flooded approximately 656 km<sup>2</sup> , remarkably altering the landscape (Júlio Júnior et al., 1997).

It was believed that G. setequedas preferred lentic environments, as in the other species of the genus (Pavanelli and Reis, 2008). However, it seems that this species behaves differently from its congeners, preferring fast waters. This fact was corroborated by its recent capture in the Lower Iguaçu River, in stretches without containment and with fast waters (Paiz et al., 2017). In addition, this species disappeared after construction of the Itaipu reservoir, being collected only twice, suggesting its dependence on lotic environments (Agostinho et al., 2004). Pavanelli and Reis (2008) consider that this species no longer occurs in the Itaipu reservoir, possibly because it did not succeed in colonizing the environment formed after construction of the reservoir. In Paraguay this species is also considered an EN (Liotta, 2010), for the same reasons as in Brazil.

According to Wu et al. (2015), understanding the diversity and genetic structure of endangered species is fundamental to engage effective environmental conservation and management actions. Genetic diversity is essential if populations are evolving in response to environmental changes. For instance, due to the effects of anthropogenic disturbances, a small and isolated population is more likely to lose genetic diversity, and consequently present population decline, than a huge population with high genetic diversity (Frankham et al., 2010; Allendorf et al., 2012).

The genetic diversity status of species is the starting point for systematic planning of actions that should be taken to ensure the survival of species and reduce their risk of extinction. No works are known which focus on the biological (Pavanelli and Reis, 2008) or genetic diversity of G. setequedas. In addition, the diploid number has only recently been presented (Paiz et al., 2017). Thus, the aim of this study was to estimate the genetic diversity and population structure of G. setequedas along its recently known area of occurrence, using microsatellite markers and mitochondrial haplotypes (D-loop), thus presenting the first data of a population study of this species threatened with extinction.

# MATERIALS AND METHODS

#### Study Area and Sampling

Our study area comprises a stretch of the Lower Iguaçu River basin, between upstream Iguaçu Falls and downstream Salto Caxias Reservoir (**Figure 1**).

Samples of 86 G. setequedas were collected at nine different points, some of them located in the Iguaçu National Park (PNI): two points in the main channel of the Iguaçu River (IGU 1 and IGU 2 – PNI), near the Iguaçu falls, and seven tributaries of the Iguaçu River (STO-Santo Antônio, SIL-Silva – PNI, Jardim, FLO-Floriano – PNI, GON-Gonçalves Dias – PNI, CAP-Capanema, AND-Andrada, and COT-Cotegipe) (**Figure 1** and **Table 1**). The samples were collected in 2012 (November), 2013 (November and December), and 2014 (January, February, March, April, July, August, September, November, and December). The specimens were captured using nets of different mesh sizes and electric fishing. Samples of muscle and rayed fins were taken from the fish, stored in microtubes containing 100% ethanol and kept at −20◦C. Specimens were fixed in 10% formalin and preserved in 70% ethanol and deposited in the fish collection of the Zoology Museum at the Universidade Estadual de Londrina under catalog number: MZUEL 16332, 16353, 16354, 17094–17096.

# DNA Extraction and Quantification

Total DNA was extracted from muscle or rayed fins preserved in 95% EtOH following the phenol/chloroform protocol of Almeida et al. (2001). NanoDropTM 1000 was used for determination of DNA concentrations and samples were diluted in ultrapure water, fgene-09-00013 January 31, 2018 Time: 17:43 # 3

10 ng/µL for microsatellite markers and 5 ng/µL for mtDNA D-loop markers.

# Microsatellite Amplification and Genotyping

Cross-amplification tests were conducted from seven loci described for Geophagus brasiliensis (Gbra6, Gbra16, Gbra17, Gbra62, Gbra63, Gbra80, and Gbra96) (Ferreira et al., 2013). Reagent concentrations and PCR conditions were performed according to Ferreira et al. (2015), using the modifications proposed by Schuelke (2000). PCR thermal conditions were conducted as follows: initial denaturation step at 94◦C for 4 min, followed by 35 cycles of denaturation at 94◦C for 40 s. The annealing temperatures of successful cros-amplifications loci were 48◦C (Gbra16, Gbra62, Gbra63, Gbra80), 54◦C (Gbra06, Gbra17, Gbra70), or 60◦C (Gbra96) for 1 min, extension at 72◦C for 1 min, followed by a final extension at 72◦C for 30 min. PCR products were analyzed on an ABI PRISM 3500-XL automated sequencer (Applied Biosystems) using GeneScan 600 Liz (Applied Biosystems) as a molecular weight marker.

#### mtDNA (D-Loop) Marker

Part of the control region (D-loop) of the G. setequedas mitochondrial DNA was amplified using PCR. The primers used were L 5<sup>0</sup> -AGAGCGTCGGTCTTGTAAACC-3<sup>0</sup> (Cronin et al., 1993) and H 5<sup>0</sup> -CTGAAGTAGGAACCAGATG-3<sup>0</sup> (Meyer et al., 1990). PCR reactions were performed in a 25 µL final volume containing 1X GoTaq Master Mix (Promega), 1 µM fgene-09-00013 January 31, 2018 Time: 17:43 # 4

TABLE 1 | Sampling sites for Gymnogeophagus setequedas, including sample sizes per site (N) and location.


of each primer, 15 ng DNA, and ultrapure water to volume. The thermal profile included an initial denaturation at 94◦C for 4 min, followed by 41 cycles at 94◦C for 15 s, annealing at 56◦C for 30 s, and extension at 72◦C for 2 min, with a final extension at 72◦C for 10 min. The PCR products were purified using ExoSAP IT (Prodimol Biotecnologia S.A, Belo Horizonte, Minas Gerais, Brazil). The Big Dye Terminator v 3.1 kits (Applied Biosystems) and ABI-PRISM 3500 XL automated sequencer (Applied Biosystems) were used for sequence analysis. Multiple alignment analysis was carried out using the ClustalW application (Thompson et al., 1994) in BioEdit 7.1.3.0 (Hall, 1999). NCBI's BLAST search (Basic Local Alignment Search Tool, Altschul et al., 1990) was used to confirm the origin of the fragment. To search for possible tRNA, was used an online version of the tRNAscan-SE (Lowe and Eddy, 1997), available at http://lowelab.ucsc.edu/tRNAscan-SE. Sequences of the 12 different haplotypes were deposited in GenBank (MG581478 to MG581489).

# Genetic Analyses (Microsatellites)

#### Population Structure

The first step for genetic analyses was to define the number of existing populations. For this we used population analyzes based on Bayesian approaches that mainly include "attribution methods." These methods calculate the probability of the different genotypes being observed in each population and assign the individuals to the populations according to the possibilities of the genotypes belonging to them, without any a priori inference. Thus, such analyzes allow to infer which population an individual belongs to, regardless of their collection site (Beaumont and Rannala, 2004).

In order to evaluate the relationship between samples, we conducted a Bayesian cluster analysis of the population by using STRUCTURE v.2.3.3 (Pritchard et al., 2000) program. The number of populations (K) was estimated by using the admixture model and correlated allele frequencies among populations, with K ranging from 1 to 10 (K = 1–10) (Evanno et al., 2005). A total of 20 independent runs of 100,000 Markov Chain Monte Carlo (MCMC) iterations discarded as burn-in, followed by 1,000,000 MCMC iterations were used for each value of K. The best-fit number of groupings was evaluated using K, ln Pr (X/K) (Pritchard et al., 2000) and 1K ad hoc statistics (Evanno et al., 2005) by Structure Harvester v.0.6.7 (Earl and VonHoldt, 2012). Graphs representing the membership coefficient of each sampled individual were plotted using Distruct 1.1 (Rosenberg, 2004). Genetic differentiation estimates were assessed from pairwise 8ST values obtained in ARLEQUIN v.3.5.1.3 (Excoffier and Lischer, 2010). Significant estimates were based on 10,000 permutations. Subsequently, P-values corresponding to alpha = 0.05 were adjusted after Holm-Bonferroni correction for multiple tests (Holm, 1979).

#### Genetic Diversity

Number of alleles per locus (NA), effective number of alleles (NE), expected and observed heterozygosity (HO, HE) were obtained with POPGEN v. 1.31 (Yeh et al., 2000) software. Inbreeding coefficient (FIS) was obtained with Fstat v2.9.3 program (Goudet, 2001). Deviation from Hardy-Weinberg equilibrium (HWE) and the linkage disequilibrium between pairs of loci with significance (P-value), later adjusted by the Bonferroni sequential correction (Rice, 1989) were tested with the GENEPOP v.1.2 (Raymond and Rousset, 1995). MICRO-CHECKER 2.2.1 (Van Oosterhout et al., 2004) software was used to test for the possible presence of null alleles or other genotyping errors such as allelic dropout and reading errors due to stutter peaks.

#### Gene Flow

The contemporary migration rates over a few previous generations and the direction of migration among the samples studied, was estimated by using the BayesAss v 3.0.3 program (Wilson and Rannala, 2003), at 95% confidence intervals. Ten runs were analyzed using different random starting seed numbers, with 3,000,000 MCMC iterations, including 999,999 discarded burn-in iterations. After the burn-in, every 2000th iteration was sampled. The delta values (maximum amount by which parameter values are allowed to change between iterations) were 0.15 for allele frequencies, 0.025 and 0.05 for migration rate, and 0.15 for inbreeding value.

#### Demographic Analyses

Recent population bottleneck signs were evaluated on microsatellite data using Bottleneck v.1.2.02 program (Piry et al., 1999), considering deviations from the mutation-drift equilibrium. Three tests were used, including two tests to indicate bottlenecks in the presence of significant excess heterozygosity: "Sign test" (Cornuet and Luikart, 1996) and the "Wilcoxon sign-rank test" (Luikart and Cornuet, 1998), both based on the Infinite Alleles Model (IAM), Stepwise Mutation Model (SMM), and Two-Phase Model (TPM – with 90% SMM and 10% IAM), with a P-value < 0.05. The third test was the "Mode shift test" that indicates bottlenecks resulting from alterations in allele frequency distributions (Luikart et al., 1998).

#### Genetic Analyses (mtDNA)

fgene-09-00013 January 31, 2018 Time: 17:43 # 5

#### Genetic Diversity

Haplotype number, haplotype diversity (h) and nucleotide diversity (π) were obtained from DnaSP v.5 program (Librado and Rozas, 2009). Arlequin v.3.5.1.3 program (Excoffier and Lischer, 2010) was used to conduct the selective neutrality tests based on the infinite sites model of Tajima (D) (Tajima, 1989) and Fu (Fs) (Fu, 1997). The Network v.4.6.1.1 program (Fluxus Technology Ltd.<sup>1</sup> ) was used to construct haplotype networks from mtDNA data based on the median-joining algorithm (Bandelt et al., 1999).

# RESULTS

#### Population Structure

Bayesian clustering analysis (Structure) applied to microsatellite data indicated that the most probable K (K+ cluster number) was K = 1, from ln Pr(X/K). The graphic representation of 1K showed that there were no well-defined groups, the ancestral values were distributed homogeneously among individuals and samples, indicating the occurrence of a single G. setequedas population (**Figure 2A**). Molecular Variance Analysis (AMOVA), conducted for all samples showed very little variation among them (8ST = 0.025, P > 0.05). Following this result, all other analyzes were based on a single population.

<sup>1</sup>http://www.fluxus-engineering.com

In the entire sample, a total of 120 different alleles were obtained from seven microsatellite loci. The number of alleles per locus (NA) was 17.429, effective alleles (NE) 6.644, expected heterozygosity (HE) 0.675, observed (HO) heterozygosity 0.592, and inbreeding coefficient (FIS) 0.128.

After applying the Bonferroni sequential correction, there were no significant deviations (P < 0.05) in the Hardy– Weinberg equilibrium (HWE) at the majority of microsatellite loci, only the locus Gbra96 showed significant deviation. This correction was also applied to the linkage disequilibrium (LD) tests, and a significant value was found only in locus Gbra96. The Micro-Checker program found no null alleles among the samples.

From the amplification and sequencing of mtDNA of 82 G. setequedas individuals, a 449 bp fragment from the D-loop region was obtained. Twenty polymorphic sites (17 transition and three transversion mutations) and four indels sites were found. Twelve different haplotypes were revealed, of which four haplotypes (H5, H7, H10, and H12) were singletons (**Figure 2B**). The H2 haplotype was the most frequent, observed in 37 samples from four different locations (IGU1, IGU2, STO, and FLO). Although SIL, FLO, GON, COT, and CAP present only one haplotype each, these haplotypes are shared with other locations, except for H5 found only in CAP. The IGU2 has the highest number of haplotypes (N = 8), followed by STO (N = 6), AND (N = 3), and IGU1 (N = 2). Haplotype (h) and nucleotide (π) diversity values were 0.7642 and 0.00729, respectively (**Table 2**).

FIGURE 2 | (A) Graphical representation of Bayesian cluster analysis from K = 1. Each column represents a different individual of G. setequedas, and the colors represent the probability of the ancestral coefficient of individual in each genetic cluster. Numbers represent the collection points: (1) Iguaçu 1; (2) Iguaçu 2; (3) Santo Antônio; (4) Silva Jardim; (5) Floriano; (6) Gonçalves Dias; (7) Capanema; (8) Andrada; (9) Cotegipe. (B) Haplotype network based on partial sequencing of the D-loop region (mtDNA) of 82 individuals of G. setequedas from the Lower Iguaçu River. Circle sizes are proportional to haplotype frequency. Numbers between haplotypes denote mutational steps between sequences.

#### Demographic Analyses

fgene-09-00013 January 31, 2018 Time: 17:43 # 6

The signed-rank test did not produce significant values in any of the mutational models (IAM, SSM, or TPM). In the mode-shift test, the samples showed typical L-shaped distribution (nonbottleneck) in the frequency of the alleles in the mode-shift test (**Table 3**). The mismatch distribution graphic demonstrated a multimodal distribution for haplotypes (**Figure 3**), which is usually ascribed to populations in demographic equilibrium. In the neutrality tests, Tajima test values (D) and Fu test values (Fs) were negative and not significant (**Table 2**).

#### Gene Flow

Bayesian gene flow analysis of the microsatellite data revealed contemporary migration values among the samples within the confidence interval (95%). Values for non-immigrants in each sample, ranging from 79 to 82.6%. However, migration estimates were also obtained and showed similar values among the majority of samples. The lowest migration estimates were from IGU 1 to COT and IGU 2 to AND, both with 1.9%. On the other hand, the highest migration values were from FLO to IGU2, and to GON, and COT to CAP, with 2.7%. Among all the samples, FLO was the one that obtained the highest percentage of migrants, for the largest number of sites (**Table 4**).

#### DISCUSSION

#### Genetic Diversity and Population Structure

The study of molecular markers, such as Microsatellite and mtDNA, generates important information on the genetic variation and structure of fish species and is a significant step toward realizing the goal of conservation of species in their natural populations (Carvalho-Costa et al., 2008; Piorski et al., 2008; Garcez et al., 2011; Abdul-Muneer, 2014). According to Wu et al. (2015), understanding the diversity and genetic structure of endangered species are essential to engage effective environmental conservation and management action. The longterm persistence of species depends on sufficient genetic diversity to adapt and survive in variable or changing environments (Hughes et al., 2008).

According to DeWoody and Avise (2000), based on a metaanalysis of microsatellite polymorphisms, freshwater fish, on average, have 9.1 ± 6.1 alleles and expected heterozygosity of 0.54 ± 0.25 per population. Therefore, based on microsatellites, the genetic diversity of the G. setequedas population in terms of allele numbers (17.14) and the expected heterozygosity (0.67) are as expected for freshwater fish. In addition to the high diversity in the nuclear markers, the G. setequedas analyzed showed significant variations in mitochondrial DNA, exhibiting high levels of genetic diversity. According to Freeland (2005), h is considered the haploid equivalent of HE in data on diploids. The similarity among these estimates suggests that the current variations in nuclear and mitochondrial DNA are evenly distributed throughout G. setequedas population.

A high level of genetic diversity is an important attribute for species and may confer the basis for adaptation to environmental change (Piorski et al., 2008), especially when it comes to endangered species such as G. setequedas (Abilhoa and Duboc, 2004; Pavanelli and Reis, 2008; International Union for Conservation of Nature [IUCN], 2014).

#### Demographic History

A 449 bp test in the D-Loop region, one of the most variable regions of mtDNA (Frankham et al., 2010), revealed 12 haplotypes and high values of π (0.00729) and h (0.750). In addition, analysis of the microsatellite data using Bottleneck program showed no significant recent bottlenecks. The absence of recent bottlenecks is corroborated by the high haplotype (h > 0.5) and nucleotide diversity (π > 0.5%) values in the mtDNA. According to Grant and Bowen (1998), high haplotypic diversity combined with high nucleotide diversity represents a

TABLE 2 | Genetic diversity of G. setequedas in the Lower Iguaçu River basin, based on microsatellite markers and mitochondrial haplotypes (D-Loop).


N, number of individuals examined; A, total number of alleles; NA, mean number of alleles; NE, mean number of effective alleles; HO, observed heterozygosity; HE, expected heterozygosity; FIS, inbreeding coefficient; Nh, number of haplotypes; h, haplotype diversity; π, nucleotide diversity; D, Tajima's neutrality test; Fs, Fu's neutrality test.

TABLE 3 | Bottleneck tests in 86 samples of G. setequedas from the Lower Iguaçu River Basin.


N, number of individuals; He, number of loci exhibiting excess heterozygosity; H<sup>d</sup> , number of loci exhibiting deficient heterozygosity. Normal L-shaped distribution = nonbottlenecked population. <sup>a</sup> Infinite allele model, <sup>b</sup> two phase model, <sup>c</sup> stepwise mutation model.

fgene-09-00013 January 31, 2018 Time: 17:43 # 7

large and stable population with a long evolutionary history, or secondary contact between different lineages. Stable population with a long evolutionary history seems to be a very plausible possibility for G. setequedas, as the Iguaçu waterfalls have exerted effective geographic isolation on ictiofauna of the Iguaçu river (Zawadzki et al., 1999), providing an accentuated degree of endemicity, of more than 70% (Abell et al., 2008). In addition, analysis of the distribution of substitution differences between pairs of haplotypes (mismatch distribution) (Cunha and Solé-Cava, 2012) shows multimodal distributions, which is generally attributed to populations in demographic equilibrium (Rogers and Harpending, 1992).

However, negative values in the Tajima's D test and Fu's F<sup>s</sup> test, even if not significant, could suggest population expansion after an ancient bottleneck (Slatkin and Hudson, 1991; Grant and Bowen, 1998), indicating that all the current haplotypes are closely related and derived from a single main haplotype (H2). Signs of old bottlenecks may be less evident at microsatellite loci, since they tend to recover from the variation more rapidly than mitochondrial sequences. At the same time, π recovery after a genetic bottleneck is slower than h at mtDNA (McCusker and Bentzen, 2010).

#### Gene Flow

The individuals of the Floriano River (FLO) presented the highest rates of migration, and the highest levels of admixture in samples were found in the Iguaçu 2 and Gonçalves Dias rivers. The specimens from rivers further upstream in the drainage (FLO, GON, CAP, and COT) appear to have more levels of admixture between them. This factor might suggest that entry into the upper tributaries is more likely than the lower. However, the highest rates of migrants to Iguaçu 1 are from most upstream tributaries, suggesting that populations of G. setequedas maintain satisfactory gene flow in all stretches of the river studied. According to Palstra and Ruzzante (2008), if local populations are small, as is the case in the present study, gene flow is the key factor to prevent the


The bold values along the diagonal represent non-migrants within a putative source subpopulation.

fgene-09-00013 January 31, 2018 Time: 17:43 # 8

stochastic loss of genetic diversity, besides providing the required alleles to subpopulations under selection that lack favorable genotypes (Kinnison and Hairston, 2007). Although these results allow inferring gene flow between localities, according to Wilson and Rannala (2003) a strong estimate can be reach with a higher sample size per locality. It can be the next goal for further studies, but it is a difficult task to solve immediately because the species is not abundant and is mainly distributed in a preservation area.

According to Fagan et al. (2002, 2005), riverine populations are forecasted to be particularly vulnerable to fragmentation due to their dendritic structure, which may be exacerbated by unidirectional migration. Natural barriers (rapids and waterfalls) and man-made structures, such as dams, also fragment riverine populations, influencing in the dispersal rate and migration pattern (Wofford et al., 2005), even of a rheophilic species of fishes with strong swimming abilities such as G. setequedas (Paiz et al., 2017). However, the construction of a new hydroelectric power plant (Baixo Iguaçu HPP), already underway between the Salto Caxias Reservoir and Iguaçu Falls, could fragment this population preventing the gene flow. As a consequence, there may be loss of genetic diversity and population decline, especially in the area of future reservoir. Moreover, this separate population can be extinguished, as has already happened with another population of G. setequedas after the construction of the Itaipu Hydroelectric Power Plant. The disappearance was attributed to the lentic waters of the Itaipu Reservoir, which isolated populations of this rheophilic species, which previously occurred in tributaries of both river banks, in Paraguay and Brazil, and probably in the Paraná River (Paiz et al., 2017).

#### Conservation Implications

The abundance, dispersal, and population size are reduced in populations structured by habitat fragmentation due to barriers such as dams, thereby increasing the risk of extinction (Gross et al., 2004; Letcher et al., 2007). This fragmentation can lead to the total or partial isolation of a population, conditioning the response of the individuals. Thus, in the recently found population of G. setequedas in the Iguaçu River, a drastic reduction and loss of genetic diversity, due to inbreeding, must be avoided preserving the lotic characteristics of the environment.

For instance, due to the effects of anthropogenic disturbance, small and isolated populations are more likely to suffer loss of genetic diversity and population decline, than a huge population with high genetic diversity (Frankham et al., 2010; Allendorf et al., 2012). According to Frankham (2003), inbreeding reduces reproduction and survival rates, and loss of genetic diversity reduces the ability of populations to evolve to cope with environmental changes, leading to extinction risk.

The type locality and most of the records of G. setequedas are in Paraguay, in tributaries of the right bank of the Paraná river, in the region of influence of the Itaipu reservoir and downstream (Reis et al., 1992). Since the species description, despite several attempts, it was not possible to collect new specimens from the known geographic range of occurrence (Agostinho et al., 2004; Pavanelli and Reis, 2008). According to Pavanelli and Reis (2008), this species no longer occurs in the Itaipu reservoir, as well as in the floodplain upstream of the reservoir. Despite several collection efforts on the Iguaçu River (Pavanelli and Reis, 2008), mainly in the Lower Iguaçu upstream the National Park (Baumgartner et al., 2012), this species was not collected. For this reason, the conservation status of G. setequedas was invariably attributed to a threatened category (Abilhoa and Duboc, 2004; Pavanelli and Reis, 2008; International Union for Conservation of Nature [IUCN], 2014; Paiz et al., 2017). However, recently Paiz et al. (2017) and the present study report the presence of G. setequedas in the Lower Iguaçu in the National Park region. In this way, the population of G. setequedas of the Lower Iguaçu River may be one of the last remnants of this species and, according to Pavanelli and Reis (2008), as G. setequedas is a naturally rare species, it is advisable that any anthropogenic changes in its original ecosystem be discouraged.

The results presented here demonstrate that the population of G. setequedas of the Iguaçu River still maintains satisfactory levels of genetic diversity. However, in terms of conservation management plans, to guarantee the survival of this species, it is necessary to maintain the tributaries of the Iguaçu River and the downstream area from the future reservoir (Baixo Iguaçu Reservoir) without additional dams. Long-term monitoring of genetic diversity and inbreeding could also help conserve this population and provide a basis for future decisions.

# ETHICS STATEMENT

This study was carried out in strict accordance with the recommendations provided in the Guide for the Care and Use of Laboratory Animals. Collection was authorized by the System of Authorization and Information on Biodiversity – SISBIO (SISBIO n◦ . 25648-3 and 25648-4), by the Chico Mendes Institute for Biodiversity Conservation ICMBio 003/2014 and Official SEI n◦ . 63/2016-DIBIO/ICMBio), and by the Environmental Institute of Paraná – IAP (n◦ . 37788 and 43394). The sampling protocol was approved by the Ethics Committee on the Use of Animals – CEUA of the Universidade Estadual do Oeste do Paraná (n◦ . 62/09).

### AUTHOR CONTRIBUTIONS

LS-S, DF, MM, and OS designed the research. LA, SP, SM, and MM collected data. LS-S, TK-D, and DF performed the molecular genetic studies. All authors contributed to the writing of the manuscript.

# ACKNOWLEDGMENTS

We are grateful to ICMBio-Parque Nacional do Iguaçu; Consórcio Empreendedor Baixo Iguaçu (CEBI); Universidade Estadual do Oeste do Paraná (UNIOESTE); and Universidade Estadual de Londrina (UEL).

# REFERENCES

fgene-09-00013 January 31, 2018 Time: 17:43 # 9


Neotropical fish Geophagus brasiliensis (Perciformes, Cichlidae). J. Fish Biol. 83, 1430–1438. doi: 10.1111/jfb.12227


Freeland, J. R. (2005). Molecular Ecology. Chichester: John Wiley & Sons Ltd.


fgene-09-00013 January 31, 2018 Time: 17:43 # 10


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Souza-Shibatta, Kotelok-Diniz, Ferreira, Shibatta, Sofia, de Assumpção, Pini, Makrakis and Makrakis. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Identification of a New Mullet Species Complex Based on an Integrative Molecular and Cytogenetic Investigation of Mugil hospes (Mugilidae: Mugiliformes)

Mauro Nirchio<sup>1</sup> , Fabilene G. Paim<sup>2</sup> , Valentina Milana<sup>3</sup> , Anna R. Rossi<sup>3</sup> and Claudio Oliveira<sup>2</sup> \*

<sup>1</sup> Facultad de Ciencias Agropecuarias, Universidad Técnica de Machala, Machala, Ecuador, <sup>2</sup> Departamento de Morfologia, Instituto de Biociências, Universidade Estadual Paulista "Júlio de Mesquita Filho", São Paulo, Brazil, <sup>3</sup> Dipartimento di Biologia e Biotecnologie "C. Darwin", Sapienza Università di Roma, Rome, Italy

#### Edited by:

Roberto Ferreira Artoni, Ponta Grossa State University, Brazil

#### Reviewed by:

Marcelo R. S. Briones, Federal University of São Paulo, Brazil Marcelo De Bello Cioffi, Federal University of São Carlos, Brazil

> \*Correspondence: Claudio Oliveira claudio@ibb.unesp.br

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 08 July 2017 Accepted: 15 January 2018 Published: 05 February 2018

#### Citation:

Nirchio M, Paim FG, Milana V, Rossi AR and Oliveira C (2018) Identification of a New Mullet Species Complex Based on an Integrative Molecular and Cytogenetic Investigation of Mugil hospes (Mugilidae: Mugiliformes). Front. Genet. 9:17. doi: 10.3389/fgene.2018.00017 Mullets are very common fishes included in the family Mugilidae, (Mugiliformes), which are characterized by both a remarkably uniform external morphology and internal anatomy. Recently, within this family, different species complexes were molecularly identified within Mugil, a genus which is characterized by lineages that sometimes show very different karyotypes. Here we report the results of cytogenetic and molecular analyses conducted on Mugil hospes, commonly known as the hospe mullet, from Ecuador. The study aims to verify whether the original described species from the Pacific Ocean corresponds to that identified in the Atlantic Ocean, and to identify speciesspecific chromosome markers that can add new comparative data about Mugilidae karyotype evolution. The karyotype of M. hospes from Ecuador is composed of 48 acrocentric chromosomes and shows two active nucleolar organizer regions (NORs). In situ hybridization, using different types of repetitive sequences (rDNAs, U1 snDNA, telomeric repeats) as probes, identified species-specific chromosome markers that have been compared with those of other species of the genus Mugil. Cytochrome c oxidase subunit I (COI) sequence analysis shows only 92–93% similarity with sequences previously deposited under this species name in GenBank, all of which were from the Atlantic Ocean. Phylogenetic reconstructions indicate the presence of three wellsupported hospe mullet lineages whose molecular divergence is compatible with the presence of distinct species. Indeed, the first lineage includes samples from Ecuador, whereas the other two lineages include the Atlantic samples and correspond to M. brevirostris from Brazil and Mugil sp. R from Belize/Venezuela. Results here provided reiterate the pivotal importance of an integrative molecular and cytogenetic approach in the reconstruction of the relationships within Mugilidae.

Keywords: fish cytogenetics, fish molecular phylogeny, COI, chromosomal evolution, FISH, Mugilidae

# INTRODUCTION

fgene-09-00017 February 2, 2018 Time: 14:39 # 2

Mullets is the popular name of fishes included in the Mugilidae, a species rich family that is the only representative of the order Mugiliformes. These fishes are distributed in several coastal aquatic habitats in tropical, subtropical and temperate regions of the world, where they are ecologically, recreationally and commercially important (Thomson, 1966). According to different authors (see González-Castro and Ghasemzadeh, 2016 and references herein), the family has approximately 26 genera, but Eschmeyer and Fong (2017) ascribe to Mugilidae 20 genera and 75 valid species.

In Mugilidae, most of the classical morphological characters used in species identification and/or systematics have poor diagnostic power and morphometric variability is limited (Schultz, 1946; Thomson, 1997; González-Castro, 2007; González-Castro and Ghasemzadeh, 2016). These characteristics are associated with the wide distribution of most of the species, which raises questions about their actual taxonomic status. Cytogenetic and molecular studies have provided important data for understanding the systematic relationships and evolutionary pathways among mullet species (Harrison et al., 2007; Sola et al., 2007; Durand et al., 2012; Durand and Borsa, 2015). These studies have also shown that it is necessary to use integrative approaches to study mugilids. Indeed, the use of repetitive sequences such as ribosomal genes (18S rDNA and 5S rDNA) as probes in FISH mapping has been shown to be a very informative cytotaxonomic tool in revealing different lineages/species within Mugilidae (Nirchio et al., 2007, 2017; Sola et al., 2007). On the other hand, the utility of molecular markers in this family to identify the species, better define the genera, and reconstruct their phylogenetic relationships, is well-represented by the huge amount of literature on this topic published in the last 15 years (see Rossi et al., 2016 for a review and Durand et al., 2017). In addition, molecular phylogenetic analyses have been used successfully in the investigation of chromosome evolution in some fish groups as those of the genus Characidium (Pansonato-Alves et al., 2014) and Triportheus (Yano et al., 2014), and in Geophagus brasiliensis (Alves-Silva and Dergam, 2015).

Mugil, which presently includes 16 valid species (Eschmeyer and Fong, 2017), is the most cytogenetically studied genus among the Mugilidae. Nine species have been investigated to date (see section "Discussion" and **Figure 6**). Nonetheless, the number of species is probably underestimated currently, as recent molecular data have indicated that there are different species complexes within this genus. For example, the cosmopolitan M. cephalus was found to be composed of 15 well supported mitochondrial lineages (Durand and Borsa, 2015), including the one sampled in the type-locality (Mediterranean Sea); six of these lineages have already been cytogenetically analyzed (Rossi et al., 1996, 2016). However, these lineages lack formal descriptions and species name attribution.

Very recently, Durand et al. (2017) reported the presence of two well-supported mitochondrial lineages in the hospe mullet "Mugil hospes," a species that, according to Barletta and Dantas (2016), is distributed in the western Atlantic from Belize to Brazil and in the eastern Pacific from Mexico to Ecuador. The first molecular lineage includes sequences from Brazil and corresponds to the resurrected species Mugil brevirostris, which is distributed from the northern Brazilian coast (Amapá) to the southern Brazilian coast (Rio Grande do Sul) (Menezes et al., 2015); the second lineage is represented by haplotypes collected in the Gulf of Mexico (Belize/Venezuela) and was named Mugil sp. R. Samples from the eastern Pacific were not included in these analyses or in any other molecular study. The karyotype of the species remains undescribed.

In this research, specimens of M. hospes from Ecuador have been collected and their morphological characters accurately analyzed to make sure of the correct species identification. Cytogenetic and mitochondrial cytochrome c oxidase subunit I (COI) sequence analyses were performed aiming to (a) verify whether the original described M. hospes from the Pacific Ocean corresponds to one of the two lineages identified in the Atlantic Ocean or represents a third lineage, (b) estimate if the divergence among lineages is sufficient to attribute them to different species, (c) identify species-specific chromosome markers and add new comparative data that allow cytotaxonomic inferences on Mugilidae karyotype evolution.

# MATERIALS AND METHODS

Fourteen specimens of Mugil hospes (four males, four females, six immature), were collected with a cast net from a reservoir that provides water to a shrimp pool located at Barbones, El Oro Province, Ecuador (3◦ 090 14.0<sup>00</sup> S 79◦ 530 53.1<sup>00</sup> W). Fishes were transported to the laboratory in sealed plastic bags (32<sup>0</sup> ) containing two gallons of water, and the air in the bags was replaced with pure oxygen. All 14 individuals were used to prepare cell suspensions. A subsample of eight individuals was used for molecular and morphological analyses. Voucher specimens were deposited in the fish collection of the Laboratório de Biologia e Genética de Peixes (LBP), UNESP, Botucatu (São Paulo State, Brazil) (collection numbers LBP 23325) and Universidad Técnica de Machala (UTMACH-174-UTMACH-182; UTMACH-187; UTMACH-191-UTMACH-194). All experiments were conducted according to the Ethical Committee of Instituto de Biociências/UNESP/Botucatu, under protocol number 1057.

#### Morphological Analysis

Each fish was measured. Measurements and counts were taken as described by Menezes et al. (2010). Mouth width and mouth depth were measured as described by Thomson (1997). Twenty morphometric characters (Supplementary Table 1) and nine meristic characters (Supplementary Table 2) were recorded for each fish.

#### Molecular Analysis

Genomic DNA was extracted from muscle tissue that was preserved in 95% ethanol. DNA samples were obtained for eight specimens (one male, two female, five immature), according to procedures described by Aljanabi and Martínez (1997). A 655 bp fragment of the mitochondrial COI was amplified by PCR and

sequenced using primers and protocols reported by Nirchio et al. (2017). DNA sequences were aligned using the software Clustal X (Thompson et al., 1997) and deposited in GenBank (Accession numbers: KY964500-KY964504). The basic local alignment search tool (BLAST<sup>1</sup> ) was used to search for similar sequences to confirm species assignment.

For phylogenetic tree reconstruction, a subset of the COI sequences of Mugil, previously analyzed by Durand et al. (2017), was considered. Those sequences that showed greater than 90% similarity (i.e., the six sequences of M. brevirostris and the seven sequences of Mugil sp. R) were also included; Agonostomus monticola (Bancroft, 1834) (JQ060401) was used as an outgroup.

Three types of phylogenetic reconstructions were conducted: neighbour-joining (NJ), maximum-likelihood (ML) and Bayesian inference (BI) analyses. NJ and ML analyses (1000 bootstrap pseudoreplicates) were performed using MEGA7 (Kumar et al., 2016) and PhyML 3.0 (Guindon et al., 2010), respectively. Bayesian analyses were carried out as implemented in MrBayes 3.1.2 (Huelsenbeck and Ronquist, 2001). Two independent runs of four Markov chains, each for 1,000,000 generations were performed. ModelTest 3.7 (Posada and Crandall, 1998) and MrModelTest 2.3 (Nylander, 2008) were used to select, according to the Akaike information criterion, the evolutionary models that best fit the data set for the ML (GTR + I + G, with nst = 6, gamma shape = 4.682, and proportion of invariant sites = 0.637) and the BI (GTR + I + G) analyses, respectively. Genetic distances were calculated with MEGA7 using the Kimura-2-parameters substitution model (Kimura, 1980).

#### Cytogenetic Analysis

Each fish received an intra-abdominal injection of 0.0125% colchicine (1.0 ml/100 g body weight) 50 min before being sacrificed by administering a numbing overdose of benzocaine (250 mg/L) as recommended by the Guidelines for the Euthanasia of Animals of the American Veterinary Medical Association (AVMA, 2013). Kidney cells were suspended, and chromosomes were prepared by following the conventional air-drying method, as described by Nirchio and Oliveira (2006). Classical staining techniques (Giemsa, Ag-staining, C-banding) and fluorescence in situ hybridization (FISH) were used to map ribosomal gene clusters (5S rDNA and 18S rDNA) and U1 snRNA gene clusters (U1 snRNA is a non-coding RNA that forms part of the spliceosome) (Nilsen, 2003). Telomeric probes were also applied. For the conventional karyotype, slides were stained for 20 min with 10% Giemsa in phosphate buffer at pH 6.88. Active nucleolus organizer regions (NORs) were revealed by silver (Ag) staining as described by Howell and Black (1980); this was performed after Giemsa staining (Rábová et al., 2015). C-banding was performed following the method of Sumner (1972).

The 5S rDNA, 18S rDNA, U1 snRNA genes and telomeric repeats were mapped onto chromosomes by FISH using the method described by Pinkel et al. (1986). Sequences of 5S rDNA, 18S rDNA, U1 snDNA and telomeric repeats were obtained by polymerase chain reaction (PCR) from the genome of Hypsolebias flagellatus and used as probes. The primers used for amplification were 5SA (5<sup>0</sup> -TCA ACC AAC CAC AAA GAC ATT GGC AC-3<sup>0</sup> ) and 5SB (5<sup>0</sup> -TAG ACT TCT GGG TGG CCA AAG GAA TCA-3<sup>0</sup> ) (Pendás et al., 1995), 18S6F (50 -CTC TTT CGA GGC CCT GTA AT-3<sup>0</sup> ) and 18S6R (5<sup>0</sup> -CAG CTT TGC AAC CAT ACT CC-3<sup>0</sup> ) (Utsunomia et al., 2016), U1F (5<sup>0</sup> -GCA GTC GAG ATT CCC ACA TT-3<sup>0</sup> ) and U1R (50 -CTT ACC TGG CAG GGG AGA TA-3<sup>0</sup> ) (Silva et al., 2015) and (TTAGGG)5 and (CCCTAA)5 (Ijdo et al., 1991). The 5S rDNA and telomeric probes were labeled with biotin-16-dUTP (20 -deoxyuridine 5<sup>0</sup> -triphosphate), and the 18S rDNA and U1 snRNA gene probes were labeled by including digoxigenin-11-dUTP in the PCR. Hybridization was detected with fluorescein-conjugated avidin (FITC, Sigma–Aldrich<sup>2</sup> ) and antidigoxigenin-rhodamine conjugate (Roche Applied Science<sup>3</sup> ), respectively. Chromosomes were counterstained with 4,6 diamidino-2-phenylindole (DAPI), which was included in the Vectashield mounting medium (Vector Laboratories<sup>4</sup> ).

Conventionally stained metaphase cells were photographed using a Motic B400, equipped with a Moticam 5000C digital camera using Motic Images Plus 2.0 ML software. FISH images were captured with an Olympus BX61 photomicroscope equipped with a DP70 digital camera using Image-Pro plus 6.0 software (Media Cybernetics). Images were merged and edited for optimization of brightness and contrast using Photoshop (Adobe Systems, Inc.) Version 2015.0.0.

### RESULTS

#### Meristic and Morphometric Characters

The fresh specimens were gray on the dorsal side and white/silver on the ventral side. The pelvic fins had a yellowish tone, and the base of each pectoral fin had a visible dark spot. The dorsal fins and caudal fins were dusky. The distal tips of the anterior rays of the second dorsal fin were slightly darker. The pelvic and anal fins were pale. The body was elongated, with a slightly pointed snout (see **Figure 1**). The origin of the first dorsal fin was midway between the tip of the snout and the base of the caudal fin. The second dorsal fin and anal fin were profusely covered with scales. One row of small teeth was visible on the upper and lower lips (viewed under the microscope). There were adipose eyelids and widely separated spiny-rayed dorsal fins with four spines in the first dorsal fin and one spine plus eight soft rays in the second dorsal fin (small specimen with nine soft rays). Pelvic fins were sub-abdominal with one spine and 5–6 branched soft rays (commonly I+5). Pectoral fins were long, reaching the level of the origin of the first dorsal fin or extending just beyond, with two spines (the first spine very small) and 11–13 soft rays (commonly 12 rays). The anal fin had three spines and nine soft rays (first spine very short, and hidden by overlying scales). There was a large pectoral axillary scale, with 37–38 scales in longitudinal series (commonly 38), 11–14 scales in an oblique row extending to the origin of the pelvic fin (commonly 13) and 13 scales in a

<sup>1</sup>https://blast.ncbi.nlm.nih.gov/Blast.cgi

<sup>2</sup>www.sigma-aldrich.com

<sup>3</sup>https://lifescience.roche.com/

<sup>4</sup>https://vectorlabs.com/

transversal series, as well as 17–22 scales in a circum-peduncular series (commonly 19) (Supplementary Table 2).

#### Molecular Analysis

BLAST was used to show that COI nucleotide sequences from GenBank have 92–93% similarity with specimens originally identified as Mugil hospes, and with M. trichodon and Mugil sp., which were all collected in the Atlantic Ocean (from Brazil and Belize). Similarity values with other Mugil species were all below 90%.

The phylogenetic tree obtained by NJ, ML (lnL = −3276.75382), and BI (lnL = −3986.503923) analyses (**Figure 2**) shows three well-supported lineages of "M. hospes." The first two correspond to the M. brevirostris (Brazil) and Mugil sp. R (Belize and Venezuela) lineages identified by Durand et al. (2017), whereas the third, referred to hereafter as Mugil hospes (see Discussion), includes all the sequences from Ecuador obtained in this study (**Figure 2**). The genetic distance is 0.077 between M. hospes/M. brevirostris and between M. hospes/Mugil sp. R, and 0.073 between M. brevirostris/Mugil sp. R.

### Cytogenetic Analysis

All individuals showed a diploid number of 2n = 48 and a karyotype composed entirely of uniformly decreasing acrocentric chromosomes. Thus, the fundamental arm number (NF) was 48. Only two pairs of homologous chromosomes can be identified with certainty: pair 5, due to a clear interstitial secondary constriction, and pair 24, which is distinctly small (**Figure 3A**). Sequential Giemsa-silver (Ag) nitrate staining enabled the identification of two actively transcribing NORs, interstitially located on the secondary constriction of chromosome pair 5 (**Figures 3B,C**). C-banding showed that constitutive heterochromatin is restricted to the centromeric regions of all chromosomes, and there is a pericentromeric heterochromatin block on the secondary constriction of chromosome pair 5 (**Figure 3D**).

Double FISH experiments using 5S and 18S rDNA as probes revealed two positive sites detected for each probe (**Figure 4A**), located on different chromosome pairs. The 18S rDNA positive sites correspond to the AgNO<sup>3</sup> sites on the secondary constriction of chromosome pair 5. The 5S rDNA

probes hybridized interstitially on a pair of medium-sized chromosomes. Double FISH using the U1 snDNA and 5S rDNA probes revealed positive U1 snDNA signals on the telomeric region of a pair of medium-sized chromosomes distinct from the 5S-rDNA-bearing chromosome pair (**Figure 4B**). When chromosomes treated by double FISH were sorted by decreasing size, it was possible to assign the 5S rDNA sites to pair 15 and the U1 snDNA signals to pair 10 (**Figure 5**). Telomeric repeats were located at both ends of each chromosome, although signal intensities varied between chromosomes (**Figure 4C**).

#### DISCUSSION

Meristic and morphometric data of samples from Ecuador agree with the original description of M. hospes (Jordan and Culver 1895 in Jordan, 1895): this species has pectoral fins whose tips reach and extend slightly past the vertical line passing through the origin of the first dorsal fin (with four spines). This morphological character is shared with M. brevirostris, which inhabits the opposite side of the Americas (i.e., the Atlantic coast).

Sequence analysis showed that samples of the hospe mullet from Ecuador are genetically very different from those collected in the Atlantic Ocean, all of which were originally identified with the same name M. hospes. Thus, in addition to the two lineages identified by Durand et al. (2017) in the Atlantic Ocean, M. brevirostris and Mugil sp. R, a third lineage is present in the Pacific Ocean. The genetic distances between the Pacific and the two Atlantic lineages are higher than the COI 2% threshold value that discriminates different species (Ward, 2009), and in the phylogenetic reconstruction, the three species form a monophyletic and well-supported clade. Cryptic species are defined as distinct evolutionary lineages not detectable with traditional taxonomic approaches, due to the absence of morphological differences (Avise, 2000; Mallet, 2010). In the last decade, barcoding methods based on COI sequences have made possible their identification in several marine and freshwater fish species (Ward et al., 2008; Lara et al., 2010; Puckridge et al., 2013; Mateussi et al., 2017; Okamoto et al., 2017; Ramirez et al., 2017; Shimabukuro-Dias et al., 2017). In Mugilidae evidences of cryptic species were inferred from mitochondrial tree topology, independent data from nuclear markers, and on the base of the geographic distribution of sister lineages (Durand and Borsa, 2015). Our results indicate that the genetic distances between the different hospe mullet are comparable to those reported among species within both the M. cephalus and M. curema species complexes (Durand et al., 2017), and the three lineages inhabits different geographic areas. Thus, we hypothesize that besides the Mugil cephalus and Mugil curema species complexes, there is an additional putative one, which should be identified as the M. hospes species complex; the name M. hospes should be kept by the Pacific samples, being Mazatlán (in the eastern Pacific) the species type-locality.

Cytogenetic analysis shows that the 48 acrocentric chromosome karyotype detected in M. hospes is consistent with the generally available data on diploid chromosome number and karyotype structure in Mugilidae (Sola et al., 2007; Rossi et al., 2016). This confirms that the only exception is represented by the mullets belonging to the Mugil curema species complex (Nirchio et al., 2017).

Apart from the number of chromosomes, many microstructural changes are evident looking at the variability in the locations of ribosomal genes in Mugil. For example, 5S rDNA cistrons are always localized to an interstitial position,

FIGURE 4 | Somatic metaphase chromosomes of Mugil hospes assayed by FISH and counterstained with DAPI: (A) 5S rDNA (arrows) and 18S rDNA (arrowheads); (B) U1 snRNA (asterisks), 5S rDNA (arrows), chromosome pair 5 (circle), and (C) telomeric repeats. Enlargement of selected samples of chromosome pairs after DAPI staining (left) and FISH (right) are shown in the insets: (A) chromosome pairs 5 and 15, with probes showing 18S rDNA (above) and 5S rDNA (below) positive sites; (B) chromosome pair 5 (above), chromosome pair 10 (center) showing U1 snDNA positive sites and chromosome pair 15 (below) showing 5S rDNA.

FIGURE 5 | FISH karyotype. Interstitial secondary constriction corresponding to NOR (chromosome pair 5); 5S rDNA (chromosome pair 15) and U1 snDNA (chromosome pair 10) positive sites are evident.

although they are on different chromosomes in different species (**Figure 6**). The 18S rDNA cistrons seems to be more variable and can be found in the telomeric or interstitial regions of a long chromosome arm, or even on the short arms of different chromosomes. The variability in the localization of the major ribosomal genes could be attributable to their association with heterochromatinas that is observed in Mugil cephalus, M. margaritae (formerly M. curema), M. rubrioculus, M. curema, M. liza, M. trichodon, M. incilis, Mugil sp. O (Rossi et al., 1996, 2005; Nirchio et al., 2005a,b, 2007, 2017; Hett et al., 2011), and M. hospes (present study). Heterochromatin is known to evolve rapidly, and its composition, that includes highly repetitive simple sequences like satellite DNA and transposable elements, is often different even between closely related species. This characteristic might promote rearrangements of the associated genes and might play an important role in reproductive isolation between sister species (Hughes and Hawley, 2009).

Cytogenetic mapping of U1 snDNA probes in M. hospes showed the presence of a single U1 gene cluster, located in the terminal position of a chromosomal pair different from the 18S rDNA and 5S bearing chromosomes (**Figures 4B**, **5**). There are no data available on the chromosome mapping of these sequences in other Mugilidae; thus, it is not possible, at this stage, to compare our results with those of other species in the family. However, the mapping of these sequences, combined with other repetitive sequences in other Mugil, might allow the identification of other chromosome re-arrangement. The analysis of chromosome localization of these sequences is restricted to a few other fish species. In Merluccius merluccius (Merlucciidae), multiple interstitial U1 sites are present (García-Souto et al., 2015). In 19 species of cichlids (Cabral-de-Mello et al., 2012), these sites could be either interstitial or terminal on a single st/a chromosome pair, and represent good chromosomal markers that allow the detection of many microstructural chromosomal rearrangements. On the contrary, in five species of Astyanax (Characidae), there is a conserved pattern in the number of U1 sites per genome, and these sequences are frequently associated with 5S rDNA sequences (Silva et al., 2015).

Telomeric DNA repeat sequences were found at the very ends of chromosomes, as observed in 15 different orders of teleosts (Ocalewicz, 2013). In mugilids, telomeric repeats have been mapped in nine species (Gornung et al., 2004; Rossi

et al., 2005; Nirchio et al., 2017) and were found also to be interspersed in NORs. Signal intensity variability between chromosomes, as observed in M. hospes, has been previously reported in other fishes (Rocco et al., 2002; Ocalewicz and Dobosz, 2009; Pomianowski et al., 2012), including Mugil species such as M. cephalus (Gornung et al., 2004), M. liza and M. margaritae (Rossi et al., 2005), and Mugil sp. O (Nirchio et al., 2017). This variability is probably due to differing copy numbers of these sequences in the different sites (Lansdorp et al., 1996).

# CONCLUSION

The data presented here confirm that a complex dynamic has played in the karyotype evolution of Mugil, and they reiterate the usefulness of cytogenetic and molecular data in the reconstruction of relationships among taxa within Mugilidae. Species of this family are usually characterized by morphological features that are "insufficient to describe its actual species diversity" (Durand and Borsa, 2015). The combined use of morphological, molecular and cytogenetic analysis is necessary in these fishes to avoid species misidentification and to reconcile the confused picture obtained by morphology-based taxonomy with molecular-based taxonomy. In the case of the M. hospes species complex, Mugil sp. R, which is distributed in the Caribbean Sea, still deserves a formal morphological description and specific name attribution. This species, along with the Brazilian species M. brevirostris, also requires a karyotype description. Thus, at this stage, it is not possible to determine whether this complex is characterized by karyotypes that differ in the total number and morphology of chromosomes, like the M. curema species complex (Nirchio et al., 2017), or whether it is characterized by karyotype homogeneity, like M. cephalus species complex (Rossi et al., 1996).

#### AUTHOR CONTRIBUTIONS

fgene-09-00017 February 2, 2018 Time: 14:39 # 8

MN, CO, and AR designed the study. FP and VM conducted the lab work. MN, AR, and VM designed and conducted the analyses. All authors analyzed the results and wrote the manuscript.

#### FUNDING

Funding was provided by: Centro de Investigación of Universidad Técnica de Machala, Ecuador; Fundação de Amparo à Pesquisa do Estado de São Paulo -FAPESP-, Brazil; Conselho

#### REFERENCES


Nacional de Desenvolvimento Científico e Tecnológico -CNPq-, Brazil; Sapienza University, Rome (Progetto Ricerca Università 2016).

#### ACKNOWLEDGMENTS

The authors are grateful to Dr. Naercio Menezes and Dr. Ricardo Britzke, who helped with the identification of M. hospes. All experiments comply with the current laws of Ecuador and Italy.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00017/full#supplementary-material

of Grey Mullet (Mugilidae), eds D. Crosetti and S. J. M. Blaber (Boca Raton, FL: CRC Press), 1–20.



neotropical ichthyofauna: DNA barcoding in the recently described genus Megaleporinus (Characiformes: Anostomidae). Front. Genet. 8:149. doi: 10.3389/fgene.2017.00149


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Nirchio, Paim, Milana, Rossi and Oliveira. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Gene and Blood Analysis Reveal That Transfer from Brackish Water to Freshwater Is More Stressful to the Silverside Odontesthes humensis

Tony L. R. Silveira<sup>1</sup>† , Gabriel B. Martins<sup>2</sup>† , William B. Domingues<sup>1</sup> , Mariana H. Remião<sup>3</sup> , Bruna F. Barreto<sup>1</sup> , Ingrid M. Lessa<sup>1</sup> , Lucas Santos<sup>1</sup> , Danillo Pinhal<sup>4</sup> , Odir A. Dellagostin<sup>5</sup> , Fabiana K. Seixas<sup>3</sup> , Tiago Collares<sup>3</sup> , Ricardo B. Robaldo<sup>2</sup> and Vinicius F. Campos<sup>1</sup> \*

<sup>1</sup> Laboratory of Structural Genomics, Technological Development Center, Federal University of Pelotas, Pelotas, Brazil, <sup>2</sup> Laboratory of Physiology, Institute of Biology, Federal University of Pelotas, Pelotas, Brazil, <sup>3</sup> Laboratory of Cancer Biotechnology, Technological Development Center, Federal University of Pelotas, Pelotas, Brazil, <sup>4</sup> Genomics and Molecular Evolution Laboratory, Department of Genetics, Institute of Biosciences of Botucatu, São Paulo State University, Botucatu, Brazil, <sup>5</sup> Laboratory of Vaccinology, Technological Development Center, Federal University of Pelotas, Pelotas, Brazil

#### Edited by:

Roberto Ferreira Artoni, Ponta Grossa State University, Brazil

#### Reviewed by:

Daniele Aparecida Matoso, Federal University of Amazonas, Brazil Ricardo Shohei Hattori, Agência Paulista de Tecnologia dos Agronegócios, Brazil

#### \*Correspondence:

Vinicius F. Campos fariascampos@gmail.com †These authors have contributed equally to this work.

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 26 October 2017 Accepted: 22 January 2018 Published: 06 February 2018

#### Citation:

Silveira TLR, Martins GB, Domingues WB, Remião MH, Barreto BF, Lessa IM, Santos L, Pinhal D, Dellagostin OA, Seixas FK, Collares T, Robaldo RB and Campos VF (2018) Gene and Blood Analysis Reveal That Transfer from Brackish Water to Freshwater Is More Stressful to the Silverside Odontesthes humensis. Front. Genet. 9:28. doi: 10.3389/fgene.2018.00028 Silversides are fish that inhabit marine coastal waters, coastal lagoons, and estuarine regions in southern South America. The freshwater (FW) silversides have the ability to tolerate salinity variations. Odontesthes humensis have similar habitats and biological characteristics of congeneric O. bonariensis, the most studied silverside species and with great economic importance. Studies revealed that O. bonariensis is not fully adapted to FW, despite inhabiting hyposmotic environments in nature. However, there is little information about stressful environments for cultivation of silverside O. humensis. Thus, the aim of this study was to evaluate the stress and osmoregulation responses triggered by the osmotic transfers on silverside O. humensis. Silversides were acclimated to FW (0 ppt) and to brackish water (BW, 10 ppt) and then they were exposed to opposite salinity treatment. Silverside gills and blood were sampled on pre-transfer (D0) and 1, 7, and 15 days (D1, D7, and D15) after changes in environmental salinity, the expression levels of genes atp1a3a, slc12a2b, kcnh1, and hspa1a were determined by quantitative reverse transcription-PCR for evaluation of osmoregulatory and stress responses. Furthermore, glycemia, hematocrit, and osmolality were also evaluated. The expression of atp1a3a was up- and down-regulated at D1 after the FW–BW and BW– FW transfers, respectively. Slc12a2b was up-regulated after FW–BW transfer. Similarly, kcnh1 and hspa1a were up-regulated at D1 after the BW–FW transfer. O. humensis blood osmolality decreased after the exposure to FW. It remained stable after exposure to BW, indicating an efficient hyposmoregulation. The glycemia had a peak at D1 after BW–FW transfer. No changes were observed in hematocrit. The return to the pretransfer levels at D7 after the significant increases in responses of almost all evaluated molecular and blood parameters indicated that this period is enough for acclimation to the experimental conditions. In conclusion, our results suggest that BW–FW transfer is more stressful to O. humensis than FW–BW transfer and the physiology of O. humensis is only partially adapted to FW.

Keywords: acclimation, blood, brackish water, fish, freshwater, genes, salt, transfer

# INTRODUCTION

fgene-09-00028 March 25, 2019 Time: 15:23 # 2

Odontesthes spp., popularly known as pejerreyes or silversides, is a fish genus naturally endemic from South America waters (Bemvenuti, 2006). This genus comprises the biggest number of species of Atherinopsidae and the broader area of distribution, with species inhabiting marine coastal waters, coastal lagoons, and estuarine regions in southern South America (Dyer, 2006). Currently, some fishes from these species occupy FW environments, despite all Odontesthes spp. have a recent common marine origin (Heras and Roldán, 2011; Campanella et al., 2015). Thus, the FW silversides have an interesting ability to tolerate salinity variations. This euryhaline characteristic has drawn attention of researchers to its application in aquaculture in estuarine regions, where there are continuous alterations in water salinity (Piedras et al., 2009).

The most studied species of silverside, with economic importance and favorable projections for production, is Odontesthes bonariensis. In Argentina, it has an importance for flesh commercialization and sport fishing (Menone et al., 2000; Somoza et al., 2008). In Japan, O. bonariensis cultivation has been deployed, due to the efforts to farming conditions improvements in the last few years (Tsuzuki et al., 2000b, 2001, 2007). This species has also been introduced in Chile, Bolivia, and Italy (Dyer, 2000, 2006). The species is also exotic and has economic importance in Peru, where O. bonariensis was introduced in the Titicaca Lake in 1946 and until nowadays continues to be caught (Loubens and Osorio, 1991).

On the other hand, the congeneric Odontesthes humensis receives little attention from researchers and fish farmers. Only studies involving fry growth (Pouey et al., 2012), embryos salinity tolerance (Piedras et al., 2009), growth in intensive system (Tavares et al., 2014), and toxicology (Zebral et al., 2017) are developed in captivity specimens. Both O. bonariensis and O. humensis species have similar habitats, biological characteristics (including morphology), and are capable of interbreeding. Due to this, both species have similar potential for captivity cultivation (Dyer, 2006; Piedras et al., 2009). Main differences between both species are: O. bonariensis is the largest species in Atheriniforms and it has 32–38 gill rakers on lower branch, vomerine teeth present, and the prognathism can be present. Furthermore, this species is native to Brazil and Argentina, but not to Uruguay. On the other hand, O. humensis is smaller, has 13–19 gill rakers on lower branch, vomerine teeth is absent, prognathism is never present, and the species is native to Brazil, Uruguay, and Argentina (Dyer, 2006). Although the commercial production of O. humensis has not increased in recent years, this species has potential to be a bioindicator in toxicological studies (Zebral et al., 2017) due to its demands for water quality, surviving only in a narrow range of water parameters.

Thus, information about basic biology of O. humensis is necessary to the adequate maintenance of this silverside species in farm and laboratory conditions. Some studies revealed that the silverside O. bonariensis is not fully adapted to FW yet, despite inhabiting hyposmotic environments in nature (Tsuzuki et al., 2000b, 2001). However, there is no information about the less stressful environment, if with or without salt, to the cultivation of silverside O. humensis. We hypothesized that O. humensis shared, besides the habitats and reproductive physiology, the same pattern of response of O. bonariensis to environmental salinity. The corroboration of this hypothesis will help in maintenance of laboratorial or commercial cultivation, in addition to reinforcing the theory of invasion of FW habitats by silversides marine ancestors. Thus, the aim of this study was to evaluate, using cellular, biochemical, and molecular analysis, the stress and osmoregulation responses triggered by the osmotic transfers in silverside O. humensis.

#### MATERIALS AND METHODS

#### Animals and Conditions

The silversides O. humensis used in this study came from eggs collected in nature (Arroio Grande, Brazil – 32◦ 140 1500S/53◦ 050 1300W) and hatched in tanks. The eggs were collected from wild breeders and incubated at laboratory conditions in FW (0h) and pH values near 7.5, the same conditions applied to larviculture. The fish were 1.5 years old and have mean weight of 27.3 ± 9.5 g at experimental time. The silversides were maintained in 1,000 L cylindrical plastic tanks within an experimental room, fed three times a day with commercial feed (Supra, 38% crude protein) until satiety, under natural photoperiod of autumn (11 h light/13 h dark), with water pH 7.5 ± 0.3, and 19 ± 0.3◦C for acclimation. The tank sides were opaque to reduce visual stress and incidence of luminosity through the top according to the daily variance. The acclimation period was 4 weeks for silversides in FW (salinity of 0 ppt) and in BW (salinity of 10 ppt), under the same previous conditions. The salinity levels were achieved through the dissolution of non-iodized sea salt in the water.

The fish continued to be fed three times a day during the experimental period. Once a week the water of the tanks was totally renewed with dissolved oxygen maintained in 7.8 ± 0.3 mg L−<sup>1</sup> ; the total ammonia levels lower than 0.6 ± 0.2 mg L−<sup>1</sup> ; temperature 20.01 ± 0.34◦C; and pH 7.78 ± 0.03.

#### Experimental Design and Protocol

The experimental setup consisted in two treatments performed in quadruplicate, totaling eight tanks with 15 fish each. Four tanks contained silversides acclimated to FW and other four tanks contained silversides acclimated to BW. Both FW and BW groups were shortly transferred to opposite salt treatment. Fish were sampled at four different time points getting three fish per tank: a control before the osmotic transfer (D0) and on day 1 (D1), on day 7 (D7), and on day 15 (D15) after the transfer. Hence, 96 silverside fish were analyzed in different times of salt exposure.

#### Sampling

When captured, at all time points, the silversides were anesthetized with benzocaine (50 mg L−<sup>1</sup> ). Blood samples

**Abbreviations:** BW, brackish water; FW, freshwater; SW, seawater.

were collected from caudal puncture using a heparinized syringe for hematocrit, glycemia, and osmolality evaluation. While anesthetized, the fish were euthanized by medullary section and brain excision to perform the post-mortem gills collection. Gill samples from three fish of each tank, at the four time points, were collected using sterilized material. Left second branchial arches were collected and immediately stored in liquid nitrogen (N2) until posterior RNA preparation. All experimental procedures were previously approved by the Ethics Committee on Animal Experimentation of Federal University of Pelotas (Process no. 23110.007018/ 2015-85).

#### Analyzed Genes

The Na+/K+-ATPase (NKA) enzyme is responsible to active transports three Na<sup>+</sup> out of and two K<sup>+</sup> into animal cells. In fishes, NKA is directly involved in the osmoregulation process (Hiroi and McCormick, 2012). Among NKAα1∼α4, we selected NKAα3 (encoded by atp1a3) which has similar responses (Stefansson et al., 2007; Whitehead et al., 2012; Armesto et al., 2014; Li et al., 2014; Taugbøl et al., 2014) and is expressed in similar tissues (Guynn et al., 2002) as NKAα1. NKAα1 is the most studied isoform of NKA (Havird et al., 2013), however, its mRNA expression can be influenced by the stress (Kiilerich et al., 2011) and this could disrupt the interpretation of results.

The Na+/K+/2Cl<sup>−</sup> cotransporter 1 (NKCC1) mediates the intake of one Na+, one K+, and two Cl<sup>−</sup> ions into the ionocytes down the electrochemical gradient generated by NKA (Hwang et al., 2011). In teleosts there are two isoforms of NKCC1 and NKCC2: NKCC1a/b and NKCC2a/b (Kakumura et al., 2015). NKCC1 (encoded by slc12a2) is more associated with ion secretion in teleost gills (Tipsmark et al., 2002; Nilsen et al., 2007), whereas NKCC2 mRNA is not present in large amount in fish gills (Hiroi and McCormick, 2012).

The voltage-gated potassium channels (KCNH) are gated by changes in membrane potential and play an important role in the repolarization of cells (Morais-Cabral and Robertson, 2015). KCNH1 has already been correlated to myoblast fusion (Occhiodoro et al., 1998), to embryonic development (Stengel et al., 2012), and has been used as target to cancer researches (Pardo and Stühmer, 2013) but the precise role of KCNH1 in normal tissues is poorly known. The kcnh1 gene has two copies in fishes, kcnh1a/b that are expressed differently in the organism (Stengel et al., 2012). KCNH4 has already been proposed as an osmoregulation marker (Taugbøl et al., 2014).

The heat shock 70kDa proteins (HSP70) are mainly responsible to protein folding, degradation of misfolded proteins, membrane translocation, and activation of immune system. HSP70 are expressed under normal conditions but also in response to stress conditions (Morimoto, 1998; Yamashita et al., 2010; McCallister et al., 2016). The genes encoding HSP70 in fishes are hspa1a and hspa1b (He et al., 2013). Thus, both hspa1 genes and the HSP70 protein are considered stress indicators in fish (Basu et al., 2001; Larsen et al., 2012; Taugbøl et al., 2014).

#### Molecular Cloning and Sequencing

Total RNA was isolated from gill samples of each fish separately, using RNeasy <sup>R</sup> Mini Kit (Qiagen, United States) according to the manufacturer's protocol. DNase treatment of RNA samples was conducted with a DNA-Free <sup>R</sup> Kit (Ambion, United States) following the manufacturer's protocol. The RNA concentration and quality were verified by spectrophotometry using NanoVueTM Plus (GE Healthcare Life Sciences, United states), and only samples with absorption A260/A<sup>280</sup> ratio ≥2.0 were used in reverse transcription (RT) reactions.

First-strand cDNA was performed with 2 µg of RNA using random primers and SuperScript <sup>R</sup> III Reverse Transcriptase (Invitrogen, United States) according to the manufacturer's protocol. Primer sets (**Table 1**) were designed to clone partial cDNA sequences of atp1a3, slc12a2, kcnh1, and hspa1 genes from gills of silverside fish based on the alignment of sequences of other fish species. The PCR parameters were: an initial denaturation for 1 min at 94◦C, followed by 35 cycles of 94◦C for 30 s; 57◦C (for atp1a3); 62◦C (for slc12a2); 64.6◦C (for kcnh1); and 61◦C (for hspa1) for 30 s; and 72◦C for 1 min, with a final extension of 5 min at 72◦C. PCR products were inserted into pCRTM4-TOPO <sup>R</sup> TA cloning vector and transformed in the electrocompetent Escherichia coli strain DH5α. These procedures were performed according to the manufacturer specifications

TABLE 1 | Sequences of primers for Odontesthes humensis used in this study.


(Invitrogen, United States). The positive clones were sequenced using a Big Dye <sup>R</sup> V3.1 Terminator Kit and M13 Primers (Applied Biosystems, United States). The cloned fragments were sequenced using an Applied Biosystems 3500 Genetic Analyzer <sup>R</sup> automatic sequencer (Life Technologies, United States).

#### In Silico and Phylogenetic Analysis

The identity of all sequenced cDNA was confirmed by Blast tool of the NCBI database<sup>1</sup> . The translation of sequenced nucleotides to amino acid sequences as well as the open-reading frame (ORF) identification was made using ExPASy bioinformatics resource portal<sup>2</sup> . Conserved domains and sites were mapped from UniProt database<sup>3</sup> . Amino acid sequences of NKA, NKCC, KCNH, and HSP70 from various vertebrate species were obtained from GenBank and aligned to the new deduced amino acid sequences of O. humensis using ClustalW. Phylogenetic analyses were carried out using MEGA version 6 (Tamura et al., 2013). Phylogenetic trees were constructed based on alignments of amino acid sequences using Neighbor-Joining method. Bootstrap analyses were done with 10,000 replications to test the tree reliability. The outgroups were Caenorhabditis elegans (for NKA and HSP70 analysis), Strongylocentrotus purpuratus (for NKCC analysis), and Toxocara canis (for KCNH analysis).

#### Evaluation of Gene Expression after Salinity Changes

Quantitative RT-PCR (qRT-PCR) was run on CFX96TM Real-Time PCR Detection System (Bio-Rad Laboratories, United States) using SYBR <sup>R</sup> Green PCR Master Mix (Applied Biosystems, United States). Primers (**Table 1**) for the identified genes atp1a3, slc12a2, kcnh1, and hspa1 and for the used reference genes β-actin (actb, GenBank Accession No. EF044319) and histone h3a (h3a, GenBank Accession no. KX060037) were designed using Primer3 online software<sup>4</sup> . Initial validation experiments were conducted to ensure that all primer pairs had equivalent PCR efficiencies calculated by standard dilution curve. Amplification was carried out at the standard cycling conditions of 95◦C for 10 min, followed by 40 cycles at 95◦C for 15 s, 60◦C for 60 s followed by conditions to calculate the melting curve. All PCR runs for each cDNA sample were performed in triplicate. The qRT-PCR data were analyzed using the 2−11C<sup>t</sup> method considering primer amplification efficiencies (Livak and Schmittgen, 2001).

#### Blood Analysis

After blood collection, with fasted animals, the glycemia was measured using Accu Chek Glucometer (Roche Diagnostics, United Kingdom). The samples from each fish were immediately transferred to microcapillaries and centrifuged at 12,000 × g for 5 min to hematocrit analysis. The blood was transferred to sterilized microtubes and centrifuged at 1,500 × g at 4◦C for 10 min to plasma separation for osmolality measurements in Vapro 5520 <sup>R</sup> Vapor Pressure Osmometer (Wescor, United States).

### Statistical Analysis

Quantitative data were expressed as means ± standard error of mean. Significant differences among means were evaluated by two-way ANOVA followed by Tukey's test, setting the significance level at 95% (P < 0.05). The exception was the expression of reference genes, which was expressed as means of cycle threshold (Ct) values ± standard deviation of mean and the differences were analyzed by one-way ANOVA followed by Tukey's test.

# RESULTS

#### cDNA Cloning and Characterization

The atp1a3, slc12a2, kcnh1, and hspa1 cloned fragments from O. humensis were, respectively, 975, 423, 560, and 286 bp (base pairs) in length. They were sequenced and deposited under GenBank accession numbers KR920364, KT001464, KX035016, and KU639716, respectively. The cloned fragments of atp1a3 and kcnh1 belong, respectively, to the middle of the ORF +2 and to the initial part of the ORF +3. The fragments of slc12a2 and hspa1 are part of the center of ORF +1. The atp1a3 fragment codes to 324 amino acids belonging to the P-type ATPase family. The slc12a2 codes to 141 amino acid residues belonging to the Na/K/Cl co-transporter 1 family. The cloned fragment of kcnh1 codes to 186 amino acids belonging to the potassium channel, voltage-dependent family. The cloned fragment of hspa1 codes to 95 amino acids belonging to the heat shock protein 70 family.

The percentage identity values between NKAα3, NKCC1, KCNH, and HSP70 putative sequences of silverside and of the other analyzed species were, respectively, 77–98, 55–92, 26–99, and 37–97% (**Supplementary Figures S1**–**S4**). The phylogenetic tree constructed based on alignment of NKAα-subunits amino acid sequences (**Supplementary Figure S1**) reveals that α1, α2, α3, and α4 form different clusters with mammal and fish sequences in each, except in α4 group, which was devoid of NKA sequences of fishes. Furthermore, the silverside NKAα3 was grouped in the monophyletic α3 cluster, showing that it was more closely related to NKAα3a than NKAα3b and NKAα2. Based on these evidences, the sequence found in the gills of O. humensis was identified as atp1a3α. NKCC1 and NKCC2 were grouped in different clusters (**Supplementary Figure S2**).

The paralogous NKCC1a and NKCC1b were grouped together in NKCC1 group. Furthermore, the tree has shown more evidence that silverside NKCC is closer to fish NKCC1b than to NKCC1a or NKCC2. Thus, the cloned and sequenced cDNA from silverside gills was identified as slc12a2b. The KCNH phylogenetic tree (**Supplementary Figure S3**) forms two monophyletic main groups: ELK family cluster, composed by vertebrates KCNH3, KCNH4, and KCNH8, and EAG family cluster, composed by vertebrates KCNH5 and KCNH1, in which the new sequence of silverside KCNH1 was grouped. Based on this analysis, the cloned KCNH sequence from silverside gills was confirmed as kcnh1.

<sup>1</sup>http://blast.ncbi.nlm.nih.gov/

<sup>2</sup>https://www.expasy.org/

<sup>3</sup>http://www.uniprot.org/

<sup>4</sup>http://primer3.ut.ee

The HSP70 phylogenetic analysis (**Supplementary Figure S4**) separated HSPA1 and HSPA2 sequences in different clusters. HSPA2 was composed only by sequences of mammals while HSPA1 was composed by mammals and teleosts. The new sequence of HSPA1 from silverside was clustered within fish HSPA1a. Based on these results, the HSPA sequence from silverside gills was identified as hspa1a.

#### Gene Expression after Salinity Changes

Primer sets for atp1a3a, slc12a2b, kcnh1, hspa1a, actb, and h3a had amplification efficiencies in qRT-PCR of 1.14, 1.07, 1.06, 1.02, 0.98, and 1.05, respectively. No difference was observed in the expression of reference genes between four different time points or between the treatments (**Supplementary Table S1**). The relative expression of atp1a3a (**Figure 1A**) was not different (P > 0.05) between FW- and BW-acclimated fishes in control D0. However, after the transfer of FWacclimated fish to BW (FW–BW) and vice versa (BW–FW), a change in atp1a3a expression was observed. In FW–BWtransferred fish, there was an increase (P < 0.05) in atp1a3a expression, while in BW–FW there was a decrease at D1 (P < 0.05). The levels of atp1a3a expression returned close to the initial conditions in both FW–BW and BW–FW groups, without difference between groups (P > 0.05), at D7 and D15.

The relative expression of slc12a2b (**Figure 1B**) was not different between FW- and BW-acclimated fish at D0 (P > 0.05). However, after the FW–BW and BW–FW transfers, a change in slc12a2b mRNA expression was observed. In FW–BW-transferred fish, there was an increase (P < 0.05) in slc12a2b mRNA levels at D1. The gene expression decreased (P < 0.05) to intermediary levels at D7 and remained without difference until D15 (P > 0.05). The slc12a2b expression levels did not change (P > 0.05) after the transfer in the BW–FW group in any experimental time. Difference in slc12a2b relative expression between the BW–FW and FW–BW groups (P < 0.05) was observed at D1 and D7.

None significant difference was observed between the relative expression of kcnh1 (**Figure 1C**) at D0 from FW- or BWacclimated silversides (P > 0.05). After the transfers, at D1, it was observed a significant increase (P < 0.05) in the kcnh1 expression of the BW–FW group. The expression levels decreased (P < 0.05), returning to the initial patterns at D7 and D15. The kcnh1 expression levels did not change significantly (P > 0.05) after the transfer in the FW–BW group in any experimental time. Difference in kcnh1 relative expression between the BW–FW and FW–BW groups (P < 0.05) was observed only at D1.

The relative expression of hspa1a (**Figure 1D**) was not different between FW- and BW-acclimated fish at D0 (P > 0.05). After the transfers, at D1, it was observed a significant increase (P < 0.05) in the hspa1a relative expression of the BW–FWtransferred fish. The expression levels decreased (P < 0.05) to initial pattern at D7 and remained unchanged until D15 (P > 0.05). The hspa1a expression levels did not change (P > 0.05) after the transfer in the FW–BW group in any experimental time. Difference in hspa1a relative expression between the BW–FW and FW–BW groups (P < 0.05) was observed only at D1.

#### Glycemia

No difference in blood glucose concentration (P > 0.05) was observed between FW- or BW-acclimated fish at D0 (**Figure 2A**). The BW–FW-transferred fish presented an increase (P < 0.05) in glycemia values at D1, a decrease (P < 0.05) to pre-transfer levels at D7, and remained without difference (P > 0.05) until D15. None difference (P > 0.05) between time-points was observed in FW–BW transfer group. Significant difference between both groups was observed only at D1.

#### Hematocrit

No difference (P > 0.05) between the hematocrit (**Figure 2B**) in none group of analyzed silverside of experimental time was observed. Only trends of increase in hematocrit soon after BW– FW transfer and decrease after FW–BW were verified.

#### Osmolality

No difference (P > 0.05) between the blood osmotic concentration (**Figure 2C**) of FW- and BW-acclimated fishes was observed at D0. After transfer, there was a reduction (P < 0.05) in osmolality of BW–FW-transferred fish that was kept the same until D15. Differences were not observed in blood osmolality of FW–BW-transferred fish. Differences (P < 0.05) between groups were found right after the transfer (D1) and at the final measurements (D15).

### DISCUSSION

Salinity is a frequent abiotic stressor that restrains fish growth and development by inducing osmotic stress responses. When fishes are subjected to salinity stress, related genes are activated to induce salinity stress tolerance. This study was successful in the identification and characterization of atp1a3a, slc12a2b, kcnh1, and hspa1a mRNA sequences in the silverside O. humensis. The molecular cloning made possible to adequately monitor changes in the expression of these genes after salinity challenges. In addition, it was demonstrated that alterations in blood glucose and osmolality after environmental salinity change.

In FW–BW-transferred fish, atp1a3a expression increased soon upon salinity stress, whereas in BW–FW group it was downregulated. Similar effects in atp1a3a expression were also observed in gills of Mozambique tilapia challenged by salinity stress, which presented higher expression levels in SW and lower in FW (Li et al., 2014). In threespine stickleback, the SW transfer also generated an increase in atp1a3a mRNA expression (Taugbøl et al., 2014). The increase of mRNA expression and protein activity of NKAα-subunit in gills, in response to the SW transfer, occurred in Atlantic salmon (D'Cotta et al., 2000), in brown trout (Tipsmark et al., 2002), and in killifish (Scott and Schulte, 2005; Scott et al., 2008). Yet, the transfer to lower salinity generates a decrease in mRNA expression and protein activity of NKAα-subunit in gills of Mozambique tilapia (Lin et al., 2004). In the present study, the downregulation in atp1a3a expression

same group. The asterisk represents the significant difference between groups in a same experimental time. The vertical broken line indicates the transfer from BW to FW and from FW to BW moments.

indicates a reduction in osmoregulatory activity due the lack of salt in water.

In silverside, slc12a2b expression in gills and in response to increase of salinity was well detectable, evidencing the high importance of NKCC1b to hyposmoregulation of O. humensis. While NKA can have activity in ion absorption and secretion, NKCC1 is more associated with ion secretion in teleost gills. Here, the augmented water salt concentration leads O. humensis to increase its slc12a2b mRNA expression when FW–BW transferred. This fact, in association with the blood osmolality stabilization even when transferred to BW, indicates an efficient ion secretion activity. Furthermore, the transfer to lower salinity media did not affect the slc12a2b mRNA expression. Despite the opposite responses of NKCC in hypo- or hyperosmotic environment are the most common reported in fishes, none variations after salinity changes is also possible, especially in transfers for low salinities. In gills of brown trout (Tipsmark et al., 2002) and striped bass (Tipsmark et al., 2004), both gene and protein expression of NKCC1 were positively correlated to environmental salinity, with increases after FW–SW transfer and decreases after SW–FW transfer. In brackish medaka (Kang et al., 2010) and climbing perch (Loong et al., 2012), the slc12a2b mRNA expression and NKCC protein quantity followed the saline concentration of water inhabited by FW- and SWacclimated fish. However, no response was reported in slc12a2a mRNA expression after the transfer of Mozambique tilapia from BW–FW (Breves et al., 2010).

The branchial tissue of silversides of this study had an increase in kcnh1 mRNA expression soon after transfer to FW. A similar result was obtained in threespine stickleback, in which the kcnh4 mRNA expression was higher in FW- than in SW-acclimated fish (Taugbøl et al., 2014). Furthermore, the return of mRNA expression to pre-transfer levels indicates an acclimation to media. Although kcnh1 functions in organisms are not well known, our results reveal that this gene may play a role in osmoregulation.

The HSP70 overexpression is induced by some environmental stressors, such as temperature changes, UV and γ-irradiation, and chemical exposure (Place and Hofmann, 2005; Yamashita et al., 2010; Rajeshkumar et al., 2013). In O. humensis, the salinity change was able to induce a response of hspa1a gene. The same was observed in threespine stickleback (Taugbøl et al., 2014), in North Sea cod and Baltic Sea cod (Larsen et al., 2012), and in Kaluga (Peng et al., 2016). The up-regulation in hspa1a mRNA after BW–FW transfer indicates an increase in stress. The significant down-regulation observed after 7 days in FW indicates a reduction in stress levels, indicating acclimation. On the other hand, the unchanged hspa1a mRNA levels after the FW–BW transfer reveal that this salinity challenge is less stressful to O. humensis.

Blood glucose has been used to estimate acute stress conditions in fishes (Cataldi et al., 2005), as reported to silverside O. bonariensis (Tsuzuki et al., 2001). Furthermore, the hematocrit levels also can serve as stress indicator in fishes. The increment in hematocrit may indicate the presence of an active stress factor (Hudson et al., 2008). The quick increase in glycemia and the trend of increase in hematocrit, allied to the increase in hspa1a expression after BW–FW transfer, reveal that this "salt-free" environment is significantly more stressful to O. humensis than the BW medium. In O. bonariensis, the transfer from FW to 20 ppt BW triggered a decrease in stress followed by a decrease in glycemia and hematocrit, indicating that FW is more stressful than BW also in this species (Tsuzuki et al., 2001). Moreover, the unchanged blood glucose and hematocrit corroborates to hspa1a expression results, which indicates that FW–BW transfer leads to lower levels of stress in O. humensis.

Reduction in atp1a3a expression and blood osmolality in BW–FW-transferred silversides indicates a decrease in osmoregulation activity, which is energetically expensive to fishes. Even so, the stress in BW–FW group was higher, suggesting that the physiology of O. humensis is only partially adapted to FW. In the FW–BW transfer group, the blood osmolality maintains its levels. In O. bonariensis, the transfer to 20 ppt BW generates a significant increment in osmolality until 24 h post-transfer (Tsuzuki et al., 2001, 2007). The 5 ppt BW transfer also generated an increase in blood osmolality, but not as significant as 20 ppt (Tsuzuki et al., 2007). Here, O. humensis osmolality remained without significant variation in FW–BW. Maybe the 10 ppt salt concentration medium was insufficient to increase significantly the blood osmolality of O. humensis, despite the half salt concentration is able to increase this parameter of O. bonariensis. However, we hypothesized that FW–BW transference causes increased recruitment of both atp1a3a and slc12a2b, which would work together in the control of ion secretion and stabilization of blood osmolality. This may indicate that the silverside O. humensis is more efficient than O. bonariensis in the hyposmoregulation process.

The return of osmoregulatory genes expression and blood glucose close to pre-transfer levels after significant increase indicates an efficient acclimation (Grutter and Pankhurst, 2000; Havird et al., 2013). The blood glucose and atp1a3a, slc12a2b, kcnh1, and hspa1a mRNA stabilized in 7 days. These markers of osmoregulation and stress corroborated acclimation after 1 week despite the high stress occurred after the saline challenge.

Our results suggest that the BW–FW transfer is more stressful to O. humensis than the FW–BW transfer. Even though O. humensis is not widely cultivated in aquaculture today, the information that this species follows the responses of O. bonariensis and remains less stressed when in brackish environments is very important. Salinity levels were shown to modulate the energy supply available for growth and reproduction in farmed fishes (Altinok and Grizzle, 2001; Chand et al., 2015), and its optimal adjustment can benefit O. humensis production in captivity for aquaculture purposes.

The brackish medium has potential for decrease in economic losses by mortality due to handling, transport, crowding, and poor water quality (Strüssmann et al., 1996; Tsuzuki et al., 2000b, 2001) and increase the survival rate of embryos in farm production (Piedras et al., 2009). Keeping O. humensis in this near-isosmotic environment can potentially allow for better growth rates, food conversion ratio, energy absorption efficiency, among other parameters of interest. Furthermore, the cultivation of O. humensis in conditions closer to the ideal decreases the stress and the physiological variations due to it and favors the use of this species as a model in scientific research.

Even regarded as FW, these species are commonly in contact with salt and BW on estuary and coastal lagoons of South America and have better development and survival in saline environments (Tsuzuki et al., 2000a, 2001; Piedras et al., 2009). The coastal plain of southern Brazil system was originated from successive transgressions and regressions moves since the upper Pleistocene (Bemvenuti, 2006). The formation and radiation of Odontesthes spp. is also recent and occurs during Pleistocene– Holocene (Beheregaray et al., 2002; Lovejoy et al., 2006; Heras and Roldán, 2011). Furthermore, there are strong evidences that various genus of Atherinopsidae order, including Odontesthes, have an evolutionary tendency to invade the continental FWs (Bloom et al., 2013; Campanella et al., 2015). Thus, the same events that originated the Southeastern South America continental lakes and lagoons may have triggered the speciation process of Odontesthes spp.

The previous and the present study corroborate to the theory that the FW is not the ideal environment for O. humensis and give more arguments to the theory of repeated invasion of FW habitats by silversides marine ancestors. The conquest of the FW environment is very recent, so there has not been enough time for the selection of adaptive mechanisms for life in this condition without the elimination of some basal stress levels.

# AUTHOR CONTRIBUTIONS

TS, GM, and VC were responsible for experimental design, data analysis, and manuscript writing. TS, GM, MR, and RR were responsible for fish acclimation and maintenance. TS, GM, BB, IL, LS, MR, RR, and WD were responsible for the biological collections. TS, BB, IL, LS, WD, and VC were responsible for the molecular biology, from RNA extraction to sequencing, and qRT-PCR analysis. OD, TC, and FS were also responsible for qRT-PCR analysis. DP and RR were also responsible for data analysis and language review.

#### FUNDING

This study was supported by the Ministério da Ciência (422292/2016-8), Tecnologia e Inovação/Conselho Nacional de Desenvolvimento Científico e Tecnológico (Edital Universal No. 472210/2013-0) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (AUXPE 2900/2014). TS, GM, WD, and MR are individually supported by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior. DP, OD, TC, FS, RR, and VC are also individually supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico.

# ACKNOWLEDGMENTS

We are greatly thankful to Mr. Nilton Link and Mr. Vítor Colvara for the help in the maintenance of animals and Janaína Pedron for the help in blood osmolality analysis.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2018. 00028/full#supplementary-material

FIGURE S1 | Phylogenetic analysis of NKAα-subunit amino acid sequences of mammals and teleosts. The tree was generated by MEGA v6 software using the Neighbor-Joining method. The bootstrap values from 10,000 replicates are showed at each node. Values between parentheses represent the identity (in %) of the sequence with NKAα3a of Odontesthes humensis. Scale bar units represent the number of amino acid substitutions per site. NKAα1 GenBank ID: Homo sapiens, CAA27840.1. NKAα1a GenBank ID: Danio rerio, NP\_571762.1; Solea senegalensis, BAN17690.1; Fundulus heteroclitus, AAL18002.1; Oncorhynchus masou, BAJ13363.1; Oncorhynchus mykiss, NP\_001117933.1. NKAα1b GenBank ID: O. mykiss, NP\_001117932.1; O. masou, BAJ13362.1; S. senegalensis, BAN17691.1; D. rerio, NP\_571765.1. NKAα2 GenBank ID: Rattus norvegicus, NP\_036637.1; O. mykiss, NP\_001117930.1; D. rerio, AAF98359.1; S. senegalensis, BAO02373.1; F. heteroclitus, AAL18003.1. NKAα3 GenBank ID: H. sapiens, NP\_689509.1; R. norvegicus, NP\_036638.1; Oreochromis mossambicus, AAF75108.1; O. mykiss, NP\_001118102.1. NKAα3a GenBank ID: D. rerio, NP\_571759.2; S. senegalensis, BAN17693.1; Monopterus albus, AGV06213.1. NKAα3b GenBank ID: M. albus, AGV06214.1; S. senegalensis, BAN17692.1. NKAα4 GenBank ID: H. sapiens, Q13733.3; R. norvegicus, NP\_074039.1. NKAα1 from Caenorhabditis elegans, NP\_506269.1, was used as outgroup.

FIGURE S2 | Phylogenetic analysis of NKCC amino acid sequences of mammals and teleosts. The tree was generated by MEGA v6 software using the Neighbor-Joining method. The bootstrap values from 10,000 replicates are showed at each node. Values between parentheses represent the identity (in %) of the sequence with NKCC1b of O. humensis. Scale bar units represent the number of amino acid substitutions per site. NKCC1 GenBank ID: H. sapiens, P55011.1; Mus musculus, NP\_033220.2. NKCC1a GenBank ID: Anguilla Anguilla, CAD31111.1; Salmo salar, NP\_001117155.1; Oryzias dancena, ADN18710.1; O. mossambicus, AAR97731.1; Anabas testudineus AFK29496.1. NKCC1b GenBank ID: O. dancena, ADK47392.1; M. albus, AGX01628.1; O. mossambicus, AAR97732.1; A. anguilla, CAD31112.1. NKCC2 GenBank ID: H. sapiens, Q13621.2; M. musculus, P55014.2; O. mossambicus, AAR97733.1; Takifugu obscurus, BAH20440.1. NKCC1 from Strongylocentrotus purpuratus, NP\_001106707.1, was used as outgroup.

FIGURE S3 | Phylogenetic analysis of KCNH amino acid sequences from EAG and ELK subfamilies of mammals and teleosts. Third existent ERG subfamily was discard in order to maintain an abbreviated analysis. The tree was generated by MEGA v6 software using the Neighbor-Joining method. The bootstrap values from 10,000 replicates are showed at each node. Values between parentheses represent the identity (in %) of the sequence with KCNH1 of O. humensis. Scale bar units represent the number of amino acid substitutions per site. KCNH1 GenBank ID: H. sapiens, NP\_758872.1; M. musculus, NP\_034730.1; D. rerio, XP\_009291371.1; Oreochromis niloticus, XP\_005474450.1. KCNH3 GenBank ID: H. sapiens, NP\_001300959.1; D. rerio, XP\_001919436.3; Nothobranchius furzeri, SBP57798.1. KCNH4 GenBank ID: H. sapiens, NP\_036417.1; P. reticulata, XP\_008415281.2; O. niloticus, XP\_019213276.1; Maylandia zebra, XP\_012774834.1. KCNH5 GenBank ID: H. sapiens, NP\_647479.2; M. musculus,

NP\_766393.2; D. rerio, NP\_001263209.1; O. niloticus, XP\_003451242.1. KCNH8 GenBank ID: H. sapiens, NP\_653234.2; M. musculus, NP\_001026981.2; O. niloticus, XP\_003448945.1; M. zebra, XP\_004547937.1; Poecilia reticulata, XP\_008419899.1. KCNH1 from Toxocara canis, KHN74999.1, was used as outgroup.

FIGURE S4 | Phylogenetic analysis of HSP70 amino acid sequences of mammals and teleosts. The tree was generated by MEGA v6 software using the Neighbor-Joining method. The bootstrap values from 10,000 replicates are showed at each node. Values between parentheses represent the identity (in %) of the sequence with HSP70 of Odontesthes humensis. Scale bar units represent the number of amino acid substitutions per site. HSP70-1A GenBank ID:

#### REFERENCES


H. sapiens, NP\_005336.3; M. musculus, NP\_034609.2; Notothenia coriiceps, XP\_010769991.1; M. albus, AGO01980.1; Nothobranchius korthausae, SBQ68887.1. HSP70-1B GenBank ID: M. albus, AGO01981.1; D. rerio, NP\_001093532.1. HSP70-2 GenBank ID: H. sapiens, AAH36107.1; M. musculus, EDL36460.1. HSP70-1 from C. elegans, NP\_503068.1, was used as outgroup.

TABLE S1 | Expression of reference genes of Odontesthes humensis before (D0) and after (D1, D7, and D15) hypo- and hyperosmotic shock. Equal letters represent no difference in a same column. Abbreviations: Ct, cycle threshold values; D0, day zero; D1, day one; D7, day seven; D15, day fifteen; FW–BW, transfer from freshwater to brackish water; BW–FW, transfer from brackish water to freshwater.



of pejerrey aquaculture in South America. Aquacult. Res. 39, 784–793. doi: 10.1111/j.1365-2109.2008.01930.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Silveira, Martins, Domingues, Remião, Barreto, Lessa, Santos, Pinhal, Dellagostin, Seixas, Collares, Robaldo and Campos. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Molecular Evidences of a Hidden Complex Scenario in Leporinus cf. friderici

#### Rosane Silva-Santos\*, Jorge L. Ramirez, Pedro M. Galetti Jr. and Patrícia D. Freitas

Laboratório de Biodiversidade Molecular e Conservação, Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Paulo, Brazil

The megadiversity of the neotropical ichthyofauna has been associated to recent diversification processes, reflecting in subtle or lacking morphological differentiation between species, challenging the classical taxonomic identification. Leporinus friderici occurs in several river basins of South America, and its nominal taxonomic validity has been questioned. Its wide distribution within the Brazilian Shield suggests that this species could be genetically structured among the hydrographic basins, despite a sharp morphological similarity. In this study, we performed phylogenetic analyses, based on three nuclear (recombination activating gene 1, RAG1, recombination activating gene 2, RAG2, and myosin heavy chain 6 cardiac muscle alpha gene, Myh6) and two mitochondrial (COI and Cytochrome b, Cytb) markers, in specimens morphologically similar to L. friderici and related species from different hydrographic basins in South America. Our phylogenetic tree identified four well-supported clades, which point out to the existence of taxonomic inconsistencies within this fish group. A clade named L. cf. friderici sensu stricto included eight Molecular Operational Taxonomic Units recently diversified in the Brazilian Shield basins. These results were also confirmed by a single-gene species delimitation analysis. It is suggested that this clade includes a species complex, characterizing taxonomic uncertainties. Another clade recovered only L. friderici from the Suriname rivers, validating this nominal species in its type locality. A third no-named clade, characterized by deeper species divergence, recovered five different nominal species interleaved with other undescribed forms previously also recognized as L. cf. friderici, indicating taxonomic errors. The fourth clade only included L. taeniatus. Our results showed a complex scenario involving the morphotype L. cf. friderici and allowed us to address aspects related to evolutionary diversification of this fish group and historical processes involved with, highlighting the importance of revealing hidden biodiversity for the taxonomy and conservationist action plans of these fish.

Keywords: neotropical fish, MOTUs, fish phylogeny, taxonomic uncertainties, cryptic species

### INTRODUCTION

South America freshwater fish represent one-third of the world continental ichthyofauna (Reis et al., 2016). However, this huge biodiversity is relatively recent, mostly due to extensive speciation events during the last 10 Ma (Hubert et al., 2007; Albert and Reis, 2011). Several taxa diverged <1 Ma during the late Pleistocene because of the Quaternary activity that

#### Edited by:

Rodrigo A. Torres, Universidade Federal de Pernambuco, Brazil

#### Reviewed by:

Izeni Pires Farias, Federal University of Amazonas, Brazil Marcelo Ricardo Vicari, Ponta Grossa State University, Brazil

> \*Correspondence: Rosane Silva-Santos rosanesantos.gen@gmail.com

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 30 October 2017 Accepted: 31 January 2018 Published: 15 February 2018

#### Citation:

Silva-Santos R, Ramirez JL, Galetti PM Jr and Freitas PD (2018) Molecular Evidences of a Hidden Complex Scenario in Leporinus cf. friderici. Front. Genet. 9:47. doi: 10.3389/fgene.2018.00047

led to topographic changes such as recent drainage rearrangements (Ribeiro, 2006). These events probably reflected in subtle or lacking morphological differentiation between the emerged species, challenging the classical taxonomic identification.

Molecular analyses have been largely used to aid species identification and delimitation within neotropical fish (e.g., Carvalho et al., 2011; Pereira et al., 2011, 2013; Ramirez and Galetti, 2015; Machado et al., 2016), contributing in revealing hidden biodiversity (Pires et al., 2017; Ramirez et al., 2017a). Both DNA barcoding and phylogeny studies have provided important contributions for better understand the phylogenetic relationships within taxa, and historical and evolutionary processes involved in the species diversification (e.g., Hebert et al., 2003; Carvalho-Costa et al., 2011; Ramirez et al., 2017b). In this way, mitochondrial and nuclear DNA data can be used to define discrete genetic lineages, detecting reciprocal monophyly, and characterizing a Molecular Operational Taxonomic Unit (MOTU, cluster of orthologous sequences generated by an explicit algorithm), representing a monophyletic lineage that could or not correspond to a taxa (Blaxter et al., 2005; Jones et al., 2011).

Leporinus, from the Anostomidae family, is considered one of the richest genus within Characiformes, a predominant freshwater fish order in South America (Garavello and Britski, 2003). Recent morphological and molecular studies have shown Leporinus as a non-monophyletic genus (Sidlauskas and Vari, 2008; Ramirez et al., 2016), highlighting the need of a deep taxonomic revision on this group. Freshly, an integrated morphological, chromosomal, and molecular approach described the new genus Megaleporinus, gathering the largest body sized Leporinus species in a monophyletic clade (Ramirez et al., 2017b).

Among the remaining Leporinus species, Leporinus friderici Block (1794) has a wide geographic distribution occurring in most rivers of South America (Garavello et al., 1992). This large distribution has been investigated by several authors, who have found morphological (Géry et al., 1987; Renno et al., 1990, 1991; Garavello et al., 1992; Sidlauskas and Vari, 2012) and genetic (Renno et al., 1990, 1991) variations among populations, suggesting that this L. friderici morphotype may contain a species complex (Sidlauskas and Vari, 2012). While L. friderici from Suriname and French Guiana rivers has been recognized as the type species (Sidlauskas and Vari, 2012), the provisional nomenclature Leporinus cf. friderici has been used to refer to the remaining individuals of this morphotype. However, because of its wide distribution within the Brazilian Shield, one could expect some levels of genetic differentiation within the L. cf. friderici among hydrographic basins. Morphological variations in the eyes, body length, and color patterns have been already reported between L. cf. friderici from the Amazon and Paraná-Paraguay basins (Géry et al., 1987; Garavello et al., 1992).

In this study we tested the hypothesis that L. cf. friderici consists of a monophyletic group formed by different MOTUs currently separated in distinct basins within the Brazilian Shield. We performed a phylogenetic analysis using mitochondrial and nuclear markers to confirm L. cf. friderici as a monophyletic group, and a single-gene species delimitation analysis to characterize MOTUs. Finally, we infer on the evolutionary historical processes possibly responsible for the diversification within this group.

# MATERIALS AND METHODS

#### Ethics Statements

This research was conducted within the protocols approved by the Ethics Committee on Animal Experimentation (CEUA, Federal University of São Carlos, São Carlos, São Paulo, Brazil) and SISBIO-ICMBio (Authorization System and Biodiversity Information, Chico Mendes Institute for Biodiversity Conservation, Ministry of Environment, Brazil). The biological samples were obtained under CMBIO/MMA No. 32215 and CEUA No. 3893250615 permits following all legal requirements. Samples from Colombia and Peru were obtained by Jose Ariel Rodriguez and Hernan Ortega, respectively, who provided DNA aliquots for this study.

#### Biological Specimens and Data Sampling

Samples of fin, muscle, or liver tissues were collected of 53 L. cf. friderici specimens from eight hydrographic basins from the Brazilian Shield. **Figure 1** illustrates the studied river basins. Samples of Leporinus agassizii Steindachner (1876), L. boehlkei Garavello (1988), and L. piau Fowler (1941), from Jaguaribe and São Francisco rivers and L. cf. parae Eigenmann and Ogle (1907), were also obtained. In addition, DNA sequences of Leporinus desmotes Fowler (1941), Leporinus fasciatus Block (1794), Leporinus lacustris Campos (1945), Leporinus octomaculatus Britski and Garavello (1993), Leporinus taeniatus Lütken (1875), Leporinus venerei Britski and Birindelli (2008), and Hypomasticus pachycheilus Britski (1976), were downloaded from GenBank (Ramirez et al., 2016). All these species were included in our study because they have been considered as closely related to L. friderici (Garavello, 1988; Britski and Birindelli, 2008; Ramirez et al., 2016). Sequences of L. friderici from Suriname, that is considered the type locality, were obtained from GenBank (Melo et al., 2016) as well. Information related to the specimens, vouchers, IDs, and site localities are recorded in the Supplementary Material (**Supplementary Tables S1**, **S2**).

### DNA Extraction, Gene Amplification, and Sequencing

Total DNA was extracted using the conventional phenol– chloroform/proteinase K protocol (Sambrook et al., 1989). For the phylogenetic analysis at least one individual per each major sampled hydrographic basin was used. Cytochrome b (Cytb) and cytochrome oxidase subunit 1 (COI) mitochondrial regions were amplified according to Ramirez and Galetti (2015). The nuclear recombination activating gene 1 (RAG1), recombination activating gene 2 (RAG2), and myosin heavy chain 6 cardiac muscle alpha gene (Myh6) were amplified following Oliveira et al. (2011). PCR products obtained for both DNA strands were sequenced on an ABI 373xl sequencer (Applied Biosystems, Little Chalfont, United Kingdom).

#### Data Analysis

The obtained sequences were manually edited and aligned using Bioedit (Hall, 1999) and Clustal W (Thompson et al., 1994), respectively. All sequences were checked for indels and stop codons. The haplotypes of the nuclear genes were combined into a consensus sequence by coding polymorphic sites with the IUPAC ambiguity codes (IUPAC, 1974).

The phylogenetic analyses were conducted for concatenated sequences of all mitochondrial and nuclear genes using maximum parsimony (MP), implemented in PAUP<sup>∗</sup> 4.0 (Swofford, 2003), with 1000 bootstrap replicates. We also performed maximum-likelihood (ML) analyses, using RAxML in XSEDE (Stamatakis, 2006; Stamatakis et al., 2008), through the web server CIPRES Science Gateway (Miller et al., 2010), and a partitioned model determined by PartitionFinder (Lanfear et al., 2012), under a GTR+G model, and 1000 bootstrap replicates.

A multilocus Bayesian species tree (BST) was estimated by <sup>∗</sup>BEAST (Star-BEAST) (Heled and Drummond, 2010) using 150 million generations, sampled every 5000, and a burn-in of 300. A nucleotide substitution model selected was based on the Bayesian criterion, using JModeltest 2 (Darriba et al., 2012). The models chosen were HKY+I+G, HKY+G, K80+I, K80+I, and K80+I for COI, Cytb, Myh6, RAG1, and RAG2, respectively. The two mitochondrial gene tree topologies were linked and set to have an effective population size of onequarter from that of nuclear genes. A lognormal relaxed clock was used for all partitions. Bayesian trees using all sequences were established for each gene, separately. These trees were estimated with 10 million of generations, sampling every 5000, and a burn-in of 10%. Yule speciation model was used and nucleotide substitution models followed the same criteria above cited (**Supplementary Figures S1**–**S5**). The convergence (sample size >200) and stationarity of the values were checked in TRACER v1.6 (Rambaut et al., 2014).

Two different analyses, using a single-gene species delimitation approach based on COI sequences, were performed to determine the number of MOTUs within the clade widely distributed in the Brazilian Shield (L. cf. friderici sensu stricto, see below). First, the General Mixed Yule-Coalescent (GMYC) model (Pons et al., 2006) was used to determine the clustering of the COI haplotypes. This analysis was performed with a single threshold that was implemented in the SPLITS package using R 3.3.3 statistical software (R Core Team, 2017). For this analysis an ultrametric tree was generated using BEAST 2.3.2 (Bouckaert et al., 2014), with a lognormal relaxed clock, a birth and death model, and a HKY substitution model chosen by jModeltest 2 (Darriba et al., 2012). A total of 50 million MCMC generations and a burn-in of 10% were used. In second, a Bayesian analysis

of genetic structure was implemented using BAPS software (Corander et al., 2008). The maximum number of genetically diverged groups (K) was firstly set up for 10 replicates, 10 times. The obtained groups containing samples from different basins were submitted to a second layer of analysis in BAPS using K = 1–3, replicated also 10 times. This hierarchical approach to DNA sequence clustering provides a useful way to increase statistical power and detect separated haplogroups that are assigned to conservative clusters (Cheng et al., 2013). The most likely K was chosen based on log (likelihood) and posterior probability values. Next, we consider the concordant groups between the two different analyses, and the groups presenting allopatry and reciprocal monophyly as the final number of MOTUs.

The genetic distances between MOTUS were calculated based on K2P model using MEGA 6.06 (Tamura et al., 2013). Finally, a haplotype network was generated using Median Joining (Bandelt et al., 1999) in the POPART software (Leigh and Bryant, 2015).

### RESULTS

We successful obtained a total of 127 sequences for the specimens studied herein [63 for the COI gene (557 bp) and 16 for each remaining amplified marker – Cytb (1005 bp), Myh6 (754 bp), RAG1 (1477 bp), and RAG2 (1023 bp)]. The GenBank accession numbers are shown in the Supplementary Material section (**Supplementary Table S2**).

The phylogenetic trees, generated by MP, ML, and BST analyses (**Figure 2**), strongly supported four clades with maximum support values. A clade recovered L. friderici from the Suriname basin alone. A monophyletic clade characterized by a recent diversification within the Brazilian Shield, named herein as L. cf. friderici sensu stricto, included specimens of L. cf. friderici from Amazonas (main channel), Madeira, Upper Tapajós, Tocantins, Paraguay, and Paraná basins, L. agassizii, and specimens of L. piau from São Francisco basin. A no-named clade, showing older diversification, joined specimens of L. cf. friderici, from Mearim, Tocantins, Turiaçu, Xingu, and Madeira basins, interleaved with L. boehlkei, L. lacustris, L. cf. parae, L. piau from Jaguaribe basin, and L. venerei. Lastly, a fourth clade recovered only L. taeniatus.

The GMYC analysis, considering 23 parsimony-informative sites and no insertions or deletions within the COI sequences, identified seven MOTUs (CI: 6–7), with a significant likelihood ratio of 10.97 (P < 0.005) within L. cf. friderici sensu stricto. From these, six MOTUs corresponded to L. agassizii, L. cf. friderici Amazon 1, L. cf. friderici Madeira 1, L. cf. friderici Paraná, L. cf. friderici Paraguay, and L. cf. friderici Upper Tapajós. The seventh MOTU joined L. piau São Francisco and L. cf. friderici Tocantins 1 (**Figure 3**). The results of the first BAPS layer presented four as the most likely K with log (ml) = −554.0008 and 0.93 posterior probability values. In addition, the results of the second BAPS layer were similar to the GMYC MOTUs, but recovered L. piau São Francisco and L. cf. friderici Tocantins as two different MOTUs. The hierarchical analysis of BAPS could separate these populations, since information related to sample locations were given. Differently, GMYC method considers no prior information. Despite this little divergence in detecting MOTUs, both these lineages presenting recent divergence are reciprocally monophyletic and geographically isolated in nature.

The mean genetic distance COI K2P values among the MOTUs ranged from 0.4 to 2.4%. The maximum intra-MOTU distance (0.5%) was observed in L. agassizii, while the minimum inter-MOTU distance (0.4%) was between L. cf. friderici Tocantins and L. piau São Francisco (**Supplementary Table S3**).

A total of 27 haplotypes was obtained within L. cf. friderici sensu stricto, in which each MOTU was represented by a haplogroup, except for the MOTUs from Paraná and Paraguay that shared one haplotype. L. piau São Francisco and L. cf. friderici Upper Tapajós were separated by only one mutational step, while the other haplogroups were connected by at least two mutational steps (**Figure 4**).

### DISCUSSION

Our phylogenetic analyses showed that specimens morphologically identified as L. friderici constitute a polyphyletic group, widely distributed along the South America (**Figure 2**). The individuals collected as L. friderici across the Brazilian Shield basins are not conspecific with L. friderici from the type locality, representing a different species. L. cf. friderici sensu stricto constitutes a monophyletic species complex distributed in Amazon, Madeira, Upper Tapajós, Tocantins, São Francisco, Paraná, and Paraguay river basins. This finding can represent a typical situation of recent diversification forming a strictly related group composed of potential cryptic species, revealing typical taxonomic uncertainties (Ramirez et al., 2017a). On the other hand, in the no-named clade the five nominal species L. boehlkei, L. cf. parae, L. lacustris, L. piau Jaguaribe, and L. venerei were interleaved with individuals that morphologically fit with the description of L. cf. friderici. In this clade, characterized by an older diversification, the use of the term L. cf. friderici hides undescribed cryptic species. All species from this clade share discriminative morphological general pattern with L. friderici (i.e., one–three spots on the body along the lateral-line, and dental formulae 4/4, except L. venerei that has 4/3, unique in Anostomidae), hindering the identification of these cryptic species. Moreover, three (L. lacustris, L. parae – L. cf. parae –, and L. venerei) of the five nominal species of this clade have been already considered as very similar in morphology due to their deep body, terminal mouth, anal fin long and dark, and three blotches on the lateral line (Britski and Birindelli, 2008).

Overall, the results obtained for L. cf. friderici sensu stricto confirm our hypothesis that there are different MOTUs within L. cf. friderici, currently separated in distinct river basins, but not all provisionally recognized as L. cf. friderici, that can be joined in a single monophyletic group.

Within the clade L. cf. friderici sensu stricto, L. agassizii is clearly recognized as a valid species different from L. friderici, mainly due to the presence of a longitudinal stripe, extending from dorsal fin to just before the caudal fin (Birindelli et al.,

approximately 4900 bp). The topology corresponds to Bayesian tree. The numbers on the branches are bootstrap values for maximum parsimony and maximum likelihood, and posterior probability for Bayesian species tree. The scale bar indicates nucleotide substitutions per site.

2013). L. agassizii was firstly described for the Iça river, Upper Amazon basin (Steindachner, 1876), and posteriorly it was also found in the Tefe lake, Nanay, Negro, and Branco rivers (Birindelli et al., 2013), from the same basin. This species has been described as restricted to the Upper Amazon basin. A parsimony analysis of endemism in the South America reported the Upper Amazon as a separate clade from the Amazon drainages (Hubert and Renno, 2006), suggesting that uplift of the paleoarches has promoted allopatric divergence in the ichthyofauna of this region, which was enhanced by marine incursions.

In our study, all individuals collected in rivers above Madeira river falls were joined in the single MOTU L. cf. friderici Madeira 1, while individuals downstream the Amazon basin were recovered in the MOTU L. cf. friderici Amazon 1 (**Figure 3**), with the exception of individuals from Upper Tapajós (see below),

indicating these falls as possible barrier that limits the fish species distribution in the region. Previous studies had already reported evidences of structuring along the Amazon basin (Goulding, 1979; de Queiroz et al., 2013). This subdivision was attributed to geomorphological agents that allowed allopatric fragmentation and diversification events (Albert and Reis, 2011). In the Madeira river, the Teotônio fall seems to play a relevant role on the ichthyofauna diversification of the Amazon basin. This fall, besides other rapids, has been keeping apart the rivers from Upper and Lower Madeira, and has been considered a geographic barrier by limiting the fish species distribution in the region (Zanata and Toledo-Piza, 2004; Hubert et al., 2007; Torrente-Vilara et al., 2011).

Barrier effects can also explain the presence of L. cf. friderici Upper Tapajós joining individuals caught in the Juruena – Teles Pires sub-basin, in the Upper Tapajós (**Supplementary Tables S1**, **S2**). Waterfalls and rapids along the Tapajós river, and in its tributaries seem acting as barriers to fish dispersal (Britski and Garavello, 2005; Britski and Lima, 2008; Dagosta and de Pinna, 2017). The region above the Juruena – Teles Pires confluence river has been characterized by an endemic ichthyofauna different from other Amazon rivers (Carvalho and Bertaco, 2006; Britski and Lima, 2008), which could account to the separation of L. cf. friderici Upper Tapajós.

In turn, the Tocantins basin is considered an independent system from the Amazon basin, since its waters flow directly into the Atlantic Ocean (Albert and Reis, 2011). This fact was reflected in our analyses, in which the Tocantins individuals corresponded to a different genetic group (L. cf. friderici Tocantins 1). The final establishment of modern course of Tocantins (1.8 Ma) separated definitively this basin from Amazon (Rossetti and Valeriano, 2007), and the differentiation of the Tocantins ichthyofauna has been often associated to the rise of Gurupá arch, the Tucurui rapids, or the limited connectivity (Hubert et al., 2007; Hrbek et al., 2014).

The relationship between L. piau São Francisco and L. cf. friderici Tocantins 1 observed here can be accounted for a biogeographic history between the Tocantins and São Francisco basins, and the low genetic divergence (0.4%) between them likely represents a recent diversification. These two hydrographic basins share an extensive watershed, where the Sapão river (São Francisco basin) shares headwaters with the Galheiros river, Tocantins basin (Lima and Caires, 2011). The existence of these common headwaters can allow a fauna exchange between these basins. Geological evidence shows that the western border of Serra Geral from the Goiás plateau has been gradually eroded and could have potentially promoted headwater capture events between the São Francisco and Tocantins rivers (Lima and Caires, 2011). Geodispersal events (i.e., headwater capture) from Amazon river to eastern basins of the Brazilian Shield (as São Francisco river) have been already claimed in studies using molecular approaches (Hubert et al., 2007; Ramirez et al., 2017b).

While the L. piau specimens from São Francisco was linked to the L. cf. friderici sensu strito clade, L. piau from Jaguaribe

was grouped in the no-named clade (**Figure 2**), revealing a clear taxonomic inconsistency. Fowler (1941) claimed the Salgado river (Jaguaribe basin) from Ceará state as type locality of L. piau, and included one paratype from Jatobá river (São Francisco basin). Consequently, specimens from São Francisco river have been usually cited as L. piau (Garavello and Britski, 2003; Carvalho et al., 2011). Our results pointed that specimens from São Francisco basin indeed constitute a different species from the L. piau from the Jaguaribe river, the type basin.

Still within L. cf. friderici sensu stricto, a well-supported differentiation between specimens from the Upper Paraná and Paraguay basins was also observed, although some individuals from the Upper Paraná showed haplotypes from the Paraguay basin (**Figures 3**, **4**). It is possible that both L. cf. friderici Paraná and L. cf. friderici Paraguay reached their current distribution through ancient geodispersal events, as headwater captures between Amazon rivers and the Paraná and Paraguay basins. In the modern river basin landscape, the Paraguay basin has a watershed with the Guaporé, Tapajós, and Xingu rivers, while the Upper Paraná shared a watershed at the headwaters of the Tocantins basin (Albert and Reis, 2011). These hydrographic systems have experienced a long history of major capture events and formation of semipermeable barriers (Lundberg et al., 1998) that can support this hypothesis. The Upper Paraná ichthyofauna was separated from the Lower Paraná by the Sete Quedas Falls, a natural geographic barrier which no longer occurs. In the past this barrier isolated the Upper Paraná, where the ichthyofauna has been diverging, as already reported in Megaleporinus (Ramirez et al., 2017b) and Salminus (Machado et al., 2016). The shared haplotypes between L. cf. friderici Paraná and L. cf. friderici Paraguay are probably resulting of the removal of the natural barrier when the Itaipu hydroelectric was built. The resulted dam flooded an extensive area, including the no longer existent Sete Quedas Falls, allowing the connection between both ichthyofauna from Lower and Upper Paraná facilitating contact between mitochondrial lineages since the formation of Sete Quedas Falls (Júlio et al., 2009; Prioli et al., 2012).

# CONCLUSION

Our study showed that L. cf. friderici as provisionally used hides at least two major situations. First, L. cf. friderici sensu stricto, a monophyletic clade joining eight MOTUs, potentially includes a true species complex, characterized by recent diversification across the Brazilian Shield basins. According to our initial expectations, L. cf. friderici sensu stricto is genetically structured along the Brazilian shield basins, and this structure appears to be related to geomorphological agents, determining the current hydrographic structure. Its taxonomic significance is an open question, requesting complementary studies for resolving this typical situation of taxonomic uncertainties. Second, a no-named clade, characterized by relatively older diversification, in our opinion, hides undescribed cryptic species under L. cf. friderici denomination, likely due to the morphology similarities that characterize the clades here studied (except L. taeniatus clade). However, this new MOTUs show deep phylogenetic divergence and they are interleaved with other nominal valid species (L. venerei, L. boehlkei, L. lacustris, L. piau, and L. cf. parae), supporting them as potential new species.

Overall, our results have important significance for the taxonomy and evolutionary knowledge of this fish group as well as for its conservation. Moreover, this scenario indicates that L. cf. friderici sensu stricto can constitute an excellent phylogeographic model in studying evolutionary and speciation processes acting in the South America basins. Despite its migratory behavior, L. cf. friderici cannot be considered as a single genetic stock even within the same basin (i.e., Amazon basin) and needs to be well known for having its whole diversity considered in any conservation effort. For a more complete understanding, the taxonomic status of each MOTU that was revealed herein needs to be evaluated using preferentially morphological and molecular data in an integrative approach.

### AUTHOR CONTRIBUTIONS

RS-S and JR collected the data, reviewed the literature, and achieved the bioinformatic analyses. All authors contributed to design the research, article writing and discussion, and approved the final version of the manuscript.

# FUNDING

This study was supported by Conselho Nacional de Desenvolvimento Cientifíco e Tecnológico (CNPq, 304440/ 2009-4 and 473474/2011-5), SISBIOTA-Brazil Program (CNPq, 563299/2010-0; FAPESP, 10/52315-7), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES, número to RS-S), and Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP, 2011/21836-4 to JR).

# ACKNOWLEDGMENTS

We are grateful to C. Cramer, C. Doria, D. Carvalho, H. Ortega, J. A. Rodriguez, J. C. Riofrio, P. Venere, W. Troy, and U. Lopes for help to obtain part of the tissue or DNA samples and ICMBIO/MMA for sampling fish authorization (32215-1). We also thank H. Britski, P. Venere, J. Zuanon, N. Priorski, and H. Ortega for the specimen identification and the two reviewers for their comments and suggestions, which have improved the manuscript.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2018. 00047/full#supplementary-material

FIGURE S1 | Bayesian tree for the cytochrome oxidase subunit 1 (COI) gene. Values on nodes represent the posterior probability.

FIGURE S2 | Bayesian tree for the cytochrome b (Cytb) gene. Values on nodes represent the posterior probability.

FIGURE S3 | Bayesian tree for the myosin heavy chain 6 cardiac muscle alpha (Myh6) gene. Values on nodes represent the posterior probability.

FiGURE S4 | Bayesian tree for the recombination activating gene 1 (RAG1) gene. Values on nodes represent the posterior probability.

FiGURE S5 | Bayesian tree for the recombination activating gene 2 (RAG2) gene. Values on nodes represent the posterior probability.

#### REFERENCES


TABLE S1 | Vouchers and collection sites for all analyzed Leporinus cf. friderici.

TABLE S2 | GenBank accession numbers for all analyzed species.

TABLE S3 | Pairwise mean genetic distances values inter and intra-MOTU (in bold) Leporinus cf. friderici sensu stricto clade using K2p model. Values as percentage.



(Characiformes. Anostomidae) through molecular analysis. J. Fish Biol. 88, 1204–1214. doi: 10.1111/jfb.12906


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Silva-Santos, Ramirez, Galetti and Freitas. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genetic Variation of the Endangered Neotropical Catfish Steindachneridion scriptum (Siluriformes: Pimelodidae)

Rômulo V. Paixão1,2 \*, Josiane Ribolli1,2 and Evoy Zaniboni-Filho1,2

<sup>1</sup> Laboratório de Biologia e Cultivo de Peixes de Água Doce, Departamento de Aquicultura, Universidade Federal de Santa Catarina, Florianópolis, Brazil, <sup>2</sup> Programa de Pós-Graduação em Aquicultura, Centro de Ciências Agrárias, Universidade Federal de Santa Catarina, Florianópolis, Brazil

Steindachneridion scriptum is an important species as a resource for fisheries and aquaculture; it is currently threatened and has a reduced occurrence in South America. The damming of rivers, overfishing, and contamination of freshwater environments are the main impacts on the maintenance of this species. We accessed the genetic diversity and structure of S. scriptum using the DNA barcode and control region (D-loop) sequences of 43 individuals from the Upper Uruguay River Basin (UUR) and 10 sequences from the Upper Paraná River Basin (UPR), which were obtained from GenBank. S. scriptum from the UUR and the UPR were assigned in two distinct molecular operational taxonomic units (MOTUs) with higher inter-specific K2P distance than the optimum threshold (OT = 0.0079). The COI Intra-MOTU distances of S. scriptum specimens from the UUR ranged from 0.0000 to 0.0100. The control region indicated a high number of haplotypes and low nucleotide diversity, compatible with a new population in recent expansion process. Genetic structure was observed, with high differentiation between UUR and UPR basins, identified by BAPS, haplotype network, AMOVA (FST = 0.78, p < 0.05) and Mantel test. S. scriptum from the UUR showed a slight differentiation (FST = 0.068, p < 0.05), but not isolation-by-distance. Negative values of Tajima's D and Fu's Fs suggest recent demographic oscillations. The Bayesian skyline plot analysis indicated possible population expansion from beginning 2,500 years ago and a recent reduction in the population size. Low nucleotide diversity, spatial population structure, and the reduction of effective population size should be considered for the planning of strategies aimed at the conservation and rehabilitation of this important fisheries resource.

Keywords: control region mitochondrial DNA, conservation of natural resources, DNA barcode, endangered species, freshwater fishes

# INTRODUCTION

Freshwater ecosystems are among the most endangered ecosystems (Dudgeon et al., 2006). Habitat degradation, hydrologic alterations, habitat fragmentation, sediment deposition, and overfishing are the principal causes of declines and extinctions of freshwater fishes (Dudgeon et al., 2006; Agostinho et al., 2008; Helfman, 2008; Hoeinghaus et al., 2009). In addition, species with

#### Edited by:

Rodrigo A. Torres, Universidade Federal de Pernambuco, Brazil

#### Reviewed by:

Jorge Luis Ramirez, Federal University of São Carlos, Brazil Jorge Abdala Dergam, Universidade Federal de Viçosa, Brazil

> \*Correspondence: Rômulo V. Paixão romulo.veiga.paixao@gmail.com

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 17 October 2017 Accepted: 31 January 2018 Published: 19 February 2018

#### Citation:

Paixão RV, Ribolli J and Zaniboni-Filho E (2018) Genetic Variation of the Endangered Neotropical Catfish Steindachneridion scriptum (Siluriformes: Pimelodidae). Front. Genet. 9:48. doi: 10.3389/fgene.2018.00048

**193**

geographically restricted distribution are more susceptible to erosion of the genetic diversity due to habitat fragmentation (Vrijenhoek et al., 1985). Fishery resources are an integral part of most societies and make important contributions to economic and social health and well-being in many countries and areas (FAO, 2002). The understanding of the genetic diversity and structure of wild populations of fish species are important to the regulation of fisheries and conservation management strategies (Carvalho and Hauser, 1994; Iervolino et al., 2010).

Steindachneridion scriptum (Miranda Ribeiro, 1918) is a large catfish belonging to the family Pimelodidae. This potamodromous fish species presents restricted distribution in the Upper Uruguay River (UUR) and Upper Paraná River (UPR) basins (Marques et al., 2002). S. scriptum is an important fishing resource to the riverine fishermen (Schork et al., 2013); nonetheless, it was recently classified as an endangered species according to the Chico Mendes Institute for Biodiversity Conservation (ICMBIO, 2014). Human activities (e.g., damming of rivers, illegal fishing, and industrial waste) are the main threats to S. scriptum in the UUR (Fontana et al., 2003) and are the principle reason for population reduction in this basin (Beux and Zaniboni-Filho, 2008).

The correct management of fish stocks depends on the precise identification of the target species, which may present very similar morphological characteristics to other species. The latest revision of the genera Steindachneridion recognizes six valid species (Garavello, 2005). Using museum specimen, Garavello (2005) suggests the presence of S. punctatum in the UUR and UPR with few characteristics that differ from S. scriptum. Despite the description of the two species of the genus Steindachneridion, ichthyofauna studies never report the presence of S. punctatum in the UUR (Schork et al., 2012, 2013). In addition to the correct ichthyofauna and management of fish stocks, uncertainties of taxonomic identification may be a problem for stock management (e.g., formation of in vivo and in vitro banks, restocking programs) and adequate fisheries control.

Taxonomic uncertainties are common in fishes (Pereira et al., 2013) and can be investigated using DNA barcode methodology, which permits the unambiguous identification of the majority of fish species (Ward, 2009). Particularly for endangered species, prior knowledge of the distribution of genetic variability within and among natural populations as well as the implementation of an efficient management plan based on genetic features are important measures for its maintenance and recovery (Cross, 2000). The distribution of the genetic variation within and between populations can be assessed using the mitochondrial control region (Sivasundar et al., 2001; Iervolino et al., 2010; Ochoa et al., 2015). Given these findings, we tested the null hypothesis that S. scriptum from UUR represent a single molecular operational taxonomic unit (MOTU). Posteriorly, we investigate the genetic diversity and population structure of S. scriptum from the UUR and UPR basins using the mitochondrial control region.

# MATERIALS AND METHODS

# Study Area and Sampling

The Uruguay River (Uru) originates in Brazilian territory (in the Serra Geral Mountains) in Southern Brazil, together with the Paraná and Paraguay Rivers form the La Plata Basin (Zaniboni-Filho and Schulz, 2003). Samples of S. scriptum were collected by scientific fishing and local fishermen between 2006 and 2015, with authorization of the Brazilian Institute of the Environment and Renewable Natural Resources (IBAMA; protocol number: 02026.005762/2004-71). A total of 19 individuals from the Uru River and 24 from the Canoas River (Can), UUR basin, were sampled (**Figure 1**). Tissues were preserved in 95% ethanol until extraction. This research was conducted under Animal Care Protocol PP00788 of the Federal University of Santa Catarina (UFSC).

# DNA Extraction and Amplification of the Mitochondrial Fragments

Total DNA was obtained with fin clips following a salt extraction method (Aljanabi and Martinez, 1997). For the DNA barcode analyses, a fragment of 652 bp of Cytochrome Oxidase subunit I (COI) was amplified through polymerase chain reaction (PCR), using primers FishF1/FishR1 (Ward et al., 2005) and following Bellafronte et al. (2013). The partial amplification of the mitochondrial control region (D-loop) was performed using primers FTTP-L and DLR1-H according to Huergo et al. (2011). The PCR products were checked for amplification using gel electrophoresis with 1% agarose gels purified using PEG 20% (Lis, 1980). Sequencing reactions were performed using BigDye TM Terminator v 3.1 (Applied Biosystems), and the PCR products were sequenced for both strands in ABI 3500XL (Applied Biosystems).

# Data Analysis

DNA sequences from each individual of both genes were edited using Geneious 5.4.4 (Wu and Drummond, 2011) to generate a consensus sequence. For the DNA barcode analysis, we combined the COI reference sequences of S. scriptum (access FUPR686-09, PDCAP027-14, and PDCAP028-14) from the UPR basin, S. parahybae (access FPSR293-10–FPSR297-10) from the Paraíba do Sul Basin and one specimen of Pseudoplatystoma corruscans and one specimen of Zungaro jahu to root our phylogenetic analyses. All sequences available in Barcode of Life Data System (BOLD). Intra- and inter-specific genetic distances based on the Kimura 2-parameter (K2P) evolution model were calculated using Mega 6 (Tamura et al., 2013).

We used the phylogenetic General Mixed Yule Coalescent (GMYC) approach based on single-locus data that is a relative robust tool for species delimitation (Pons et al., 2006; Fujisawa and Barraclough, 2013). The ultrametric tree was generated in BEAST v.2.2.1 (Bouckaert et al., 2014), with the substitution model calculated in the JModelTest 2.1.4 (HKY+G; Darriba et al., 2012), using relaxed molecular clock with a lognormal distribution and birth–death model. Three independent runs were carried out with 20 million generations each. Posteriorly, the

runs were combined using the LogCombiner v.1.8.3 (Drummond et al., 2012), with a burn-in of 25. Data mixing and effective sample size (ESS) were verified in Tracer v1.5. GMYC was carried out in Species Limits by Threshold Statistics (SPLITs; Monaghan et al., 2009) with RStudio<sup>1</sup> , using the unique threshold method to detect the transition point between intra- and inter-specific relationships.

In addition to the standard threshold adopted to Neotropical fishes (Pereira et al., 2013), we calculated an optimum threshold (OT; Collins et al., 2012) directly from all dataset, using the local minima function in the R package SPIDER (SPecies IDentity and Evolution in R; Brown et al., 2012). The OT value was used to define the MOTUs using the software jMOTU (Jones et al., 2011). The graphical representation of the MOTUs was performed by a neighbor-joining analysis (NJ) using the K2P model with Mega 6.6 (Tamura et al., 2013). The support of the clades was tested by the bootstrap method with 10,000 pseudo-replicates.

DNA D-loop sequences were combined with 16 S. scriptum sequences downloaded from GenBank, of specimens collected between 1995 and 2002 that corresponded to the Uru River (access EU930029.1–EU930038.1) and specimens from Tibaji River (UPR) (access EU930039.1–EU930044.1). The overall genetic diversity was estimated using the following DnaSP software (Rozas et al., 2003) parameters: nucleotide diversity (π) (Nei, 1987), haplotype diversity (Hd) (Nei and Tajima, 1981), and number of polymorphic sites (S). Genetic diversity within and between sample sites was hierarchically tested by Analysis of Molecular Variance (AMOVA) (Excoffier et al., 1992) with 10,000 permutations to test the pairwise population comparison (FST) using Arlequin 3.1 (Excoffier and Lischer, 2010). Spatial genetic structure was inferred using Bayesian Analysis of Population Structure 6.0 (BAPS) software (Corander et al., 2013). First, BAPS was run with 10 replicates for every level of k (1–6) without origin information ("clustering of individuals") and subsequently using "clustering of groups of individuals." We used Mantel tests as implemented in the Alleles In Space (AIS) 1.0 (Miller, 2005) to test for a correlation of geographic stream distance and genetic distance (isolation-by-distance; IBD) expressed as FST, with 10,000 permutations to assess significance. Tajima's (1989) D, Fu's (1997) Fs, and mismatch distributions were estimated with DnaSP (Rozas et al., 2003). A median-joining haplotype network was generated through PopART (Bandelt et al., 1999). Demographic history was investigated using Bayesian Skyline Plot (BSP) in BEAST 2.1.3 (Drummond et al., 2012) with the

<sup>1</sup>http://r-forge.r-project.org/projects/splits

evolutionary model obtained in the Jmodeltest program. The graphic was generated in Tracer 1.66 (Rambaut et al., 2014).

The COI sequences were deposited in BOLD systems (accession UUR001-17–UUR052-17), and DNA D-loop sequences were uploaded in GenBank (accession MF045370– MF045412). The voucher corresponding to S. scriptum from the UUR basin was deposited in the Zoology Museum of the Universidade Estadual de Londrina MZUEL 15569 (Type locality: Itaqui, RS – Uru Basin).

#### RESULTS

#### Mitochondrial DNA Barcoding

The consensus alignment of 43 COI sequences were obtained from samples identified morphologically as S. scriptum, resulting in a total length of 611 bp, with 7 polymorphic sites and 6 haplotypes defined. The inter-specific nucleotide frequencies were 26.02% of Cytosine, 28.97% of Thymine, 27.19% of Adenine, and 17.82% of Guanine. No stop codons, insertions, or deletions were observed in the COI sequences, indicating that they represent fragments of functional mitochondrial genes and not nuclear mitochondrial pseudo-genes (Numts). Considering P. corruscans and Z. jahu as the outgroup, the maximum likelihood for the GMYC model was significantly superior (L = 628.3242) to the likelihood of the null model (Lo = 614.7156, p < 0.0001). The single-threshold GMYC model suggested the presence two clusters (confidence interval 2–13) of four ML entities (GMYC 'species,' named 'MOTUs' herein) with a confidence interval of 4–17 (S. scriptum, S. parahybae, and two outgroups).

The optimal threshold calculated for all S. scriptum sequences used in this study was OT = 0.0079 (0.79%) of divergence. From the set of the genus Steindachneridion sequences available on BOLD, three MOTUs were identified using the OT value and software jMOTU: S. scriptum from the UUR, S. scriptum from the UPR, and S. parahybae. The COI inter-MOTU between S. scriptum from the UUR and S. scriptum from the UPR showed values larger than the OT (mean 0.012, minimum 0.010), while intra-MOTU values for fishes from both basins were lower than the average OT (UUR = 0.000 to 0.010; UPR = 0.000) (**Table 1**). The minimum inter-specific distances between S. scriptum MOTUs (UUR and UPR) and S. parahybae were 0.100 (10%) and 0.090 (9%), respectively.

The NJ-K2P (Supplementary Figure SI 1) and Bayesian Inference topologies (**Figure 2**) were clustered in two principal clusters, corresponding to S. scriptum and S. parahybae species. The S. scriptum clade appeared divided into two well-supported sub-clades formed by UUR and UPR individuals, in both NJ and BI phylogenetic trees. Based on these results, all specimens from Uru were composed of a single MOTU, named here as S. scriptum.

#### Mitochondrial DNA D-Loop

The final alignment size of the 59 D-loop consensus sequences of S. scriptum specimens from the UUR and UPR (43 newly sequenced, 16 downloaded from GenBank) was 865 bp. A total TABLE 1 | Intra-MOTU distances (in bold) and inter-MOTU genetic distances using the COI gene and K2p model.


Distance range in parentheses. UUR = Upper Uruguay River; UPR = Upper Paraná River.

of 56 variable sites were found in the region defining a total of 36 haplotypes. Overall, 30 haplotypes identified in the analyses were unique and exclusive. Of these, 20 were from the Uru, 4 from Can, and 6 from UPR basin (Tibagi River). The most common haplotype was H8, which was recorded 12 times and was shared by samples from the Uru and Can rivers, and haplotypes H4 and H27, both with three records, were exclusive to the Uru and Can rivers, respectively. The haplotype network (**Figure 3**) revealed a high degree of similarity between the specimens from the Uru and Can rivers, even though with slight differences in the haplotype frequencies and exclusive haplotypes from each river. These specimens were also differentiated from the samples of UPR by 12 mutations.

The average nucleotide frequencies were 34.38% of Adenine, 33.17% of Thymine, 12.19% of Guanine, and 20.26% of Cytosine. Genetic variability, expressed as Hd) and nucleotide diversity (π), was higher in S. scriptum from the Uru (Hd = 0.959/π = 0.007) in comparison with the Can River (Hd = 0.837/π = 0.004) (**Table 2**). In addition, the samples from UPR have high diversity indices, in comparison with samples from UUR (N = 6; Nh = 6; Hd = 1.0000, π = 0.00698; D = −0.88901, p = 0.24002; Fs = −1.81313, p = 0.07768).

The patterns of genetic variability found within and between populations in the AMOVA were based on the two principal clusters: Uruguay vs. Paraná Basins and Uru vs. Can rivers (**Table 3**). When the populations were considered as two basin groups, the AMOVA among groups was 78.17% and FST value was highly significant (FST = 0.781; p = 0.000). Genetic divergence between individuals from the Uruguay Basin (Uru vs. Can rivers) was low but significant (FST = 0.0682, p = 0.00475). The population groupings generated by the BAPS, without origin information of the samples, revealed the existence of three clusters (K = 3, Supplementary Figure SI 2), with slight differentiation between Can and Uru individuals. On the other hand, analysis with the individuals identified by sample group indicated two clusters (K = 2, **Figure 4**), corresponding to Uruguay and Paraná Basins. The IBD analysis showed a significant positive correlation (r <sup>2</sup> = 0.85, p < 0.001) between the geographical distance and corresponding FST for S. scriptum from UUR and UPR (Supplementary Figure SI 3). On the other hand, FST values plotted over distance no reveal patter of isolation by distance for S. scriptum from UUR (r = −0.052, p = 0.682).

Tajima's (1989) D-neutrality tests, applied to detect evidence of strong selective pressures, and Fu's (1997) Fs-tests, used

specifically to detect population expansion, revealed significant negative values for all individuals from the UUR (D = −1.907, p < 0.05; Fs = −19.246, p < 0.01; **Table 3**). Non-significant negative values were estimated for D and Fs indexes for specimens from the Can River and UPR, whereas fish sampled in the Uru showed significant negative values estimated for D and Fs indexes. The BSP analysis (Supplementary Figure SI 4), used to explore previous demographic signals of S. scriptum, indicated early demographic expansion approximately 2,500 years ago as well as a fairly recent population reduction.

#### DISCUSSION

The DNA barcode confirmed the identification of all the individuals of S. scriptum from UUR as a single MOTU.

fgene-09-00048 February 15, 2018 Time: 15:30 # 5

FIGURE 3 | Median-joining network of Steindachneridion scriptum, based on haplotypes of mtDNA control region. The colors indicate locality according to the legend, size of the circles illustrates the number of identical haplotypes, and small black circles, hypothetical ancestors or unsampled haplotypes. Hatch marks represent the number of mutations by which haplotypes differ.

TABLE 2 | Genetic diversity of Steindachneridion scriptum from the Uruguay and Canoas rivers (Upper Uruguay River Basin) estimated using D-loop control region.


N = number of sequenced individuals; Nh = number of haplotypes; Hd = haplotype diversity; π = nucleotide diversity; D = Tajima's D-test; Fs = Fu's Fs-test. Significant values to D-test and Fs-test: <sup>∗</sup>p < 0.05, ∗∗p < 0.01.

The different methods, jMOTU and GMYC, were congruent in delimiting S. scriptum and S. parahybae. However, within S. scriptum, jMOTU methodology identified two distinct MOTUs between UUR basin and UPR basin, while GMYC only one, despite strongly supported clade. Inter-MOTU divergence between S. scriptum from the UUR and the UPR was higher than the OT and the mean intra-specific divergence found for freshwater fish (0.3%) (Ward et al., 2005; Lara et al., 2010; Pereira et al., 2013). The MOTUs do not necessarily represent species (Blaxter et al., 2005) but can indicate molecular entities (Casiraghi et al., 2010). The two MOTUs estimated between S. scriptum from the UUR and S. scriptum from the UPR could be explained by geographic isolation between watersheds that occurred during the Miocene epoch (between 5 and 24 million years ago) when the UUR and the UPR became isolated (Albert and Reis, 2011). Based on the results, S. scriptum from these two hydrographic systems are most likely in the process of incipient allopatric speciation since the genetic structuring in fish is in fact often evidenced and influenced principally by geological, ecological, and behavioral factors (Allan and Flecker, 1993). Congeners S. scriptum and S. parahybae species showed mean inter-MOTUs 10 times greater than the OT, indicating the existence of the barcode gap (Hebert et al., 2003, 2004) that allows us to



UUR = Upper Uruguay River; UPR = Upper Parana River. <sup>∗</sup>Significant value: p-value (FST) = < 0.05.

assign an unknown Steindachneridion specimen to its species using a genetic distance criterion with an insignificant error rate.

Although they belong to the same watershed, individuals from the Can River and the Uru River showed a slight genetic differentiation probably due to the topography of the region and the interaction between the species' biology and environmental characteristics (e.g., the Can River is located at a higher altitude with a lower water temperature than the Uru River). The haplotype network indicated a greater genetic similarity between the specimens from the Can and Uru, whereas is possible to observe differences in the haplotype frequencies, and exclusive haplotypes for each river. Recent studies with potamodromous fish species reported genetic structure in hydrographic systems without apparent physical barriers, resulting from behaviors related to IBD (Hardy and Vekemans, 1999; Primmer et al., 2006; Han et al., 2010), homing (Windle and Rose, 2005; Batista and Alves-Gomes, 2006; Neville et al., 2006), and isolation-by-time (IBT) (Hendry and Day, 2005; Braga-Silva and Galetti, 2016; Ribolli et al., 2017).

Nucleotide diversity of S. scriptum from the UUR was low in comparison with the values found in neotropical freshwater fish (π = 1.5%) (Batista and Alves-Gomes, 2006; Iervolino et al., 2010; Ashikaga et al., 2015). The high haplotype diversity and low nucleotide diversity seem quite compatible with a new population in recent expansion process, similar to what is shown in BSP analyses. This pattern may be a signature of such expansion that is long enough to examine a change in the haplotypes resulted from the mutation, but is not long enough to accumulate large differences between sequences (Avise, 2000).

Individuals from the Uru River were genetically more diverse than fishes from the Can River, indicating that the main channel of the Uru River allows the meeting of individuals of different areas (or tributaries), favoring the maintenance of the highest level of genetic diversity. Although it is a relevant fishing resource, useful molecular markers such as microsatellites, extensively employed in fish genetic studies, are still not developed for any Steindachneridion species, and the knowledge about genetic characteristics of the genus is incipient. RAPD markers indicated low genetic diversity for S. scriptum

from the UUR, as reported by Ramella et al. (2006), as well as for S. melanodermatum from the Iguaçu River Basin (Matoso et al., 2011). Low genetic diversity may indicate recent or historic reduction of this diversity; however, some endangered populations may have a historic maintenance of small effective population sizes (Matocq and Villablanca, 2001). In this way, the low diversity detected in S. scriptum from the UUR can be attributed to the following: (1) population history of this species: a Bayesian skyline plot analysis revealed a subtle increase in effective population size over time and demographic swings in the recent past, with a notable increase in effective population size between 1,000 and 2,500 years ago and a subsequent reduction in the effective size of females; and (2) evolutionary history of the species: according to Garavello (2005) and Swarça et al. (2005), S. scriptum notably maintain conserved morphological and cytogenetic patterns.

In general, concerns and actions of conservation are more related to the perception of the disappearance of a given species than of genetic diversity reduction (Frankham, 2005). Therefore, given the low genetic diversity associated with the current scenario of fragmentation of the UUR basin and the population reduction of S. scriptum in some stretches of the Itá and Machadinho reservoirs (Schork et al., 2012, 2013), this study highlights the necessity of mitigation measures and more intense monitoring of illegal fishing to avoid the collapse of this important fishing resource. In addition, our results were congruent, identifying great differentiation between individuals from the UPR and UUR Basins. Further studies with a larger number of samples morphological analyze to better define the taxonomic status of endangered S. scriptum.

#### AUTHOR CONTRIBUTIONS

RP made substantial contributions to the design of the work and acquisition, analysis, and interpretation of data for the work, and drafted it until the approval for publication of the content. JR made substantial contributions to the conception and design of the work, analysis and interpretation of the data, and drafted and revised it critically for important intellectual content. EZ-F made substantial contributions to the conception of the work,

revised it critically for important intellectual content, and provided approval for publication of the content. The authors agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

#### FUNDING

This work was supported by the Engie Tractebel Energia, Consórcio Machadinho e Consórcio Itá. RP thanks Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES); JR acknowledges the Programa Nacional de Pós-Doutorado (PNPD/CAPES); EZ-F thanks Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq, Grant No. 302860/2014-2).

# REFERENCES


#### ACKNOWLEDGMENTS

We are grateful to Laboratório de Biologia e Cultivo de Peixes de Água Doce (LAPAD) of Universidade Federal de Santa Catarina (UFSC), Pedro Iaczinki, and local fishermen for help with fish collections. We thank Dra. Carolina Machado, Leonardo Porto Ferreira, and Ueslei Lopes for their contributions. We also thank the two reviewers for their valuable contributions on improving the manuscript and for the suggestions. Research was conducted under Animal Care Protocol PP00788.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00048/full#supplementary-material


and evaluation on simulated data sets. Syst. Biol. 62, 707–724. doi: 10.1093/ sysbio/syt033



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Paixão, Ribolli and Zaniboni-Filho. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genetic Characterization of the Fish Piaractus brachypomus by Microsatellites Derived from Transcriptome Sequencing

Paulo H. Jorge<sup>1</sup> , Vito A. Mastrochirico-Filho<sup>1</sup> , Milene E. Hata<sup>1</sup> , Natália J. Mendes <sup>1</sup> , Raquel B. Ariede<sup>1</sup> , Milena Vieira de Freitas <sup>1</sup> , Manuel Vera<sup>2</sup> , Fábio Porto-Foresti <sup>3</sup> and Diogo T. Hashimoto<sup>1</sup> \*

<sup>1</sup> Aquaculture Center of Universidade Estadual Paulista Júlio de Mesquita Filho, São Paulo State University, Jaboticabal, Brazil, <sup>2</sup> Veterinary Faculty, University of Santiago de Compostela, Lugo, Spain, <sup>3</sup> School of Sciences, São Paulo State University, Bauru, Brazil

#### Edited by:

Rodrigo A. Torres, Universidade Federal de Pernambuco, Brazil

#### Reviewed by:

Pedro Manoel Galetti Jr, Federal University of São Carlos, Brazil Maria Raquel Moura Coimbra, Federal Rural University of Pernambuco, Brazil

> \*Correspondence: Diogo T. Hashimoto diogo@caunesp.unesp.br

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 01 September 2017 Accepted: 31 January 2018 Published: 22 February 2018

#### Citation:

Jorge PH, Mastrochirico-Filho VA, Hata ME, Mendes NJ, Ariede RB, Freitas MV, Vera M, Porto-Foresti F and Hashimoto DT (2018) Genetic Characterization of the Fish Piaractus brachypomus by Microsatellites Derived from Transcriptome Sequencing. Front. Genet. 9:46. doi: 10.3389/fgene.2018.00046 The pirapitinga, Piaractus brachypomus (Characiformes, Serrasalmidae), is a fish from the Amazon basin and is considered to be one of the main native species used in aquaculture production in South America. The objectives of this study were: (1) to perform liver transcriptome sequencing of pirapitinga through NGS and then validate a set of microsatellite markers for this species; and (2) to use polymorphic microsatellites for analysis of genetic variability in farmed stocks. The transcriptome sequencing was carried out through the Roche/454 technology, which resulted in 3,696 non-redundant contigs. Of this total, 2,568 contigs had similarity in the non-redundant (nr) protein database (Genbank) and 2,075 sequences were characterized in the categories of Gene Ontology (GO). After the validation process of 30 microsatellite loci, eight markers showed polymorphism. The analysis of these polymorphic markers in farmed stocks revealed that fish farms from North Brazil had a higher genetic diversity than fish farms from Southeast Brazil. AMOVA demonstrated that the highest proportion of variation was presented within the populations. However, when comparing different groups (1: Wild; 2: North fish farms; 3: Southeast fish farms), a considerable variation between the groups was observed. The FST values showed the occurrence of genetic structure among the broodstocks from different regions of Brazil. The transcriptome sequencing in pirapitinga provided important genetic resources for biological studies in this non-model species, and microsatellite data can be used as the framework for the genetic management of breeding stocks in Brazil, which might provide a basis for a genetic pre-breeding programme.

Keywords: aquaculture, genetic structure, NGS, Pirapitinga, Serrasalmidae

# INTRODUCTION

The pirapitinga (Piaractus brachypomus) is a native fish from the Amazon and Orinoco Rivers and can reach up to 20 kg of weight (Alcântara et al., 1990). This species is used for fish farming, is valued for its meat and has fast growth performance (Fresneda et al., 2004). In Brazil, pirapitinga farming represents the third largest fish production operation (about 10,000 tons) among the native fish species (MPA, 2013a). Furthermore, this species has been widely used for the production of interspecific hybrids, particularly the tambatinga (female tambaqui Colossoma macropomum × male pirapitinga P. brachypomus), and patinga (female pacu Piaractus mesopotamicus × male pirapitinga P. brachypomus; IBGE, 2016). The aquaculture production of pirapitinga in Brazil is concentrated mainly in the Midwest and North (87%), followed by the Southeast (9%), Northeast (3%) and South (1%) (MPA, 2013b). This species also has economic importance for aquaculture in other countries in South America (Colombia, Peru, and Venezuela) and in Asia (China, Myanmar, Thailand, and Vietnam; Flores Nava, 2007; Honglang, 2007; Lin et al., 2015).

However, despite this representation of aquaculture production, few scientific studies have focused on understanding the biology of pirapitinga, especially of genetic traits. So, the generation of genetic resources for this species is fundamental to advancing studies of breeding and genetic management, as occurred in model species used in aquaculture, such as salmon, catfish, carp, and tilapia (Lien et al., 2011; Liu et al., 2011; Guyon et al., 2012; Ji et al., 2012).

Model fish species, such as zebrafish Danio rerio, have been described with more than 26,000 genes (Howe et al., 2013). However, few genes and their metabolic pathways have been characterized for non-model species without reference genomes, as is the case for pirapitinga. In the field of genetics and molecular biology, Next-Generation Sequencing (NGS) technologies are causing a revolution, allowing the sequencing of genome and transcriptome of any organism, quickly and at low cost (Seeb et al., 2011). RNA-seq (transcriptome sequencing) is considered one of the most used strategies of NGS technology for the transcripts analysis (Qian et al., 2014), wherein all the messenger RNA (mRNA) of a specific tissue or set of tissues are used as a source for sequencing. Moreover, RNA-seq is an effective tool for discovery of molecular markers, particularly for the prospection of gene-associated microsatellites (Teacher et al., 2012; Xu et al., 2013).

Due to the usefulness of revealing the genetic variation among individuals (Liu and Cordes, 2004), microsatellite markers have proven to be efficient for genetic characterization of wild populations and breeding stocks of farmed fish (Koljonen et al., 2002; Lehoczky et al., 2005), such as to prevent inbreeding (Ponzoni et al., 2008), to identify and preserve live gene banks (Machado-Schiaffino et al., 2007), to detect genetic structure (Do Prado et al., 2018), to direct matings during the formation of the population base of breeding programmes (Fernández et al., 2014), and to perform marker assisted selection (MAS) for economic traits (Houston et al., 2010). However, these markers are not available for pirapitinga, one of most important species for the aquaculture in South America.

For the aquaculture of pirapitinga, analysis of genetic variability in farmed stocks still needs to be performed, which will allow three hypotheses to be tested: (1) farmed stocks of pirapitinga have lower genetic diversity in relation to wild stocks; (2) farmed stocks of pirapitinga in Brazil are genetically structured; and (3) gene-linked microsatellites can be associated to economic traits of pirapitinga, such as growth and disease resistance. These analyses will support the creation of a breeding programme to increase the productivity of pirapitinga, by directed matings which lead to the formation of families, avoiding the problems of bottlenecks and inbreeding in the base population (Fernández et al., 2014), and by the identification of quantitative trait loci (QTL), which will assist the selection of superior genotypes by MAS (Houston et al., 2010).

Thus, the objective of the present study was to characterize genetic resources for the proper management of this non-model species in aquaculture, through transcriptome characterization and genetic variability analysis of stocks using microsatellite markers.

# MATERIALS AND METHODS

#### Ethics Statement

This study was carried out in strict accordance with the animal welfare guidelines of the National Council for Control of Animal Experimentation (Brazilian Ministry for Science, Technology, and Innovation). The present study was performed under authorization N◦ 33435-1, issued through ICMBio (Chico Mendes Institute for the Conservation of Biodiversity, Brazilian Ministry for Environment). No animal was housed or cared for in the laboratory. Fish were euthanized by benzocaine anesthetic overdose for collection of liver tissue for transcriptome sequencing. For microsatellite validation and genetic variability analysis, fin fragments were collected from each fish under benzocaine anesthesia and all efforts were made to minimize suffering.

#### Samples for Transcriptome Sequencing

To perform the transcriptome sequencing, samples of liver tissue were taken from 10 individual fish from three different Brazilian fish farms and one wild population: Aquaculture Center of São Paulo State University, CAUNESP, Jaboticabal, SP (n = 3); Projeto Peixe fish farm, Sales Oliveira, SP (n = 1); Fazenda São Paulo fish farm, Brejinho de Nazaré, TO (n = 5); and Tocantins River, Lajeado, TO (n = 1). Individuals from different origins were used in order to achieve the highest genetic variability in microsatellite discovery analysis. Liver samples were selected for transcriptome studies because the liver plays a critical role in coordinating various physiological processes, including digestion, metabolism, detoxification, and endocrine system immune response (Martin et al., 2010).

#### Samples for Genetic Variability Analysis

Analyses of microsatellite validation were performed in 22 individual pirapitinga collected from the Tocantins River (TO) from Lajeado City, Tocantins State, Brazil. We then used the microsatellite markers to study the genetic variability in samples collected from four commercial fish farms: TO1 (n = 25) and TO2 (n = 26), from Tocantins State (North Brazil); and SP1 (n = 36) and SP2 (n = 20), from São Paulo State (Southeast Brazil). To maintain the confidentiality of these fish farms, the names of and information on the fish farms have been preserved.

# Analysis of Genetic Purity in Pirapitinga Individuals

According Hashimoto et al. (2014), interspecific hybrids have been detected in broodstocks of Brazilian fish farms. The pirapitinga can be crossed with tambaqui C. macropomum or pacu P. mesopotamicus, resulting in viable and fertile hybrids (Hashimoto et al., 2012, 2014). Therefore, in the present study, special attention was given to analyze pure pirapitinga, and not interspecific hybrids. The analysis of genetic purity in all animals herein studied was performed using the mitochondrial genes, Cytochrome C Oxidase subunit I (mt-co1) and Cytochrome b (mt-cyb); and the nuclear genes, α-Tropomyosin (tpm1) and Recombination Activating Gene 2 (rag2), according to the protocols and methods of Hashimoto et al. (2011). Fish identified as interspecific hybrids were excluded from further analysis in this study.

#### cDNA Library Construction and Roche 454 Platform Sequencing

Samples of ∼100 mg of liver fixed in RNAlater were extracted with Rneasy Mini Kit (Qiagen). Each sample was quantified by spectrophotometry using NanoDrop ND-1000 equipment and the quality (integrity) was checked by 2100 Bioanalyzer equipment. It succeeded the preparation of an equimolar pool of total RNA samples (from 10 individuals) to mRNA enrichment with µMACS mRNA Isolation Kit (Miltenyi Biotech).

A non-normalized cDNA library was prepared using cDNA Synthesis System Kit with random primer GS Rapid Library Prep Kit and GS Rapid Library MID Adaptors Kit (Roche). The High Sensitivity DNA LabChip Kit (Agilent Technologies) with 2100 Bioanalyzer was used for quality analysis of the cDNA library. The concentration of sample (molecules/µL) was obtained by QuantiFluorTM—ST fluorimeter (Promega). Titration of emPCR (emulsion PCR) was performed with the GS FLX Titanium SV em PCR Kit (Lib-L) (Roche), according to the emPCR Amplification Method Manual—Libl SV, GS FLX+ Series, to identify the optimal number of DNA molecules per bead (cpb = copies per bead). After emPCR titration, the emPCR was performed with GS FLX Titanium LV emPCR Kit (Lib-L) (Roche), according to the emPCR Amplification Method Manual—LibL LV, GS FLX+ Series. The transcriptome sequencing was conducted using the Roche/454 technology (GS FLX Titanium Sequencing Kit XL +) from HELIXXA company (Campinas, SP, Brazil), which has been used for transcriptome analysis of non-model fish species (Renaut et al., 2010).

# Bioinformatic Analysis

Filtering of the initial quality of the 454 sequences in sff format was performed using the Roche Newbler programme. Sequence analysis was performed using the high-throughput sequencing module of CLC Genomics Workbench (version 7.5.1; CLC bio, Aarhus, Denmark). The raw reads were cleaned by trimming low quality sequences with quality scores of <20. Terminal nucleotides (five nucleotides at each extremity 5′ and 3′ ), ambiguous nucleotides, adapter sequences and reads <15 base pairs (bp) were discarded. For de novo assembly, contigs <200 bp were also discarded and the default local alignment settings were used to rank potential matches (mismatch cost of 2, insertion cost of 3, deletion cost of 3). The highest scoring matches that shared ≥50% of their length with ≥80% of similarity were included in the alignment. The assembled transcripts were subjected to cdhit-est programme with an identity threshold of 90% to remove redundancy (Li and Godzik, 2006; Duan et al., 2012). In order to remove any mitochondrial and ribosomal contamination, sequences were compared against pacu mitochondrial genome and zebrafish ribosomal RNA RefSeqs (NCBI database) using CLC Genomic Workbench (version 8.0.3; CLC Bio, Aarhus, Denmark).

Functional annotation of the unique consensus sequences was performed by homology searches against the National Center for Biotechnology Information (NCBI) non-redundant protein database (nr) (cutoff E-value of 1E-3) using BLAST2GO software (Conesa et al., 2005) to obtain the putative gene identity. All BLASTx hits were filtered for redundancy in protein accessions. The gene ontology (GO) terms were assigned to each unique gene based on the GO terms annotated to the corresponding homologs in the NCBI database (e-value cutoff 1e-6). The transcripts were further annotated in InterPro, Enzyme code (EC), and Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways analysis through the Bi-directional Best Hits (BBH) method.

Microsatellites were identified in the contigs using msatcommander software (Faircloth, 2008). Primers flanking the microsatellite loci were designed with Primer3plus software (Rozen and Skaletsky, 2000). The six possible reading frames of the consensus sequence of each functionally annotated contig containing microsatellite were compared against the NCBI protein database using BLASTx (e-value 1e-10) in order to find Open Reading Frame (ORF) regions. These approaches allowed us to locate microsatellites in coding sequences (CDS) or untranslated regions (5′UTR and 3′UTR) through graphical sequence viewer Tablet (Milne et al., 2013).

# Microsatellite Genotyping and Validation

DNA was extracted from fin fragments using the Wizard Genomic DNA Purification Kit (Promega), according to the manufacturer's protocol. Microsatellite validation was performed in 30 loci, selected according to the motif and functional annotation of the contigs. Amplifications were performed by polymerase chain reaction (PCR) in a total volume of 25 µl containing 100µM of each dNTP (dATP, dTTP, dGTP, and dCTP), 1.5 mM MgCl2, 1X Taq DNA buffer (20 mM Tris-HCl, pH 8.4, and 50 mM KCl), 0.1µM of each primer, 0.5 units of Taq Polymerase (Invitrogen) and 10-50 ng of genomic DNA. The reactions were performed in a thermocycler (ProFlexTM PCR System, Life Technologies) following initial denaturing for 10 min at 95◦C; 35 cycles of 30 s at 95◦C, 30 s at 55–60◦C (adjusted for each primer set), 20 s at 72◦C; and a final extension at 72◦C for 20 min.

Microsatellites that showed polymorphism in 6% polyacrylamide gels were analyzed in a 3130xl sequencer (Life Technologies) to get better accuracy of allele determination. The sequencing strategy adopted in this study was according to protocols described by Schuelke (2000), using the CAGtag primer (5′ -CAGTCGGGCGTCATCA-3′ ; Shirk et al., 2013) labeled with the fluorochromes HEX or FAM. The genotyping PCR was performed with the following reagents: 100µM of each dNTP, 1.5 mM MgCl2, 1X Taq DNA buffer, 0.1µM of each primer (F and R), 0.01µM of the CAGtag primer, 0.5 units of Taq Polymerase (Invitrogen), and 10–50 ng of genomic DNA. The cycling programme for amplification consisted of: nine cycles at 95◦C for 30 s, 55–60◦C for 30 s (adjusted for each primer set), 72◦C for 20 s; then, 30 cycles at 95◦C for 30 s, 50◦C for 30 s, and 72◦C for 20 s. During the first nine cycles, the annealing temperature of 55–60◦C allows incorporation of the primers (F and R) from the microsatellite loci. Then, in the following 30 cycles, the temperature of 50◦C facilitates the annealing of the fluorescent dye-labeled CAGtag primer. PCR products were analyzed by capillary electrophoresis with a 3130xl genetic analyzer, using the DS-30 matrix, with the GeneScan 500 ROX dye Size Standard (Thermo). The programme GeneMapper 3.7 (Applied Biosystems) was used to determine the allele sizes.

#### Microsatellite Diversity and Population Analysis

For statistical analysis, we initially used GenAlex analysis 6.1 software (Peakall and Smouse, 2012) to convert the arrays into specific formats for each programme. The observed (Ho) and expected (He) heterozygosity, Hardy-Weinberg Equilibrium (HWE) and Analysis of Molecular Variance (AMOVA) (Excoffier et al., 1992) were calculated using the Arlequim 3.5 programme (Excoffier and Lischer, 2010). The levels of significance for the HWE test were adjusted with the Bonferroni correction (Rice, 1989). The inbreeding coefficient (FIS) was performed using Genepop 4.0.11 (Rousset, 2008), based on Weir and Cockerham (1984) estimates. The fixation index (FST) was calculated using FSTAT 9.3.2 software (Goudet, 1995). Wright (1965) threshold values were adopted, FST = little genetic differentiation (0– 0.05); moderate genetic differentiation (0.05–0.25); high level of genetic differentiation (> 0.25). The programme Cervus v.3.0.7 (Marshall et al., 1998) was applied to verify the presence of null alleles. Linkage disequilibrium (LD) was estimated using Arlequin v.3.5.2.2. The levels of significance were adjusted to multiple tests using the Bonferroni correction.

After LD analysis, level of admixture among population samples was inferred by estimating the optimum number of clusters (K), as suggested by Evanno et al. (2005), using the programme STRUCTURE version 2.3.4 (Pritchard et al., 2000) without prior information about population. Primarily, we determined the distribution of 1K, an ad hoc statistic based on the rate of change in the log probability of data between successive K values. The range of clusters (K) was predefined from 1 to 5. The analysis was performed in 25 replicated runs using 200,000 iterations after a burn-in period of 50,000 runs. The K value most likely to explain the population structure is the modal value of this 1K. The outputs of STRUCTURE analysis were visualized through the STRUCTURE HARVESTER programme (Earl, 2012).

Analysis for population bottlenecks was tested using BOTTLENECK (Cornuet and Luikart, 1996; Piry et al., 1999), by using the mutation–drift equilibrium assuming the two-phase model (TPM) with 70% stepwise mutation model (SMM) and 30% infinite allele model (IAM). Deviations between the observed and expected frequency distributions were tested using the Wilcoxon's signed rank test. BOTTLENECK was run for 10,000 iterations.

# RESULTS

# Transcriptome Sequencing

The results of liver transcriptome sequencing in pirapitinga yielded a total of 192,373 reads, which were deposited in the Short Read Archive (SRA) of NCBI under the accession number SRR6303971. The raw reads presented an average length of 395.5 bp, comprising a total of ∼76 Mbp. After the trimming process, the average length of the reads was of 362.1 bp, resulting in a total of ∼69 Mbp (192,077 reads; **Table 1**). As P. brachypomus is considered a non-model organism, and therefore without reference genome, de novo assembly strategy was performed for transcriptome analysis, which yielded 3,696 non-redundant contigs as a result of 174,272 overlapping reads (63,460,229 bp). The size characteristics of the contigs are presented in **Table 1**. A total of 17,805 remaining reads (6,084,530 bp) was considered as singletons, and therefore they were not used for subsequent analysis.

Non-redundant sequences were annotated by BLASTx algorithm against the NCBI databases: non-redundant protein (nr), protein RefSeq of zebrafish and fugu. A total of 2,568 unique protein accessions (69.4% of transcripts) had significant similarity in the nr database. In relation to the protein RefSeq of zebrafish and fugu, we found similar numbers of annotated genes, which were of 2,498 (67.6%) and 2,419 (65.4%), respectively. No sequence showed homology with known pirapitinga protein sequences deposited in NCBI database, because the available sequences database is still limited to mostly mitochondrial sequences.

Of the 2,568 contigs with correspondence in the nr database, 2,075 (80.8%) were annotated in the categories of Gene Ontology (GO). A total of 1,831 assignments to Biological Process (88.2%) were found, followed by 1,757 to Molecular Function (84.6%) and 1,378 to Cellular Component (66.4%). In relation to the GO subcategories, the most abundant terms were related to: metabolic process, cellular process, and single-organism process of the Biological Process category; binding, catalytic activity, and

TABLE 1 | Data of de novo assembly from liver transcriptome of pirapitinga Piaractus brachypomus.


transporter activity of the Molecular Function; cell, organelle, and membrane of the Cellular Component (**Figure 1**). In the present study, genes assigned to the immune system, growth and reproduction were found, and therefore these data will serve as support for future studies on the aquaculture of pirapitinga.

The transcripts characterization in the KEGG database demonstrated that 1,122 sequences were identified in 106 metabolic pathways. Genes involved in the biosynthesis of antibiotics, purine metabolism and glycolysis/gluconeogenesis were able to be highlighted (**Figure 2**).

#### Microsatellite Diversity and Population Analysis

The search for short sequence repeats (SSR) in the 3,696 contigs resulted in the discovery of 130 microsatellite markers distributed in 95 contigs. In total, 75 pairs of primers were designed adjacent to the microsatellite loci, including the following sequence repeats: 56 di, 13 tri, 4 tetra, and 2 pentanucleotide. Among the dinucleotide motifs, the main repeats were the types AC (48.28%), AG (39.65%), AT (10.35%), and CG (1.72%). In relation to the trinucleotide motifs, we identified seven types (AGC, AGG, ATC, AAT, ACG, CCG, and AAG). The tetranucleotide (ATCT, AAAG, AATG, and AAAT) and pentanucleotide (ACTAT and ATAGT) sequences were described with the presence of four and two types of motifs. In relation to the gene position, 26.76% of the microsatellite markers were found in the 3′UTR (untranslated region), 19.71% in the 5′UTR, and 29.58% in the cds (coding sequence).

In the process of microsatellite validation, 30 markers were evaluated in 22 samples of pirapitinga collected from the wild. Of these markers, eight microsatellite loci showed polymorphism (GenBank accession numbers MG595996— MG596003), revealed by the presence of different fragment sizes (**Table 2**). The number of alleles was low, which ranged from 2 (loci C25, C64, C410, and C1832) to 5 (C1376) and mean of 2.750 ± 0.366. The expected (He) and observed (Ho) heterozygosity in the wild population had an average of 0.466 ± 0.061 and 0.355 ± 0.076, respectively. Most of the loci showed positive values for FIS, except the locus C64. Three microsatellite loci (C13, C25, and C1716) showed significant deviation from the Hardy–Weinberg Equilibrium (HWE) after Bonferroni correction (adjusted p = 0.00625).

The results of genetic variability in farmed stocks revealed that North fish farms TO1 and TO2 had higher diversity than the wild population, demonstrated by number of alleles (mean of 4.500 ± 0.423 and 3.375 ± 0.596, respectively) and average values of H<sup>e</sup> (0.589 ± 0.033 and 0.488 ± 0.044, respectively) and H<sup>o</sup> (0.520 ± 0.060 and 0.447 ± 0.040, respectively; **Table 3**). The Southeast fish farms SP1 and SP2 showed the lowest genetic variability when compared to other populations, with lower allele number (mean of 2.250 ± 0.313 and 3.125 ± 0.581, respectively), and average of He(0.226 ± 0.077 and 0.278 ± 0.085, respectively; p < 0.05) and H<sup>o</sup> (0.259 ± 0.103 and 0.251 ± 0.079, respectively; **Table 3**). Most of the microsatellite loci were characterized with positive values of FIS, except for SP1 and SP2. The mean value of FIS and null alleles was positive in most populations, with the exception of SP1 (−0.071 ± 0.076 and −0.011 ± 0.080). The majority of the markers were in concordance to HWE, after Bonferroni correction, with the exception of C25 (TO1, SP1, and SP2), C64 (SP1 and SP2) and C1376 (SP1) (**Table 3**). Linkage disequilibrium was found between the microsatellites C410 and C1005 (p < 0.00625). Although molecular markers on linkage disequilibrium were not applied in genetic variability studies, this information can be useful in future analysis of genetic mapping.

In bottleneck analyses, evidence for recent reductions in population size (bottleneck) using TPM was not found, except for the wild population of Tocantins River (p = 0.027).

In the evaluation of the level of admixture among stocks by STRUCTURE, the model-based clustering analyses detected K = 3, allowing the identification of 3 main clusters between the populations: Group 1 (SP1 and SP2), Group 2 (TO1 and TO2), and Group 3 (wild) (**Figure 3**).

The global FST was 0.379, which showed high genetic differentiation among the populations (p < 0.05). Pairwise FST detected a higher genetic differentiation between the wild population and all farmed stocks, particularly when compared to SP1 (FST = 0.538, p < 0.05) and SP2 (FST = 0.537, p < 0.05). Additionally, high genetic structure was found between the populations from North and Southeast Brazil, as observed between TO2 with SP1 (FST = 0.463, p < 0.05) and TO1 with SP2 (FST = 0.380, p < 0.05; **Table 4**). Moreover, values of pairwise FST after stock clustering detected a higher genetic differentiation when comparing Group 1/Group 2 (FST = 0.379, p < 0.05), Group 1/Group 3 (FST = 0.549, p < 0.05), and Group 2/Group 3 (FST = 0.144, p < 0.05).

The results of AMOVA showed that the majority of genetic variation (29.11%, FCT = 0.291, p < 0.001) occurred between groups (according to STRUCTURE clustering), while the variation among individuals within populations was only 8.21% (FIS = 0.126, p < 0.001) and among populations within groups presented 6.06% of genetic variation (FSC = 0.085, p < 0.001).

# DISCUSSION

#### Transcriptome Sequencing

Currently, genetic resources for pirapitinga P. brachypomus are limited only to sequences of the mitochondrial genome (Chen et al., 2016). Thus, one of the main results of this study was the data generated through transcriptome sequencing, because little knowledge was available about the genes of this species. The efficiency of the Roche/454 sequencing system in the functional genomics analysis of pirapitinga can be observed because of the 3,696 transcripts that were generated in this study. According to Seeb et al. (2011), genome reduction strategies for NGS sequencing (e.g., transcriptome sequencing) are more viable when the objective is to prospect molecular markers and genetic information for use in aquaculture, in a low cost and fast way. Roche/454 sequencing technology is one of the main methods used in NGS transcriptome of non-model fish (Renaut et al., 2010;

Shin et al., 2012; Calduch-Giner et al., 2013; Mutz et al., 2013).

The results of functional annotation showed that the sequences of pirapitinga had a high proportion of annotated genes when compared to the database of zebrafish and fugu proteins. The gene annotation allowed identification of genomic regions responsible for ontogenetic development processes, biological regulation, the immune system, and regions involved in processes of growth and reproduction. Consequently, the present data can be used as the basis of further biological studies of other areas of aquaculture or for future breeding programmes. In addition, through transcriptome sequencing, the discovery of gene-associated microsatellites can be considered to be the main result which can be applied to pirapitinga aquaculture, as already demonstrated in previous studies of fish (Renaut et al., 2010; Helyar et al., 2012; Shin et al., 2012). The use of geneassociated markers becomes even more important in the construction of genetic maps (Shin et al., 2012) because, by comparative genomics using fish genome references already sequenced, it is possible to presume the location of each studied locus.


TABLE 2 | Characterization of the genetic diversity of eight polymorphic microsatellites in the wild population of pirapitinga (Piaractus brachypomus).

The wild population analyzed corresponds to 22 individuals collected on the Tocantins River. TA, annealing temperature (◦C); Na, number of alleles per locus; Ho, observed heterozygosity; He, expected heterozygosity; P (HWE), p-value from Hardy-Weinberg equilibrium and FIS, inbreeding coefficient. F(Null), Null alleles; ND, not performed.

Moreover, some examples have demonstrated that gene-linked microsatellite markers can be correlated with interesting productive traits, especially for growth performance. In the fish Sparus aurata, a dinucleotide microsatellite in the 5 ′ UTR of the growth hormone gene (GH) is linked with faster growth rate, especially the alleles 250 and 254, which can be used for breeding management and genetic selection for this trait (Almuly et al., 2005). In other fish species, such as Oreochromis niloticus and Lates calcarifer (Yue et al., 2001; Yue and Orban, 2002), microsatellites have also been reported for genes of interest (prolactin, GH and igf2) and, therefore, they can be used in marker-assisted selection (MAS) programmes. In the present study, eight polymorphic microsatellite loci were validated, some of them located in gene regions that may be useful for productive characteristics in aquaculture. In this case, a microsatellite locus was found in the gene Tetraspanin−3 isoform x1 (C1832), which plays a role in viral infection pathology (Martin et al., 2005; Shoshana and Shoham, 2005). There is another microsatellite in the gene Cytosolic 5 – nucleotidase 3 a-like (NTC5C3) (C25), which contributes in the production of red blood cells and its mutation can cause hemolytic anemia and influence on the immune system (Aksoy et al., 2009). Thus, the microsatellites described in this study will be also important in future analysis of (QTL) linked to traits of disease resistance, which has received special attention in aquaculture species, such as turbot (Scophthalmus maximus), rainbow trout (Oncorhynchus mykiss), salmon (Salmo salar), Nile tilapia (O. niloticus), and cod (Gadus morhua), investigating the resistance to pathogens (Pardo et al., 2008; Ødegård et al., 2010, 2011; Yáñez et al., 2014; Evenhuis et al., 2015). Furthermore, one microsatellite locus was also detected in the gene apolipoprotein e (C410), which is associated with the central nervous system and the senescence process (Wang et al., 2014). These markers can provide useful information for studies of the biology of the pirapitinga, besides serving as a framework for other native species.

#### Population Analysis

The validation of eight microsatellites showed a low level of genetic diversity in these loci, both in wild and farmed stocks. In the wild, the observed heterozygosity (Ho) ranged from 0.000 to 0.545 and an average of 2.750 alleles per locus. These values confirm the low genetic variability when compared with related species, such as pacu P. mesopotamicus (H<sup>o</sup> range from 0.068 to 0.911 and average of 8.5 alleles per locus), and tambaqui C. macropomum (H<sup>o</sup> range from 0.430 to 0.880 and average of 12.8 alleles per locus; Calcagnotto and DeSalle, 2009; Fazzi-Gomes et al., 2017). In contrast to neutral markers (microsatellites in noncoding regions), gene-associated microsatellites might be more susceptible to selection pressure and, therefore, they have low values of gene diversity.

Analysis of the genetic diversity in pirapitinga farmed stocks showed significant differences between fish farms in different regions of Brazil, two from the Southeast (São Paulo State: SP1 and SP2) and two from the North (Tocantins State: TO1 and TO2). In general, farmed stocks were expected to have low genetic variability as a result of genetic decline, genetic TABLE 3 | Values of genetic diversity of eight microsatellite loci of Piaractus brachypomus.


Wild, population from the Tocantins River; TO1 and TO2, fish farms from Tocantins; SP1 and SP2, fish farms from São Paulo, Na, number of alleles; Ho, observed heterozygosity; He, expected heterozygosity; P (HWE), P-value from Hardy-Weinberg equilibrium; FIS, inbreeding coefficient; F(Null), Null alleles; ND, not performed.

drift, selection and inbreeding (Theodorou and Couvet, 2015). However, the results of this study showed higher genetic variability in breeding stocks from North fish farms in relation to the wild stocks (p < 0.05; higher values of allelic frequency and heterozygosity), which was also observed in studies with other related species (Barroso et al., 2005; Panarari-Antunes et al., 2011). The basis of this result could be considered from three different perspectives: (1) North fish farms had originated from different wild stocks resulting in high level of genetic variability; (2) problems of sample size bias, such as few microsatellite loci and individuals analyzed; (3) evidence for recent genetic bottlenecks in the wild population. Some studies of fish have reported bottlenecks in natural populations, particularly due to habitat loss and fragmentation by human disturbance (Brauer et al., 2016). In the case of pirapitinga, the fragmentation of the Tocantins River by hydroelectric dams in the 80′ s and 90′ s (e.g., Tucuruí and Luiz Eduardo Magalhães dams, where wild fish were collected for this study) could be responsible for

TABLE 4 | Analysis of pairwise FST based on eight microsatellite loci between populations of Piaractus brachypomus.


Wild, population from the Tocantins River; TO1 and TO2, fish farms from Tocantins; SP1 and SP2, fish farms from São Paulo. All results of FST were significant statistically p < 0.05.

a population reduction and subsequent genetic variation loss detected by our microsatellite analysis. There are considerable numbers of hydropower dams in the basin, which can affect the reproduction, migratory routes, and egg and larvae drift of fish (Agostinho et al., 2008). Alteration of the migratory flow consequently leads to a decrease in or interruption of the gene flow, reducing the population size, which makes the fish more susceptible to the effects of genetic drift (Hatanaka and Galetti, 2003), which results in genetic structure for some fish species (Calcagnotto and DeSalle, 2009; Do Prado et al., 2018).

STRUCTURE and pairwise FST analyses suggested a high genetic structure between the stocks herein analyzed, particularly as result of the fixation of specific alleles in some loci, which resulted in three clusters (**Figure 3**). There are three hypothetical explanations for these genetic patterns: (1) differentiation of wild population in relation to farmed stocks, which could be due to the selection of the fittest individuals for farming systems or low number of founders for the establishment of the farmed broodstocks; (2) lower genetic structure in North/wild than Southeast/wild, which suggests that North fish farms had frequent broodstock renovation from the wild; (3) fish farms were genetically clustered due to the geographic distribution, i.e., the degree of genetic similarity is higher when one fish farm is closer to the other, indicating interchange of individuals between nearby fish farms, common origin of the farmed broodstocks, or fixation/selection of specific alleles for different climatic conditions that are found in Brazil (North and South). However, these genetic patterns should be also evaluated using neutral markers (microsatellites in noncoding regions) and through techniques of higher genome coverage (SNP, single-nucleotide polymorphism).

Through AMOVA analysis, the main genetic variation was found to be present within populations (64.8%). This genetic pattern has also been reported in studies carried out with pacu (Calcagnotto and DeSalle, 2009; Iervolino et al., 2010) and tambaqui (Aguiar et al., 2013). Moreover, highly significant genetic variation was associated with differences between groups (Wild, SP, and TO), which represented 29.11% of genetic variation, in contrast to low differences among populations within groups (6.06%).

In general, our study of genetic characterization in piratininga farmed stocks provides important insights which can lead to better management of this species in aquaculture. Our results are fundamental to beginning a breeding programme, since the genetic structure should be taken into consideration when composing an initial base population, where matings between farmed individuals from North and Southeast Brazil are shown to result in higher genetic variability in the families. Moreover, the data suggested levels of genetic diversity which were higher in farmed stocks than in wild fish, discarding the occurrence of inbreeding. In general, lack of knowledge on genetic variability of stocks can result in inbreeding and fixation of deleterious genes, reduced growth rates, disease resistance problems and reduced ability to adapt to new environments (Arkush et al., 2002; Gallardo et al., 2004; Neira et al., 2006; Hillen et al., 2017). Therefore, besides the identification of QTL to assist in the selection of superior genotypes by MAS, studies of microsatellites are important for genetic monitoring, supporting pirapitinga aquaculture and increasing its productivity.

# FINAL CONSIDERATIONS

The prospection of genetic data for pirapitinga is one of the priority issues for aquaculture, since this species is of high economic importance in national and global fish farming. The identification of gene-associated microsatellites by NGS is fundamental to understanding the genetic structure of wild and farmed populations, providing support for further management programmes and genetic pre-breeding programmes. Moreover, the microsatellites described herein are interesting targets used to find QTL markers, specifically related to the immune system of pirapitinga.

# AUTHOR CONTRIBUTIONS

PJ: Acquisition, analysis and interpretation of data, draft of the work, final approval of the version; VM-F, RA, and MdF: Draft of the work, development of intellectual content, final approval of the version; MH: Analysis and interpretation of data, draft of the work, final approval of the version; NM: Analysis and interpretation of data, draft of the work, final approval of the version; MV: Analysis and interpretation of data, draft of the work, final approval of the version; FP-F: Acquisition, Interpretation of data, development of intellectual content, final approval of the version; DH: Draft of the work, development of intellectual content, writing of the manuscript, final approval of the version.

#### REFERENCES


#### ACKNOWLEDGMENTS

This work was supported by grants from FAPESP (2014/03772-7 and 2014/05732-2), CNPq (446779/2014-8 and 305916/2015-7), and PROPE/UNESP.


IBGE (2016). Produção da Pecuária Municipal 2016. Rio de Janeiro 44, 1–51.


response in Atlantic salmon (Salmo salar). BMC Genomics 11:418. doi: 10.1186/1471-2164-11-418


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Jorge, Mastrochirico-Filho, Hata, Mendes, Ariede, Freitas, Vera, Porto-Foresti, and Hashimoto. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genetic and Morphological Analyses Demonstrate That Schizolecis guntheri (Siluriformes: Loricariidae) Is Likely to Be a Species Complex

Camila S. Souza<sup>1</sup> , Guilherme J. Costa-Silva1,2, Fábio F. Roxo<sup>1</sup> \*, Fausto Foresti<sup>1</sup> and Claudio Oliveira<sup>1</sup>

<sup>1</sup> Departamento de Morfologia, Universidade Estadual Paulista "Júlio de Mesquita Filho", Instituto de Biociências de Botucatu, Botucatu, Brazil, <sup>2</sup> Departamento de Biologia, Universidade Santo Amaro, São Paulo, Brazil

Schizolecis is a monotypic genus of Siluriformes widely distributed throughout isolated coastal drainages of southeastern Brazil. Previous studies have shown that fish groups found in isolated river basins tend to differentiate over time in the absence of gene flow, resulting in allopatric speciation. In this study, we used partial sequences of the mitochondrial gene COI with the analysis of the General Mixed Yule Coalescent model (GMYC) and the Automatic Barcode Gap Discovery (ABGD) for single locus species delimitation, and a Principal Component Analysis (PCA) of external morphology to test the hypothesis that Schizolecis guntheri is a complex of species. We analyzed 94 samples of S. guntheri for GMYC and ABGD, and 82 samples for PCA from 22 coastal rivers draining to the Atlantic in southeastern Brazil from the Paraná State to the north of the Rio de Janeiro State. As a result, the GMYC model and the ABGD delimited five operational taxonomy units (OTUs – a nomenclature referred to in the present study of the possible new species delimited for the genetic analysis), a much higher number compared to the traditional alfa taxonomy that only recognizes S. guntheri across the isolated coastal rivers of Brazil. Furthermore, the PCA analysis suggests that S. guntheri is highly variable in aspects of external body proportions, including dorsal-fin spine length, pectoral-fin spine length, pelvic-fin spine length, lower caudal-fin spine length, caudal peduncle depth, anal width and mandibular ramus length. However, no exclusive character was found among the isolated populations that could be used to describe a new species of Schizolecis. Therefore, we can conclude, based on our results of PCA contrasting with the results of GMYC and ABGD, that S. guntheri represents a complex of species.

Keywords: coastal drainages, catfish, molecular identification, COI gene, GMYC model

# INTRODUCTION

The distribution pattern of single fish species throughout independent hydrographic systems (i.e., current not connected rivers) is unusual among fishes of the Atlantic rainforest rivers (Menezes et al., 2007) as well as among members of Otothyrinae (Reis et al., 2003). Recently, several genetic studies focusing on freshwater fishes, such as Rineloricaria (Costa-Silva et al., 2015), Curimatopsis

#### Edited by:

Samuel A. Cushman, United States Forest Service (USDA), United States

#### Reviewed by:

Gonzalo Gajardo, University of Los Lagos, Chile Chandra Shekhar Prabhakar, Bihar Agricultural University, India

> \*Correspondence: Fábio F. Roxo roxoff@hotmail.com.br

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 25 August 2017 Accepted: 15 February 2018 Published: 02 March 2018

#### Citation:

Souza CS, Costa-Silva GJ, Roxo FF, Foresti F and Oliveira C (2018) Genetic and Morphological Analyses Demonstrate That Schizolecis guntheri (Siluriformes: Loricariidae) Is Likely to Be a Species Complex. Front. Genet. 9:69. doi: 10.3389/fgene.2018.00069

(Melo et al., 2016), Piabina (Pereira et al., 2011), and Astyanax (Ornelas-Garcia et al., 2008), have shown that species of these groups may present large discontinuities in their distribution patterns with high genetic divergences, but with low morphological variability among geographically isolated populations. These results suggest that these groups may represent a complex of species –, i.e., they are constituted by two or more morphological variable species that are erroneously classified (and hidden) under one species name (Brown et al., 1995). Usually, studies focused on morphology alone are inadequate to recognize species complex. Integrative studies using molecular markers (e.g., DNA sequencing and allozymes), in addition to morphological comparison, are more powerful for the recognition of possible new species in such complex groups (Sytsma and Schaal, 1985).

The fast development of DNA sequencing and advances in molecular techniques in the last few years have been effective in recognizing species in several organism groups – birds (e.g., Tavares et al., 2011; Saitoh et al., 2015), fishes (e.g., Ward et al., 2005; Pereira et al., 2011, 2013; Roxo et al., 2012, 2015; Shimabukuro-Dias et al., 2016), insects (e.g., Hebert et al., 2004; Versteirt et al., 2015; Batovska et al., 2016), mammals (e.g., Borisenko et al., 2008; Li et al., 2015), plants (Kress et al., 2005; Lahaye et al., 2008; Braukmann et al., 2017), fungi (Begerow et al., 2010; Kelly et al., 2011), archaea (Bates et al., 2011), and bacteria (Sogin et al., 2006). The use of DNA sequences combined with several analytical methods, such as General Mixed Yule Coalescent (GMYC; Pons et al., 2006) and Automatic Barcode Gap Discovery (ABGD; Puillandre et al., 2012), support species delineation with single-locus data. The GMYC method is based on a likelihood method that seeks to determine the threshold between speciation and coalescent events from an ultrametric gene tree, whereas ABGD methods use the gap among organisms belonging to the same species and organisms from different species as a limit to species delimitation.

Schizolecis was described by Britski and Garavello (1984), being the species-type Microlepidogaster guntheri Miranda Ribeiro (1918). Currently Schizolecis guntheri is the only species of Schizolecis. This species is a descendant of a very ancient lineage that arose during the Middle Eocene approximately 42 Mya (Roxo et al., 2014), and inhabits small to median size streams with rocky and sandy bottoms, mostly in shallows and backwaters up to 30 cm deep, with slow water flow (Burgess, 1989). The work conducted by Britski and Garavello (1984) detected morphological differences only related to orbits of the eyes, body depth and head depth among populations, but without enough evidence to support the hypothesis that some of the analyzed populations could represent a new species. Therefore, despite Schizolecis guntheri being widely distributed across adjacent and not connected Atlantic Coastal rivers from the north of Santa Catarina to the north of Rio de Janeiro States (Menezes et al., 2007), and present small morphological variations among isolated populations; the doubt of whether S. guntheri represents a complex of species still remains.

In the present study, we used genetic data of 94 samples of Schizolecis guntheri from 22 coastal rivers draining directly to the Atlantic in southeastern Brazil and employed analytical methods to support species delineation with single-locus data (GMYC and ABGD), and analyzed the body shape variation among isolated populations using a PCA to test whether this species represents a complex of species.

#### MATERIALS AND METHODS

# Morphological Analysis

Principal Component Analysis (PCA) (Jolliffe, 2002) was used to check the external morphology variation among 82 samples of Schizolecis guntheri among regions of genetic groups (Supplementary Table S3) using the program Past version v1.28 (Hammer et al., 2004). Landmarks distances followed those originally proposed by Carvalho and Reis (2009), and they were measured for adult specimens (>26.2 mm SL). Prior to the PCA analysis, we followed the method of Dryden and Mardia (1998) to minimize body size influence on morphometric data. We normalized the first two coordinate dimensions, divided all coordinate values by the centroid size for each specimen, and conducted a Procrustes superimposition of the left half to a mirrored version of the right half. After that we also log

TABLE 1 | Variable loadings in the first and second axes of the size-free principal component analysis (Axis 1 and Axis 2) of samples of Schizolecis guntheri.


Bold values represent the character with the highest variations.

transformed the data for base 10. The PCA loadings are presented in **Table 1**.

#### Taxon Sampling for Genetic Analysis

We analyzed 94 Schizolecis guntheri specimens from 22 coastal rivers draining directly to the Atlantic, from the south of Paraná to the north of Rio de Janeiro States (**Figure 1**), almost comprising the entire distribution of this species. Information about the sample used in the present study is available in Bold Systems with the accession numbers for individuals present in Supplementary Table S1. The vouchers and tissues are deposited in the fish collection of the LBP – Laboratório de Biologia e Genética de Peixes, Universidade Estadual Paulista, Botucatu, São Paulo, Brazil. A sequence from one additional species of Loricariidae (Hypostomus ancistroides) was used as an outgroup to root our tree.

The abbreviation OTU (operational taxonomical unit) was used in the text to refer to the possible new species of the molecular analysis of the GMYC model and ABGD method (sensu Blaxter et al., 2005). This last author defined the term OTU to refer to genetic clusters of unknown organisms grouped by DNA sequences.

#### Ethics Statement

All fishes collected for this study were collected in accordance with Brazilian laws, under a scientific collection license in the name of Dr. Claudio Oliveira (SISBIO). Furthermore, our laboratory has special federal permission to keep animals and tissues from a public collection under our care. To work with the animals, we followed all the ethical prescriptions stated by our internal committee of ethics (protocol number 388), called the "Comissão de Ética na Experimentação Animal" (CEEA), involving animal experiments that were approved for this study. After collection, animals were anesthetized with benzocaine, a piece of muscle tissue was extracted from the right side of the body and preserved in 95% ethanol. Specimens were fixed in 10% formalin for 2 weeks, and then transferred to 70% ethanol for permanent storage.

FIGURE 1 | Image showing the map of coastal rivers of southeastern of Brazil following the paleodrainages inference of Thomaz et al. (2015), and adding the rivers of Guanabara Bay, the Paraiba do Sul basin and adjacent rivers. Numbers in the maps represents collection sites: 1 – Itaboraí-RJ; 2 – Bom Jardim-RJ; 3 – Morretes-PR; 4 – Ubatuba-SP; 5 – Caraguatatuba-SP; 6 – São Sebastião-SP; 7 – Bertioga-SP; 8 – Cajati-SP; 9 – Parati-RJ; and 10 – Angra dos Reis-RJ. Each number can represent more than one collection site.

# DNA Extraction, Amplification, and Sequencing

We conducted the total genomic DNA extraction using the protocol described by Ivanova et al. (2006). Partial sequences of the cytochrome oxidase C subunit I (COI) gene were amplified (approximately 655 bp) using the primers Fish F1 and Fish R1 (Ward et al., 2005). Amplifications were performed in a total reaction mixture volume of 12.5 µl. Each reaction includes 1.25 µl of 10 X Buffer, 0.25 µl of MgCl<sup>2</sup> (50 mM), 0.2 µl dNTPs (2 mM), 0.5 µl of each primer (5 mM), 0.1 µl of Pht Taq DNA polymerase (Phoneutria Biotecnologia e Serviços Ltda, Belo Horizonte, Brazil), 1 µl of genomic DNA (200 ng) and 8.7 ml ddH2O. The conditions for each PCR reaction consisted of an initial denaturation (5 min at 94◦C), followed by 30 cycles of chain denaturation (40 s at 94◦C), primer hybridization (30 s at 50–54◦C), nucleotide extension (1 min at 68◦C, considering the optimum temperature of the Pht Tap DNA polymerase) and final extension (8 min at 72◦C, to stabilize the reaction). The amplified products were checked on 1% agarose gels and then purified using ExoSap-IT (USB Corporation, Cleveland, OH, United States) following the manufacturer's instructions. We accomplished the sequencing reactions using the BigDye TM Terminator v 3.1 Cycle Sequencing Ready Reaction Kit (Applied Biosystems, Austin, TX, United States) and purified again by ethanol precipitation. DNA sequencing was conducted in an ABI 3130 DNA Analyzer automatic sequencer (Applied Biosystems, Foster City, CA, United States).

#### Genetic Analysis

The consensus sequences were obtained using the program Geneious 7.1.4<sup>1</sup> (Kearse et al., 2012) and the alignment was generated with the algorithm Muscle (Edgar, 2004) under default parameters. To evaluate the occurrence of substitution saturation in our molecular data, we estimated whether the Iss (index of substitution saturation) is significantly lower than Iss.cAsym (assuming asymmetrical topology) using the method described by Xia et al. (2003) with the software DAMBE 5.3.38 (Xia, 2013). After the identification of the OTUs by the GMYC model and ABGD analysis, we calculated the genetic variation within and among the OTUs delimited by each method using the Kimura-2-parameter (K2P) model in the MEGA v.6.06 software (Tamura et al., 2013).

# GMYC Model

The GMYC method requires an ultrametric tree as input for the analysis. To estimate the ultrametric tree, we used Beast v1.8.2 (Drummond et al., 2012), employing an uncorrelated lognormal relaxed clock and birth-death speciation process, and the General Time Reversible (GTR) model (Lanave et al., 1984; Tavaré, 1986). The Bayesian topology reconstruction started with a UPGMA tree and the Markov Chain Monte Carlo (MCMC) method was performed for 100 million generations; a tree was sampled for every 20,000 generations. We used the software Tracer v1.6 (Rambaut et al., 2014) to check the convergence of the

# ABGD Analysis

Automatic Barcode Gap Discovery analysis (Puillandre et al., 2012), was processed using the "graphic" web version available at http://wwwabi.snv.jussieu.fr/public/abgd/abgdweb.html, under the default parameters of Pmin = 0.001 to Pmax = 0.1, steps = 10, X (relative gap width) = 1.5, Nb bins (for distance distribution) = 20, and the Kimura (K80) molecular model. For this analysis, the external group (H. ancistroides) was excluded from the input data.

# RESULTS

# Morphological Analysis

The first (PC1) and second (PC2) principal component axis of our analysis explained 28.1% and 12.4%, respectively, of variation in body shape for all analyzed Schizolecis guntheri specimens. The variation is partly distributed within populations and partly between populations, and apparently, it represents a continuous distribution of external morphology, as we can observe in the PCA scatter plot (**Figure 1**). Our results also showed that the measures with greater variations were, respectively: dorsal-fin spine length, pectoral-fin spine length, pelvic-fin spine length, lower caudal-fin spine length, caudal peduncle depth, anal width and mandibular ramus length, as we can observe in the PCA loading values (**Table 1**).

# Statistics of the DNA Matrix

We obtained a matrix with 533 characters, 230 of which were variable. The nucleotide frequencies were A (25.4%), G (16.7%), T (29.3%), and C (28.7%). No insertions, deletions, stop codons or contamination in the sequences were detected. The data were not saturated considering that the Iss.cAsym values are higher than the Iss for the different numbers of NumOTU analyzed (4 OTUs: Iss = 0.187, Iss.cAsym = 0.763, P = 0; 8 OTUs: Iss = 0. 177, Iss.cAsym = 0.643, P = 0; 16 OTUs: Iss = 0.179, Iss.cAsym = 0.516, P = 0; and 32 OTUs: Iss = 0.180, Iss.cAsym = 0.383, P = 0) in the total matrix (all molecular characters including gaps).

# GMYC Model and ABGD Analysis

The phylogenetic reconstruction resulted in a tree with high values of posterior probability among the clades (>95%), highlighting the existence of five lineages within Schizolecis guntheri (**Figure 2**). The analysis of species delimitation using the GMYC model under a phylogenetic tree estimated using a Birth–Death model prior of branching rates showed that the found threshold time was −0.0055687, indicating the time

values. All sampled topologies beneath the asymptote (20,000,000 generations) were discarded as part of a burn-in procedure, and the remaining trees were used to construct a 90% majority-rule consensus tree using Tree Annotator v1.8.2 (Drummond et al., 2012). The GMYC analysis was performed with the package Species Limits by Threshold Statistics ("splits") (Fujisawa and Barraclough, 2013) using R v3.0.0 (R Development Core Team, 2014) that only includes the ingroup (the outgroup Hypostomus ancistroides was excluded).

<sup>1</sup>http://www.geneious.com

Values below 0.95 are not shown. The vertical red line represents the threshold time (–0.0055687) – the limit between species and population according to the GMYC model in the tree. The colors of the branches to the right of the red line correspond to the colors of the paleodrainages in Figure 1, and numbers after are local to collection sites also in Figure 1. LBPV is the code for samples of the fish collection of the Laboratório de Biologia e Genética de Peixes on the Bold systems site.

before which all nodes reflect diversification events and after which all nodes in the tree reflect coalescent events. The likelihood of the null model was 958.2078 and the maximum likelihood of the GMYC model was 962.0088. The standard log-likelihood ratio test is used to assess whether the alternative model (GMYC model) provides a better fit than the null model in the branching process (see Goldman, 1993 and Pons et al., 2006 for a better explanation about how the GMYC model assesses species delimitation).

The analysis of ABGD partitioned the data from 19 groups (Pmin = 0.0010003) to 3 groups (Pmax = 0.021544). The value of P = 0.004642 delimited five groups, the same number of groups delimited by the GMYC model (see Supplementary Table S1). Therefore, the OTUs for Schizolecis guntheri were divided following the genetic delimitation of five clusters (OTUs) according to the results of both genetic methods, and named according to their localities: OTU I – Itaboraí-RJ and Bom Jardim-RJ; OTU II – Morretes-PR; OTU III – Ubatuba-SP, Caraguatatuba-SP, São Sebastião-SP and Bertioga-SP; OTU IV – Cajati-SP; and OTU V – Angra dos Reis and Parati (**Figures 1**, **2** and Supplementary Table S1). The values of genetic distance among the five OTUs ranged from 1.1% (OTU III and OTU IV) to 8.7% (OTU II and OTU V) (**Table 2**).

### DISCUSSION

The results of the Bayesian and GMYC analyses (**Figure 2**) of the samples from 22 coastal rivers localities draining directly to the Atlantic of southeastern Brazil (**Figure 1**) highlighted the existence of five monophyletic and highly statistically supported (>95%) lineages (OTUs sensu Blaxter et al., 2005) within Schizolecis guntheri (**Figure 2**), the same result found by the ABGD method (Supplementary Table S2) that also divided the data into five clusters (or five OTUs). Several authors (e.g., Hänflig and Brandl, 1998; Paggi et al., 1998; Baric and Sturmbauer, 1999; Ferguson, 2002) have summarized the arguments for using genetic divergence for identifying separate species. According to these authors, sufficient genetic distance indicates reproductive isolation between probable/possible species (sensu Mayr, 1963) that gradually accumulates genetic differences between lineages, and after long-time periods, accumulates sufficient genetic divergence so that a speciation event could be detected.

TABLE 2 | Genetic divergences based on the Kimura-2-parameter (K2P) nucleotide model for OTUs delimitated by GMYC analysis.


In the main diagonal are the values of intragroup genetic divergences highlighted in bold. Below the main diagonal are the values of intergroup (OTUs) genetic divergences. The values are shown in percentages.

However, despite the similarity of the results of the GMYC and ABGD analysis methods in the present study, if we apply the 2% threshold of interspecific genetic divergence as a limit among population and species (as proposed by Smith et al., 2005) the OTU III and OTU IV with 1.1% and OTU IV and OTU V with 1.9% (**Table 2**) should be considered members of the same cluster. The different number of species delimited for different molecular methods is also a problematic question in species delimitation using single locus genes (Pons et al., 2006). However, the two analytical molecular methods used in the present study (e.g., GMYC and ABGD) resulted in the same numbers of OTUs (i.e., five OTUs) and the application of these methods has been encouraging, highlighting a hidden genetic diversity in several neotropical fish species (e.g., Costa-Silva et al., 2015; Roxo et al., 2015; Melo et al., 2016). Furthermore, molecular studies in several Neotropical fish groups (e.g., Pereira et al., 2011; Costa-Silva et al., 2015; Melo et al., 2016) have shown that widely distributed nominal species isolated in independent hydrographic systems can sometimes represent species complex (i.e., species with continuous morphological variation, but discontinuous variation in genetic analysis) and the combined usage of DNA barcoding and morphological data has provided support to recognize and describe possible new species (Melo et al., 2011; Amaral et al., 2013; Silva et al., 2013).

The results of Schizolecis guntheri morphological analysis exhibits high external morphological variation across its

FIGURE 3 | Pictures showing the variation in external morphology among Schizolecis guntheri populations. Colored circles represent the paleodrainages proposed by Thomaz et al. (2015) shown in Figure 1.

distribution, including color pattern (**Figures 1**, **3**), but especially in morphometric characteristics, such as dorsal-fin spine length, pectoral-fin spine length, pelvic-fin spine length, lower caudal-fin spine length, caudal peduncle depth, anal width and mandibular ramus length, as we can observe in the PCA loading values (**Table 1**). However, this diversity is partly distributed within populations and among populations, and represents a continuous variation in external morphology, as we can observe in the PCA scatter plot (**Figure 1**). This continuous morphological variation is typically found in a species complex (Brown et al., 1995). Therefore, we do not find any exclusive character to support a possible new species for any of the OTUs distributed across the isolated coastal drainages of the present study. A similar result was previously found by Britski and Garavello (1984). These authors found differences in the morphometric characters of eyes orbits, body depth and head depth, but no exclusive character that could support a new species of the genus Schizolecis among the analyzed samples.

De Queiroz (2007) argued that the confusion among species delimitation is associated with the fact that different methods, including different properties (molecular or morphological), are focused on different stages of the speciation process. Our results suggested different numbers of OTUs of S. guntheri delimited based on the GMYC model and ABGD (Genotypic Cluster) that recognized five monophyletic groups and the morphological analysis (Phenetic Cluster) that only recognized S. guntheri with a continuous variation in the external morphology, as shown by the PCA (typical results of a species complex). With the passing of time, two independent lineages develop and increasingly acquire different properties relative to each other –, i.e., they become phenetically distinct, reach the reciprocal monophyly, became ecologically distinct or reproductively incompatible. Before the recovery of the first estate, everybody will recognize that there is only one species, and after the acquisition of several estates, everyone will recognize two species. Otherwise, in between, there will be confusion. De Queiroz (2007) called the area where two groups of species come into conflict and the boundaries among species are unclear as the "gray zone." On either side of the gray zone, there will be consistent agreement about the species number, but the "gray zone" has conflict. Therefore, the conflict among different numbers of recognized species between genetic methods (GMYC and ABGD) and the morphological analysis in S. guntheri could be associated with the fact that this species is in the gray

#### REFERENCES


zone suggested by De Queiroz (2007). Furthermore, a species complex could be interpreted as species that are in the gray zone (e.g., molecularly distinguished but morphologically undistinguished).

Therefore, considering the lack of phenotypic discontinuities and the presence of relatively high levels of genetic divergence among some local populations of Schizolecis guntheri, we hypothesize that this species may represent a species complex, like suggested for other freshwater fish species (Pereira et al., 2011; Bellafronte et al., 2013; Marques et al., 2013).

### AUTHOR CONTRIBUTIONS

CS, CO, GC-S, and FR designed the ideas of the research. CS collected data. CS, GC-S, and FR performed the analyses. CS, CO, GC-S, and FR contributed to the writing of the manuscript. FF provided physical structure of the laboratory to develop the work.

# FUNDING

This research was supported by CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico proc. 150415/2015–0 to GC-S), FAPESP (Fundo de Amparo à Pesquisa do Estado de São Paulo, proc. 2014/05051–5 and 2015/00691–9 to FR), and MCT/CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico) (Edital Universal, proc. N. 441347/2014–2 coord. FR).

#### ACKNOWLEDGMENTS

The authors are grateful to Priscila Camelier and Renato Devidé for their help during the collection expeditions. The authors also wish to thank Gabriel S. C. Silva for his help with the identification of the specimens.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00069/full#supplementary-material


Trans. R. Soc. Lond. B Biol. Sci. 360, 1935–1943. doi: 10.1098/rstb.2005. 1725



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Souza, Costa-Silva, Roxo, Foresti and Oliveira. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Sex Chromosome Evolution and Genomic Divergence in the Fish Hoplias malabaricus (Characiformes, Erythrinidae)

Alexandr Sember <sup>1</sup> , Luiz A. C. Bertollo<sup>2</sup> , Petr Ráb<sup>1</sup> , Cassia F. Yano<sup>2</sup> , Terumi Hatanaka<sup>2</sup> , Ezequiel A. de Oliveira2,3 and Marcelo de Bello Cioffi<sup>2</sup> \*

<sup>1</sup> Laboratory of Fish Genetics, Institute of Animal Physiology and Genetics, Czech Academy of Sciences, Libechov, Czechia, ˇ <sup>2</sup> Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Carlos, Brazil, <sup>3</sup> Secretaria de Estado de Educação de Mato Grosso (SEDUC-MT), Cuiabá, Brazil

The Erythrinidae family (Teleostei: Characiformes) is a small Neotropical fish group with a wide distribution throughout South America, where Hoplias malabaricus corresponds to the most widespread and cytogenetically studied taxon. This species possesses significant genetic variation, as well as huge karyotype diversity among populations, as reflected by its seven major karyotype forms (i.e., karyomorphs A-G) identified up to now. Although morphological differences in their bodies are not outstanding, H. malabaricus karyomorphs are easily identified by differences in 2n, morphology and size of chromosomes, as well as by distinct evolutionary steps of sex chromosomes development. Here, we performed comparative genomic hybridization (CGH) to analyse both the intra- and inter-genomic status in terms of repetitive DNA divergence among all but one (E) H. malabaricus karyomorphs. Our results indicated that they have close relationships, but with evolutionary divergences among their genomes, yielding a range of non-overlapping karyomorph-specific signals. Besides, male-specific regions were uncovered on the sex chromosomes, confirming their differential evolutionary trajectories. In conclusion, the hypothesis that H. malabaricus karyomorphs are result of speciation events was strengthened.

Keywords: fish cytogenetics, multiple sex chromosomes, sex-determining region, sex chromosome turnover, CGH, intraspecific variability, species complex, speciation

#### INTRODUCTION

The Erythrinidae family (Teleostei: Characiformes) is a small group of Neotropical fishes with a wide distribution throughout South America. This family currently consists of three well-recognized genera—Erythrinus (Scopoli, 1777), Hoplias (Gill, 1903), and Hoplerythrinus (Gill, 1895) with at least 15 until now recognized species (Oyakawa, 2003; Oyakawa and Mattox, 2009). Erythrinids live in diverse habitats, from small lakes and lagoons to large rivers (Oyakawa, 2003). However, unlike the large migratory Neotropical fishes, they are usually not able to overcome obstacles such as waterfalls and large rapids, due to their sedentary lifestyle (Oyakawa, 2003). This situation may have contributed to reduced gene flow between sub-populations in the same hydrographic basin. Consequently, great genome diversity has been documented within the

#### Edited by:

Roberto Ferreira Artoni, Ponta Grossa State University, Brazil

#### Reviewed by:

Julio Cesar Pieczarka, Universidade Federal do Pará, Brazil Daniel Pacheco Bruschi, Universidade Federal do Paraná, Brazil

> \*Correspondence: Marcelo de Bello Cioffi mbcioffi@ufscar.br

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 29 August 2017 Accepted: 16 February 2018 Published: 05 March 2018

#### Citation:

Sember A, Bertollo LAC, Ráb P, Yano CF, Hatanaka T, de Oliveira EA and Cioffi MdB (2018) Sex Chromosome Evolution and Genomic Divergence in the Fish Hoplias malabaricus (Characiformes, Erythrinidae). Front. Genet. 9:71. doi: 10.3389/fgene.2018.00071 Erythrinidae family, reflected in the noticeable diversity of karyotypes—particularly in the diploid number (2n), karyotype structure and sex chromosome systems (reviewed in Bertollo, 2007).

Hoplias malabaricus is the most widespread as well as cytogenetically investigated taxon, with analyzed populations from north to south of Brazil, Uruguay, Argentina, and Suriname. Despite its wide distribution, this taxon is characterized by low vagility, tending to constitute small populations, being able to survive under low oxygen conditions and to adapt to new environments (Rantin et al., 1992, 1993; Rios et al., 2002). Such characteristics are probably associated with the significant genetic variation and enormous karyotype diversity evidenced by the seven major karyotype forms (i.e., karyomorphs A-G Bertollo et al., 2000; Cioffi et al., 2012). However, despite rather morphological uniformity of body plan, H. malabaricus karyomorphs are easily distinguished by 2n, morphology and size of chromosomes, as well as by different evolutionary stages of distinct sex chromosome systems, indicating the occurrence of an unrecognized species diversity (Bertollo et al., 2000; Bertollo, 2007). Among the seven karyomorphs examined to date, only those of A and E do not show heteromorphic sex chromosomes. Indeed, a well-differentiated XY sex chromosome system occurs in the karyomorph B, while karyomorphs C and F possess such system in an early state of differentiation and finally, karyomorphs D and G harbor a X1X2Y and XY1Y<sup>2</sup> multiple sex system, respectively (reviewed in Freitas et al., 2017). These findings indicate independent origins of sex chromosome systems as evidenced by whole chromosome painting (WCP) data (Cioffi et al., 2013). Altogether, H. malabaricus provides a suited model for evolutionary, cytotaxonomic and biodiversity analyses (for review see Cioffi et al., 2012).

The development of recent molecular methologies has allowed a qualitative improvement on chromosome researches of different biological taxa. Among them, the genomic in situ hybridization (GISH) and comparative genomic hybridization (CGH), originally developed for clinical studies (Kallioniemi et al., 1992), are now successfully applied for several other purposes, such as the identification of parental genomes in hybrids/allopolyploids (Bi and Bogart, 2006; Knytl et al., 2013; Symonová et al., 2013a, 2015; Doležálková et al., 2016; Majtánová et al., 2016), the detection of sex-specific content on homomorphic sex chromosomes (Ezaz et al., 2006; Altmanová et al., 2016; Rovatsos et al., 2016; Freitas et al., 2017) and the genome comparisons among related species (Valente et al., 2009; Symonová et al., 2013b; Majka et al., 2016; Carvalho et al., 2017; Moraes et al., 2017). All these (and many other) studies proved that GISH and CGH technologies, despite representing rather "rough" molecular tools, may be successful in providing clues about the genome evolution, with their resolution being based on the differential distribution of already divergent genome-specific repetitive DNA classes (Kato et al., 2005; Chester et al., 2010), as these sequences are generally highly abundant in eukaryotic genomes and display faster evolutionary rates than the singlecopy ones (Charlesworth et al., 1994; Cioffi and Bertollo, 2012; López-Flores and Garrido-Ramos, 2012). Hence, in the present study, we performed analyses including CGH procedures to explore both inter- and intra-genomic divergences (within the range defined above) among H. malabaricus karyomorphs. Our results provided new insigths for better understanding of the ongoing processes of the karyomorph differentiations, as well as their sex chromosome systems.

### MATERIALS AND METHODS

#### Individuals and Mitotic Chromosome Preparations

Analyzed representatives of H. malabaricus karyomorphs are given in **Table 1**. The individuals were collected under appropriate authorization of the Brazilian environmental agency ICMBIO/SISBIO (License numbers 48628-2 and 10538-1) and deposited in the fish collection of the Cytogenetic Laboratory, Departamento de Genética e Evolução, Universidade Federal de São Carlos. Mitotic chromosomes were obtained by protocols described in Bertollo et al. (2015). The experiments followed ethical and anesthesia conducts, in accordance with the Ethics Committee on Animal Experimentation of the Universidade Federal de São Carlos (Process number CEUA 1853260315).

#### Preparation of Probes for Comparative Genomic Hybridization (CGH)

The total genomic DNAs (gDNAs) from male and female specimens of all karyomorphs listed in **Table 1** were extracted from liver tissue by the standard phenol-chloroformisoamylalkohol method (Sambrook and Russell, 2001). Two different experimental designs were used for this study, as



AM, Amazonas; SP, São Paulo; MT, Mato Grosso; MG, Minas Gerais States.

**Abbreviations:** 2n, iploid chromosome number; CGH, comparative genomic hybridization; DAPI, 4′ ,6-diamidino-2-phenylindole; dUTP, 2′ -Deoxyuridine-5′ - Triphosphate; FISH, fluorescence in situ hybridization; gDNA, total genomic DNA; m, metacentric chromosome; NFDM, non-fat dried milk; NOR, nucleolar organizer region, PCR, polymerase chain reaction; rDNA, ribosomal DNA; SD, sex determination; sm, submetacentric chromosome; TEs, transposable elements; WCP, whole chromosomal painting.

outlined in **Figure 1**. In the first set of experiments (interkaryomorph genomic comparisons), the gDNA of karyomorph B male specimens was chosen as a reference due to its welldifferentiated XY sex chromosomes and used for hybridization against metaphase chromosomes of the other karyomorphs (**Figure 1A**). For this purpose, male-derived gDNAs of each karyomorph A, C, D, F and G were labeled with digoxigenin-11-dUTP using DIG-nick-translation Mix (Roche, Mannheim, Germany), while male-derived gDNA of karyomorph B was labeled with biotin-16-dUTP using BIO-nick-translation Mix (Roche). For blocking the repetitive sequences in all experiments, we used unlabeled C0t-1 DNA (i.e., fraction of genomic DNA enriched for highly and moderately repetitive sequences), prepared according to Zwick et al. (1997). Hence, the final probe cocktail for each slide was composed of 500 ng of malederived gDNA of karyomorph B, 500 ng of male-derived DNA corresponding to one of the comparative karyomorphs, 15 µg of female-derived C0t-1 DNA of karyomorph B and 15 µg of female-derived C0t-1 DNA from the respective comparative karyomorph. The probe was ethanol-precipitated and the dry pellets were resuspended in hybridization buffer containing 50% formamide, 2 × SSC, 10% SDS, 10% dextran sulfate and Denhardt's buffer, pH 7.0. In the second set of experiments (**Figure 1B**) we focused on intra-karyomorph comparisons, with special emphasis on molecular composition of putative, nascent or well-differentiated sex chromosomes. In this case, male-derived gDNAs of all karyomorphs were labeled with biotin-16-dUTP and female gDNAs with digoxigenin-11-dUTP by means of nick translation as described above. The final hybridization mixture for each slide (20 µl) was composed of male- and female-derived gDNAs (500 ng each), 25 µg of female-derived C0t-1 DNA from the respective karyomorph and the hybridization buffer described above. The chosen ratio of probe vs. C0t-1 DNA amount was based on the experiments performed in our previous studies in fishes including erythrinids (Symonová et al., 2013a,b, 2015; Carvalho et al., 2017; Freitas et al., 2017; Moraes et al., 2017; Yano et al., 2017; Oliveira et al., 2018) and corroborated the ratio used in other related fish studies (e.g., Valente et al., 2009). According to our experiences, this ratio reflects high stringency towards repetitive DNA blocking and yet avoids the probability of improper probe dissolution in the hybridization buffer, which would otherwise cause artifacts.

#### Fish Used for CGH

The CGH experiments followed the methodology described in Symonová et al. (2015), with modifications. Briefly, prior to hybridization, slides were aged at 37◦C for 2h, followed by an RNAse A (90 min, 37◦C) and then pepsin (50µg/ml in 10 mM HCl, 3 min, 37◦C) treatments. Chromosomes were subsequently denatured in 75% formamide (pH 7.0) in 2 × SSC (74◦C, 3 min), and then immediately cooled and dehydrated through 70% (cold), 85%, and 100% (RT) ethanol series. The hybridization mixture was denatured at 86◦C for 6 min, cooled at 4◦C (10 min) and then applied on the slides. The hybridization was performed at 37◦C for 72 h. Post-hybridization washes were carried out once in 50% formamide in 2 × SSC (pH 7.0) (44◦C, 10 min each) and three times in 1× SSC (44◦C, 7 min each). Prior to probe detection, the slides were incubated with 3% nonfat dried milk (NFDM) in order to avoid the non-specific binding of antibodies. The hybridization signal was detected using Anti-Digoxigenin-Rhodamin (Roche) and Avidin-FITC (Sigma). Chromosomes were counterstained and mounted in antifade containing 1.5µg/ml DAPI (Vector, Burlingame, CA, USA).

# Microscopic Analyses and Image Processing

At least 30 metaphase spreads per individual were analyzed to confirm the 2n, karyotype structure and CGH results. Images were captured using an Olympus BX50 microscope (Olympus Corporation, Ishikawa, Japan), with CoolSNAP and the images were processed using Image Pro Plus 4.1 software (Media Cybernetics, Silver Spring, MD, USA). Chromosome morphology was classified according to Levan et al. (1964)**.**

# RESULTS

#### Inter-Karyomorph Genomic Relationships

In each experiment, both genome-derived probes showed rather equal binding to all chromosomes, with preferential localization in the centromeric and pericentromeric regions of most chromosomes and in terminal parts of some of them (yellow signals, i.e., combination of green and red), indicating the shared repetitive DNA content in such regions. The hybridization pattern in karyomorph A displayed stronger binding of the Aderived probe to the centomeric or telomeric regions of several chromosomes, while the B-derived probe that co-hybridized to these regions, produced signals of less intensity. Moreover, several exclusive A-specific markings appeared mostly in distal chromosomal regions (**Figures 2A–D**). Similar situation was observed also in karyomorph C (**Figures 2E–H**), where the majority of the accumulated blocks was shared by both probes, including those in the pericentromeric regions of the XY chromosomes, but several signals and especially those located in the terminal region of the long arms of the largest m pair were found to be accumulated with C-specific probe only. In karyomorph D, stronger binding of the D-derived probe was highlighted in many centromeres, in addition to some telomeric segments (**Figures 2I–L**). Remarkably, both genomic probes equally stained the heterochromatic block displayed by the neo-Y chromosome. In the karyomorph F, besides the shared binding pattern to majority of heterochromatic blocks, the F-derived probe yielded specific signals on the chromosomal pair bearing NOR-like regions (**Figures 2M–P**). In addition, the male-specific region on the nascent Y chromosome of this karyomorph displayed some affinity to B-derived male probe, despite being preferentially labeled with the F-male specific one (**Figures 2N,O**). In comparison to that, the male-specific region of karyomorph G, covering the entire short arms of the Y<sup>1</sup> chromosome, was stained almost exclusively by the Gderived probe, while the B-derived probe produced only faint and dispersed signals in this region (**Figures 2Q–T**). In a similar way, the G-specific probe showed predominant binding to terminally located repetitive blocks on a few other chromosomes.

### Intra-Karyomorph Genomic Relationships: Detecting Male-Specific Regions

Experiments performed on female chromosome spreads of karyomorphs B, C, D, F, and G showed the absence of identifiable sex-specific segments.

Regarding karyomorph A, no exclusive male-specific regions were identified on male chromosome complement (**Figures 3A–D**). In male chromosome spreads of karyomorph B, CGH enabled to recognize male-specific region located terminally on the long arms of the Y chromosome (**Figures 3E–H**). In karyomorph C, unlike the biased accumulation of repetitive DNAs in the X pericentromeric region (Cioffi and Bertollo, 2010), a slight binding preference for the male-derived probe to the pericentromeric region of Y chromosome was evidenced (**Figures 3I–L**). Femalederived probe produced only a faint hybridization signal in such region, while both probes matched equally the large heterochromatic segment located in the pericentromeric part of the X chromosome. Accordingly, CGH procedure failed to detect any sex-specific region on male chromosomes of karyomorph D (**Figures 3M–P**). In karyomorph F, a prominent interstitial band on the metacentric Y chromosome was also enriched with malespecific sequences, although a concurrent faint hybridization signal produced by the female-derived probe was also apparent (**Figures 3Q–T**). CGH on male preparations from karyomorph G unmasked a clear male-specific region covering the short arms of Y<sup>1</sup> chromosome (**Figures 3U–X**). The summary of observed intra-karyomorph CGH patterns is given in **Figure 4**.

# DISCUSSION

# Genomic Diversity Among Karyomorphs

Sex chromosome systems in H. malabaricus display only male heterogamety and therefore inter-karyomorph genomic comparisons between males were supposed to be informative in indicating their interrelationships. Hence, the male-derived gDNA of reference karyomorph B was probed on chromosomes of karyomorphs A, C, D, F, and G. Even though the Bderived probe showed lower affinity to chromosomes of other karyomorphs, the overall pattern of these experiments was relatively similar. In all experiments, both genome-derived probes showed preferred accumulation to chromosome regions

previously identified as C-bands and C0t-1 DNA hybridization sites (Born and Bertollo, 2000; Cioffi and Bertollo, 2010; Cioffi et al., 2010, 2011a), documenting their repetitive DNA content. However, despite less intense, hybridization signals were also apparent along the rest of chromosomal material. Our findings are in line with the general patterns observed in previous GISH/CGH-based reports (e.g., Traut and Winking, 2001; Valente et al., 2009; Koubová et al., 2014; Altmanová et al., 2016) in the sense of biased hybridization in heterochromatic regions and point to the fact that even high amount of C0t-1 DNA is insufficient to outcompete highly repetitive (heterochromatic) regions. Given that the resolution of the CGH procedure predominantly relies on the presence of species-specific (or sexspecific) repetitive sequences, together with the evolutionary distance of the compared genomes (Kato et al., 2005; Chester et al., 2010), our overall results indicate that karyomorphs of H. malabaricus are closely related, but with divergences among their genomes, yielding a range of non-overlapping karyomorph-specific signals. Remarkably, the B-derived probe displayed the lowest degree of hybridization correspondence with karyomorphs D and F, suggesting the ongoing processes of sequence divergence. These findings are indicative of ongoing evolutionary processes driving the divergence and possibly also speciation within H. malabaricus populations, facilitated by the

hybridized together for each karyomorph. First column (A,E,I,M,Q,U): DAPI images (blue); Second column (B,F,J,N,R,V): hybridization pattern of the female-derived probe (red) of each analyzed karyomorph; Third column (C,G,K,O,S,W): hybridization pattern of the male-derived probe (green) of the respective karyomorph. Fourth column (D,H,L,P,T,X): merged images of both genomic probes and DAPI staining. The common genomic regions for male and female are depicted in yellow. Bar = 10µm.

sedentary lifestyle of these fishes (as discussed in detail further in the text).

An array of molecular and cytogenetic methods, including DNA barcoding and phylogeographic approaches, have already led to the hypothesis that H. malabaricus likely represents a "species complex," with several undescribed species (Bertollo et al., 2000; Santos et al., 2009; Cioffi et al., 2012; Marques et al., 2013). Furthermore, different karyomorphs with a sympatric or syntopic occurrence were found to lack any hybrid forms, as proven by cytogenetic and RAPD (random amplified

polymorphic DNA) analyses (Dergam et al., 1998, 2002; Bertollo et al., 2000), yet a sporadic case of hybridization followed by an elevation of the ploidy level was already reported (Utsunomia et al., 2014).

When compared to other members of the Erythrinidae family, similar degree of cytotaxonomic diversity can be found in Erythrinus erythrinus and Hoplerythrinus unitaeniatus groups (Bertollo, 2007; Cioffi et al., 2012; Rosa et al., 2012; Martinez et al., 2015, 2016), while—in stark contrast—the Hoplias lacerdae species complex exhibits highly conserved karyotype structure (Blanco et al., 2011; de Oliveira et al., 2015). It is likely that the chromosomal diversity inside some Erythrinidae species might be associated with their species-specific lifestyles. In this sense, because H. malabaricus, E. erythrinus, and H. unitaeniatus appear to constitute small and restricted populations, with low vagility (Blanco et al., 2011), they experience a higher rate of stochastic fixation of chromosome rearrangements and, consequently, an elevated evolutionary genome dynamism that might contribute to speciation and/or local adaptation processes (see King, 1993; Faria and Navarro, 2010 for an exhaustive discussion). It is therefore remarkable that the three H. malabaricus karyomorphs with more restricted geographic distribution, i.e., the B, D and G ones, possess morphogically recognizable sex chromosomes (Bertollo et al., 2000; Cioffi et al., 2012).

#### Intra-Karyomorph Genomic Diversity and Male-Specific Sequences

Based on data available from the last comprehensive fish karyotype overview (Arai, 2011), so far only 5% of karyologically analyzed actinopterygian fish species possess heteromorphic sex chromosomes. However, this is very likely an underestimation, with many other well-differentiated or even nascent sex chromosome systems still awaiting their discovery, especially when taking into account that homomorphic (i.e., cytologically indistinguishable or hardly detectable) sex chromosomes are thought to be frequent in fishes (Mank and Avise, 2009; Schartl et al., 2016). Within the Erythrinidae family, three different simple or multiple sex chromosome systems in advanced and/or nascent evolutionary stages have been reported among both E. erythrinus and H. malabaricus karyomorphs (reviewed in Cioffi et al., 2012), where males possess always the heterogametic sex.

Within H. malabaricus group, karyomorph A is characterized by 2n = 42 for both males and females, without an apparent sex chromosome system, while karyomorph B, though also with 2n = 42 for both sexes, exhibits a well-differentiated ♀XX/♂XY sex chromosome system. In this case, the subtelocentric large X chromosome is clearly distinguished from the small-sized metacentric Y, in addition to presence of a conspicuous heterochromatic block distally located on its long arms (Born and Bertollo, 2000; Cioffi et al., 2010). Previous repetitive DNA mapping and WCP data indicate that such sex chromosome system is likely derived from a proto-sex chromosome (the 21st pair of karyomorph A) due enrichment in several types of DNA repeats confined to only one of the homologs, namely the X chromosome in karyomorph B (Cioffi et al., 2010, 2011a,c, 2013). However, CGH procedures did not reveal any sex-specific region in karyomorph A, probably due to low level of sex-specific repetitive DNA divergence or due the small size of the sexdetermining region, remaining below the detection limit of the CGH method (that ranges approximately between 2 and 3 Mbp; Schoumans et al., 2004). Theoretically, alternate mechanisms of sex determining region creation such as, e.g., epimutationcoupled recombination suppression (as eloquently discussed in Ezaz and Deakin, 2014) cannot be entirely ruled out and these, again, may have gone undetected through CGH. Finally, the possibility that the sex-determining region in this karyomorph is completely absent seems equally likely. However, the frequent changes of master sex determining genes repeatedly observed in fishes, especially in XX/XY systems (Heule et al., 2014; Martínez et al., 2014), favor the former view as the new non-recombining region is needed to be established again. In support of this view, in several another fish species, the sex determining regions might be very tiny (reviewed in Schartl et al., 2016), with the extreme case of fugu genome, Takifugu rubripes, where the Y-specific sexdetermining gene differs from the homologous region on the X chromosome by a single non-synonymous substitution (Kamiya et al., 2012). Moreover, even in the platyfish, Xiphophorus maculatus, with genetically defined sex chromosomes, no visible differences between X and Y were evidenced after CGH (Traut and Winking, 2001), similarly to what had been occasionally observed in some other animals (Koubová et al., 2014; Altmanová et al., 2016; Green et al., 2016; Gazoni et al., 2018). In yet another case, however, CGH proved to be resolute even in sex chromosomes of a very young age (Montiel et al., 2017). Finally, we cannot entirely exclude the possibility that bright fluorescence from major rDNA loci located on several chromosomes of the karyomorph A complement, most probably including the pair no. 21, could disable the detection of the hypothetical sexdetermining region in its vicinity. Maybe a finer-scale approach such as BAC FISH mapping of specific candidate genes identified based on utilization of recent genome sequencing approaches and corresponding bioinformatic tools, togehter with other related "state-of-the-art" technologies may shed more light on this issue (for examples in fishes, see: Reichwald et al., 2015; Portela-Bens et al., 2017; Sutherland et al., 2017; Liu et al., 2018).

A male-specific region confined to a distal part of the long arms of Y chromosome was identified in the karyomorph B corresponding to the location of a constitutive heterochromatin block previously described (Born and Bertollo, 2000). The position of this region is noteworthy since a large NOR is found in the corresponding homologous region on the X chromosome, leading to the considerable size difference between both sex chromosomes (Born and Bertollo, 2000). Therefore, we are dealing here with an unusual situation, resembling the findings in weakly electric fish Eigenmannia virescens (de Almeida-Toledo and Foresti, 2001; Henning et al., 2008) or in the snake eel Ophisurus serpens (Salvadori et al., 2018), where the accumulation of rDNA and other repetitive DNAs occurs also on the X instead of Y chromosome. In our specific case, the sex-specific region is present on the corresponding C-positive but NOR-negative (Born and Bertollo, 2000; Cioffi et al., 2010) region on the Y chromosome. It is likely that the differential accumulation of repetitive DNA sequences might have decreased the recombination rate between the sex pair due to their delayed pairing during meiosis (Griffin et al., 2002). Alternatively, the co-amplification of the NOR region with other repetitive DNA sequences on the X chromosome can be viewed as a consequence of the whole differentiation process of the sex pair, helping to buffer the absence of functional rDNA copies on the Y chromosome. In fact, it is noteworthy that the NOR on the X chromosome is always genetically active (Born and Bertollo, 2000). The growing number of reports pointing to sex chromosome-specific NORs (see Kawai et al., 2007; Badenhorst et al., 2013; Yano et al., 2017 for references) possibly indicates that such regions might have played a more relevant role in nascent sex chromosome evolution than currently known.

In karyomorph C (2n = 40, for both sexes), the nascent and morphologically undifferentiated XY sex chromosomes were formerly evidenced by a small accumulation of repetitive DNAs occurring exclusively on the X chromosome (Cioffi and Bertollo, 2010). Here, it is likely that these newly emerging sex-related elements have not had the necessary evolutionary time to evolve and hence accumulated low proportion of tentatively Y-specific sequences Despite that, we cannot rule out that some differentiation in the hybridization pattern of both genomic probes in the pericentromeric region of the nascent Y chromosome is caused by a copy number variation of interspersed repetitive sequences between the sex chromosomes. Finally, it is worth mentioning that B-derived gDNA probe displayed strong binding to this region (on both X and Y) in the inter-karyomorph experiment (see **Figures 2F–H**), suggesting certain degree of shared sequences, but the extent of overlap with C-derived probe was not absolute.

Karyomorph D (2n = 40 in females/39 in males) is characterized by a X1X2Y multiple sex chromosomes system, where the neo-Y originated via a tandem fusion between the nascent Y chromosome and one autosomal homolog corresponding to the pair No. 20 of karyomorph C (Bertollo et al., 1997; Cioffi and Bertollo, 2010). Indeed, such origin was also confirmed by additional data from inter-karyomorph chromosome painting and mapping of several repetitive DNA classes (Cioffi et al., 2009, 2011b,c). Previous studies on male meiosis showed stabilized pachytene sex trivalents, as well as asynapsis in the region of presumed sequence divergences (Bertollo and Mestriner, 1998), thus pointing to a putative sex-specific region. In favor of this scenario, Rosa et al. (2009) reported noticeable alterations in location of constitutive heterochromatin and 18S rDNA sites on the neo-Y chromosome, indicating that pericentric inversions probably have also taken place in the early process of the sex-specific chromosome differentiation. However, although in karyomorph C, a slight binding preference for the male-derived probe to the pericentromeric region of Y chromosome was observed, our CGH data did not reveal any conspicuous Y-specific region in the neo-sex chromosome system of karyomorph D. In this sense, while in karyomorph D the recombination arrest and the establishment of the stable multiple sex chromosomes was most likely achieved by chromosomal rearrangements, in karyomorph C the accumulation of repetitive DNA sequences seems to have a central role in triggering the differentiation of the nascent XY sex system (Bertollo et al., 1997; Cioffi and Bertollo, 2010).

Karyomorphs E, F, and G were supposed to be closely related (Bertollo et al., 2000). Although the karyomorph E (2n = 42) was not sampled in this study, our results confirmed previous findings in karyomorphs F (Freitas et al., 2017) and G (Oliveira et al., 2018). More specifically, karyomorph F (2n = 40, for both sexes) was found to exhibit a nascent XY sex chromosome system, where the male-specific content was highlighted as a prominent interstitial heterochromatic block on the large metacentric Y chromosome, coincident with several microsatellite motifs and retrotransposons (RTEs) (Freitas et al., 2017, present study). Importantly, as the faint hybridization signal produced by the female-specific probe was allocated also within this region, we witness here a bit similar situation to that found in karyomorph C. If we admit that slightly preferred accumulation of male-exclusive sequences in pericentromeric region of the Y chromosome in males of karyomorph C might be related to the early stage of sex-determining region formation, the observed pattern in karyomorph F may reflect a later phase of similar process. At this stage, the accumulation of repetitive DNA in the Y-specific region in karyomorph F probably involves also the portion of sequences that are common for both sexes.

In contrast to karyomorph F, the sex chromosome system found in karyomorph G (2n = 40 in females/41 in males) is characterized by presence of XY1Y<sup>2</sup> chromosomes, where the unusual acrocentric Y<sup>1</sup> element carries the male-specific region, enriched with several different types of repetitive DNAs including 5S rDNA (Oliveira et al., 2018), hence strengthening the view discussed above. As initially proposed by Bertollo et al. (2000) and confirmed by the recent findings (Oliveira et al., 2018), the emergence of sex chromosomes in karyomorph G proceeded through a tandem fusion involving chromosomes from two different pairs that might be tentatively assigned to specific pairs in karyomorph E—a hypothetical ancestral karyotype to both F and G karyomorphs. Importantly, while the tandem fusion was fixed in heterozygous condition in karyomorph G as only one homolog from each pair underwent this rearrangement (hence resulting in unpaired large-sized metacentric X chromosome complemented with the remaining unfused Y<sup>1</sup> and Y<sup>2</sup> chromosomes in males), in karyomorph F both homologs from the mentioned pairs gave rise to two large-sized metacentric chromosomes, the X and Y ones (Freitas et al., 2017; Oliveira et al., 2018, this study). Noteworthy, the XY1Y<sup>2</sup> system of the karyomorph G differs from the X1X2Y neo-sex chromosomes of karyomorph D in the way that a sexdetermining region is clearly detectable by CGH only in the former case, indicating a different evolutionary stage between such sex systems. All these findings in karyomorphs F and G, i.e., (i) shared homology between their sex chromosomes, pointing to a common origin through tandem fusion and (ii) lack of homology between multiple sex chromosomes found in karyomorphs D and G, are supported also by recent Zoo-FISH experiments (Oliveira et al., 2018).

In summary, our findings support trends in teleost fishes concerning the independent and repeated evolution of sex chromosomes regardless their phylogenetic relationships (Devlin and Nagahama, 2002; Woram et al., 2003; Schartl, 2004; Mank et al., 2006; Mank and Avise, 2009; Cioffi et al., 2013). It has been shown that the sex chromosome systems and/or the stage of their differentiation may differ evidently not only among closely-related species, but also among different populations of the same species (Takehana et al., 2007; Ross et al., 2009; Zhou et al., 2010; Cioffi et al., 2012; Cnaani, 2013). Such an exceptional sex chromosome variability could be possibly associated with the high plasticity and dynamics of teleost genomes (Ravi and Venkatesh, 2008), a feature usually assigned to a specific whole-genome duplication (TSGD) that occurred at the base of teleostean radiation (Hurley et al., 2007). As a consequence, duplicated redundant copies of different genes might have evolved into master sex-determining genes (Matsuda et al., 2002; Nanda et al., 2002), thus leading to emergence of distinct sex chromosomes in different evolutionary lineages (Schartl, 2004; Mank and Avise, 2009). The outstanding pace of sex

#### REFERENCES

de Almeida-Toledo, L. F., and Foresti, F. (2001). Morphologically differentiated sex chromosomes in neotropical freshwater fish. Genetica 111, 91–100. doi: 10.1023/A:1013768104422

chromosome turnover is, however, commonly observable also in other cold-blooded vertebrates such as amphibians and reptiles, hence a number of alternative hypotheses about the evolutionary forces standing behind this phenomenon have already been proposed (for recent reviews and in depth discussion, see Mank and Avise, 2009; Kitano and Peichel, 2012; Kikuchi and Hamaguchi, 2013; Bachtrog et al., 2014; Brykov, 2014; Pokorná et al., 2014; Pennell et al., 2015; Schartl et al., 2016). In a broader context, handful of studies have provided direct evidence that the emergence of sex chromosomes, or even the sex chromosome turnover itself, might play a major role in reproductive isolation promoting evolutionary divergences and eventually speciation (e.g., Kitano et al., 2009; Nguyen et al., 2013), which is evidently the case for H. malabaricus.

# CONCLUSION

Our data provided additional layer of evidence about the status of the taxon H. malabaricus and corroborated previous studies in the conclusion that it includes taxonomically distinct species. The CGH procedures proved to be very useful in detecting the hidden biodiversity in this fish group, as they have opened novel views and widen our understanding of the ongoing processes of inter-karyomorph genome differentiation, as well as the amazing variety of sex chromosome systems inside this fish group. Besides, our approach not only uncovered the male-specific regions on the sex chromosomes, but also confirmed different trajectories of the sex chromosome evolution. Future studies using high throughput sequencing will be applied in microdissected sex chromosomes for furthering our understanding of sex determination in this species complex and its possible link with the speciation process.

#### AUTHOR CONTRIBUTIONS

AS: Designed the study, performed the experiments, and drafted the manuscript. CY, TH, and EdO: Performed the experiments and drafted the manuscript; PR and LB: Drafted and revised the manuscript; MC: Designed the study drafted and revised the manuscript. All authors read and approved the final version of the manuscript.

#### ACKNOWLEDGMENTS

This study was supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico—CNPq (Proc. nos 304992/2015-1, 401575/2016-0 and 152105/2016-6), Fundação de Amparo à Pesquisa do Estado de São Paulo- FAPESP (Proc. No 2016/21411-7) and the project EXCELLENCE CZ.02.1.01/0.0/0.0/15\_003/0000460 OP RDE and RVO: 67985904.

Arai, R. (2011). Fish Karyotypes: A Check List, 1st Edn. Tokyo: Springer.

Altmanová, M., Rovatsos, M., Kratochvíl, L., and Johnson Pokorná, M. (2016). Minute Y chromosomes and karyotype evolution in Madagascan iguanas (Squamata: Iguania: Opluridae). Biol. J. Linn. Soc. 118, 618–633. doi: 10.1111/bij.12751


(Characiformes, Erythrinidae). Unusual accumulation of repetitive sequences on the X chromosome. Sex. Dev. 4, 176–185. doi: 10.1159/000309726


brazilian hydrographic basins. Cytogenet. Genome Res. 149, 191–200. doi: 10.1159/000448153


mediating extensive ribosomal DNA multiplications. BMC Evol. Biol. 13:42. doi: 10.1186/1471-2148-13-42


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Sember, Bertollo, Ráb, Yano, Hatanaka, de Oliveira and Cioffi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# High-Throughput Sequencing Strategy for Microsatellite Genotyping Using Neotropical Fish as a Model

Juliana S. M. Pimentel<sup>1</sup> , Anderson O. Carmo<sup>1</sup> , Izinara C. Rosse<sup>1</sup> , Ana P. V. Martins<sup>1</sup> , Sandra Ludwig<sup>2</sup> , Susanne Facchin<sup>1</sup> , Adriana H. Pereira<sup>1</sup> , Pedro F. P. Brandão-Dias<sup>1</sup> , Nazaré L. Abreu<sup>1</sup> and Evanguedes Kalapothakis<sup>1</sup> \*

<sup>1</sup> Laboratory of Biotechnology and Molecular Markers, Department of General Biology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Brazil, <sup>2</sup> Department of Zoology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Brazil

#### Edited by:

Roberto Ferreira Artoni, Ponta Grossa State University, Brazil

#### Reviewed by:

Alexandre Wagner Silva Hilsdorf, University of Mogi das Cruzes, Brazil Masafumi Nozawa, Tokyo Metropolitan University, Japan

> \*Correspondence: Evanguedes Kalapothakis kalapothakis@gmail.com

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 14 September 2017 Accepted: 19 February 2018 Published: 09 March 2018

#### Citation:

Pimentel JSM, Carmo AO, Rosse IC, Martins APV, Ludwig S, Facchin S, Pereira AH, Brandão-Dias PFP, Abreu NL and Kalapothakis E (2018) High-Throughput Sequencing Strategy for Microsatellite Genotyping Using Neotropical Fish as a Model. Front. Genet. 9:73. doi: 10.3389/fgene.2018.00073 Genetic diversity and population studies are essential for conservation and wildlife management programs. However, monitoring requires the analysis of multiple loci from many samples. These processes can be laborious and expensive. The choice of microsatellites and PCR calibration for genotyping are particularly daunting. Here we optimized a low-cost genotyping method using multiple microsatellite loci for simultaneous genotyping of up to 384 samples using next-generation sequencing (NGS). We designed primers with adapters to the combinatorial barcoding amplicon library and sequenced samples by MiSeq. Next, we adapted a bioinformatics pipeline for genotyping microsatellites based on read-length and sequence content. Using primer pairs for eight microsatellite loci from the fish Prochilodus costatus, we amplified, sequenced, and analyzed the DNA of 96, 288, or 384 individuals for allele detection. The most cost-effective methodology was a pseudo-multiplex reaction using a lowthroughput kit of 1 M reads (Nano) for 384 DNA samples. We observed an average of 325 reads per individual per locus when genotyping eight loci. Assuming a minimum requirement of 10 reads per loci, two to four times more loci could be tested in each run, depending on the quality of the PCR reaction of each locus. In conclusion, we present a novel method for microsatellite genotyping using Illumina combinatorial barcoding that dispenses exhaustive PCR calibrations, since non-specific amplicons can be eliminated by bioinformatics analyses. This methodology rapidly provides genotyping data and is therefore a promising development for large-scale conservation-genetics studies.

Keywords: microsatellite, fish, genotyping, next-generation sequencing, conservation genetics

# INTRODUCTION

Innovative technological applications in the field of conservation genetics can contribute to wildlife monitoring and management programs. Technological advances in genomics have greatly expanded the use of genetic markers in biological research, resulting in more extensive and efficient generation and analysis of population genetics data (Putman and Carbone, 2014).

Microsatellites are DNA units composed of repeating motifs in tandem, also known as simple sequence repeats (SSRs) or short tandem repeats. Due to their high degree of polymorphism, microsatellites are used as molecular markers in genetic structure, kinship identification, genetic mapping, and others population genetics studies (Buschiazzo and Gemmell, 2006; Chistiakov et al., 2006; Yazbeck and Kalapothakis, 2007; Bhargava and Fuentes, 2010). The high statistical power per locus obtained through microsatellites genotyping makes this a powerful tool in population studies (Guichoux et al., 2011; Putman and Carbone, 2014). Moreover, microsatellites are preferred in forensic and kinship analyses due to their high mutation rates and multiallelic nature (Clayton et al., 1998), and are the markers most frequently used in human paternity tests (Guichoux et al., 2011).

The early difficulties in isolating sequences from microsatellites were circumvented with the use of nextgeneration sequencing (NGS). Indeed, thousands of microsatellite loci can now be identified from a single NGS run (Tang et al., 2008; Boomer and Stow, 2010; Castoe et al., 2010; Rosazlina et al., 2015). However, the current techniques present difficulties regarding PCR calibration and the choice of informative microsatellites with high specificity. Thus, a balanced reaction that maintains great sensitivity and specificity to target DNA remains a challenge. In addition, studies involving microsatellites frequently use electrophoresis, which makes PCR calibration and fragment size identification laborious, and compromises the analysis of homoplasy cases (Delmotte et al., 2001; Pasqualotto et al., 2007).

NGS has enabled high-throughput genotyping, with a range of protocols for partial sequencing of genomes such as the use of restriction enzymes, e.g., RAD (Baird et al., 2008), ddRAD (Peterson et al., 2012), and 2bRAD (Wang et al., 2012). This is collectively known as 'genotyping-by-sequencing' (GBS) (Narum et al., 2013). Sequencing of a target region (PCR amplicon) can also be used as a GBS method (Vartia et al., 2015).GBS enables analyses of multiple target sequences in several different samples simultaneously, thereby saving time and resources. Moreover, target region sequencing can be applied to DNA samples with a certain degree of degradation such as forensic and cancer biopsy samples (Kerick et al., 2011; Van Neste et al., 2012; König et al., 2015). New approaches with microsatellite genotyping using individual combinatorial barcoding have been used in conservation genetics (Scheible et al., 2011; Vartia et al., 2015).

Here, we optimized a high-throughput genotyping methodology for genetic studies in conservation programs. We genotyped microsatellite loci employing NGS technology using Prochilodus costatus, a migratory fish species found in Brazil, as a model. Microsatellite markers with a maximum size of 200 base pairs (bp) and tri- or tetra-nucleotide motifs were used to facilitate data analysis (de Valk et al., 2005). We tested distinct NGS reagent kits and varying numbers of individuals per analysis. We believe that the use of high-throughput technology in conservation studies will advance the field, allowing large amounts of data to be generated quickly and efficiently.

# MATERIALS AND METHODS

# Ethics Statement

We collected P. costatus samples in the region under the influence of the Três Marias dam (located in the state of Minas Gerais, Southeastern Brazil). To collect the samples needed for the study we obtained a Permanent Field Permit for Collecting from the Instituto Chico Mendes de Conservação da Biodiversidade (protocol number 57204-1) and also from the Instituto Estadual de Florestas (protocol number 014.007 2017).

### DNA Extraction

Using a sterile metallic mold, we removed 12.5-mm<sup>2</sup> fragments of fish fin from samples previously collected and stored in 70% ethanol (v/v). Fragments were washed with ultrapure water and individually placed in 96-well plates. We added 50 µL of NaOH (50 mM) to the wells, sealed the plates, vortexed for 10 s, incubated at 95◦C for 10 min, and vortexed again for 10 s. Then, 7.5 µL of Tris-HCl (0.5 M, pH 8.0) were added to each well and the plates vortexed for 15 s. Supernatants were transferred to new 96-well plates. We purified DNA from the supernatants using Agencourt AMPure XP magnetic beads (Beckman Coulter, Brea, CA, United States), according to the manufacturer's protocol. DNA was quantified using Qubit 2.0 (Invitrogen, Carlsbad, CA, United States) and its purity was evaluated using NanoDrop 2000 (ThermoScientific, Waltham, MA, United States).

# Analysis and Identification of Microsatellites

Contigs previously generated by shotgun genome sequencing of P. costatus and deposited in the database of the Laboratory of Biotechnology and Molecular Markers were used for in silico analysis. The software msatcommander (Faircloth, 2008) was used to identify microsatellite regions. Over 100,000 contigs containing microsatellite regions were identified, and about 1,000 primer pairs suggested. We screened the primer pairs for the generated fragment size (amplicons of 130–200 bp), and selected microsatellites containing at least seven repeats of trior tetra-nucleotide motifs. de Valk et al. (2005) reported that microsatellites with tri- and tetra-nucleotide motifs can be easily discerned, avoiding issues such as stutter patter.

We selected 50 primer pairs that matched the requirements and tested them in P. costatus samples. PCR optimization was carried out by testing buffers with distinct concentrations of Mg2<sup>+</sup> and KCl (Phoneutria Biotecnologia e Serviços LTDA, Brazil). PCR was performed using different annealing temperatures (50 – 65◦C), dimethyl sulfoxide concentrations (0 – 7%), and numbers of amplification cycles (20 – 35). Amplicons were analyzed in 8% (w/v) polyacrylamide gel stained with silver nitrate. Twenty-four sets of primers were selected based on the presence of a single band (homozygous) or two adjacent bands (heterozygous). For genotyping, we used Illumina's metagenomics protocol for amplicon resequencing. An 'overhang adapter' complementary to the 'index' in Illumina's Nextera kits was added to the primers. The 'overhang adapter' sequences were as follows: Forward: 5<sup>0</sup> -TCGTCGGCAGC

GTCAGATGTGTATAAGAGACAG + locus-specific forward primer sequence-3<sup>0</sup> , and Reverse: 5<sup>0</sup> -GTCTCGTGGGCTCG GAGATG TGTATAAGAGACAG + locus-specific reverse primer sequence-3<sup>0</sup> . The primers with the overhang adapters were incorporated into the target DNA through 30 PCR cycles. A specific primer (index) containing the MiSeq adapter (which individualizes samples in the NGS procedures) was attached to amplicons through 10 PCR amplification cycles (**Figure 1A**). A preliminary genotyping test (NGS sequencing) was performed using P. costatus DNA isolated from five individuals. From the 24 sets of primers, eight were selected for further analyses (**Table 1**). These were selected based on performance and PCR robustness (i.e., satisfactory amplification in non-ideal conditions, such as little or degraded DNA, the presence of inhibitors, etc.) and the presence of homozygous or heterozygous alleles with low amounts of unspecific DNA. GenBank accession number: Proc10 MG456705; Proc18 MG456707; Proc22 MG456708; Proc36 MG456709; Proc37 MG456710; Proc44 MG456712; Proc48 MG456715; Proc49 MG456716.

We tested several NGS genotyping strategies: (a) multiplex, in which all primer pairs were used to amplify a single DNA sample in a single reaction (30 cycles) followed by the incorporation of the index (10 cycles) and NGS sequencing; (b) pseudo-multiplex reaction, in which each primer pair was individually used to amplify a DNA target (30 cycles), followed by pooling the amplicons of one individual and a second PCR (10 cycles) for index incorporation and MiSeq sequencing; (c) monoplex, in which DNA target amplification (30 cycles) and index incorporation (10 cycles) were performed individually for each DNA target and followed by MiSeq sequencing. **Table 2** indicates the strategies, the types of cartridges, and the number of target DNA samples used (minimum of five and maximum of 384).

To optimize the NGS genotyping procedures, we performed multiplex PCR tests in an initial MiSeq run (run 1) using a lowthroughput kit (1 M reads, MiSeq Reagent Kit Nano, 300 cycles; Illumina). We mixed eight, four, and two primer pairs for the first, second, and third tests, respectively. Additionally, a monoplex test was carried out for each of the eight primer pairs selected. We used DNA from five individuals in all tests (**Table 2**) and different index sequences for each test.

The second NGS run (run 2) was carried out using a multiplex reaction with the eight selected primer pairs, 192 individuals, and a high-throughput kit (15 M cartridge, MiSeq Reagent Kit

Standard, 300 cycles; Illumina). Pseudo-multiplex reactions with eight primer pairs and 1M cartridges were performed in the runs 3, 4, 5, and 6 with 192, 96, 288, and 384 target DNAs, respectively (**Table 2**).

We used Nextera XT Index Kit v2 Sets A, B, C, and D (Illumina), which allows the sequencing of up to 384 individuals in a single MiSeq run and, therefore, accelerates the process and increases the benefit-cost ratios of the analysis. We further increased the number of individuals genotyped per run using a random nucleotide sequence (such as AAA or TTT) between the adapter and the locusspecific sequences. We used the following primers: Forward: 5 0 -TCGTCGGCAG CGTCAGATGT GTATAAGAGA CAG + **AAA** or **TTT** + locus-specific forward primer sequence-3<sup>0</sup> , and Reverse: 5<sup>0</sup> -GTCTCGTGGGCTCGGAGATG TGTATAAGAG ACAG + **AAA** or **TTT** + locus-specific reverse primer sequence-3<sup>0</sup> (see **Table 1** for primer sequences). In these runs, only one set of adapters was used (Nextera XT Index kit v2 set A) and 288 individuals were genotyped. Noteworthy, other sequences can be used including CCC or sequences with four or more nucleotides. However, shorter sequences (one or two nucleotides) may hinder bioinformatics analysis, whereas longer sequences may reduce the efficiency of amplification. Additional tests should be performed to evaluate each case.

Amplicons of the eight loci were polled for each individual, quantified using Qubit, and diluted to 10 ng/µl. Then, a new pool (the combinatorial barcoding amplicon library) was prepared with all the quantified material. This library was then quantified by qPCR using the KAPA Library Quant Illumina/Universal kit (KAPA Biosciences) following the manufacturers' instructions. The library was used as input in a MiSeq run with a final concentration of 15 pM. A ready-to-use control library from Illumina (PhiX Control v3) was used in each sequencing run.

#### Genotyping of Microsatellites

We developed a bioinformatics pipeline for microsatellite genotyping (described below and in **Figure 1B**). To improve reliability, we trimmed and filtered the obtained reads using the PRINSEQ tool (Schmieder and Edwards, 2011). Bases with Phred scores lower than 30 and/or read lengths shorter than 75 bp were removed. Filtered reads were aligned against a FASTA file containing reference sequences for the eight microsatellite loci using the software Bowtie 2 (Langmead and Salzberg, 2012) with the high sensitivity option.

Alignment against a reference region that contains insertions or deletions of nucleotides, such as the microsatellite variants requires careful curation because variations in the edge of the repeat can lead to error in alignment and consequent misidentification of the alleles. One way to increase the

TABLE 1 | Primers used for amplification of the microsatellite markers selected for genotyping of Prochilodus costatus.


SSR, simple sequence repeat; bp, base pairs; AT, annealing temperature; DMSO, dimethyl sulfoxide. <sup>∗</sup>Buffer classification according to Phoneutria Biotecnologia e Seviços LTDA, Brazil.



<sup>∗</sup>Not applicate due to the low number of individuals. ∗∗Total number of alleles obtained for eight loci and variable numbers of individuals. #Reactions used for amplification. Multiplex, in which all primer pairs were used to amplify a single DNA sample in a single reaction followed by the incorporation of the index and NGS sequencing; Pseudo-multiplex, in which each primer pair was individually used to amplify a DNA target, followed by pooling of the amplicons of one individual (eight loci per individual) and a second PCR for index incorporation and MiSeq sequencing; Monoplex, in which DNA target amplification and index incorporation were performed individually for each DNA target and followed by MiSeq sequencing. <sup>U</sup>Type of cartridge used. 1 M is a MiSeq Reagent Nano Kit v2 (1 M reads) and 15 M is a MiSeq Reagent Kit v2 (15 M reads).

confidence and reduce error is to realign the reads taking into account the nucleotide sequence of the edges of the repeat and possible variants within the region. As no information on the variants of the microsatellite region studied here was available in public databases, we identified the possible variants at the edges of the repetition of each of the eight loci. We used the SAMtools package (Li et al., 2009) to detect possible variants in the mapping file (BAM format) of each of the 384 sequenced individuals. As a result, we obtained a Variant Call Format (VCF) file containing all the variants in the regions around the microsatellites repeat motifs. Next, we realigned the reads that mapped to the reference repeating regions using the tools RealignTargetCreator and IndelRealigner from the GATK package (McKenna et al., 2010). These tools perform a local realignment to the regions of the repeat motifs taking into account only high-quality reads that completely cover the repeat region and the variants described for the region (VCF file). Reads that did not match these criteria were removed. A realignment file (BAM format) containing only reads that realigned to each locus (tested twice) was obtained for each individual, thus increasing the confidence in the identification of the alleles.

We used the RepeatSeq tool (Highnam et al., 2012), with parameter −M 2 (minimum sequencing quality required value) to identify and quantify the alleles from the realignment files. This tool requires a file containing the chromosome coordinates and the repeat region motif sequence. Since P. costatus genomic information was unavailable, we used the information obtained through the microsatellites amplicon sequencing as an independent chromosome. We created an input file containing the name, the starting and ending positions of the repeat sequence in the amplicon, and the sequence of repeat motif for each locus. The information of the chromosomal regions was replaced by the information of each of the amplicons. There is no limit regarding the size or number of amplicons. However, it is important to enter the correct location and base sequence of the repeat region. The RepeatSeq tool uses the coordinate file to search for repeat regions in the realignment files and calculates the repeat length, which determines the alleles. A repeat ATTATTATTATT, for example, would be defined as allele 12. After identification of the repeat motif, the reads that aligned to that region are selected and quantified, according to their number of repeats. The resulting file contains the full read annotation of the reference microsatellite, including the total number of alleles detected, total number of reads, total number of reads per allele, and mapping quality score.

To avoid false negatives and to convert the results into the input format required by the software commonly used in population genetics, we developed a Perl script, named GenotypeMicrosat.pl. This script performs a detailed analysis of the RepeatSeq output file. We determined the individual's genotype for each locus using the following filter criteria: (1) maximum of two alleles per individual per locus; (2) at least 10 reads per locus in the entire repeat sequence, including eight bases in the 5<sup>0</sup> and eight bases in the 3<sup>0</sup> flanking regions; and (3) at least 20% of reads corresponding to a second allele for an individual to be considered heterozygous in a given locus. Individuals with a second allele coverage of less than 20% were considered homozygous. Following application of these filter criteria, we generated a spreadsheet containing the genotypes of each individual for each of the eight loci. Loci that did not attain the filter requirements were identified as 'NA.' The generated spreadsheet can be easily adapted for other population genetics analysis programs.

### RESULTS

In comparison with monoplex, multiplex reactions generated 10 to 100-times fewer reads (run 1) that could be used in genotyping (**Figure 2**). For some loci, no reads were detected in the multiplex tests, which suggests intense primer competition. On the other hand, consistent results were observed with monoplex-amplified samples (**Figure 2**).

We compared the genotyping results of 192 individuals by multiplex (run 2) or pseudo-multiplex (run 3) reactions. The 15 M cartridge tested in run 2 yielded about 4.7 times more reads than the 1 M cartridge (Nano) kit tested in run 3. Nevertheless, the number of reads generated with the Nano kit (run 3) allowed genotyping of all individuals. In run 2 (multiplex, **Table 3**), reads were obtained for five loci only. Amplification efficiency, determined as the fraction of individuals successfully genotyped for a given locus, was superior in the pseudo-multiplex reaction (run 3) for all loci, except proC10 and proC37 (**Table 3**).

As expected, the standard curve generated from runs 4, 5, and 6 revealed a strong negative correlation between the number of individuals tested and the number of reads generated (**Figure 3**). The obtained distribution of reads per locus in these three runs (**Table 4**) was uneven. When analyzing 384 individuals (run 6), we encountered an average of 325 reads per individual per locus for all loci. The yield differed depending on the locus being evaluated and for the same locus in different runs. For example, marker proC36 yielded an average of 1,018 reads per individual in run 5 and 198 reads per individual in run 6. On

FIGURE 2 | Average (mean) read yield in run 1 (n = 5 individuals). Multiplex reactions were performed in A with eight loci (proC10, proC18, proC22, proC36, proC37, proC44, proC48, proC49), in B with four loci (proC18, proC36, proC37, proC44), and in C with two loci (proC48 and proC49). Monoplex reactions were performed in D (proC10), E (proC18), F (proC36), G (proC48), and H (proC49). Error bars are standard deviation. MiSeq Reagent Kit v2 1 M reads was used.



Values correspond to the genotyping efficiency (%) for 192 individuals (Prochilodus costatus). x, no date obtained.

the other hand, proC37 yielded only 65 and 42 reads in runs 5 and 6, respectively. This variation did not compromise the genotyping analysis because of the bioinformatics parameters used.

From the 3,771,786 reads yielded in run 6, 1,998,885 (53%) passed quality filtering. Repeatseq identified 991,337 highquality reads (**Table 2**), which were subsequently used for allele detection. Of the 3,072 microsatellites genotyped in this run (eight loci from 384 individuals), 1,179 (39%) remained with undetermined genotype due to the low number of reads per locus (<10). The total number of alleles obtained in run 6 (384 individuals and 1 M cartridge) was 53, with the locus proC49 showing the highest number of alleles (10) while proC10 showed the lowest (4). Genotyping using the primers marked with base triads AAA and TTT generated results similar to those obtained in run 6.

#### DISCUSSION

In the present paper we showed the potential of microsatellite genotyping by NGS as a fast and cost-effective methodology to be implemented in large-scale population genetic studies. We used a combination of commercially available indexes and genotyped 384 individuals per run. The efficiency observed with the Nano (1 M reads) kit represents a substantial cost reduction over the NGS runs with the 15 M reads kit.

The flowchart presented herein was developed to ensure high accuracy in microsatellite genotyping. We excluded low-quality reads from the analysis and aligned the reads against the reference using Bowtie 2, the best-suited software for INDEL-rich loci (Highnam et al., 2012). This realignment step increases the confidence in allele detection, since our analysis considers only reads that cover the whole repeat region, including eight bases in the 5<sup>0</sup> and eight bases in the 3<sup>0</sup> flanking regions. Additionally, the pipeline proposed verifies all neighboring variations (5<sup>0</sup> and 3<sup>0</sup> regions flanking the microsatellite motif), allowing the identification of homoplastic motif repeat numbers and fragment length. The microsatellite genotyping tool RepeatSeq uses a Bayesian approach, which considers characteristics of the read and the sequence under analysis (Highnam et al., 2012). The developed GenotypeMicrosat.pl script further increases the

FIGURE 3 | Average number of reads generated per individual in three distinct runs using MiSeq Reagent Kit v2 1 M reads: run 4 (96 individuals), run 5 (288 individuals), and run 6 (384 individuals). • Represents the mean number of reads for each number of samples tested. Linear regression is represented by the dashed line (......). For more details, see Table 2.

TABLE 4 | Percentage<sup>∗</sup> of reads generated for each locus tested in three distinct sequencing runs (runs 4, 5, and 6).


<sup>∗</sup>Percentage was calculated from 646,223 reads generated in run 4 with 96 individuals, 1,051,375 reads generated in run 5 with 288 individuals, and 991,337 reads generated in run 6 with 384 individuals. One MiSeq Reagent Nano Kit v2 (1 M reads) was used for each run.

confidence of the genotyping by establishing a minimum of 10 reads to confirm an allele.

Read realignment with the GATK package requires a file containing all the variants described for the analyzed species. However, information about variations in our model species is scarce in public databases. As an alternative, we used single nucleotide polymorphisms (SNPs) and INDELs detected through sequenced amplicons as input for GATK. To circumvent the lack of data on the P. costatus microsatellite genomic localization required by RepeatSeq, we generated a file containing the genomic coordinates and repeat sequences to use as input. Our successful attempts to overcome the lack of genomic information for our species of interest highlight the potential application of the pipeline proposed for microsatellite genotyping of species for which genomic data are not available, and further support its use in genetic monitoring programs.

The maximum number of individuals genotyped per run is limited by the number of commercially available indexes

(currently 384). However, the amplification success of primers containing adenine or thymine trios shows the potential of this tool to increase the number of individuals genotyped in a single run. We genotyped eight loci per individual, with an average coverage of 325 reads per locus per individual in the runs with the Nano kit using 384 samples. As the genotyping pipeline considers a minimum of 10 reads per allele per individual, none of the runs reached the maximum capacity of the cartridge. In theory, the number of individuals or the number of loci could be increased up to four times per run (32 loci or 1536 individuals), based on the regression analysis. However, this possibility must be weighted carefully since the number of alleles considerably varies among the loci. For instance, ProC49 showed an average of 700 reads, while ProC37 showed 20 reads per individual in run 6. The number of loci may, therefore, be increased or decreased depending on the quality of the loci tested. Previous knowledge of the quality of a given locus also allows for the use of greater amounts of amplicon for loci with low yield. These findings open the prospect of using loci for which PCR reactions are not 100% efficient and represent an advantage in the genetic analysis of understudied species with limited availability of microsatellites.

Traditional methods employing microsatellite molecular markers have disadvantages such as the long optimization process and the elevated costs, especially in the development of multiplex systems. Additionally, automation limitations and data management requirements can prevent technology transfer among different laboratories (Guichoux et al., 2011). Previous studies have shown that allele sizes generated by capillary electrophoresis may vary depending on the equipment and running conditions (Delmotte et al., 2001; Foulet et al., 2005; Pasqualotto et al., 2007), and the number of loci that can be multiplexed with this technique is limited by the number of commercially available fluorophores. On the other hand, many loci can be simultaneously genotyped in a single NGS run (Scheible et al., 2011). The direct sequencing of loci is a more reliable approach as it allows for the analysis of all the variations in the fragment, thus ensuring greater reliability. Furthermore, technology transfer and detection of technical errors are facilitated by NGS. Here we tested eight loci per individual. Nevertheless, our pipeline has the potential to provide

### REFERENCES


analysis of a significantly larger number of loci and recent publications with neotropical migratory fish revealed a minimum of seven and a maximum of 13 loci for up to 30 individuals (Rueda et al., 2011; Berdugo and Narvaez Barandica, 2014; Coimbra et al., 2017).

Despite the poor results of direct multiplex reactions, we successfully optimized a 'pseudo-multiplex strategy,' in which previous monoplex reactions were performed for each sample and the amplicons mixed in the indexing reaction. This strategy reduced the cost and duration of the analysis and may be used in the genotyping of other markers, such as SNPs, and in metagenomics studies.

#### CONCLUSION

We present a novel method for microsatellite genotyping based on the Illumina combinatorial barcoding using a Nano kit. This approach is faster and more efficient than those currently available and offers large amounts of high-quality data for conservation genetics and population studies.

### AUTHOR CONTRIBUTIONS

JP and EK designed and coordinated the work. SL, SF, AP, PB-D, and NA acquired the data. AC, IR, and AM developed the pipeline. JP, AC, IR, and SL analyzed the data. All authors contributed to data interpretation and provided substantial contributions to manuscript writing. All authors approved the final version prior to submission.

#### FUNDING

This work was funded by Companhia Energética de Minas Gerais (CEMIG) (project P&D GT 455), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) (Edital Ciências Forenses no. 25/2014), and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq).


resolution fingerprinting of Aspergillus fumigatus isolates. J. Clin. Microbiol. 43, 4112–4120. doi: 10.1128/JCM.43.8.4112-4120.2005


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Pimentel, Carmo, Rosse, Martins, Ludwig, Facchin, Pereira, Brandão-Dias, Abreu and Kalapothakis. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Evaluation of Reference Genes to Analyze Gene Expression in Silverside Odontesthes humensis Under Different Environmental Conditions

Tony L. R. Silveira<sup>1</sup> , William B. Domingues <sup>1</sup> , Mariana H. Remião<sup>1</sup> , Lucas Santos <sup>1</sup> , Bruna Barreto<sup>1</sup> , Ingrid M. Lessa<sup>1</sup> , Antonio Sergio Varela Junior <sup>2</sup> , Diego Martins Pires <sup>3</sup> , Carine Corcini <sup>3</sup> , Tiago Collares <sup>4</sup> , Fabiana K. Seixas <sup>4</sup> , Ricardo B. Robaldo<sup>5</sup> and Vinicius F. Campos <sup>1</sup> \*

#### Edited by:

Rodrigo A. Torres, Universidade Federal de Pernambuco, Brazil

#### Reviewed by:

Bruno Cavalheiro Araújo, University of Mogi das Cruzes, Brazil Diogo Teruo Hashimoto, Universidade Estadual Paulista Júlio de Mesquita Filho (UNESP), Brazil

#### \*Correspondence:

Vinicius F. Campos fariascampos@gmail.com

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 23 October 2017 Accepted: 19 February 2018 Published: 14 March 2018

#### Citation:

Silveira TLR, Domingues WB, Remião MH, Santos L, Barreto B, Lessa IM, Varela Junior AS, Martins Pires D, Corcini C, Collares T, Seixas FK, Robaldo RB and Campos VF (2018) Evaluation of Reference Genes to Analyze Gene Expression in Silverside Odontesthes humensis Under Different Environmental Conditions. Front. Genet. 9:75. doi: 10.3389/fgene.2018.00075 <sup>1</sup> Laboratory of Structural Genomics, Biotechnology Graduate Program, Federal University of Pelotas, Pelotas, Brazil, 2 Institute of Biological Sciences, Federal University of Rio Grande, Rio Grande, Brazil, <sup>3</sup> Veterinary Faculty, Federal University of Pelotas, Pelotas, Brazil, <sup>4</sup> Laboratory of Cancer Biotechnology, Biotechnology Graduate Program, Federal University of Pelotas, Pelotas, Brazil, <sup>5</sup> Laboratory of Physiology, Institute of Biology, Federal University of Pelotas, Pelotas, Brazil

Some mammalian reference genes, which are widely used to normalize the qRT-PCR, could not be used for this purpose due to its high expression variation. The normalization with false reference genes leads to misinterpretation of results. The silversides (Odontesthes spp.) has been used as models for evolutionary, osmoregulatory and environmental pollution studies but, up to now, there are no studies about reference genes in any Odontesthes species. Furthermore, many studies on silversides have used reference genes without previous validations. Thus, present study aimed to was to clone and sequence potential reference genes, thereby identifying the best ones in Odontesthes humensis considering different tissues, ages and conditions. For this purpose, animals belonging to three ages (adults, juveniles, and immature) were exposed to control, Roundup®, and seawater treatments for 24 h. Blood samples were subjected to flow-cytometry and other collected tissues to RNA extraction; cDNA synthesis; molecular cloning; DNA sequencing; and qRT-PCR. The candidate genes tested included 18s, actb, ef1a, eif3g, gapdh, h3a, atp1a, and tuba. Gene expression results were analyzed using five algorithms that ranked the candidate genes. The flow-cytometry data showed that the environmental challenges could trigger a systemic response in the treated fish. Even during this systemic physiological disorder, the consensus analysis of gene expression revealed h3a to be the most stable gene expression when only the treatments were considered. On the other hand, tuba was the least stable gene in the control and gapdh was the least stable in both Roundup® and seawater groups. In conclusion, the consensus analyses of different tissues, ages, and treatments groups revealed that h3a is the most stable gene whereas gapdh and tuba are the least stable genes, even being considered two constitutive genes.

Keywords: algorithm, expression, fish, gene, normalization, real time PCR, sequencing, validation

#### INTRODUCTION

Northern-blotting, ribonuclease protection assay, microarray, RNA-Seq, semi-quantitative RT-PCR, and quantitative RT-PCR (qRT-PCR) are among the most currently used techniques for evaluation of gene expression levels. The latter stands out for fast readout, high throughput and high automation potential (Tang et al., 2012). In absolute quantification of qRT-PCR, the increasing level of fluorescent signal emitted by the amplified products is compared to a standard curve. The relative quantification describes the change in the expression pattern of genes of interest when subjected to certain controlled situations compared to the constant expression of reference genes (RGs), also known as housekeeping genes under same conditions. Therefore, it is extremely important to have a stable and constant expression pattern of RGs regardless of gender, age, organ or tissue, and even after environmental changes. Thus, it is necessary to validate diverse RGs for various species and their stages of development in different tissues, as well as for each type of controlled environmental condition (Livak and Schmittgen, 2001; Zheng and Sun, 2011).

However, most studies related to the quantification of gene expression in various species of teleosts are not based on RGs displaying unchanged expression. Most studies use RGs properly confirmed in mammalian species to normalize the relative quantification using qRT-PCR of non-mammalian species, such as fishes. The main RGs from mammals that have been empirically used to normalize experiments in teleost species are the GAPDH (Lin et al., 2004); ACTB (Choi and An, 2008; Nawata et al., 2010); EF1α (Scott et al., 2004; Nilsen et al., 2007; Patterson et al., 2012; Sinha et al., 2015); RPL (Hsu et al., 2014) and 18S (Sinha et al., 2015). Studies also report the distribution of RGs in the tissues; however, these genes are not objectively evaluated in relation to the maintenance of the constant expression during development stages of individuals, in both genders or in different controlled environmental conditions (Tomy et al., 2009; Hsu et al., 2014).

Numerous studies have reported that some mammalian RGs, which are widely used to normalize the qRT-PCR reactions of different species (i.e., ACTB and GAPDH), could not be used owing to their high variation in expression levels (Bustin, 2002; Dheda et al., 2004; Silver et al., 2006). This could lead to misinterpretation of results. It is impossible to find suitable RGs that exhibit constant expression pattern for all species and under all experimental conditions (Zheng and Sun, 2011; Tang et al., 2012). Due to these reasons, the identification of RGs for each species and each experimental condition is justified.

The silversides (Odontesthesspp.) comprise a genus of fish that are naturally endemic to the South American waters (Bemvenuti, 2006). This genus consists of the biggest number of species of the South American atherinopsids (Dyer, 2006). In nature, the silversides are usually found in southern Brazil, Uruguay, and Argentina (Bemvenuti, 2006). In these regions, some species have economic importance for production, flesh commercialization and sport fishing (Menone et al., 2000; Somoza et al., 2008). Furthermore, some silversides have been used as experimental models for evolutionary, osmoregulatory, and environmental pollution studies due to the radiation after continental waters invasion from ocean, euryhaline biology and inhabitation of environments contaminated by glyphosate coming from rice and soybean monocultures in fields near to the habitat of silversides (Menone et al., 2000; Carriquiriborde and Ronco, 2006; Tsuzuki et al., 2007; Piedras et al., 2009; Bloom et al., 2013; Hughes et al., 2017; Zebral et al., 2017). Moreover, the silversides have already been utilized to address various aspects of gene expression (Strobl-Mazzulla et al., 2005; Karube et al., 2007; Fernandino et al., 2008, 2011; Majhi et al., 2009; Miranda et al., 2009; Blasco et al., 2010; Gómez-Requeni et al., 2012; Pérez et al., 2012; González et al., 2015). The expression pattern of a gene can be used to assess physiological responses of fishes to environmental changes, such as those to which some silversides are exposed in their habitat (salinity variations and contamination by glyphosate) (Wang et al., 2014; Velasques et al., 2016).

Similarly, Odontesthes humensis de Buen, 1953, a native species to coastal lagoons from southern Brazil, Uruguay and Argentina, has similar potential to be used as a biological model. However, a previous basic knowledge about the RGs is necessary to employ modern techniques such as qRT-PCR to study the gene expression of different genes of interest, independent from the biological model. Up to now, no studies have focused on the sequence determination and validation of RGs in any species of Odontesthes spp. Due to this, the present study aimed to clone and sequence eight potential RGs, thereby identifying and validating the best RGs in O. humensis using different tissues, stages of development and a natural and an anthropized environmental conditions, to standardize the qRT-PCR technique.

#### MATERIALS AND METHODS

#### Animals and Sample Collection

The silversides Odontesthes humensis used in this study was procreated from the eggs previously collected in nature (Arroio Grande, Brazil: 32◦ 14′ 15′′S/53◦ 05′ 13′′W) and hatched in tanks. The silversides were kept within an experimental room in nine 1,000 L circular plastic tanks with water at a temperature of 13.3 ± 1.8◦C, pH 7.0 ± 0.7, salinity 3.2 ± 0.67 ppt, dissolved oxygen 9.5 ± 0.5 mg/L, ammonia levels lower than 0.6 mg/L and a natural photoperiod of winter. The tank sides were opaque to reduce visual stress, and once a week 2/3 of the water in the tanks were renewed. Three tanks were occupied by 3-year-old fish (5 animals/tank), classified as adults considering the complete development of gonads. Other three tanks were occupied by 1.5 year-old fish (5 animals/tank), classified as juveniles considering the partial development of gonads. In addition, three tanks were occupied by 6-month-old silversides (5 animals/tank), classified as immature due to the undifferentiated gonads. The animals were fed thrice a day on a commercial diet (Supra, 38% crude protein) and zooplankton ad libitum. The acclimation period was 4 weeks under these conditions.

**Abbreviations:** RG, reference gene.

The experimental design consisted of three treatment groups: the "control group," exposed to water in previously mentioned conditions; the "Roundup <sup>R</sup> group," exposed to Roundup <sup>R</sup> Transorb (a glyphosate based herbicide) diluted in water at a concentration of 10 mg.L−<sup>1</sup> (acid equivalent, a.e.) of glyphosate; and the "seawater group," exposed to seawater at 30 ppt (dissolution of non-iodized sea salt in the medium). Three tanks were prepared for each treatment, the first one received the adults, the second one received the juveniles, and the third one received the immature individuals. After the acclimation period, the animals were exposed to treatments for 24 h. The water quality parameters did not differ statistically from those observed in the acclimatization period, except for the salinity of the group exposed to seawater. During the experimental period, the fish continued to be fed as during the acclimation period.

Each animal was hooked from the tank and anesthetized by submersion in benzocaine at 50 ppm. Three fish from each tank were collected, leading to a total of 27 animals. The animals were weighed, measured, and euthanized by cranial spinal section and excision of the brain. Adults, juveniles and immature silversides had mean lengths of 20.2 ± 2.3 cm, 15.2 ± 1.2 cm, and 11.7 ± 0.7 cm and mean weight of 55.2 ± 11.7 g, 24.6 ± 6.5 g, and 9.6 ± 1.9 g, respectively. Brain, hepatopancreas, gills, and kidney (the main organs affected by glyphosate and/or hyperosmotic environments in fishes; Tipsmark et al., 2004; Carriquiriborde and Ronco, 2006; Hiroi and McCormick, 2012; Harayashiki et al., 2013; Velasques et al., 2016) from adults, juveniles and immature silversides in the three treated groups were collected and preserved in liquid nitrogen (N2). Furthermore, a total of 20 µL of peripheral blood was collected by puncturing the caudal fin. Each blood sample was added to 1 mL of fetal bovine serum (FBS), stored at 4◦C, and protected from the light until used for flow-cytometry analysis. The animal use and all handling practices were approved by the Ethics Committee on Animal Experimentation of the Federal University of Pelotas (process no. 23110.007018/2015-85). The Roundup <sup>R</sup> residual from experimental tanks was sent to university chemical waste facility to to be treated prior to disposal.

#### Flow-Cytometric Analysis

The flow-cytometry analysis was performed using the Attune <sup>R</sup> Acoustic Focusing Flow Cytometer (Applied Biosystems, USA) to evaluate the efficacy of treatments to affect the physiology of silversides. Each blood sample was exposed to 2 mM of Hoechst 33342 (H33342; Sigma-Aldrich, USA) for 5 min before each read, except in the DNA damage analysis. Events were detected by fluorochrome with violet laser (405 nm) and photomultiplier (PMT) VL1 (filter 450/40 nm). The green, orange, and red fluorescences were analyzed by blue laser (488 nm) and PMTs BL1 (filter 530/30 nm), BL2 (filter 575/24 nm), and BL3 (filter > 640 nm) filters, respectively. Cytometry fluorescence stability was tested daily using standard beads (Invitrogen, USA). The acquisition rate was 200 events/s totaling 20,000 events per sample. All assays were performed in triplicate. The results were analyzed using the Attune Cytometric Software v2.1. The non-cellular events were eliminated from the analysis by scatter plots of forward scatter (FSC) × side scatter (SSC) (Petrunkina et al., 2005) and negative fluorescence of H33342 events (debris).

For evaluation of reactive oxygen species (ROS) produced by the erythrocytes, 10 µL of blood sample previously collected and stored was added to 20 µL of saline solution with 2µM of 2′ ,7′ -dichlorofluorescein diacetate (DCFH-DA) and 5µM of propidium iodide (PI) fluorescent probes (Sigma-Aldrich Co., USA). The samples were analyzed in triplicate after incubation for 60 min at 22◦C in the dark. The ROS production was measured by the median of the intensity of the green fluorescence. Only intact cells (PI negative) were evaluated.

To evaluate DNA damage in the erythrocytes, 10 µL of blood sample previously collected and stored was added to 5 µL of TNE (0.01 M Tris-HCl. 0.15 M NaCl, 0.001 M EDTA, pH 7.2) and 30 s later to 10 µL of Triton 1X (Triton X-100, 1%, v/v). Then, 50 µL acridine orange dye (2 mg/mL, #A6014, Sigma-Aldrich, USA) was added to the sample, followed by incubation from 30 s up to 2 min before each reading. The samples were analyzed in triplicate. The DNA of the erythrocytes was classified as integrated (green fluorescence emission) and damaged (orange/red fluorescence emission). The percentage of DNA fragmentation index (DFI) was obtained by the median of the red fluorescence intensity/(median of the red + green fluorescence intensities) × 100.

# RNA Extraction and cDNA Synthesis

The total RNA was extracted from tissue samples using the commercial kit RNeasy <sup>R</sup> Mini Kit (Qiagen, USA) following the manufacturer's instructions. The RNA was treated with DNA-free <sup>R</sup> Kit (Ambion, USA) to remove genomic DNA contamination. Subsequently, the RNA concentration and purity were measured using a spectrophotometer NanoVueTM Plus (GE Healthcare Life Science, USA), and only the samples presenting high purity (OD260/<sup>280</sup> ≥ 2.0 nm) were used. In addition, RNA quality was analyzed on the 4,200 TapeStation (Agilent Technologies, USA) system using the TapeStation analysis application. The mean RIN number for each experimental group is listed in Supplementary Table 1. First-strand cDNA synthesis was performed with 2 µg of RNA using the commercial kit High Capacity cDNA Reverse Transcription (Applied Biosystems, USA) according to manufacturer's recommendation. The reactions were run in SimpliAmpTM thermal cycler (Applied Biosystems, USA). Finally, the cDNA was stored at −20◦C until its use.

### Amplification and Cloning of Candidate RGs

The tested candidates RGs were: 18S ribosomal RNA (18s), β-actin (actb), elongation factor 1 α (ef1a), eukaryotic translation initiation factor 3g (eif3g), glyceraldehyde-3-phosphate dehydrogenase (gapdh), histone h3a (h3a), Na+/K+-ATPase α (atp1a), and tubulin-α (tuba) (**Table 1**). The atp1a gene represented the control, which is a constitutive gene that is reported to be affected by glyphosate-based herbicides and salt treatments (Shiogiri et al., 2012; Armesto et al., 2014).

TABLE 1 | Summary of candidate reference genes evaluated in the present study.


The gene fragments were amplified by PCR using primers designed in the PriFi online tool (https://services.birc.au.dk/ prifi/) after alignments (Supplementary Table 2) of known sequences deposited in the GenBank (**Table 2**). The PCR was performed using the mix buffer GoTaq <sup>R</sup> G2 Flexi DNA Polymerase (Promega, USA) following manufacturer's instructions. The reactions were run in the SimpliAmpTM thermal cycler (Applied Biosystems, USA). The PCR parameters were: an initial denaturation step for 1 min at 94◦C, followed by 35 cycles at 94◦C for 30 s, 59.9 to 65.5◦C (depending on the primer sequence, **Table 2**) for 30 s; and 72◦C for 1 min, with a final extension of 5 min at 72◦C. To confirm the amplification of the fragments, the reactions were analyzed using 1.5% agarose gel electrophoresis. The PCR products were inserted into the cloning vector pCRTM4-TOPO <sup>R</sup> TA (Invitrogen, USA). The transformed vector was used to transform electrocompetent Escherichia coli DH5α.

#### Purification and Sequencing of the Fragments of Interest

The transformed E. coli colonies were selected and cultured in 3 mL of Luria broth (LB) medium containing kanamycin antibiotic. The resistant bacteria were used to isolate the vector and the fragment was purified using the commercial kit Illustra plasmidPrep Mini Spin (GE Healthcare, USA) following the manufacturer's instructions. The purified samples of ten cloning fragments were submitted to sequencing by BigDye <sup>R</sup> Terminator v3.1 Cycle Sequencing (Applied Biosystems, USA) following the manufacturer's instructions. The reaction mixtures were incubated in a SimpliAmpTM thermal cycler (Applied Biosystems, USA) under the following parameters: 96◦C for 1 min, 35 cycles of 96◦C for 1 min, 50◦C for 5 s, 60◦C for 4 min.

One more purification was carried out using the BigDye <sup>R</sup> XTerminatorTM Purification (Applied Biosystems, USA) according to the manufacturer's instructions. Following the purification, the samples were submitted to sequencing using an Applied Biosystems 3,500 Genetic Analyzer <sup>R</sup> automatic sequencer (Life Technologies, USA). The sequences were analyzed using the Vector NTI and deposited in GenBank <sup>R</sup> .

#### Gene Expression Analysis by qRT-PCR

The primers used for qRT-PCR were designed using the previous sequence of the cloned fragments and Primer3 online tool (www.bioinfo.ut.ee/primer3-0.4.0/). The primers used in this study are presented in **Table 2**. The qRT-PCR was run in a Stratagene <sup>R</sup> Mx3005PTM Real-Time PCR System (Agilent Technologies, USA) using SYBR <sup>R</sup> Green PCR Master Mix (Applied Biosystems, USA). The amplification conditions were 95◦C for 10 min, 40 cycles at 95◦C for 15 s, and 60◦C for 60 s followed by the conditions needed to calculate the melting curve. All the reactions were performed in triplicate.

#### Data Analysis

The normality distribution of flow-cytometry data was evaluated by the Shapiro-Wilk test. None of the evaluated parameters presented normal distribution, thus, the data were submitted to the Kruskal-Wallis non-parametric test followed by Dunn's allpairwise multiple comparisons, with a significance level of 5%. The results of ROS production and DFI were expressed as the mean fluorescent intensity and mean percentage, respectively, ± standard error of the means (SEM).

The cycle threshold (Ct) values were exported from the detection software MxProTM v.4.1 (Stratagene, Waldbronn, Germany) to Excel files to facilitate the descriptive analysis and the values transformations required by some software. The specific analyses of stability of gene expression were performed using the comparative delta-Ct (dCt) method (Silver et al., 2006) and geNorm (Vandesompele et al., 2002), NormFinder (Andersen et al., 2004), BestKeeper (Pfaffl et al., 2004), and RefFinder (http://150.216.56.64/referencegene.php? type=reference) statistical approaches as previously reported (Yang et al., 2013; Taki et al., 2014). The results were expressed as average of standard deviation (SD) to dCt method; average expression stability value to geNorm and NormFinder; Pearson's correlation coefficient ([r]) to BestKeeper; and geometric mean of ranking values to RefFinder.

# RESULTS

#### Cloning and Sequencing of RGs

Except for atp1a gene, which was previously known (Silveira et al., 2018), all the other seven silverside O. humensis candidate RGs analyzed in this study were cloned and partially sequenced for the first time. Furthermore, the new sequences were deposited in the GenBank <sup>R</sup> (**Table 1**). The sequence length of 18s was 640 base pairs (bp). The sequence lengths of actb; ef1a; eif3g; gapdh; h3a; and tuba cloned fragments from O. humensis were 350, 638, 435, 325, 222, and 527 bp, respectively, that encoded 116, 212, 144, 108, 73, and 176 amino acids, respectively. The cloned fragments of tuba belongs to the open reading frame (ORF) +1, the fragments of actb and eif3g are part of the ORF +2, and the fragments of ef1a, gapdh, and h3a compose the ORF +3.



E, efficience; bp, base pair; R<sup>2</sup> , Pearson coefficient of determination; Tm, melting temperature.

#### Flow-Cytometry Analysis

The exposure of silversides for 24 h to both Roundup <sup>R</sup> and seawater caused an increase in the ROS production in erythrocytes (P < 0.05) when compared to the control group (**Figure 1A**). Similarly, the Roundup <sup>R</sup> and seawater treatments increased (P < 0.05) the DFI of the erythrocytes in the exposed silversides (**Figure 1B**). The increase in ROS production and DNA damage in the treated fish confirmed that both Roundup <sup>R</sup> and seawater treatments negatively affected the physiology of silversides, thereby generating a systemic effect on fish.

### The Comparative dCt Method Analysis

The comparative dCt method was used to select the most stable RG. A low average of SD value represented a low expression variance, or high stability. The analysis revealed the following order of stability of evaluated genes, from higher to lower, in water with controlled quality parameters (control group): h3a (2.57) > ef1a (2.82) = actb (2.82) > eif3g (3.00) > 18s (3.21) > gapdh (3.66) > tuba (3.77) > atp1a (4.98) (**Figure 2A**). The ranking order of stability of the evaluated genes upon exposure of silversides to Roundup <sup>R</sup> treatment was h3a (2.74) > actb (2.79) > ef1a (2.95) > eif3g (3.03) > 18s (3.51) > tuba (3.82) > gapdh (3.95) > atp1a (6.18) (**Figure 2B**). In saline condition, the seawater treated group presented the following stability order: h3a (2.62) > eif3g (2.91) > ef1a (3.00) > actb (3.19) > 18s (3.72) > tuba (3.84) > gapdh (3.90) > atp1a (5.05) (**Figure 2C**).

#### GeNorm Analysis

The geNorm analysis was carried out for the result data set after transformation of the Ct values into relative quantities through the 2(minimum Ct value in a set sample <sup>−</sup>Ct value of a sample) formula and compare pairwise variation (SD values) for each gene pair. Then, the geometric mean of SD values was used to calculate the M-value. The generation of low average expression stability

represents a low variance. The geNorm analysis revealed the following ranking order of stability of genes in the control group: h3a/actb (1.14) > eif3g (1.62) > ef1a (1.85) > 18s (2.19) > tuba (2.53) > gapdh (2.81) > atp1a (3.35) (**Figure 3A**). In the Roundup <sup>R</sup> group, the sequence of stability of the evaluated genes from the highest to the lowest was h3a/actb (1.19) > eif3g (1.43) > ef1a (1.61) > tuba (2.09) > 18s (2.44) > gapdh (2.77) > atp1a (3.62) (**Figure 3B**). When exposed to seawater, the silversides genes exhibited the following stability ranking order: h3a/eif3g (1.48) > actb (1.68) > ef1a (1.85) > tuba (2.36) > 18s (2.74) > gapdh (3.02) > atp1a (3.53) (**Figure 3C**).

#### NormFinder Analysis

The NormFinder was applied to analyze the most stable evaluated genes. A low average of expression stability represents a low variance. The NormFinder analysis revealed the ranking order of stability in the control group to be h3a (0.79) > ef1a (1.26) > actb (1.63) > eif3g (1.95) > 18s (2.06) > gapdh (2.79) > tuba (3.05) > atp1a (4.55) (**Figure 4A**). For the Roundup <sup>R</sup> group, the ranking of expression stability was h3a (0.89) > actb (1.23) > ef1a (1.43) > eif3g (1.79) > 18s (2.15) > tuba (2.89) > gapdh (3.01) > atp1a (5.83) (**Figure 4B**). When exposed to the seawater, the ranking order of RGs was h3a (0.74) > ef1a

(1.53) > eif3g (1.61) > actb (2.19) > 18s (2.70) > tuba (2.97) > gapdh (3.04) > atp1a (4.54) (**Figure 4C**). In addition, the best combination of genes for the experimental conditions was evaluated. A comparison of the gene expression in silversides of the control and the Roundup <sup>R</sup> groups, and of the control and the seawater groups revealed the best combination of genes to be h3a and ef1a with combined stability values of 0.28 and 0.29, respectively. The best suitable combination of two genes for all

three tested conditions was also h3a and ef1a with a combined stability value of 0.26.

#### BestKeeper Analysis

The BestKeeper analysis provided two-interpretation-ways to rank the gene stability: one based on the samples SD values of Ct and other based on the Pearson's correlation of expression among the genes. Thus, the genes with low SDs and high correlation

with the BestKeeper index (indicating high similarity among the expression levels of the RGs) are ranked as the most stable genes. To construct the consensus comprehensive ranking, the SD values were used, which is a more conservative approach. However, to construct the rankings of the most stable genes with BestKeeper, [r] and P-values of Pearson's correlation were employed. This way is more sophisticated and statistically robust, as it results in the rankings more similar to the ones obtained using other algorithms.

The analyses based on Pearson's correlation revealed the following ranking order of stability in the control group: h3a (0.93) > actb (0.90) > ef1a (0.80) > eif3g (0.79) > tuba (0.76) > 18s (0.62) > gapdh (0.46) > atp1a (0.24) (**Figure 5A**). The genes presented a positive correlation with the BestKeeper index with P = 0.001 except atp1a with P = 0.05. In the Roundup <sup>R</sup> group the ranking of the best RG was h3a (0.83) = actb (0.83) > ef1a (0.79) > eif3g (0.78) > tuba (0.70) > gapdh (0.49) > 18s (0.41) > atp1a (0.26) (**Figure 5B**). All correlation analysis presented P

FIGURE 5 | Stability analysis of the candidates reference genes in Odontesthes humensis calculated by BestKeeper algorithm. Gene expression stability in silversides exposed to water with quality parameters controlled (A); to Roundup® Transorb [10 mg.L−<sup>1</sup> (acid equivalent) of glyphosate] (B); and to seawater (30 ppt) (C) across the different treatments for 24 h. The most stable genes are displayed on the left, and the least stable genes are displayed on the right of the x-axis.

= 0.001 except atp1a with P = 0.08. In the seawater group the ranking order from the most to the least stable genes was h3a (0.91) > ef1a (0.89) > actb (0.87) > eif3g (0.83) > tuba (0.76) > gapdh (0.49) > 18s (0.23) > atp1a (0.22) (**Figure 5C**). The genes atp1a and 18s exhibited P = 0.1 and P = 0.08, respectively. All the other genes displayed positive correlations with P = 0.001.

#### RefFinder Analysis

The RefFinder is an online tool used to construct a consensus comprehensive ranking of stability of RGs among all the other methods using the calculation of geometric mean for the ranks calculated by each of other algorithms. Candidate genes with the lowest geometric mean are most stable.

Besides the graphical rankings of RefFinder, it presented the consensus results discriminated by treatment group (control, Roundup <sup>R</sup> , and seawater) by developmental stages (adults, juveniles, and immature) and tissue types (brain, gills, hepatopancreas, and kidney).

In the control group, the most stable gene across the different ages and tissues was h3a. The only exception was the brain samples, where the most stable gene was tuba. The least stable gene across the developmental stages and tissues was atp1a (**Table 3**).

In the Roundup <sup>R</sup> group the most stable gene across different ages was h3a in adults and immature and ef1a for juveniles. In different tissues, the most stable gene was also h3a in gills and hepatopancreas, but actb was most stable in brain and kidney. As expected, the gene with the lowest stability was atp1a across different ages and tissues, except in hepatopancreas, where tuba exhibited the lowest stability (**Table 4**).

The h3a gene displayed the highest stability in different developmental stages in the seawater group. It was also the most stably expressed gene in the gills and hepatopancreas, while actb

TABLE 3 | Consensus stability ranking by RefFinder of the candidate reference genes in Odontesthes humensis under normal conditions.




TABLE 5 | Consensus stability ranking by RefFinder of the candidate reference genes in Odontesthes humensis exposed to seawater.


was the most stable in brain and kidney. The gene with the lowest stability across the ages and tissues was atp1a, except in hepatopancreas, where tuba exhibited the lowest stability in expression (**Table 5**).

In the control group, the consensus comprehensive ranking constructed by RefFinder was h3a (1.32) > ef1a (2.38) > actb (2.59) > 18s (3.34) > eif3g (4.12) > gapdh (5.63) > tuba (6.96) > atp1a (7.74) (**Figure 6A**). In the Roundup <sup>R</sup> group, the ranking order of stability was h3a (1.19) > actb (1.86) > 18s (3.50) > ef1a (3.66) > eif3g (3.72) > tuba (5.96) > gapdh (6.74) > atp1a (8.00) (**Figure 6B**). The consensus ranking of stability constructed based on the expression data of the seawater group was: h3a (1.19) > eif3g (2.06) > ef1a (3.13) > 18s (3.50) > actb (4.12) > tuba (5.96) > gapdh (6.44) > atp1a (8.00) (**Figure 6C**). The final consensus stability analysis, considering all methods and grouping the different tissues, ages, and treatments groups, revealed the following ranking order: h3a (1.19) > ef1a (2.63) > actb (2.99) > eif3g (3.22) > 18s (3.34) > gapdh (6.48) > tuba (6.48) > atp1a (8.00) (**Figure 6D**).

#### DISCUSSION

In the present study, it was used individual methods (comparative dCt, geNorm, NormFinder, and BestKeeper) to obtain independent rankings of gene expression stability of candidate RGs in O. humensis and an algorithm that deduces a consensus ranking among all methods (RefFinder) such as presented in recent studies about RGs (Taki et al., 2014; Xu et al., 2014; Huang et al., 2015). Different algorithms can give different results depending of the set of analyzed genes and experimental variables (Yang et al., 2013; Zhang et al., 2013; Liu et al., 2014; Purohit et al., 2016) and no one statistical approach can cover all variables (Taki et al., 2014). For these reasons, the use of more than one algorithm is indicated. The different results presented by each algorithm are natural and expected due to their distinct statistical approach to construct the rankings (Liu et al., 2014). For example, the dCt method compares the relative expression of gene pairs with their SD values to get the most stable. GeNorm goes beyond, it calculates the M-value from the SD values and the geometric mean, and compares all gene pairs results. BestKeeper also starts from the SD values of gene expression and a geometric mean of the genes with the most stable Ct value are used to calculate the BestKeeper index. Then, it is calculated the Pearson's correlation coefficient ([r]) and the P-value to determine the similarity between the candidate RGs expression. The NormFinder is based in analysis of variance (ANOVA) and it is the unique of the used algorithms that evaluates the intragroup variation (i.e., replicates of a same treatment group) besides the intergroup variation (i.e., control vs. Roundup <sup>R</sup> or seawater groups), which also used by the other pairwise comparisons approaches, and combine both variations into a stability value. In this study, some differences were obtained in the ranking orders occupied by the genes in result of each algorithm analysis mainly from the second to the penultimate positions (**Figures 2**–**5**). Due to this, the use of the analysis by the ReFinder, which calculates the geometric mean for the previously obtained ranks, was important to a consensual unification of the discrepant results.

In the current study, the most stable gene across the treatment groups, independent of the method of analysis, including the final consensus RefFinder analysis, was h3a. This gene was also the most stable across different ages, except in the juveniles from the Roundup <sup>R</sup> group. However, in the consensus analysis of the tissues samples revealed h3a to be not as stable as in

the age analysis. For example, the results of brain analysis demonstrated this gene not to be the most stable in all treatments. Moreover, h3a did not exhibit the highest stability in the kidney of Roundup <sup>R</sup> group. The h3a gene has already been evaluated in studies for validation of RGs showing high stability (Taylor et al., 2007; Koenigstein et al., 2013) and can be used to normalize qRT-PCR reactions (Ramachandra et al., 2008). However, up to now, no study has reported this gene as the candidate RG in a fish species. In the final consensus ranking, the second most stable gene was ef1a. All performed analysis revealed this gene to have high stability, occupying at least the third position of the constructed rankings, except in consensus ranking of the Roundup <sup>R</sup> group constructed by RefFinder, where it occupied the fourth position. A deviation in the ranking order of stability is natural owing to the use of different algorithms. Studies on stability of candidates RGs in different experimental protocols have revealed that ef1a is usually the most stable gene of fishes (Jorgensen et al., 2006; Tang et al., 2007, 2012; Infante et al., 2008; Liman et al., 2013; Hu et al., 2014).

The studies that evaluated the gene expression in the South American silversides have used actb as the internal control for qRT-PCR (Strobl-Mazzulla et al., 2005; Karube et al., 2007; Fernandino et al., 2008, 2011; Majhi et al., 2009; Miranda et al., 2009; Blasco et al., 2010; Gómez-Requeni et al., 2012; Pérez et al., 2012; González et al., 2015). In the present study, RefFinder analysis focused on the individual tissues revealing actb to be the most stable gene in brain and kidney from the Roundup <sup>R</sup> group and in the brain from the seawater group. Only the geNorm analysis of stability, based on control and Roundup <sup>R</sup> groups, revealed actb as the most stable gene. However, our results showed that actb was not the most stable gene. The RefFinder final consensus comprehensive ranking pointed this gene as the third most stable in O. humensis. There is still much debate on the use of actb to normalize qRT-PCR reactions in fishes. Some authors have reported that this gene is the least stable in some fish species (Jorgensen et al., 2006; Filby and Tyler, 2007; Yang et al., 2013; Hu et al., 2014; Chapman and Waldenström, 2015). However, the studies that present actb as a good RG have also reported that its high stability in fishes is tissue dependent within a species (Zheng and Sun, 2011; Zhang et al., 2013; Sun and Hu, 2015), such as in O. humensis. This does not mean that the studies using actb as the RG should not be considered. However, future studies should pay more attention to the choice of RG.

The 18s gene has been a subject of discussions on its use as the RG in fish models. This gene has been shown as the most or to be among the most stable candidates in some fish species (Jorgensen et al., 2006; Filby and Tyler, 2007; Tang et al., 2007; Small et al., 2008; Yang et al., 2013; Liu et al., 2014). However, in other species, this gene is considered unsuitable for use as an internal control for qRT-PCR (Infante et al., 2008; Hu et al., 2014; Purohit et al., 2016). In O. humensis from control, Roundup <sup>R</sup> , and seawater groups, and across different ages and tissues, 18s generally held the intermediary position in the rankings, attracting little attention for its use in normalization methods. The gapdh gene, as well as actb and 18s, has presented discrepant results in the literature that reports its expression stability. The gapdh has already been reported as one of the most stable genes in halfsmooth tongue sole (Liu et al., 2014). Furthermore, its expression is usually stable depending on the evaluated tissue, being able to occupy both extremities of rankings of stability (Yang et al., 2013; Zhang et al., 2013). So, generally, gapdh is refuted as a suitable gene for an efficient normalization (Jorgensen et al., 2006; Filby and Tyler, 2007; Tang et al., 2007, 2012; Infante et al., 2008; Small et al., 2008; Ahi et al., 2013; Liman et al., 2013; Hu et al., 2014; Chapman and Waldenström, 2015). In the present study, gapdh appeared to be the second most unstable gene, only after atp1a gene.

The tuba gene usually presents high stability in studies on validation of candidates RGs. This gene was the second most stable after analysis by dCt and NormFinder in Nile tilapia under normal conditions (Yang et al., 2013). This gene also exhibited high stability in intestine and liver from Japanese flounder at 24 and 72 h after viral infection, respectively, as well as in muscle of turbot after 72 h postinfection (Zhang et al., 2013). Even with tuba gene presenting higher stability in the brain of silversides from the control group, this gene did not maintain this expression pattern. In the brain of silversides from other treatment groups, tuba had low stability levels. Across other tissues and different developmental stages, the tuba gene frequently occupied the last places of the final ranking, even behind atp1a, as well as across treatment groups and different algorithms used. The final consensus comprehensive ranking constructed by RefFinder presented tuba as the most unstable gene in O. humensis, disregarding atp1a. Finally, the control gene atp1a, despite being a constitutive gene, was the most unstable, as expected. It occupied the last position in the rankings of stability constructed by all algorithms used to compare the three treatment groups. This gene also exhibited lower stability across different ages and tissues from silversides from all treatments, except in the hepatopancreas of the Roundup <sup>R</sup> and seawater groups.

In the present study, a set of candidates of RGs was analyzed for the first time in O. humensis. The genes atp1a, tuba, gapdh, and 18s were the least stable and highly unsuitable to be used for qRT-PCR normalization in studies in silverside O. humensis. The three most stable genes were h3a, ef1a, and actb. The discordance observed among the results of

#### REFERENCES


analyses of the candidate RGs in different fish species, tissues, developmental stages, and experimental conditions, it is evident the importance of identification of efficient RGs for each experimental design instead of the indiscriminate use of traditional genes. Furthermore, the gene h3a exhibited a greater potential to be used as a RG in O. humensis in comparison with other traditional genes, such as actb and ef1a. Perhaps the h3a gene had the same potential to be used as the best RG in other fish species, since the stability of this gene has never been evaluated in other fishes. Therefore, future studies warrant the evaluation of h3a as candidate RG in other fish species, besides O. humensis or silversides.

### AUTHOR CONTRIBUTIONS

TS and VC are responsible for experimental design, data analysis, manuscript writing. TS, MR, and RR were responsible for fish acclimation and maintenance. TS, BB, IL, LS, MR, RR, and WD were responsible for the biological collections. TS, BB, IL, LS, WD, and VC were responsible for the molecular biology, from RNA extraction to sequencing and qRT-PCR analysis. TC and FS were also responsible for qRT-PCR analysis. AV, CC, and DM were responsible for the flow-cytometry analysis.

#### FUNDING

This study was supported by the Ministério da Ciência, Tecnologia e Inovação/Conselho Nacional de Desenvolvimento Científico e Tecnológico (Edital Universal #472210/2013-0 and #422292/2016-8) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (AUXPE 2900/2014). TS, WD, MR, and DM are individually supported by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior. AV, CC, TC, FS, RR, and VC are also individually supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico.

#### ACKNOWLEDGMENTS

We are greatly thankful to Mr. Nilton Link and Mr. Vítor Colvara for the help in the maintenance of animals.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00075/full#supplementary-material

variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets. Cancer Res. 64, 5245–5250. doi: 10.1158/0008-5472.CAN-04-0496

Armesto, P., Campinho, M. A., Rodríguez-Rúa, A., Cousin, X., Power, D. M., Manchado, M., et al. (2014). Molecular characterization and transcriptional regulation of the Na+/K<sup>+</sup> ATPase α subunit isoforms during development and salinity challenge in a teleost fish, the Senegalese sole (Solea senegalensis). Comp. Biochem. Physiol. B Biochem. Mol. Biol. 175, 23–38. doi: 10.1016/j.cbpb.2014. 06.004


a test on multiple genes and tissues in a model ascidian Ciona savignyi. Gene 576, 79–87. doi: 10.1016/j.gene.2015.09.066


exposure to pejerrey, Odontesthes bonariensis, a South American teleost fish. Environ. Toxicol. Chem. 31, 941–946. doi: 10.1002/etc.1789


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Silveira, Domingues, Remião, Santos, Barreto, Lessa, Varela Junior, Martins Pires, Corcini, Collares, Seixas, Robaldo and Campos. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Unrevealing Parasitic Trophic Interactions—A Molecular Approach for Fluid-Feeding Fishes

Karine O. Bonato, Priscilla C. Silva and Luiz R. Malabarba\*

Laboratório de Ictiologia, Programa de Pós-Graduação em Biologia Animal, Departamento de Zoologia, Instituto de Biociências, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil

Fish diets have been traditionally studied through the direct visual identification of food items found in their stomachs. Stomach contents of Vandeliinae and Stegophilinae (family Trichomycteridae) parasite catfishes, however, cannot be identified by usual optical methods due to their mucophagic, lepidophagic, or hematophagic diets, in such a way that the trophic interactions and the dynamics of food webs in aquatic systems involving these catfishes are mostly unknown. The knowledge about trophic interactions, including difficult relation between parasites and hosts, are crucial to understand the whole working of food webs. In this way, molecular markers can be useful to determine the truly hosts of these catfishes, proving a preference in their feeding behavior for specific organisms and not a generalist. Sequences of cytochrome oxidase subunit 1 (COI) were successfully extracted and amplified from mucus or scales found in the stomach contents of two species of stegophilines, Homodiaetus anisitsi, and Pseudostegophilus maculatus, to identify the host species. The two species were found to be obligatory mucus-feeders and occasionally lepidophagic. Selection of host species is associated to host behavior, being constituted mainly by substrate-sifting benthivores. Characiformes are preferred hosts, but host choice depends on what characiform species are available in their environments, usually corresponding to the most abundant species. This is the first time that host species of parasitic fishes bearing mucophagous habits are identified, and demonstrates the effectiveness of the extraction and amplification of mitochondrial DNA from the ingested mucus in gut contents. The molecular markers effectively allowed determine parasite preferences and helps in better understanding the food web and trophic interaction on which fish species are involved. Despite, the methodology applied here can be used for an infinitive of organisms improving ecological trophic studies.

Keywords: annealing blocking primer, DNA barcode, food webs, parasite-host interaction, stegophilinae, vandeliinae

# INTRODUCTION

The role of parasites in food webs has been largely disregarded (Sukhdeo, 2012). As an example, Winemiller and Polis (1996) was the most important contribution of food web studies but not approached the parasite-host interaction as part of the trophic webs. Since Elton insights (Elton et al., 1931) it is known that parasites are very important links in the food webs and are capable of externing major effects on ecological interactions. Today, there is no longer the need to argue that

#### Edited by:

Roberto Ferreira Artoni, Ponta Grossa State University, Brazil

#### Reviewed by:

Gonzalo Gajardo, University of Los Lagos, Chile Igor De Paiva Affonso, Universidade Tecnológica Federal do Paraná, Brazil

> \*Correspondence: Luiz R. Malabarba malabarb@ufrgs.br

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Ecology and Evolution

> Received: 19 August 2017 Accepted: 26 February 2018 Published: 15 March 2018

#### Citation:

Bonato KO, Silva PC and Malabarba LR (2018) Unrevealing Parasitic Trophic Interactions—A Molecular Approach for Fluid-Feeding Fishes. Front. Ecol. Evol. 6:22. doi: 10.3389/fevo.2018.00022 parasites must be included in all models of ecosystem function (Sukhdeo, 2012). However, many questions remain as how parasites might fit in food webs, if it should be included or excluded in the food webs, what is the role of parasites in host population regulation, and what are the evolutionary and ecological implications of parasite mediation in trophic interactions (Sukhdeo and Hernandez, 2005). Nevertheless, small alteration in the position of parasites and host can change the food chain length, connectance, and the establishment of food patterns (Huxham et al., 1995; Leaper and Huxham, 2002).

Indeed, only dietary studies allow the comprehension of the trophic interactions and of the dynamics of food webs specific for fishes in aquatic systems (Winemiller and Polis, 1996; Carreon-Martinez and Heath, 2010; Murray et al., 2011). For these purposes, food items are traditionally identified through visual analysis of stomach, gut, or fecal contents under microscope (Hyslop, 1980; Taguchi et al., 2014) and through stable isotope analysis (DeNiro and Epstein, 1978, 1981; Post, 2002; Mercado-Silva et al., 2015). None of these methods, however, allow the identification of items that are easily digested or of specific parasitic-host (or predator-prey) interactions, which may cause a bias in data interpretation (Sheppard and Harwood, 2005; Paquin et al., 2014). This may be especially problematic when dealing with small species (Sheppard and Harwood, 2005; Jo et al., 2014; Paquin et al., 2014).

Since the 1940's many methods of diet analysis have been developed with a recent use of molecular tools (Symondson, 2002; King et al., 2008; Hardy et al., 2010; Pompanon et al., 2012; Taguchi et al., 2014). Molecular techniques are efficient for the identification of small preys or digested items that cannot be identified through traditional methodologies. It is also presumed to allow a higher taxonomic resolution (Carreon-Martinez et al., 2011) through the use of sequences of specific regions of the mitochondrial genome to identify any species (DNA Barcode). In most animal groups, the cytochrome oxidase subunit 1 (COI) is the reference for the DNA barcoding system (Hebert et al., 2003). COI sequences libraries are available through on-line systems (such as GenBank, Bold), enabling its use in the identifications of host or prey species (Valentini et al., 2009; Corse et al., 2010; Leray et al., 2013a; Jo et al., 2014). DNA barcode techniques have been used to determine host-parasitoids webs in arthropods (Hrcek and Godfray, 2014 ˇ ), as in Lepidoptera (Janzen et al., 2009) and Hemiptera (Gordon and Weirauch, 2015). However, in fish species the establishment of parasitic interaction has been reported as difficult (Sazima, 1983; Lima et al., 2012). According to Paine (1980) "The central significance of webs is derived from the fact that the links between species are often easily identified and the resultant trophic scaffolding provides a tempting descriptor of community structure." Thus, the no comprehension of the parasite/host interaction leads to the impossibility of understanding food webs and the trophic ecology at the ecosystems.

The Neotropical catfish family Trichomycteridae is composed of eight subfamilies, two of which consist exclusively of fishes referred to as parasites, the Stegophilinae ,and Vandeliinae (de Pinna, 1998, 2016; Datovo and Bockmann, 2010; Ferrer and Malabarba, 2013). The vandeliines feed exclusively on blood from the gills of other fishes (Kelley and Atz, 1964; Machado and Sazima, 1983) while the stegophilines have been reported as scale and mucus-feeders (Eigenmann and Allen, 1942; Roberts, 1972; Baskin et al., 1980; Machado and Sazima, 1983; Winemiller and Yan, 1989; Neto and de Pinna, 2016). The feeding habits of the representatives of a third subfamily, the Tridentinae, remains uncertain, but there are circumstantial records of semiparasitic (scale-eating) or predation of small invertebrates among its species. Certainly, their biology is key to understanding the evolution of parasitic feeding behavior of stegophilines and vandeliines (de Pinna, 2016).

The species of the stegophiline genus Homodiaetus are small (maximum 42.0 mm of standard length) and translucent when alive, except the head and abdominal region (Koch, 2002). Homodiaetus anisitsi Eigenmann and Ward, 1907 (**Figure 1**) is found in lakes and rivers in the lower Paraná-Paraguay System and coastal river drainages of the Rio Grande do Sul State, Brazil (Koch, 2002). The stegophiline Pseudostegophilus maculatus (Steindachner, 1879; **Figure 1**) reaches a maximum size of 60.0 mm of standard length and is found in the lower Paraná and Uruguay river basins, South America (de Pinna and Wosiacki, 2003). Specimens of both species inhabit fine grain sandy bottom environments. In aquarium observations, specimens of H. anisitsi remain buried in sandy bed except for the eyes. They quickly move out from the substrate and attach to the side of fishes passing by for mucus and occasionally scale ingestion (LRM pers. obs.).

Host identification of mucophagous, lepidophagous, and hematophagous trichomycterids cannot be made by conventional analysis of the stomach contents due to the nature of the ingested items. This is notably exemplified by Winemiller and Yan (1989) that examined 245 specimens of the stegophiline Ochmacanthus alternus Myers, 1927. They were able to identify the presence of ingested mucus in 95% of the stomachs, but were not able to identify a single hosts species among the 88 fish species found syntopic with this catfish. Likewise, there is no information available about the host species exploited by H. anisitsi, P. maculatus or other stegophilines in natural environments. The same is true for most vandeliines, except for

FIGURE 1 | Studied species of Stegophilinae. (a) Homodiaetus anisitsi, 32 mm standard length, specimen from Lagoa dos Quadros, rio Tramandaí basin, Rio Grande do Sul State, Brazil. (b) Pseudostegophilus maculatus, 38 mm standard length, specimen from rio Ibicuí, rio Uruguay basin, Rio Grande do Sul State, Brazil. Arrows indicate the digestive tract, visible by transparency as a relatively straight tube from the mouth to the anus.

a few visual records of specimens observed attached to gill arches of some large fishes (Zuanon and Sazima, 2005) or experiments in aquaria or observation on large fishes tied to river banks (Machado and Sazima, 1983; Zuanon and Sazima, 2004).

Considering mucophagous or hematophagous catfishes, molecular techniques provide an alternative to determine the origin of the food items ingested by mucus-, scale- and bloodfeeders, being considered as a powerful tool for studying feeding ecology (King et al., 2008). However, this tool has not been used yet to identify the species to which belong these items found in the stegophiline stomachs. Therefore, our hypothesis is that molecular markers can be useful to determine the truly hosts of these catfishes, proving a preference in their feeding behavior for specific organisms and not a generalist. Notwithstanding, the aims of this study are: (1) to verify the viability of DNA extraction and amplification of the cytochrome oxidase subunit 1 gene (COI) from the mucus and occasionally scales ingested by H. anisitsi and P. maculatus; (2) the identification of their hosts, and (3) establishment of specific parasitic-host interactions.

### METHODS AND MATERIALS

#### Ethics Statement

This study was approved by the Animal Ethics Committee of the Universidade Federal do Rio Grande do Sul (Permit Number: 24434) and was conducted in accordance with protocols in their ethical and methodological aspects for the use of fish.

#### Analysis

Sixty three specimens of H. anisitsi and 18 specimens of P. maculatus (**Tables 1**, **2** and Supplementary Table 1) were analyzed and vouchers cataloged in the fish collection of the Departamento de Zoologia, Universidade Federal do Rio Grande do Sul (UFRGS; CGEN Process # 02000.002101/2007-25) (Supplementary Table 2).

The digestive tract of H. anisitsi and P. maculatus is a relatively straight tube from the mouth to the anus (**Figure 1**) with a very thin wall. The entire digestive tract was opened and all the content was examined under a stereomicroscope to identify the food items. Immediately after that, the ingested items and a tissue sample of the dissected specimens were moved to separate and empty sterilized tubes of 0.2 µl to proceed with the DNA extraction. DNA was extracted from all stomach contents using the "Phire Animal Tissue Direct PCR Kit" by Thermo Scientific, following manufacturer instructions. Extracted DNA samples were then submitted to two different protocols for gene amplification.

The PCR reactions were carried out in a reaction volume of 20 µL [12.9 µL of H20, 2 µL of 10 × reaction buffer (Platinum <sup>R</sup> Taq), 0.6 µL of MgCl2 (50 mM), 2 µL of dNTPs (2 mM), 0.2 µL of each primer (10µM), 2.0 µL of the blocking primer, 0.1 µL (5 U) of Platinum <sup>R</sup> Taq (Invitrogen), and 100 ng of template DNA. A versatile PCR primer (mlCOIintF/jgHCO2198 Geller et al., 2013; Leray et al., 2013b) was used to amplify a 313 bp region of the mitochondrial COI region. In H. anisitsi samples, this primer was associated to a parasite-specific annealing blocking primer (blocking primer sequence: 5′ CGAARAATCARAAYARRTGTTG3SpC3 3 ′ ). Because the predator DNA co-amplification is known to inhibit prey DNA detection in stomach contents (Vestheim and Jarman, 2008), the blocking primer was included at ten times the concentration of versatile primers (Leray et al., 2013b). The PCR cocktail and touchdown temperature profile used in this study follow Leray et al. (2013b).

The quantification of the amplified gene was carried out in agarose gel using a Low Mass Ladder 100 pb by Ludwig Biokee as comparison parameter. The PCR products were purified by enzymatic methods (ExoSap) and sequencing was performed on a sequencing platform of Actgene (Porto Alegre, Brazil) and Laboratory of analytical biology at National Museum of Natural History, Smithsonian Institution (Washington DC). The chromatogram qualities of the generated sequences were checked in MEGA 6.0 (Tamura et al., 2013), and were aligned using Clustal W (Higgins et al., 1994).

The sequences obtained from stomach contents were compared to the GenBank on-line platform; and to local Barcode inventories (sequences deposited in Genbank; access numbers in Supplementary Table 2). The genetic distance between the stomach contents and the local Barcode inventories were estimated using p-distance in MEGA 6.0 (Tamura et al., 2013) software using as molecular evolution model Kimura-2 parameters.

#### RESULTS

Stomachs of three specimens of H. anisitsi contained sand only, and 15 stomachs of H. anisitsi and eight of P. maculatus were empty, being not submitted to DNA extraction. The stomachs of 42 specimens of H. anisitsi and 10 specimens of P. maculatus contained ingested items identified as mucus and occasionally scales, sometimes associated with sand (**Tables 1**, **2** and Supplementary Table 1). DNA was successfully extracted from gut contents of all these specimens, but amplification gave positive results only for 10 specimens of H. anisitsi and 7 specimens of P. maculatus.

The BLAST search (GenBank) based on the sequences obtained from stomach contents generated identifications with values of identity ranging from as low as 80% and up to 100%. Values of BLAST identity lower than 98% were not considered (Supplementary Table 1), since it generates spurious identifications, including fish species absent in the hydrographic regions sampled (e.g., Astyanax altiparanae) or even marine species of Balistidae and Apogonidae. Further comparison to local barcode data allowed a second and independent identification, allowing the confirmation of the identity of the species spoiled by these stegophilines, or identification of species whose sequences are not available in GenBank (**Tables 1**, **2**).

The COI sequences amplified from stomach contents resulted always in the identification of a single species per stomach. Six host species were identified in the contents of eight stomachs in H. anisitsi (**Table 1**), five freshwater species belonging to TABLE 1 | Results of the identification of the collected samples from stomach contents of Homodiaetus anisitsi.


Sequences were compared to GenBank (BLAST identity), to local Barcode data (p distance) and to the parasitic catfish COI sequence. UFRGS, catalog number of vouchers in the Fish Collection, Universidade Federal do Rio Grande do Sul; SL, standard length in mm.

TABLE 2 | Results of the identification of the collected samples from stomach contents of Pseudostegophilus maculatus.


Sequences were compared to GenBank (BLAST identity), to local Barcode data (p distance) and to the parasitic catfish COI sequence. UFRGS, catalog number of vouchers in the Fish Collection, Universidade Federal do Rio Grande do Sul; SL, standard length in mm.

the order Characiformes (six stomachs) and one species of the diadromous mugilid Mugil liza (two stomachs). Two host species were identified in the contents of eight stomachs in P. maculatus (**Table 2**), both belonging to the freshwater order Characiformes (four stomachs).

Both species were mucophagous and lepidophagous, and parasitized organisms that live close to the bottom or that feed on the bottom.

#### DISCUSSION

The analysis of guts contents of H. anisitsi and P. maculatus suggests they are obligatory mucus feeders, as also observed in the stegophiline O. alternus by Winemiller and Yan (1989). This diet is possible by the morphological specializations shared by the members of the subfamily Stegophilinae that possess a wide and ventral mouth with numerous teeth in both jaws, arranged in several rows (Baskin et al., 1980). This allows mucus scraping of their hosts and may function as a sucker (Winemiller and Yan, 1989).

A scale-feeding behavior has been also described for the stegophilines (Eigenmann and Allen, 1942; Roberts, 1972; Baskin et al., 1980; Machado and Sazima, 1983; Winemiller and Yan, 1989), but we found lepidophagy in the two studied species to be non-obligatory and related to the body size and consequently scale size of the host species (Supplementary Table 1). Even though we do not have information on the body size of the hosts whose mucus were found in the stomachs (adult size or juveniles), none of the stomachs containing mucus of the largest host species (M. liza and Prochilodus lineatus) showed the presence of scales (**Figures 2**, **3**). Instead, scales were found associated with DNA of Bryconamericus and Cyphocharax, both with smaller body sizes (**Figures 2**, **3**).

It is remarkable the preference for characiforms as hosts in the two stegophiline species. In a recent inventory of the freshwater fish species from the rio Uruguay, rio Tramandaí and laguna dos Patos drainages, that embraces the areas of collection of the samples studied herein, Siluriformes was found to be the dominant group, corresponding to 42% of the species found in these drainages (Bertaco et al., 2016), but none correspond to the stomach contents. Instead, except for two stomachs containing one species of mullets (Mugilidae), all stomachs contained species of Characiformes, that corresponds to nearly 1/4 (28%) of the freshwater species found in these drainages (Bertaco et al., 2016). In addition and further supporting the preference for characiforms, several species of Siluriformes and Cichlidae (Cichliformes) were collected along with the specimens of H. anisitsi or P. maculatus analyzed here (Supplementary Table 3), but were not found in their stomachs.

Host species of H. anisitsi varied according to the locality of collection, and seems to be related to hosts that live close to the bottom or that feed on the bottom, and to the species available. M. liza is very abundant in coastal lagoons of southern Brazil (Reis and D'Incao, 2000), including the locality sampled in the rio Tramandaí drainage, being found in two stomachs of H. anisitsi. The second species found in stomach contents in this drainage is Cyphocharax voga, one of the five most abundant species in these coastal lagoons (Schifino et al., 2004). Both host species are detritivorous and edentulous (Dualiby, 1988; Oliveira and Soares, 1996; Corrêa and Piedras, 2008). In the laguna dos Patos drainage the two identified stomach contents included scales of Bryconamericus iheringii, a characid species with inferior mouth that feeds in the substrate (Orciolli and Bennemann, 2006) and is very abundant in creeks of this drainage (Corrêa et al., 2015). In the rio Uruguay drainage, three host species were found (Astyanax jacuhiensis, Astyanax fasciatus, and Acestrorhynchus pantaneiro), but in this case none have feeding habits associated to bottom feeding. Instead, A. lacustris and A. fasciatus normally inhabit the column of water and surface and can inspect the bottom for feeding (Casatti et al., 2001).

A similar result was obtained from P. maculatus, being host species related to host behavior and to the species availability. P. lineatus, the most common host found in three stomachs, feeds on algae and detritus grasping the substrate (Fugi et al., 2001). Piabarchus stramineus found in one stomach inhabits the water column and can inspect the bottom for feeding (Casatti et al., 2003; Brandão-Gonçalves et al., 2009). Likewise, most hosts of H. anisitsi, P. lineatus is very abundant in the Rio de la Plata basin,

including the rio Uruguay, and dominates the biomass, being the target of the principal freshwater fishery and is the main prey item for large predatory fish (Speranza et al., 2013).

When studying trophic ecology, one of the most difficult concepts is the classification of organisms as specialists or generalists. Parasites are usually highly host-specific (Sukhdeo and Hernandez, 2005), but the problem is why some species appear to not specialize as consistently on a single host species (Thompson, 1982). In the best revision about this approach Sukhdeo and Hernandez (2005) indicated that phylogenetic history as morphologic characters and feeding behaviors of both players are able to promote parasite-host specificity. Authors argue that if you base your definition of specificity on the number of higher taxa on which a parasite feeds it allows inferences about the extent to which a parasite is tracking it host taxon phylogenetically. Our results suggest that these two species of stegophilines show host-specificity, because they boot feed mainly on species belonging to the same order. The study of the feeding behavior of other stegophilines may test if the evolution of mucous and scale feeding behavior of these catfishes has been associated to a coevolutionary interaction with the order Characiformes.

This is the first time the mucous found in the stomachs of mucophagic species were identified through DNA extraction and sequencing, and this approach may be used to any parasite/host interactions in any group of organisms. It can also test hypotheses based on empiric observations that lack testing support. For example, Sazima (1983) reports the characid fish Probolodus heterostomus as lepidophagous and associates the scales found in its stomach with the shoal that this species is mimetic. Since then, no study has been done to determine the origin of scales commonly found in the stomachs of P. heterostomus: are them from its shoal or can be originated from other species, preferentially not belonging to the shoal?

In other examples, Lima et al. (2012) reports a behavior of "mutilating predation" for Odontostilbe pequira preferentially attacking Leporinus friderici, but do not show any support as molecular identification of food items. Based only in field observations of the putative attack, O. pequira is currently classified as omnivore. Other problem is related to the determination of species in piscivorous fish diets. In most cases it is possible to determine only the genus or family of prey by bone remains of semi digested preys in the stomachs due the similarity of structures or the stage of fragment digestibility (Hansel et al., 1988). In such cases, the DNA identification of hosts or preys is the best alternative to solve these questions, and can be applied to any parasite/host or predator/prey interaction.

Thus, in our case, regardless the relative small samples from which stomach contents were positively identified through DNA extraction, some patterns are clearly discernible in the two stegophiline species: (1) they are truly parasites once they are obligatory mucophagous and lepidophagous; (2) Characiformes are preferred hosts event thought they are not the most abundant order in their environments; (3) host choice is related to the habits of hosts, choosing those that feed and live in or near the bottom. The unanswered question now is: why the preferential choice for Characiformes?

Finally, the accurate analysis of parasite/host interactions of the stegophilines allowed us to better understand their function on the freshwater ecosystem where these are inserted. From this point, we conclude that molecular helps in reveal any trophic interaction, including the identification of food items not allowed by usual methods, permitting a whole comprehension of the interactions inside of the food webs and any trophic ecology of the system.

# AUTHOR CONTRIBUTIONS

KB, PS, and LM: Conceived and designed the experiments; KB and PS: Performed the experiments; KB, PS, and LM: Analyzed the data; PS and LM: Contributed reagents, materials, analysis tools; PS: Drawn the blocked primer; KB, PS, and LM: Wrote the paper; KB, PS, and LM: Reviewing the manuscript.

#### ACKNOWLEDGMENTS

Authors are grateful to MC Malabarba and two reviewers for critically reading the manuscript and providing helpful suggestions; to UFRGS laboratory team that collected the analyzed samples; to Lee Weigt and Jeff Hunt for all support to PS at LAB, NMNH, Smithsonian Institution; to Mathieu Leray that helped designing the blocking primer. This research is supported

#### REFERENCES


by CNPq (process # 307890/2016-3 and 401204/2016-2 to LM; 205678/2014-9 and 150956/2017-7 to PS).

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fevo. 2018.00022/full#supplementary-material

past progress on trichomycterid phylogenetics. Neotrop. Ichthyol. 14:e150127. doi: 10.1590/1982-0224-20150127


inventory of complex tropical biodiversity. Mol. Ecol. Resour. 9, 1–26. doi: 10.1111/j.1755-0998.2009.02628.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Bonato, Silva and Malabarba. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Fishing Into the MicroRNA Transcriptome

Marcos E. Herkenhoff <sup>1</sup> , Arthur C. Oliveira<sup>1</sup> , Pedro G. Nachtigall <sup>1</sup> , Juliana M. Costa<sup>1</sup> , Vinicius F. Campos <sup>2</sup> , Alexandre W. S. Hilsdorf <sup>3</sup> and Danillo Pinhal <sup>1</sup> \*

<sup>1</sup> Laboratory of Genomics and Molecular Evolution, Department of Genetics, Institute of Biosciences of Botucatu, Sao Paulo State University, Botucatu, Brazil, <sup>2</sup> Laboratory of Structural Genomics (GenEstrut), Graduate Program of Biotechnology, Technology Developmental Center, Federal University of Pelotas, Pelotas, Brazil, <sup>3</sup> Unit of Biotechnology, University of Mogi das Cruzes, Mogi das Cruzes, Brazil

#### Edited by:

Roberto Ferreira Artoni, Ponta Grossa State University, Brazil

#### Reviewed by:

Diogo Teruo Hashimoto, Universidade Estadual Paulista Júlio de Mesquita Filho (UNESP), Brazil Adalberto Luis Val, Brazilian National Institute for Research in the Amazon, Brazil

> \*Correspondence: Danillo Pinhal dlpinhal@ibb.unesp.br

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 18 October 2017 Accepted: 02 March 2018 Published: 19 March 2018

#### Citation:

Herkenhoff ME, Oliveira AC, Nachtigall PG, Costa JM, Campos VF, Hilsdorf AWS and Pinhal D (2018) Fishing Into the MicroRNA Transcriptome. Front. Genet. 9:88. doi: 10.3389/fgene.2018.00088 In the last decade, several studies have been focused on revealing the microRNA (miRNA) repertoire and determining their functions in farm animals such as poultry, pigs, cattle, and fish. These small non-protein coding RNA molecules (18–25 nucleotides) are capable of controlling gene expression by binding to messenger RNA (mRNA) targets, thus interfering in the final protein output. MiRNAs have been recognized as the main regulators of biological features of economic interest, including body growth, muscle development, fat deposition, and immunology, among other highly valuable traits, in aquatic livestock. Currently, the miRNA repertoire of some farmed fish species has been identified and characterized, bringing insights about miRNA functions, and novel perspectives for improving health and productivity. In this review, we summarize the current advances in miRNA research by examining available data on Neotropical and other key species exploited by fisheries and in aquaculture worldwide and discuss how future studies on Neotropical fish could benefit from this knowledge. We also make a horizontal comparison of major results and discuss forefront strategies for miRNA manipulation in aquaculture focusing on forward-looking ideas for forthcoming research.

#### Keywords: aquaculture, farm animals, teleost fish, gene expression, microRNAs

# INTRODUCTION

MicroRNAs (miRNAs) are a class of small (17–22 nucleotides), non-coding RNAs that inhibit gene expression post-transcriptionally by pairing with complementary sequences in their target mRNA (**Box 1**). These small regulatory molecules are present in the genome of animals, plants, and even some viruses (Lee et al., 1993; Bartel, 2004; Kim et al., 2009; Xia et al., 2011). The first miRNAs, lin-4 and let-7, were both discovered in Caenorhabditis elegans (Lee et al., 1993; Wightman et al., 1993; Reinhart et al., 2000) and have subsequently been found to correspond to a novel and extensive class of small non-coding RNAs (Lagos-Quintana et al., 2001; Lau et al., 2001; Lee and Ambros, 2001).

#### BOX 1 | miRNA biogenesis and action.

The miRNA genes are firstly transcribed in the nucleus by RNA polymerase II to generate primary miRNAs (pri-miRNAs). In the canonical pathway, the pri-miRNAs bend into hairpins and are processed by the RNase III Drosha to form precursor miRNAs (pre-miRNAs) with around 70 nt, characterized by the stem-loop structure (Lee et al., 2003; Kim, 2005). The pre-miRNA is transported from the nucleus to cytoplasm by Exportin-5 and is recognized and cleaved by the RNase III Dicer to form a double-stranded RNA with 22 nt (Hutvágner et al., 2001; Lund et al., 2004). The double strands unbind and one of the strands enters in the RNA-induced silencing complex (RISC), becoming a mature miRNA that can exercise its gene silencing function, whereas the other strand is generally degraded or, in some cases, can also form an RISC (Khvorova et al., 2003).

In addition to the canonical pathway, there are alternative pathways that can promote miRNA biogenesis. One of them is the "Drosha/DGCR8 independent pathway" (Ha and Kim, 2014). In this process, the pre-miRNA (also called mirtron) comprises the whole intronic region of a gene. Thus, when the spliceosome cleaves the transcript, the pre-miRNA is released, and its biogenesis follows the standard pathways. The "TUTase-dependent pathway" (Ha and Kim, 2014) is another type of alternative processing of primiRNAs. In this pathway, TUT family proteins binds to pre-miRNAs with only one nucleotide overhang in the 3' region and adds a uridine in its 3' end, promoting the two-nucleotide overhang, important for the correct miRNA maturation. Another type of non-canonical pathway is the "Dicer-independent pathway" (Ha and Kim, 2014). During this process, AGO2 proteins bind directly to the pre-miRNA, and along with PARN ribonucleases, promote the cleavage and final maturation of the miRNA.

All the aforementioned pathways lead to miRNA maturation and subsequent miRNA activity occurs through miRNA-RISC complex binding mainly to the 3′ UTR region of target mRNA, inhibiting the expression or degrading the mRNA transcript (Lee and Dutta, 2009). Besides the 3' UTR region, non-canonical miRNA-to-target interactions were described to occur in binding sites located inside exons (Reczko et al., 2012; Hausser et al., 2013) orin the 5' UTR region (Devlin et al., 2010; Zhou and Rigoutsos, 2014). It has been estimated that each miRNA may modulate hundreds of messenger RNAs and a single messenger RNA may have its stability or translation controlled by several miRNAs (Doench and Sharp, 2004; Brenneck et al., 2005; Lim et al., 2005).

Since then, numerous miRNAs have been identified and quantified in several organisms, primarily as a result of numerous enhancements in high throughput sequencing technologies, bioinformatics computer programs and experimental methods (Guerra-Assumpção and Enright, 2012). Together, these approaches have enabled large-scale analyses, thereby facilitating the discovery of species-specific and novel miRNAs, even those with reduced expression, in many organisms belonging to diverse taxa (Berezikov, 2011).

Owing to the key involvement of miRNAs in the regulation of growth, metabolism, and homeostasis, among hundreds of other functions, diverse studies over the last decade have sought to identify genuine miRNA-to-target interactions in farm animals, such as poultry, pigs, cattle (reviewed by Wang et al., 2013), and fish (Rasal et al., 2016), to maximize production (**Box 2**). In addition, several patents have been approved in various countries for the commercial exploration of miRNAs, e.g., a US patent (US 2006/0246491 A1) using miRNAs to regulate muscle cell growth.

Here, we review the available data on miRNA identification and functional characterization in the most important globally farmed fish species, with an emphasis on Neotropical fish, and

#### BOX 2 | Finding miRNA putative targets.

Searching for a true miRNA-mRNA interaction is a difficult task. Despite the availability of several computational target prediction tools, their results are generally inconsistent and often return variable sets of possible miRNA targets. This happens mainly because the rules governing miRNA target recognition are not completely understood and may vary for each miRNAtarget interaction (Ritchie and Rasko, 2014). Moreover, several popular tools such as DIANNA-microT (Maragkakis et al., 2009), miRanda-miRSVR (Betel et al., 2010), and miRWalk (Dweep et al., 2011) were designed exclusively for predicting miRNAs in mammals. Other tools, such as TargetScan (Garcia et al., 2011), miRanda (Enright et al., 2003), PITA (Kertesz et al., 2007), and RNA22 (Miranda et al., 2006) algorithms are frequently used as target prediction tools for animal miRNAs (Witkos et al., 2011; Reyes-Herrera and Ficarra, 2012; Dweep et al., 2013; Peterson et al., 2014).

Once there are several options of target prediction tools available in the literature, researchers often look up for which tool or combination of tools allows for the best quality results. Oliveira et al. (2017) shows that the union of the results provided by TargetScan, miRanda and RNA22 provide a better performance (balance of sensitivity and specificity) than any of these tools alone or other combination of them. However, it is important to keep in mind that any target prediction tool or combinatory use may still return false-positive results.

Since the target prediction tools yet lack consistency, once putative targets have been selected, validation experiments must be performed to confirm if the interaction is genuine. The majority of strategies for miRNA-to-target validation rely on the overexpression and/or knockdown of the miRNA of interest and corresponding surveillance of the effect at the mRNA/protein levels of predicted target genes. To evaluate the mRNA and/or protein expression profiles associated with the manipulation of a specific miRNA, several functional analyses have been widely applied (Table 1).

discuss the current challenges and future directions for the use of miRNAs in aquaculture.

#### MiRNAs IN ANIMAL BREEDING

Since miRNAs are key elements in gene regulation, they have been the focus of studies on improving the health and productivity of farm animal species. Diverse studies have been conducted in cattle (Townley-Tilson et al., 2010; Kozomara and Griffiths-Jones, 2011; Li H. et al., 2011; Miretti et al., 2011), pigs (Cho et al., 2010; Cirera et al., 2010; Xie et al., 2010; Li G. et al., 2011; Chen et al., 2012), and birds (Hicks et al., 2008, 2009; Li T. et al., 2011; Yao et al., 2011; Wang et al., 2012b). These studies have reported that miRNAs play an important role in a wide range of biological pathways, such as growth, metabolism, immunology, muscle development, and fat deposition (**Figure 1**). Therefore, these small RNAs are of great value in animal breeding, steering research toward the development of new solutions for current and emerging threats to the health and welfare of farm animal species.

Reports aimed at the identification and functional analysis of miRNAs in fish have increased recently, allowing for the characterization of the miRNA repertoire in Neotropical fish and in other prominent fish species in the aquaculture industry worldwide (**Table 1**).

# A REVIEW OF MiRNAs IN FARMED FISHES

#### Neotropical Fish

Neotropical fish encompass 50% of all freshwater species that are currently identified, which includes ∼13,000 characterized species (Reis et al., 2003; Bertollo et al., 2017). These data highlight the great biodiversity of this group of fishes and, particularly, of those of the Neotropical region. However, despite the growing interest in the study of Neotropical fishes, resources describing miRNA diversity and function are still very limited. Below, we provide a picture of the current knowledge of miRNAs of Neotropical species.

#### Tambaqui (Colossoma macropomum, Characidae)

The black pacu or tambaqui (Colossoma macropomum) can reach up to 1 m in length and ∼30 kg in weight and is a main resource of aquaculture and fisheries along the Amazon (Goulding and Carvalho, 1982; IBAMA, 2000). Gomes et al. (2017) characterized the miRNA expression profiles of liver and skin tissues of the Amazon tambaqui. They identified 279 conserved miRNAs in this species, with 257 from the liver and 272 from the skin and with several miRNAs expressed in both tissues. A functional enrichment analysis was performed for the targets of the 10 most highly expressed miRNAs of each tissue, revealing enriched biological pathways associated with metabolism, cell proliferation, calcium/ion transport and carbohydrate kinase activity in the liver as well as regulation of transcription and metabolic processes in the skin.

miR-122, which is highly expressed in the liver of tambaqui (Gomes et al., 2017), is conserved among vertebrate species and is known to be integrated with the regulation of cholesterol metabolism. Moreover, studies in rainbow trout corroborate the influence of miR-122 in liver metabolism (Mennigen et al., 2012, 2013). Thus, miR-122 is an interesting regulatory gene that requires further study regarding its impact on fish metabolism and, consequently, in aquaculture production.

#### Pacu (Piaractus mesopotamicus, Characidae)

The pacu (Piaractus mesopotamicus), which is characterized by hardiness, fast growth, adaptation to artificial feeding and



flavorsome meat (Castagnolli and Cyrino, 1986), is a valuable resource in Brazilian aquaculture (Urbinati and Gonçalves, 2005). Duran et al. (2015) and Paula et al. (2017) studied the role of skeletal muscle miRNAs in this Neotropical fish during development and under food restriction, respectively. Duran et al. (2015) analyzed the impact of the miRNA-target interactions of miR-1/hdac4, miR-133-a/b/srf, miR-206/pax7, and miR-499/sox6 in fast- and slow-twitch skeletal muscles during growth. They found that miR-1 and miR-206 may promote myoblast differentiation in fast- and slow-twitch muscles in adult individuals, while miR-133a/b acts earlier, promoting myoblast proliferation in juveniles. Finally, miR-499 may promote differentiation of slow-twitch muscle fibers, which corroborates previous findings in Nile tilapia (Nachtigall et al., 2015). Paula et al. (2017) showed that a short period of food restriction significantly increased the expression of miR-1, miR-206, miR-199, and miR-23a in fast muscle and significantly decreased the expression of miR-1 and miR-206 in slow muscle, while their targets (IGF-1 for miR-1, miR-206, and miR-199; mTOR for miR-199; and MFbx and PGC1a for miR-23a) exhibited negatively correlated expression profiles. Altogether, these data suggest that miRNAs may have a large impact on muscle plasticity and recovery after periods of food restriction.

MiR-1, miR-133a, miR-133b, miR-206, and miR-499 have been shown to be involved in the control of genes related to myoblast proliferation and differentiation. The same miRNAs were also found in the skeletal muscle of other farmed animals and exhibited a highly conserved genomic context (i.e., structural features) among fish species (Nachtigall et al., 2014). These findings underpin the putative applicability of these muscle miRNAs as valuable biomarkers in Neotropical fish breeding. Additional experiments focusing on the characterization of SNPs in miRNAs and target genes from distinct populations exhibiting variable phenotypic traits, such as increased muscle growth and body mass, are required.

#### Midas cichlids (Amphilophus spp., Cichlidae)

The Midas cichlids are Neotropical fishes from Nicaraguan crater lakes. These species are considered to be an excellent model for studying speciation due to their fast parallel sympatric speciation. In contrast to other research that focused on miRNA expression and function, Franchini et al. (2016) studied the role of miRNAs in the diversification of five species of the Midas cichlid lineage (Amphilophus spp.).

Interestingly, by examining the putative effects of mutations of 3'UTR miRNA binding sites on the phenotypic diversification and evolution of these cichlids, the authors found that the species from Lake Apoyo (A. astorquii and A. zaliosus) have undergone less natural selection compared to species from Lake Nicaragua and Lake Managua (A. citrinellus) and Lake Xiloà (A. amarillo and A. sagittae). The impact of variable natural selection in the genome allowed novel mutations in binding sites to accumulate and the regulatory networks to evolve faster in the latter species, thus contributing to novel miRNA-target interactions associated with the phenotypic diversification of Midas cichlids (Franchini et al., 2016). These findings corroborate previous data regarding the "evolutionary race" between miRNAs and targets that have been proposed to influence the evolution of African TABLE 2 | Summary of miRNA associated functions described in each species reviewed in the manuscript.


cichlids (Loh et al., 2011; Brawand et al., 2014), thus indicating the pervasiveness of mutual molecular mechanisms of miRNA functional diversification in fish.

From an aquaculture perspective, such miRNA-target evolutionary mechanisms have selected adaptive traits for a specific environment, which can be tracked to identify economically valuable phenotypes. Moreover, this study reinforces the importance of examining population polymorphisms of both miRNAs and targets while performing fish genetic breeding programs.

#### Other Key Fish Species

Studies of miRNAs in Neotropical fishes are very recent, still providing a limited source of data regarding the impact of these molecules in Neotropical fish biology. Thus, in this section, we describe advances based on investigations of miRNA repertoires from other four fish species of relevance in aquaculture and fisheries worldwide and discuss the putative transferability of the acquired knowledge to Neotropical fishes.

#### Nile tilapia (Oreochromis niloticus, Cichlidae)

Nile tilapia is one of the most commonly farmed species in freshwater aquaculture worldwide. A fast growth rate and adaptation to a wide range of culture conditions are the attributes responsible for successful fish farming (Yue et al., 2016).

Several studies aimed at miRNA characterization have been carried out in this model species. Using bioinformatics, Loh et al. (2011) analyzed the evolution of miRNAs and their target genes in cichlid fishes from Lake Malawi (East Africa) and detected 100 cichlid miRNAs, including those from tilapia, that are highly conserved in metazoan genomes. Brawand et al. (2014), who also studied cichlid miRNAs, described diverse loci in tilapia comprising 7 novel miRNAs. Yan et al. (2012a), using small RNA cloning and Sanger sequencing, discovered 25 conserved miRNAs in tilapia skeletal muscle.1'. Huang et al. (2012) employed next-generation sequencing (NGS) in muscle tissue to explore the complete miRNA transcriptome and identified the expression of 184 known mature miRNA sequences.

The functional roles of miRNAs have also evaluated under variable contexts. Several studies have investigated muscle development because muscle constitutes the major edible part of fish and is therefore an economically important trait. Many miRNAs have been proven to participate in the regulation of muscle growth by controlling the genes involved in hyperplasia and hypertrophy (Lagos-Quintana et al., 2002). Studies that focused on the initial embryonic development have also clarified miRNA targets involved in the tightly regulated initial steps of body formation (Giusti et al., 2016).

In tilapia, Yan et al. (2012a) and Nachtigall et al. (2015) showed that miR-1, miR-133a, and miR-206 have similar expression patterns in adult males and females and may assist each other to accurately control the development of skeletal muscles, although they perform distinct biological functions. MiR-1 is responsible for repressing the expression of histone deacetylase 4 (HDAC4), which is a negative regulator of cellular differentiation and thus promotes myocyte differentiation. MiR-1 also blocks a repressor of the MEF2 (myocyte enhancer factor-2) transcription factor. MiR-133a promotes, in part, myocyte proliferation by repressing serum response factor (SRF) (Chen et al., 2006), whereas miR-206 plays an important role in regulating the differentiation of C2C12 myoblasts in vitro (Kim et al., 2006). In addition, miR-206 loss of function in vivo was shown to significantly improve tilapia growth performance by targeting insulin-like growth factor-1 (IGF-1) (Yan et al., 2013b). IGF-1 is known to play a central role in a complex system that regulates growth, differentiation, and reproduction by selectively promoting mitogenesis and cell differentiation and inhibiting apoptosis (Jones and Clemmons, 1995; Reinecke and Collet, 1998). In many fish species, IGF-1 blood or tissue levels positively correlate with dietary protein levels and body growth rate (Beckman et al., 2004; Carnevali et al., 2006). Therefore, miR-206 could affect tilapia growth by modulating IGF-1 gene expression levels (Yan et al., 2014). Similarly, miR-203b has been shown to promote myogenesis (hyperplastic growth) by targeting MyoD (Yan et al., 2013a), a key protein that initiates the cascade of regulatory events during muscle differentiation. These authors showed that blocking miR-203b results in a significant increase in MyoD expression. In comparison to other approaches, however, using metabolic alterations, Wang D. et al. (2017) showed that miR-223-3p promotes upregulation of pou1f1, a key transcription factor related to somatic growth in Nile tilapia. Pou1f1 binds to the promoter region and transactivates the expression of growth hormone (GH), prolactin (PRL), and somatolactin (SL) in teleost fish. Furthermore, Qiang et al. (2017c) showed that miR-29a antagomir treatment in vivo resulted in stearoyl-CoA desaturase (SCD) upregulation, which plays a role in hepatic lipid metabolism regulation (Dobrzyn and Ntambi, 2004; Ntambi and Miyazaki, 2004).

Other studies have revealed miRNAs regulating metabolic pathways that could indirectly enhance production. For example, there is growing concern about the genetic improvement of salt tolerance in Nile tilapia, which may provide advantages, such as adaptation to certain environmental conditions and higher oxidation. Tilapia are euryhaline fish, and most species can live in a wide range of salinities from freshwater to seawater and therefore are a suitable model organism for studies on ionic and osmotic acclimation in euryhaline teleosts (Deane and Woo, 2004; Wang et al., 2009). For example, miR-30c, a kidney-enriched miRNA, was shown to regulate salt tolerance because its loss of function caused fish to be unable to respond to osmotic stress (Yan et al., 2012b, 2013b). Moreover, osmotic stress transcription factor 1 (OSTF1) was shown to be potentially regulated by miR-429 (Yan et al., 2012c). Recently, Zhao et al. (2016) showed that miR-21 is abundantly expressed and its action modulates alkalinity stress by upregulating VEGFB and VEGFC expression in vivo and in vitro, which are responsible for regulating alkalinity tolerance. Therefore, miR-21, miR-30c, and miR-429 may be important markers for tilapia and also for other commercial species. Nevertheless, Qiang et al. (2017d) concluded that miR-122 plays an important role in regulating the stress response in the lineage of Nile tilapia liver exposed to cadmium.

Another relevant topic for Nile tilapia production involves sex determination and differentiation, because males grow faster and larger than the female. Therefore, several studies have sought to uncover the molecular mechanisms of sex-determination regulated by miRNAs (Eshel et al., 2014; Xiao et al., 2014).

In the study by Xiao et al. (2014), gonads (testes and ovaries) were screened using high-throughput sequencing, and the data showed distinct miRNA expression signatures. In addition, Nile tilapia testes and ovaries displayed miR-181a, miR-181a-5p, miR-143, and miR-143-3p as the most abundant miRNAs. By contrast, the miR-29 and miR-129 families showed significantly increased expression in ovaries compared to testes (Xiao et al., 2014). In humans, the high expression of miR-129 in ovaries is associated with the control of cell growth and differentiation in the final process of ovary maturation through downregulation of its target mRNAs, whereas miR-29 expression levels, which progressively increase throughout oogenesis, may also be important (Sirotkin et al., 2009). In tilapia testes, the most abundantly expressed miRNA families are miR-33a, miR-132, miR-135b, and miR-212 (Xiao et al., 2014). In mammals, miR-212/132 expression is necessary for the development and function of neurons. Furthermore, both miRNAs are associated with mammary gland development by downregulating the matrix metalloproteinase 9 (MMP-9), which is an activator of TGFβ and is involved in cell proliferation, differentiation, and apoptosis (Kubiczkova et al., 2012). miR-33a controls fatty acid regulation in mammals by repressing insulin receptor substrate 2 (Dávalos et al., 2011), and its high expression in tilapia testes may contribute to testes maturation via regulation of the insulin signaling pathway (Xiao et al., 2014).

Eshel et al. (2014) compared miRNA expression in tilapia gonads and found nine sexually dimorphic expressed miRNAs; they found a single upregulated miRNA in male embryos (miR-4585) with a perfect inverse correlation in expression pattern with its six target genes, cr/20β-hsd, psmb8, rtn4ip1, casp8, atp5g3, and a non-annotated gene (downregulated in males). cr/20βhsd is known to be part of the oxidoreductase pathway for oocyte maturation preceding the enzymatic activity of cyp19 (cytochrome P450 aromatase) (Senthilkumaran et al., 2004), and cyp19a1a is proposed to be the major gene for female determination in zebrafish (Danio rerio) (Rodríguez-Marí et al., 2010). MiR-4585 is upregulated in male embryos at 2- and 5-days post-fertilization (dpf) and decreases at 9 dpf, which indicates the significance of this miRNA in males soon after fertilization (Eshel et al., 2014). Therefore, miR-4585 could possibly be manipulated for sex reversal in tilapia. Similarly, control of cyp19a1a expression may be relevant for sex reversal of females into males, given the aforementioned differential growth between sexes. Recently, Wang et al. (2016), using ovaries and testes of young Nile tilapia, showed that miR-17-5p and miR-20a were highly expressed in the ovaries and negatively regulated DMRT1 expression, suggesting that these miRNAs could induce estrogen production by inhibiting DMRT1 expression and promoting cyp19a1a expression in Nile tilapia. They found that miR-138, miR-338, and miR-200a negatively regulated cyp17a2 (Wang et al., 2016), which is involved in 20b-dihydroxy-4-pregnen-3 one biosynthesis and thus might be essential for spermatogonial cell proliferation and spermatogenesis (Eshel et al., 2014). Wang et al. (2016) also showed that the miRNAs miR-456 and miR-138 negatively regulate AMH and that lower expression of these miRNAs promotes testis differentiation by allowing AMH to be expressed in testis.

Another aspect to be investigated is immunity. Infected animals cause economic losses and risks to human health. Streptococcus iniae causes high mortality and huge economic losses in tilapia cultures in China. This outbreak has seriously hindered the development of the tilapia industry (Qiang et al., 2017a). Qiang et al. (2017a) found that miR-92d regulates the expression of complement C3, which is the central component of the immune system. Furthermore, Qiang et al. (2017b) found three miRNAs (miR-310, miR-92, and miR-127) that were upregulated and four (miR-92d, miR-375, miR-146, and miR-694) that were downregulated by comparing a control group of Nile tilapia with a group infected with Streptococcus iniae. In another study of infection by Streptococcus, but using Streptococcus agalactiae, Wang et al. (2016) applied high-throughput sequencing followed by functional enrichment and found1121 differentially expressed miRNAs that target 41961 genes during infection. These analyses provide data for future studies of host-pathogen interactions among Nile tilapia and S agalactiae. In addition, Wang B. et al. (2017) identified 1981 miRNAs involved in the immune response against meningo encephalitis caused by Streptococcus agalactiae in Nile tilapia. Although these data are important, the studies of Wang et al. (2016) and Wang B. et al. (2017) are deficient because of their failure to find miRNA biomarkers that are more effective or that can be manipulated.

#### Atlantic salmon (Salmo salar, Salmonidae)

Atlantic salmon is a domesticated fish of notable economic interest for wild fisheries and aquaculture production. Since 1970, Atlantic salmon have been intensively selected for genetic traits to improve growth performance, considerably benefiting aquaculture (Bentsen and Thodesen, 2005). The increased growth of this species is estimated at ∼14% per generation (Gjedrem, 2010). Genome knowledge of salmonids is advanced in comparison to other farmed fish species (Bekaert et al., 2013); however, there are limited studies describing salmonid miRNAs.

To provide better tools for future analysis of miRNAs, Zavala et al. (2017) identified and validated appropriate endogenous reference miRNA genes. Zavala et al. (2017) showed that the ssamiR-99-5p gene was the most stable overall and that ssamiR-99- 5p and ssa-miR-23a-5p were the best combination.

The first screen for salmonid miRNAs was performed by Barozai (2012) using an in silico approach to predict miRNAs in salmon genomes. They detected let-7a-3p as a regulator of zonadhesin-like and growth hormone 2 gene, miR-142-5p as a regulator of heparin-binding growth factor 1, and miR-144 as a regulator of the growth factor receptor-bound protein 2. They also found that miR-430 regulates the transforming growth factor-beta-induced protein ig-h3, miR-451 blocks the antidorsalizing morphogenic protein, and miR-1594 activates both titin-cap (telethonin)-like mRNA and growth hormone receptor isoform 2. All of these miRNAs and targets are associated with growth and developmental processes.

Andreassen et al. (2013) characterized miRNA genes in Atlantic salmon by performing deep sequencing analysis of small RNA libraries from nine different tissues and revealed a total of 180 evolutionarily conserved mature miRNAs and 13 distinct novel mature miRNAs. Later, Bekaert et al. (2013) deep sequenced miRNA libraries of young juveniles (4 months old) and identified 547 miRNA transcripts that mapped to 88 miRNA distinct genes.

Johansen and Andreassen (2014) validated miR-25-3p and miR-455-5p as the best-performing two-reference gene combination suitable for quantitative expression analysis in Atlantic salmon. Such findings are relevant for aquaculture because these two miRNA reference genes are now useful for appropriate diagnostics tests, such detection of infection by viral RNA through expression. In addition, Kure et al. (2013) identified the differential expression of miRNAs subjected to an acid environment through RNA-Seq. They found 4 down- and 14 upregulated miRNAs between exposed groups and the control, suggesting alterations in a number of physiological responses that ultimately may interfere in animal growth performance.

The modulatory effect of miRNAs in collagen formation in salmon muscle has also been investigated (Mitchie, 2001). Muscle firmness is appreciated by consumers, and inadequate processing of salmon meat reduces firmness in selected adult salmons (Moreno et al., 2016). miR-29a, which is highly conserved between Salmo salar and Danio rerio (Andreassen et al., 2013), is the major factor in collagen formation and showed low expression in human fibroblasts with systemic sclerosis (Maurer et al., 2010). Thus, miR-29a may be a future target in expression studies to improve filet quality with a high potential for applications in aquaculture.

Furthermore, in salmon aquaculture, precocious puberty creates welfare problems and consequently inflicts economic damage. Skaftnesmo et al. (2017) explored which miRNAs regulate mRNAs during initiation of puberty, and several regulated miRNAs in the pubertal stage had earlier been associated (miR-20a, miR-25, miR-181a, miR-202, let7c/d/a, miR-125b, miR-222a/b, miR-190a) or have now been found connected (miR-2188, miR-144, miR-731, miR-8157) to the initiation of puberty.

In addition, there have been some studies on miRNAs associated with immunity. Andreassen et al. (2017) identified miRNAs responding to salmonid alphavirus (SAV) at different time points post-infection. They identified 20 differentially expressed miRNAs that may be important in viral-host interactions in Atlantic salmon. Following the same field of interest, Valenzuela-Miranda et al. (2017) studied the expression of miRNAs during infection with Piscirickettsia salmonis in the head, kidney and spleen. Piscirickettsia salmonis, a facultative intracellular bacterium, causes salmonid rickettsial septicemia (SRS), which is associated with major mortality in salmonid aquaculture (Mauel and Miller, 2002). The miRNA families miR-181, miR-143, and miR-21 were the most abundant in control groups, while miR-21, miR-181, and miR-30 were the most abundant in animals infected with P. salmonis (Valenzuela-Miranda et al., 2017). Sea louse Caligusrogercresseyi, which affects Chilean aquaculture, were studied during infestation in Atlantic salmon and the most abundant families were mir-10, mir-21, mir-30, mir-181, and let7 in skin, head and kidney (Valenzuela-Muñoz et al., 2017).

Overall, the collection of experimentally identified Atlantic salmon miRNAs provide an important resource for functional genome research in Neotropical fish and, in particular, helps to determine the actual contribution of miRNAs to phenotypic variation in economic and biologically fundamental characteristics.

#### Rainbow trout (Oncorhynchus mykiss, Salmonidae)

Rainbow trout, the most widely cultivated cold freshwater fish in the world, has been improved in terms of its growth and development by animal breeding programs globally (Sae-Lim et al., 2013). This species is also a model organism for genomerelated research on comparative immunology, carcinogenesis, toxicology, disease ecology, physiology, evolutionary genetics, and nutrition (Thorgaard et al., 2002).

Juanchich et al. (2016) provided a repertoire of 2946 miRNA loci in the rainbow trout genome, including 445 already known, in 38 different samples corresponding to 16 different tissues. These data represent the first characterization from a wide variety of tissues and provide a novel resource for research. Subsequently, Mennigen and Zhang (2016) created and reported the microtrout, a comprehensive database, which was developed to implement an algorithm to predict relationships among miRNA-mRNA targets.

Ramachandra et al. (2008) discovered 14 miRNAs in early embryos (5 dpf); among these, miR-21, miR-30d, miR-92a, miR-200, and miR-26 are associated with differentiation and development. Salem et al. (2010) showed that miR-133, which is known to be enriched in mammalian muscle (Shingara et al., 2005; Ason et al., 2006; Chen et al., 2006), is also highly expressed in trout skeletal muscle and therefore is of interest in animal breeding.

Other miRNAs involved in growth and nutritional support metabolic pathways have also been reported. A pioneer study by Mennigen et al. (2012) describes the post-prandial regulation of lipid and glucose metabolism by miRNAs. Among these miRNAs, they predicted conserved targets in fish and humans for miR-103, miR-107, and miR-143 homologs. Research has also shown miR-33 and miR-122b were upregulated, whereas miR-122a was downregulated. MiR-33 and miR-122b act on the hepatic insulin pathway to stimulate lipogenesis and inhibit lipolysis (Fernández-Hernando et al., 2011). MiR-33 and miR-122b may promote lipogenesis and simultaneously inhibit lipolysis. The inhibition of miR-33 expression in mice (Mus musculus) resulted in a low level of VLDL triglyceride (Fernández-Hernando et al., 2011). Similarly, inhibition of miR-122 resulted in increased fatty acid oxidation and decreased fatty acid synthesis rates (Krützfeldt et al., 2005; Esau et al., 2006; Elmén et al., 2008). MiR-122b was inhibited alongside an increase in plasma triglycerides, a final product of the lipogenic pathway (Li et al., 2009; Mennigen et al., 2012). Mennigen et al. (2013) analyzed the expression of miRNAs by quantitative real-time RT-PCR in rainbow trout fingerlings while switching from endogenous to exogenous feeding and showed a decrease in miRNA-33 and miRNA-122a/b isomiRs.

Paneru et al. (2017) aimed to investigate the association of miRNA expression with muscle growth and the effects on growth of single nucleotide polymorphisms (SNPs) in miRNA binding sites. They found 90 miRNAs that showed differential expression and were strongly associated with phenotypic variations. Among the 204 SNPs with 3′ UTRs targeted by miRNAs, 78 SNPs may be associated with changes in 5 muscle traits. The expression pattern of 12 miRNAs, including mir-1, mir-133 and mir-206, was validated by real time PCR.

On the basis of these studies, it is possible that down regulation of the above miRNAs could contribute to weight gain in rainbow trout. Therefore, miRNAs could possibly be experimentally manipulated in live animals to improve the production indices in rainbow trout.

#### Grass carp (Ctenopharyngodon idella, Cyprinidae)

Grass carp is extensively cultivated in eastern Asia and is one of the most important freshwater fish globally (Liu et al., 2009). Grass carp was introduced into 115 countries worldwide, and in at least 58 of these (∼50%) it appears to have self-sustaining populations (Fishbase, 2015) and high performance.

Zhu et al. (2014) reported that several miRNAs are likely involved in fast-twitch skeletal muscle growth in grass carp based on their analysis of the response to rapid refeeding following fasting. They recorded changes in the expression levels of eight miRNAs (miR-1a, miR-181a, miR-133a, miR-214, miR-133b, miR-206, miR-146, and miR-26a) shown to be involved in a strong resumption of myogenesis (Zhu et al., 2014).

Xu et al. (2014) studied miRNA expression stability in embryos at distinct developmental stages as well as in several tissues from adults in an effort to establish the best reference genes for quantitative expression analysis. Seven miRNAs (miR-126-3p, miR-101a, miR-451, miR-22a, miR-146, miR-142a-5p, and miR-192) were found to have optimal stability and should be individually prioritized according to the stage and tissue of interest.

Analyzing two grass carp lineages, Aeromonas hydrophilasusceptible (SGC) and -resistant (RGC), Xu et al. (2015) found miRNAs related to the immune system. MiR-118 and let-7i are differentially expressed and target both tlr4 and nfil3-6 genes. Let-7i is predominantly expressed in the spleen during bacterial infection and displays a noticeable difference in expression between SGC and RGC (Xu et al., 2015). The let-7 family is known to regulate multiple genes related to the cell cycle and proliferation (Yang et al., 2008), and let-7i was shown to influence innate immunity (Chen et al., 2007). Recently, Xu et al. (2016) studied these two carp lineages using two kidney miRNA transcriptomes from SGC or RGC infected with the highly pathogenic A. hydrophila and identified nine miRNAs differentially expressed between these groups. Furthermore, the spatial and temporal expression of a novel miRNA (cid-miRn-115) and miR-142a-3p suggests that they are potential regulators of anti-bacterial activity (Xu et al., 2016). Overexpression of these miRNAs resulted in a visible change in the immune effector activity in C. idella kidney cells, and bioinformatics analysis shows that they directly regulate tlr5 expression (Xu et al., 2016), which is associated with the innate immune response (Yoon et al., 2012).

Therefore, all these miRNAs interfere with grass carp health and should be further investigated for improving fish resistance to diseases that account for great economic losses in aquaculture.

#### DISCUSSION

### Biological Implications of miRNA-Mediated Regulation

In fish, numerous miRNAs have been continuously identified, although only a small fraction have known functions (**Figure 1**; **Table 2**). Regarding Neotropical fish, these numbers are even smaller since very few species have been studied regarding miRNA functions (all species covered in this review). Considering that miRNAs have an important function in shaping both morphological and physiological phenotypes, and its function is highly conserved within vertebrates, analysis of miRNA profiles from other species might potentially be useful to understand and to provide wide applications in Neotropical fish breeding.

Although limited functional studies on farmed fish miRNAs have been conducted, they have provided sufficient evidence for linking miRNA activity to biological aspects of economic interest, such as the control of muscle development and hypertrophy, regulation of oocyte and testis maturation, sex differentiation and physiological resilience to diseases and adaptation to environmental changes. Additionally, deep exploration of classic genetic aspects of miRNA function, such as miRNA mapping, control of expression and genomic context, have contributed to understanding the genetic basis of miRNA expression and have led to new insights into the influence of miRNA on an assortment of complex traits.

Since interactions between miRNAs and their targets can be strongly conserved among species, several interactions verified in previous studies could be extrapolated to Neotropical fish species, thereby remarkably improving fish aquaculture. For instance, the role ofmiR-1, miR-206, and miR-133 during myoblast proliferation and differentiation is recognized to interfere in the hypertrophic growth of skeletal muscle. These interactions has been shown in Nile tilapia (Nachtigall et al., 2015) and may potentially be a widely conserved miRNA-target interaction modulating growth of several Neotropical fishes, as already reported for pacu (Duran et al., 2015; Paula et al., 2017). In the skeletal muscle, the main edible part of the fish, miR-1, miR-133a, and miR-206 have conserved expression patterns in all farmed fish species, which makes them interesting molecules for modulating muscle development and growth. Particularly, in Neotropical fishes, the study of myomiRs (i.e., miRNAs specifically or enriched in this tissue) showed important data that can bring advances to increase their production to its full potential. Moreover, the fact that these miRNAs were found to play the same functional roles in the skeletal muscle of several other animal species corroborates hypotheses about their pivotal importance in gene regulatory networks.

For the large number of non-model fishes that do not have their complete genome sequenced or for those for which little is known about their genome and its interactions, advanced computational tools are becoming sufficiently capable and precise to achieve good and true assessments of the data and genome from model fish species that are evolutionarily close to non-model fish. Consequently, there is less of a gap or challenge between genomic information of non-model and model fishes. Some muscle miRNAs have well-defined target genes, as miR-206 regulates IGF-1; miR-1, miR-122, and miR-462 control IGF-2a; the let-7 family regulates MSTN; miR-103 and miR-107 modulate GHR and FSHR; and miR-138 and miR-211 control LHR. All these genes targeted by muscle miRNAs can be experimentally overexpressed or under expressed through molecular techniques aiming to obtain fishes carrying desired phenotypes. Furthermore, some fish miRNAs have well-defined conserved functions also occurring in humans and mice, and could be controlled to enhance production. For example, miR-29a acts in collagen formation as has conserved targets among fish, mouse and human, implying that this miRNA is a good candidate to be modulated in Tambaqui with the goal of improving filet quality.

Other aspects where miRNAs stand out pertain to the control of fish disease responses that specifically affect confined shoals, owing to the appearance of various stressors, such as poor water quality, high density, and inadequate diets. The megalocytivirus, from the Iridoviridae family, for example, has been a focus of research because it usually causes significant mortality to a wide range of hosts in aquaculture, leading to pronounced economic losses (Subramaniam et al., 2012). For instance, Nile tilapia (Subramaniam et al., 2016), Atlantic salmon (Crane and Hyatt, 2011), rainbow trout (Ariel and Jensen, 2009; Crane and Hyatt, 2011) and grass carp (Subramaniam et al., 2012) are confirmed hosts of megalocytivirus. Furthermore, grass carp is a potential carrier of megalocytivirus, but infection does not unavoidably cause mortality or clinical changes in this species (Subramaniam et al., 2012). Thus, studies could examine whether miRNAs differentially expressed would play a role in maintaining grass carp resilience to megalocytivirus infection and symptoms, and subsequently to test for a prospective application to Neotropical fish.

Another key aspect in fish biology pertains to the control of sexual differentiation and maturation. Experiments on Nile tilapia have shown that miR-456 and miR-138 inhibit testis differentiation by targeting the Amh gene (Wang et al., 2016), and the high expression of miR-4585 in males interferes with the sexual reversal mechanism. Together, these miRNAs can be good biomarkers for investigating both the mechanisms influenced by sex and mechanisms that influence sexual development, many of which are important parameters for efficient production. For example, some of these findings could be further investigated to produce monosex populations of species that females grows larger than males, such as tambaqui. Overall, this area is still in its infancy, and additional studies focusing on miRNA pathways related to sexual development in fishes will be of great value for improving fish breeding.

Environmental temperature plays a key role in maintaining the life cycle of any fish species. Understanding the molecular mechanisms involved in acclimatization at different temperatures is fundamental in the current context because of global warming. In addition, these molecular mechanisms can be used to genetically select fish in aquaculture programs. For example, the cultivation of tilapias and black pacu is restricted to tropical and subtropical areas, since they prefer temperatures between 27 and 32◦C. Handling and transport at low temperatures (<22◦C), especially after winter, leads to severe reduction of appetite and increased risk of disease. Furthermore, the culture of these fish at a temperature below 14◦C is generally lethal. As a result, tilapia and black pacu breeding companies cannot expand their activities to low-temperature locations. Therefore, miRNAs have been proposed as molecular markers to select cold-tolerant fish. In zebrafish, 25 differentially expressed miRNAs were identified in individuals cultured for 10 days at a temperature of 10◦C, which is related to a cellular response of adaptability to the environment (Yang et al., 2011). Thus, miRNAs could be studied as epigenetic markers for the selection of cold tolerant Neotropical fish, allowing the production of several species in subtropical and temperate regions.

Regarding general metabolism, miRNAs that regulate lipid and glucose metabolism are highly conserved between humans and fishes. For instance, miR-103, miR-107, miR-122, and miR-143 have constrained metabolic functions in vertebrates and can be targeted in future genetic therapies. Similarly, salt tolerance is an interesting trait and Nile tilapia has strong tolerance associated with the expression of miR-30c and miR-429, which can be broadly investigated and modulated in less tolerant Neotropical species to increase their productivity.

One of the major challenges, if not the greatest, for the use of miRNAs to increase the production of farm animals is the control of miRNA expression and consequently the control of genes and their products. Such control would allow researchers and farmers to have absolute control over a range of characteristics that would make the animals more productive or exhibit better quality. Efforts have been made to this end, and in the future, it is likely that miRNAs can be fully manipulated. However, in a genomic context, miRNAs can already be very well-exploited with the purpose of improving animal production.

### Biological Implications of miRNA Genomic Context and Regulation of Expression of Complex Traits

Characterization of miRNA signatures of expression for quantitative trait loci (named "miR-eQTL") can produce insights into regulatory mechanisms of miRNA transcription and assist in clarifying the role of miRNAs as orchestrators of complex biological traits.

Experiments detecting the quantitative trait locus (QTL) related to important breeding traits have uncovered diverse molecular markers useful for fish production (Sonesson, 2007; Canario et al., 2008). Recently, the relationship between miRNAs and QTLs has been found to interfere in a myriad of processes, as in sexual differentiation and determination in Nile tilapia (Eshel et al., 2014) and in immune defense against viral infection in Atlantic salmon (Lowe et al., 2014). Additionally, colligated assessment of miRNA signatures and SNP within QTLs has contributed to the recognition of several miRNA regulatory loci of interest. Such miR-eQTLs can be used to determine phenotypic information through the integration of QTL polymorphisms with transcriptome data, including genomic loci of miRNA that contribute to the variation of mRNA expression levels. An analysis in European seabass, another economic relevant species, identified 20,779 SNPs over 1,469 gene loci and intergenic spacers (Kuhl et al., 2011), showing that they can be extensively used as genetic markers in population and molecular studies.

The integration of QTL and miRNA allowed the identification of key regulatory pathways involved in human liver diseases (Gamazon et al., 2013). This approach could be used in fish to assess susceptibility to diseases and other biological aspects, such as muscle development. Indeed, miR-eQTLs are promising molecular tools for enhancing aquaculture productivity and/or species conservation. Similarly, global identification and characterization of miRNAs and QTLs could be an important source for further understanding how coding and non-coding DNA work together to generate attractive phenotypic traits for selection programs (Cerdà and Manchado, 2013).

Although miRNAs have the potential to be used as molecular markers in fish genetic breeding programs, some limitations persist in their broad application in aquaculture. For instance, the assessment of miRNA expression levels is usually performed in specific tissues, requiring euthanasia of the fish, which would make it impossible to use the animals for subsequent selection programs. Another drawback comes from the presence of miRNAs in biofluids, such as blood and urine (Harrill et al., 2016) that were not directly expressed in these fluids, but rather resulted from leakage owing to cell injury, cell death, or active secretion during manipulation. As an alternative to euthanasia for tissue collection, evaluation of circulating miRNA levels in body fluids can be performed to monitor fish metabolism, allowing animals to be further used for genetic selection. In this way, miRNAs could be evaluated in fish blood, since circulating miRNAs are already being proposed as stable epigenetic biomarkers in several mammals, including humans (Harrill et al., 2016; Kasimanickam and Kastelic, 2016).

Another aspect inherent in miRNA modulation pertains to the characteristics of multigenic regulation, that is, a single miRNA can regulate several genes in different biological contexts. So when miRNA expression is altered artificially, the expression of several target genes can be altered. This is one of the challenges of manipulating miRNAs that makes it difficult to obtain the desired final phenotype. This challenge can be overcome or minimized by controlling the modulation of miRNAs specifically or highly expressed in certain tissues during temporal windows of development. For this approach, it is necessary to validate the real interactions between target miRNAs (**Box 2**) in tissues and various cell types. It will then be possible to reduce or eliminate off-target effects and obtain the desired benefits on the stock. The problem of small size and low abundance of some miRNAs has been circumvented through the development and implementation of last generation sequencing techniques and advanced bioinformatics tools.

In addition, to evaluate whether miRNAs with described functions in one species can change the phenotype of other organisms, genetic reverse techniques for gene knock in or knockout can be performed. This approach works by changing gene expression to rigorously test its function, helping to determine the potentially beneficial aspects of down and up regulation in farmed fish species. Additionally, when these effects are confirmed as molecularly conserved in several species, genome-editing analysis can be broadly performed to modulate expression levels of one or a subset of miRNAs in related and non-related fish species. Cutting-edge genome editing techniques have been recently improved with the CRISPR-Cas system and could positively be implemented for aquaculture enhancement. Several papers have already suggested that CRISPR-Cas studies could be widely applied in fish, as first proven by studies using zebrafish models (Bassett et al., 2014).

Based on the aforementioned discussion, we envision that future research on farmed Neotropical fishes will greatly benefit from the ongoing comparative analysis of transcriptomes, which will provide the roadmap for the development of practical applications to expand animal breeding programs. Certainly, studies on farmed fishes that focus on the identification of interactions between SNPs and miRNAs are another important direction. SNPs in miRNA binding sites or in the miRNA precursor sequences may largely impact the desired phenotype and provide the best use of genomics and bioinformatics in animal breeding and aquaculture. Degradome sequencing technology and miRNA-seq can be applied to populations rather than to a few individuals to assess SNPs associated with specific desired phenotypes. This feature could provide an immediate tool for selecting both larval and adult fishes carrying superior traits of economic interest and improve profits.

Finally, gene expression regulation in its various forms is considered one of the fastest and most effective mechanisms underlying adaptive evolution. Thus, several years ago, gene regulation was recognized as a powerful force for diversification and speciation. As was well-discussed in this paper, miRNAs are key molecules involved in the regulation of genes in all complex organisms, including phenotypic characteristics that confer better production or selective advantages in nature. In addition, Franchini et al. (2016) have shown that several species of Neotropical fishes, given their evolutionary history that forces them to form small populations due to physical environmental changes, are and will still serve as good biological models in studies involving molecular mechanisms associated with speciation caused by changes in gene expression. Consequently, miRNAs play a fundamental role in this process. Thus, these species may be at the forefront of studies involving miRNAs and gene regulation as speciation mechanisms.

# CONCLUSIONS AND PERSPECTIVES

In conclusion, this review on fish miRNAs shows that these small molecules are great targets for understanding Neotropical fish biology and that miRNAs possess very attractive features for their immediate implementation as biomarkers that can be used to select adaptive traits for a specific environment in worldwide aquaculture. The existence of highly conserved miRNAs among vertebrates is relevant and may lead to the broad application of knowledge acquired from one fish species to another or even allow for genome editing technology transferability within distinct vertebrate groups. Many miRNA-mediated biological characteristics are available for study and implementation into Neotropical fish farming in distinct captivity environments. Genome editing is the best approach to increase or reduce miRNA expression and could contribute to the improvement of important characteristics that can enhance global aquaculture production.

# AUTHOR CONTRIBUTIONS

Literature survey: MH, AO, PN, JC. Data discussion: MH, AO, PN, JC, VC, AH, DP. Writing of the first draft of the manuscript: MH, AO, PN, JC. Critically revision and final manuscript writing: MH, VC, AH, DP. All authors read and approved the final manuscript.

# ACKNOWLEDGMENTS

This work received financial support from Fapesp (Sao Paulo Research Foundation—grants #2014/03062-0, #2012/15589-7, and #2013/06864-7), and CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico grants #460608/2014-2, #312046/2014-6, #158284/2013-5, #131265/2015-6, and #312312/2016-4).

#### REFERENCES


in Proceedings, 10th World Congress of Genetics Applied to Livestock Production (Vancouver, BC).


improved farmed tilapia (GIFT, Oreochromis niloticus) with Streptococcus iniae infection by modulating complement C3. Fish Shellfish Immunol. 63, 367–375. doi: 10.1016/j.fsi.2017.02.036


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer DTH declared a shared affiliation, with no collaboration, with several of the authors (MH, AO, PN, JC, and DP) and to the handling Editor.

Copyright © 2018 Herkenhoff, Oliveira, Nachtigall, Costa, Campos, Hilsdorf and Pinhal. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# **Little Divergence Among Mitochondrial Lineages of** *Prochilodus* **(Teleostei, Characiformes)**

#### *Bruno F. Melo1,2\*, Beatriz F. Dorini 1, Fausto Foresti <sup>1</sup> and Claudio Oliveira1*

<sup>1</sup> Departamento de Morfologia, Instituto de Biociências, Universidade Estadual Paulista, Botucatu, Brazil, <sup>2</sup> Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, United States

Evidence that migration prevents population structure among Neotropical characiform fishes has been reported recently but the effects upon species diversification remain unclear. Migratory species of Prochilodus have complex species boundaries and intrincate taxonomy representing a good model to address such questions. Here, we analyzed 147 specimens through barcode sequences covering all species of Prochilodus across a broad geographic area of South America. Species delimitation and population genetic methods revealed very little genetic divergence among mitochondrial lineages suggesting that extensive gene flow resulted likely from the highly migratory behavior, natural hybridization or recent radiation prevent accumulation of genetic disparity among lineages. Our results clearly delimit eight genetic lineages in which four of them contain a single species and four contain more than one morphologically problematic taxon including a trans-Andean species pair and species of the P. nigricans group. Information about biogeographic distribution of haplotypes presented here might contribute to further research on the population genetics and taxonomy of Prochilodus.

*Edited by:*

Roberto Ferreira Artoni, Ponta Grossa State University, Brazil

#### *Reviewed by:*

Jorge Abdala Dergam, Universidade Federal de Viçosa, Brazil Evanguedes Kalapothakis, Universidade Federal de Minas Gerais (UFMG), Brazil

*\*Correspondence:*

Bruno F. Melo melo@ibb.unesp.br

#### *Specialty section:*

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> *Received:* 18 October 2017 *Accepted:* 19 March 2018 *Published:* 04 April 2018

#### *Citation:*

Melo BF, Dorini BF, Foresti F and Oliveira C (2018) Little Divergence Among Mitochondrial Lineages of Prochilodus (Teleostei, Characiformes). Front. Genet. 9:107. doi: 10.3389/fgene.2018.00107 **Keywords: DNA barcoding, freshwater fishes, gene flow, Neotropics, Prochilodontidae, South America, taxonomy**

# **INTRODUCTION**

Fishes of the characiform family Prochilodontidae are widely distributed across Neotropical freshwaters and represent important fishery resources in South America (Ribeiro and Petrere, 1990; Garcia et al., 2009). The migratory behavior allows them to achieve many hundreds of river kilometers to spawn during rainy seasons (Godinho and Kynard, 2006) and, consequently, permits extensive gene flow among distant populations (Sivasundar et al., 2001; Melo et al., 2013). There is substantial evidence that migration of prochilodontids results in high levels of genetic variability and low levels of population structure (Sivasundar et al., 2001; Rueda et al., 2013; Ferreira et al., 2017; Machado et al., 2017). However, there is no empirical study aimed to detect whether long-distance migrations affect genetic diversification at species level in Neotropical freshwater fishes.

Prochilodontids represent a good model to address such questions because much research on population genetics and phylogeography have provided valuable intraspecific genetic information (Sivasundar et al., 2001; Turner et al., 2004; Hatanaka et al., 2006; Carvalho-Costa et al., 2008; Melo et al., 2013; Rueda et al., 2013; Ferreira et al., 2017; Machado et al., 2017; Sales et al., 2018). Furthermore, recent barcoding studies in focal regions (i.e., using endemic species) have generated a robust mitochondrial database for *Prochilodus* (e.g., Carvalho et al., 2011; Rosso et al., 2012; Pereira et al., 2013; Chagas et al., 2015; Díaz et al., 2016) that, if combined, might be useful for species-level comparisons.

Prochilodontidae is represented by three genera (*Ichtyoelephas*, *Prochilodus,* and *Semaprochilodus*) spanning 21 species (Castro and Vari, 2004). While *Ichthyoelephas* and *Semaprochilodus* have well-stablished taxonomy, except for questions on species boundaries between *S. kneri* and *S. insignis* (Melo et al., 2016a), the taxonomy of *Prochilodus* remains complex. It has 13 morphologically similar species being two endemic to trans-Andean basins of Río Magdalena (*P. magdalenae*) and Lago Maracaibo (*P. reticulatus*), three from the Amazon basin: the widely distributed *P. nigricans* occupying major tributaries of western Amazon in Colombian, Peruvian, Bolivian, and Brazilian eastern rivers flowing northward such as the Madeira, Tapajós, and Tocantins; *P. rubrotaeniatus*, allopatrically distributed through portions of Rio Negro (i.e., Rio Marauiá) and adjacent Guianese rivers such as the Essequibo, Corantijn, and Marowijne river basins; and the less abundant and endangered *P. britskii* from the Rio Apiacás, a tributary of the upper Rio Tapajós. Remaining species are generally endemic to specific drainages: *P. mariae* (Río Orinoco), *P. lineatus* (La Plata and Rio Paraíba do Sul), *P. argenteus* and *P. costatus* (São Francisco), *P. harttii* and *P. vimboides* (Eastern Brazilian drainages from Rio Pardo to Rio Paraíba do Sul), *P. brevis* (coastal rivers of northeastern Brazil), and *P. lacustris* (Río Parnaíba, Northeastern Brazil). Moreover, species distribution of *Prochilodus* has suffered significant alterations due antropogenic introductions in several rivers of eastern and northeastern Brazil (Castro and Vari, 2004).

Some species groups have very subtle morphological differentiation with species being discriminated by ranges and modal meristic values, and by the biogeographic drainage where they are generally endemic (Castro and Vari, 2004). These are *Prochilodus magdalenae*/*P. reticulatus* from Magdalena-Maracaibo, *P. nigricans*/*P. rubrotaeniatus* from Amazon-Guianas-Orinoco, *P. brevis*/*P. lacustris* from northeastern Brazil, and *P. costatus*/*P. lineatus* from São Francisco-La Plata. Furthermore, a recent molecular phylogeny based on six genes revealed non-monophyly of some species, including *P. magdalenae*, *P. costatus, P. nigricans*, and *P. rubrotaeniatus* (Melo et al., 2016a). This study also revealed a problematic species complex, the *P. nigricans* group that encompasses several specimens of *P. rubrotaeniatus*, *P. brevis,* and *P. lacustris* interspersed within the *P. nigricans sensu lato*. Although Melo et al. (2016a) used various specimens of *P. nigricans* from distinct biogeographic zones across the Amazon basin, which still remain to be complete, they did not use an extensive sampling for those other problematic species.

Haplotypic variation has been applied to study the diversity of Neotropical characiform fishes as well as used to address systematic questions through the expansion of DNA barcoding projects (e.g., Pereira et al., 2011; Bellafronte et al., 2013; Castro Paz et al., 2014; Benzaquem et al., 2015; Melo et al., 2016b; Ramirez et al., 2017). The majority of barcoding studies have demonstrated high levels of interspecific variation (Melo et al., 2016b; Silva et al., 2016a) while others present a more reduced variation pattern (Pereira et al., 2011; Rossini et al., 2016). Despite a substantial number of population genetic studies applied to species of *Prochilodus* and the natural abundance of those fishes in South American rivers, no genetic study aimed to address species diversity within the genus currently exists.

In this context, barcode sequences of a higher number of specimens from distant regions in association with modern species delimitation methods and haplotype variation analysis are applicable to better determine species delineation within problematic taxa (e.g., Castro Paz et al., 2014; Costa-Silva et al., 2015; Melo et al., 2016b), as in the case of *Prochilodus.* Here, we aim to detect the effects of migration in species diversification, to delimit species of *Prochilodus* using a high taxon sampling and to advance the resolution of the problematic species boundaries within the genus.

# **MATERIALS AND METHODS**

# **Taxon Sampling and DNA Sequencing**

Specimens were collected under a permanent permission number 13843-1 from MMA/IBAMA/SISBIO and subsequently preserved in 95% ethanol. We included 146 specimens spanning all 13 species of *Prochilodus* collected across all South America plus *Semaprochilodus taeniurus* to root the trees (total 147 taxa). We sequenced barcodes for 19 specimens and supplemented the matrix with 127 additional barcodes of *Prochilodus* available at the public genetic databases Genbank (www.ncbi.nlm.nih. gov/) and Barcode of Life Database (BOLD; www.boldsystems. org/). **Supplementary Table S1** contains voucher and locality information and accession numbers for databases.

Genomic DNA was extracted from muscle tissues preserved in 95% ethanol with a DNeasy Tissue kit (Qiagen Inc.; http://www. qiagen.com) according to the manufacturer's instructions. We obtained partial sequences of the mitochondrial gene *cytochrome oxidase c subunit I* by amplifying via polymerase chain reaction (PCR) using the primer described in the literature (Melo et al., 2011) and modifying reaction steps as follow: 12.5 μl as a total volume with 9.075 μl of double-distilled water, 1.25 μl 5x buffer, 0.375 μl MgCl2 (50 mM), 0.25 μl dNTP mix, 0.25 μl of each primer at 10μM, 0.05 μl Platinum Taq DNA polymerase enzyme (5 units/μl, Invitrogen; www.invitrogen.com) and 1.0 μl genomic DNA (10–50 ng). The PCR consisted of an initial denaturation (4 min at 95◦C) followed by 28–30 cycles of chain denaturation (30 s at 95◦C), primer hybridization (30–60 s at 52–54◦C), and nucleotide extension (30–60 s at 72◦C). After the visualization of the fragments using 1% agarose gel, we performed the sequencing reaction using dye terminators (BigDyeTM Terminator v 3.1 Cycle Sequencing Ready Reaction Kit, Applied Biosystems; http:// www.appliedbiosystems.com) purified again through ethanol precipitation. We then loaded the samples onto an automatic sequencer ABI 3130-Genetic Analyzer (Applied Biosystems) at the São Paulo State University, Brazil.

# **Species Delimitation and Population Genetic Analyses**

We assembled and edited the newly generated consensus sequences in Geneious 7.1.9 (Kearse et al., 2012) and aligned the whole matrix with Muscle (Edgar, 2004). This matrix contains 147 taxa (146 *Prochilodus* plus one *Semaprochilodus*) and 648 bp. To evaluate the occurrence of substitution saturation, the index of substitution saturation in asymmetrical (Iss.cAsym) and symmetrical (Iss.cSym) topologies were estimated in Dambe 5.3.38 (Xia, 2013). We used PartitionFinder 1.1.0 (Lanfear et al., 2012) to select the best-fit model of nucleotide evolution for our dataset.

Species were previously identified following the most recent and complete taxonomic revision (Castro and Vari, 2004), and lineages were proposed based on subsequent topologies. Most available sequences are from vouchers already identified by the first author (e.g., Melo et al., 2016b) or from previous studies with endemic species (Carvalho et al., 2011; Rosso et al., 2012; Pereira et al., 2013; Díaz et al., 2016). We then generated overall and pairwise values of genetic distance based on Kimura-2-parameters (K2P)+Gamma using Mega 7.0 (Tamura et al., 2013) and a neighbor-joining tree (NJ) with 1,000 replicates of bootstraps using Geneious 7.1.9. We also performed a maximum likelihood (ML) analysis under RAxML HPC-PTHREADS-SSE3 (Stamatakis, 2006) using five random parsimony trees with the GTRGAMMA model (Stamatakis et al., 2008) without rooting and with other parameters at default. We used the autoMRE function to generate pseudoreplicates through MREbased stopping criteria (Pattengale et al., 2009) that ran a total of 650 replicates. Stopping criteria determine when enough replicates have been generated so that robust bootstraps under ML analysis become computationally practical (Pattengale et al., 2009).

An ultrametric gene tree was generated in a Bayesian inference with Beast 1.8.0 (Drummond et al., 2012) using two independent runs of 50 millions generations sampling trees every 5000th generation. Convergence was indicated by Tracer v1.5 (Rambaut et al., 2014) with estimated sample sizes (ESS) superior to 200. An appropriate number of trees (first 10%) from each run was discarded as burn-in and the MCMC samples was generated using the maximum clade credibility (MCC) topology in TreeAnnotator v1.4.7 (Drummond et al., 2012) and visualized in FigTree v1.4.3.

The general mixed Yule coalescent (GMYC) method (Pons et al., 2006; Fujisawa and Barraclough, 2013) was performed using the ultrametric gene tree estimated with the exponential growth coalescent model (Griffiths and Tavaré, 1994) and the lognormal relaxed clock model (Drummond et al., 2006), which assumes that the rates of molecular evolution are uncorrelated but log-normally distributed among lineages. Species delimitation through GMYC model was conducted using standard parameters [interval = c(0, 10)] and a single threshold that specifies the transition time between to within species branching. Such analysis was conducted with the package *splits* (Species Limits by Threshold Statistics; http://r-forge.rproject.org/projects/splits) in R v.3.0.0 (R Development Core Team, 2013). GMYC appears to be useful for single-locus analysis (Fujisawa and Barraclough, 2013) but depends on the availability of additional data/analyses from independent characters (Esselstyn et al., 2012). Additionally, we used the Bayesian Poisson Tree Processes model (bPTP) (Zhang et al., 2013) in the bPTP webserver (http://species.h-its.org/ptp/) under default parameters. bPTP does not require an ultrametric gene tree and uses, instead, a nexus tree as input file with branch lengths representing the number of nucleotide substitutions (Zhang et al., 2013). We used a nexus MCC tree generated in Beast 1.8.0 (Drummond et al., 2012) as input file and ran 500,000 generations (thinning = 500). We also used a clustering species delimitation analysis through the Automatic Barcode Gap Discovery (ABGD; Puillandre et al., 2012) that automatically defines sequences into hypothetical candidate species based on confidence limits for intraspecific divergence. We used a pairwise distance matrix generated in Mega 7.0 (Kumar et al., 2016) through K2P+G model and 1,000 pseudoreps as input file into the ABGD webserver (wwwabi.snv.jussieu.fr/public/abgd/abgdweb.html) with other parameters left at defaut.

Population genetic analyses were conducted in order to detect levels of genetic variance among haplotypes. We excluded four taxa and excized flanking regions with elevated missing data to properly run those analyses. This reduced matrix contained 143 taxa and 465 bp. Each mitochondrial lineage previously determined by distance and likelihood analyses was treated as a distinct population. We used DnaSP v.5.10.01 (Librado and Rozas, 2009) to obtain the number of polymorphic sites, haplotype number, and nucleotide/haplotype diversity. In Arlequin 3.5.1 (Excoffier and Lischer, 2010), each mitochondrial lineage was set as a single population with the following hypothetic group structuring (group 1 = outgroup; group 2 = lineage 1; group3 = lineage 2; group 4 = lineages 3, 4, and 5; group 5 = lineages 6, 7, and 8) based on the arrangement from ML and Bayesian trees. We ran an analysis of molecular variance (AMOVA; Excoffier et al., 1992) with 1,000 permutations using conventional F-statistics and generated the haplotype network using the median joining analysis (Bandelt et al., 1999) incorporated in PopART 1.7 (Leigh and Bryant, 2015).

# **RESULTS**

The final matrix contained 147 taxa, 648 bp, and 154 variable sites (23.8%). Nucleotide frequencies were 21.1% adenine, 25.0% citosine, 16.1% guanine, and 26.4% tymine. The newly generated sequences of *Prochilodus* are deposited at GenBank with accession numbers MH068824–MH068842 (**Supplementary Table S1**). The Iss indexes indicated no saturation in either transitions and transversions in both asymmetrical (Iss.cAsym) and symmetrical (Iss.cSym) topologies. The overall mean of K2P genetic distances without outgroup was 0.025 ± 0.004. Intraspecific genetic variation ranged from zero within the lineage of *P. magdalenae* and *P. reticulatus* to 0.003 within the lineage of *P. costatus* and *P. lineatus*. The lowest pairwise K2P distance was 0.012 ± 0.004 between *P. harttii* and *P. argenteus*. The highest pairwise K2P distance was 0.103 ± 0.016 between *P. vimboides* and *P. mariae*. Fourteen out of 28 pairwise comparisons received values below 0.03. **Table 1** shows intraspecific and interspecific genetic distances of each lineage.

Species delimitation analysis by GMYC evidenced the presence of eight genetic lineages (interval 3–22) that encompass the 13 valid species of *Prochilodus* (**Figure 1**). The threshold time was −0.004 and indicates the time before which all nodes reflect speciation events and after which all nodes reflect coalescent events. Maximum likelihood for the null model was 1452.484 and maximum likelihood for GMYC model was 1457.71. The bPTP species delimitation analysis through both ML and Bayesian approaches returned a slightly distinct result with a total of 10 lineages of *Prochilodus* plus outgroup. The two additional clusters refer to splits within *P. vimboides* (Lineage 1; low support = 0.569) and within *P. lineatus* (Lineage 8; low support = 0.427). ABGD resulted in eight partitions that ranged from 61 (*P* = 0.001) to one candidade species (*P* = 0.03), with one partition with eight candidate species plus outgroup (*P* = 0.002) that match those obtained in GMYC. The evidence of eight species of *Prochilodus* plus outgroup agrees with NJ and ML topologies showing well-defined branches and reciprocal monophyly. **Supplementary Figure S1** represents NJ tree and **Supplementary Figure S2** represents the best maximum likelihood tree (sum of branch lengths = 0.266).

The reduced matrix for population genetic analyses included 143 sequences (see section Materials and Methods) with 465 bp (364 invariable and 101 polymorphic sites) and a total of 47 haplotypes (HD = 0.951). We found high *F*ST values among lineages/populations (*F*ST = 0.331, *P* < 0.001) ranging from 0.000 (*Prochilodus mariae* vs. *Semaprochilodus*, *P. harttii* vs. *Semaprochilodus*, *P. harttii* vs. *P. mariae*) to 1.000 (*P. magdalenae/P. reticulatus* vs. *Semaprochilodus*) but without significant values (**Supplementary Table S2**). High *F*ST values are expected due the fact that we are treating lineages/species as populations. AMOVA results indicated that there is more variation within populations (66.9%) than among populations within groups (21.7%) or among groups (11.4%) (**Supplementary Table S3**). The haplotype network shows the distribution and interrelationships among haplotypes (**Supplementary Figure S3**).

All clusters present strong support for hypothesized lineages in the NJ (bootstrap >74%), ML (bootstrap >76%) and BI (posterior probabilities = 1) analyses. Lineage one includes *Prochilodus vimboides* from eastern Brazil including the Rio Doce, Rio Itaúnas, and Rio Mucuri. Lineage two includes *P. magdalenae* (Río Magdalena in Colombia) and *P. reticulatus* (Lago Maracaibo in Venezuela), the trans-Andean species of *Prochilodus* as a single genetic unit. Third lineage contains three specimens of *P. mariae* from Río Orinoco and lineage four has two specimens of *P. harttii* from Rio Pardo in Eastern Brazil. *Prochilodus argenteus* is represented by lineage five with 14 specimens from the upper, middle and lower Rio São Francisco plus two specimens introduced into Rio Doce and Rio Jequitinhonha. Subsequent lineages (six, seven, and eight) contain more that one species of the *P. nigricans* group (*sensu* Melo et al., 2016a) plus *P. lineatus* and *P. costatus*. The lineage six incorporates the haplotypic group composed by *P. nigricans* from uplands of the Eastern Amazon (Rio Araguaia, upper and middle Rio Tapajós), *P. britskii* from the upper Rio Tapajós (Rio Apiacás), *P. brevis* from northeastern Brazil (states of Ceará and Rio Grande do Norte), *P. lacustris* from Rio Parnaíba, and *P. rubrotaeniatus* from the upper Río Orinoco in Venezuela and the upper Essequibo river basin in Guyana. Lineage seven contains three specimens of *P. rubrotaeniatus* from Corantijn, Coppename, and Marowijne river basins in Suriname plus 26 specimens of *P. nigricans* from lowlands of the Western Amazon, including mainstream Rio Amazonas in Manaus (Brazil), the Río Itaya at the Iquitos region (Peru), Rio Madeira, and Rio Purus. Finally, the eighth lineage contains the species pair composed by *P. costatus* from distinct regions of the Rio São Francisco together with 42 specimens of *P. lineatus* from Rio Paraíba do Sul, upper Rio Paraná, upper Rio Paraguai (all in Brazil) and the lower Rio Paraná (Argentina). Analyses of NJ, ML, and BI returned similar results overall, despite some differences in the arrangement of some lineages (**Supplementary Figures S1**, **S2**).


Intraspecific genetic distance within lineages in bold. Groups were defined based on the GMYC analysis. N, number of lineage matching those in *Figure 1*; EA, Eastern Amazon; WA, Western Amazon; EG, Eastern Guianas; WG, Western Guianas.

#### Melo et al. Little Haplotype Variation in Prochilodus

# **DISCUSSION**

#### **Species Delimitation in** *Prochilodus*

Results from the species delimitation analysis revealed the presence of eight genetic lineages covering 13 valid species of *Prochilodus*, in which four lineages (1, 3, 4, and 5) are structured by only one species and the other four lineages (2, 6, 7, and 8) include more than one species. Topologies (**Figure 1**, **Supplementary Figures S1**, **S2**) are quite similar to the molecular phylogeny of Prochilodontidae (Melo et al., 2016a), likely due the locus selection. *Prochilodus vimboides* (lineage 1), for example, splits from the most recent common ancestor of all other *Prochilodus*, although the biogeographic implications for this result still requires a more detailed, time-calibrated analysis of the Prochilodontidae. Other example is the structuring of *P. harttii* (lineage 4) and *P. argenteus* (lineage 5), evidencing distinct genetic lineages even with a recent evidence of hybridization (Sales et al., 2018). In contrast with the molecular phylogeny, results indicate that *P. mariae* is an exclusive cluster, suggesting inconsistencies in the phylogenetic placement of the species (Castro and Vari, 2004; Melo et al., 2016a). A phylogeographic study found that *P. mariae* diverged from *P.* cf. *rubrotaeniatus* in a very recent cladogenesis (Turner et al., 2004), which does not match the Orinoco-Amazon vicariant event resulted from the rise of Vaupes Arch during the Late Miocene (Lujan and Armbruster, 2011).

Our findings suggest the recognition of only one trans-Andean species, in which *Prochilodus magdalenae* from Río Magdalena remains nested within *P. reticulatus* from Lago Maracaibo (lineage 2) as proposed by the molecular phylogeny (Melo et al., 2016a). Species limits among them involve subtle differences in the range and modal values of number of lateral line scales, number of predorsal scales, and number of vertebrae (Castro and Vari, 2004). A further phylogeographic study involving samples from Atrato, Cauca-Magdalena, and Maracaibo might help to elucidate the allopatric distribution of this mitochondrial lineage.

The Amazon basin harbors two distinct mitochondrial lineages of *Prochilodus nigricans* (lineages 6 and 7). A recent study detected population structure in western populations of *P. nigricans* (Madeira and Purus) compared to those from mainstream Rio Amazonas (Machado et al., 2017) despite the lack of samples from eastern tributaries. Interestingly, there is ecological evidence of two distinct migration patterns of *P. nigricans* in the Amazon basin (Araújo-Lima and Ruffino, 2003) that might explain our results. The first involves lateral migrations from floodplain lakes to the mainstream Rio Amazonas with subsequent migration upstream to breeding and spawning (Fernandes, 1997), and the second involves only upstream migrations to upper Rio Tocantins or Araguaia to spawning and downstream migrations to feeding (Carvalho and Mérona, 1986), the latter similar to the well-known pattern observed for *P. argenteus* (Godinho and Kynard, 2006) and *P. lineatus* (Agostinho et al., 2004).

Our results do not support the presence of multiple species within lineage six. *Prochilodus britskii*, a morphologically distinct species, appears for the first time embedded within the lineage, differently from the position as sister to *P. mariae* (Melo et al., 2016a). Both *P. brevis* and *P. lacustris* from northeastern Brazil are distinguished from the species pair *P. nigricans* and *P. rubrotaeniatus* by radial subdivision patterns on body scales (Castro and Vari, 2004). Morphological features diagnosing each of them include overlapped counts of lateral line scales, number of horizontal scale rows below lateral line, and number of circumpeduncular scale rows (Castro and Vari, 2004).

Castro and Vari (2004) redescribed *P. nigricans* by examining almost one thousand Amazonian specimens including type specimens. They designated a neotype from Lago Janauacá at the right margin of Rio Solimões near Manaus in Brazil. Twenty-one individuals of *P. nigricans* from Manaus appear within lineage seven along with specimens from western Amazon. Therefore, this cluster likely constitutes the genetic lineage of the neotype. The position of *P. rubrotaeniatus* still represents a lacuna in our knowledge due the presence of the species in both lineages six and seven. Based on a previous phylogeographic study (Turner et al., 2004), Albert et al. (2011) suggest that *P. rubrotaeniatus* represents an example of paraspecies that gave rise to the endemic *P. mariae*. Paraspecies are paraphyletic, geographically widespread species that originates another peripheral isolated species without becoming extinct (Ackery and Vane-Wright, 1984; Albert et al., 2011). The phylogenetic evidence (Melo et al., 2016a) and the results arrived herein refute such hypothesis and instead, indicate that the present concept of *P. rubrotaeniatus* constitute more than one genetic lineage.

In the Brazilian Shield, *Prochilodus costatus* share the same mitochondrial cluster with *P. lineatus* (lineage 8), which again corroborates the molecular phylogeny (Melo et al., 2016a) and a mitogenome analysis (Chagas et al., 2015). Analyzed specimens of *P. costatus* from Rio Pandeiros/São Francisco are remarkably distant (∼2,500 linear km) from analyzed specimens of *P. lineatus* from Rosario in Argentina. These results agree with previous population genetic and phylogeographic studies that show high genetic diversity and low population divergence (Sivasundar et al., 2001; Carvalho-Costa et al., 2008; Melo et al., 2013; Ferreira et al., 2017). A phylogeographic study of *P. lineatus*, for example, found strong similarity among mitochondrial control regions between samples from the lower Rio Paraná in Argentina and upper Rio Paraná in Brazil (Sivasundar et al., 2001). Overlapped counts of lateral line scales, number of vertebrae and allopatry slightly discriminate the two species (Castro and Vari, 2004), which is clearly not supported herein. In addition, our evidence indicates that future population genetic studies of one or another species should include members of both nominal species.

# **Little Divergence Among Lineages of** *Prochilodus*

Results indicate very little mitochondrial divergence among lineages of *Prochilodus* and provide evidence that distantly sampled specimens, in various instances, correspond to a single mitochondrial lineage. The most plausible hypothesis that might explain such result is that migration affects species diversification. Indeed, migration has been used to explain high levels of gene flow and low population structure in *Prochilodus* (e.g., Sivasundar et al., 2001; Carvalho-Costa et al., 2008; Melo et al., 2013; Ferreira et al., 2017). This is supported by ecological data from fish tagging that found migratory routes of >120 km for *P. argenteus* along the Rio São Francisco (Godinho and Kynard, 2006) and 250 km of *Prochilodus* sp. in that same basin (Paiva and Bastos, 1981).

Migration and gene flow directly influence morphological stasis (Stanley, 1979). This raises some questions about how migration patterns have influenced population diversification without morphological change in *Prochilodus*. Would distinct environmental settings be responsible for distinct movement behaviors along their evolutionary history? López-Fernández and Albert (2011) suggest that massive prochilodontid migrations evolved during the Oligocene, before the separation of the paleo-Amazon-Orinoco river basin, and that posterior vicariant events allowed their successful colonization throughout major Neotropical basins. The two allopatric lineages of *P. nigricans* and their two migration patterns (Carvalho and Mérona, 1986; Fernandes, 1997) support López-Fernández and Albert (2011)'s conclusion and reinforce the fact that a lineage (or population) once fragmented tends to search for ecological adaptation in distinct environmental conditions. Colonization of Neotropical habitats in upland rivers requires adaptation to a strong selective pressure by acquiring specific morphological innovations (Silva et al., 2016b) or behavior specializations, which appears to be the case of *Prochilodus*.

It is noteworthy, however, that migration is not the exclusive factor aging disfavoring species diversification in *Prochilodus*. There are, at least, two more plausible explanations for the observed low genetic variation. Hybridization between the native *P. harttii* and the introduced *P. argenteus* has been genetically identified in the Rio Jequitinhonha recently (Sales et al., 2018). Although hybridization between native species has not been documented yet, the process might be included as another hypothesis to explain our results, at least for sympatric species. Another plausible hypothesis would be recent episodes of species diversification that did not allow accumulation of haplotype variation. Despite a molecular phylogeny is available (Melo et al., 2016a), the lack of a time-calibrated tree does not allow us to support this hypothesis with better confidence. Testing those three most plausible hypotheses to explain the little divergence among lineages of *Prochilodus* is thus a matter of further research.

#### **AUTHOR CONTRIBUTIONS**

BM, FF, and CO designed the project; BM and BD generated, analyzed, and compiled the data; BM wrote most of the text; BM, BD, FF, and CO revised and approved the final version of the manuscript.

### **ACKNOWLEDGMENTS**

The authors thank F. Y. Ashikaga for helping with the population genetic analyses. Research received financial support from FAPESP grants 2011/08374-1, 2013/16436-2, 2016/11313- 8 (BM), PIBIC-CNPq (BD), FAPESP grant 2014/26508-3, and CNPq grant 306054/2006-0 (CO).

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00107/full#supplementary-material

**Supplementary Table S1 |** Lineage, taxon, voucher, locality information, and Genbank accession numbers of the analyzed specimens of Prochilodus. Lines in bold indicate sequences generated in the present study and asterisks represent BOLD accession numbers.

**Supplementary Table S2 |** Pairwise FST values among mitochondrial lineages of Prochilodus. ∗P < 0.05.

**Supplementary Table S3 |** Analysis of molecular variance (AMOVA) among lineages of Prochilodus. Groups were ordered on the basis of previous ML and Bayesian analyses (see section Material and Methods).

**Supplementary Figure S1 |** Neighbor-joining tree of the species of Prochilodus based on partial sequences of the cytochrome oxidase c subunit I. Numbers near nodes represent bootstrap support.

**Supplementary Figure S2 |** Maximum likelihood tree of the Prochilodus species based on partial sequences of the cytochrome oxidase c subunit I. Numbers near nodes represent bootstrap support. Colors match those in **Figure 1**.

**Supplementary Figure S3 |** Haplotype network of the eight mitochondrial lineages of Prochilodus. Each circle represents a unique haplotype and the size proportional to haplotype frequency. Colors match those in **Figure 1**.

# **REFERENCES**


*Conservation Status*, eds J. Carolsfeld, B. Harvey, C. Ross, and A. Baer (Victoria, BC: World Fisheries Trust), 233–301.


the case of the São Francisco River basin. *Mitochondrial DNA* 22, 80–86. doi: 10.3109/19401736.2011.588214


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2018 Melo, Dorini, Foresti and Oliveira. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Trends in Karyotype Evolution in Astyanax (Teleostei, Characiformes, Characidae): Insights From Molecular Data

Rubens Pazza<sup>1</sup> \*, Jorge A. Dergam<sup>2</sup> and Karine F. Kavalco<sup>1</sup>

<sup>1</sup> Laboratory of Ecological and Evolutionary Genetics, Institute of Biological and Health Sciences, Federal University of Viçosa, Rio Paranaíba, Brazil, <sup>2</sup> Laboratory of Molecular Systematics "Beagle", Department of Animal Biology, Federal University of Viçosa, Viçosa, Brazil

The study of patterns and evolutionary processes in neotropical fish is not always an easy task due the wide distribution of major fish groups in large and extensive river basins. Thus, it is not always possible to detect or correlate possible effects of chromosome rearrangements in the evolution of biodiversity. In the Astyanax genus, chromosome data obtained since the 1970s have shown evidence of cryptic species, karyotypic plasticity, supernumerary chromosomes, triploidies, and minor chromosomal rearrangements. In the present work, we map and discuss the main chromosomal events compatible with the molecular evolution of the genus Astyanax (Characiformes, Characidae) using mitochondrial DNA sequence data, in the search for major chromosome evolutionary trends within this taxon.

#### Edited by:

Roberto Ferreira Artoni, State University of Ponta Grossa, Brazil

#### Reviewed by:

Wenyan Du, Alforex Seeds LLC, United States Filipe Augusto A. G. Gonçalves De Melo, Universidade Estadual do Piauí, Brazil

> \*Correspondence: Rubens Pazza rpazza@ufv.br

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 01 November 2017 Accepted: 03 April 2018 Published: 16 April 2018

#### Citation:

Pazza R, Dergam JA and Kavalco KF (2018) Trends in Karyotype Evolution in Astyanax (Teleostei, Characiformes, Characidae): Insights From Molecular Data. Front. Genet. 9:131. doi: 10.3389/fgene.2018.00131 Keywords: cytotaxonomy, molecular evolution, chromosomal rearrangements, mtDNA, chromosomal symplesiomorphy, chromosomal synapomorphy, chromosomal autapomorphy

#### INTRODUCTION

The role of chromosomal rearrangements in the evolution of organisms has been a matter of debate for many years. The initial observations that closely related species differ in their karyotypes was later supported by evidence that unbalanced rearrangements can interfere with gametogenesis, decrease gene flow, and reinforce reproductive isolation leading to speciation (Rieseberg, 2001; Navarro and Barton, 2003). On the other hand, some organisms tolerate a certain amount of chromosomal rearrangement, which is often referred to as karyotypic plasticity (Jónsson et al., 2014; Havelka et al., 2016). This complicates the task of explaining the possible role of rearrangements in the evolution of organisms, since the variation may result in speciation or stay as polymorphism within populations.

Since Moreira-Filho and Bertollo (1991), who proposed Astyanax scabripinnis as a species complex based on cytogenetic and morphometric data, chromosome variation has been regarded as part of speciation processes in Astyanax. Before that, however, it was already acknowledged that some populations could differ in their karyotype formulae and diploid numbers (Morelli et al., 1983). Thus, it is now known that populations currently assigned to A. scabripinnis and to the other nominal species are characterized by some degree of inter- or intra-population chromosome variation, with diploid numbers ranging from 46 to 50 chromosomes; such is the case of Astyanax fasciatus (Pazza et al., 2006). Molecular cytogenetics (i.e., satellite DNA and ribosomal

genes localization) has also provided further taxonomically informative data (for a review, see Pazza and Kavalco, 2007).

Although little is known about the cause and effect of the rearrangements during the evolutionary process, the correlation between independent data sets such as molecular and chromosomal data suggest that some karyotypic signatures are associated with organismic evolution. Based on Cytochrome B (CytB) sequences, Mello et al. (2015) found three genetically distinct clades out of 17 Astyanax nominal species from the Iguaçu and adjacent river basins. Likewise, using Cytochrome Oxydase I (COI) sequences, Rossini et al. (2016) found five clades out of 64 nominal species plus 12 provisionally identified taxa. Based on DNA barcoding criteria, these authors identified only 21 morphological species. These studies also suggest the possibility of horizontal transfer that affects some species within otherwise cogent taxa. Indeed, hybridization may play a fundamental role in the genus speciation, especially when these events involve chromosomal characteristics. The matter is particularly difficult since the claims of possible natural hybrids in the specialized literature are rare (Artoni et al., 2006; Pazza et al., 2006; Sassi et al., 2018).

Thus, we analyzed mitochondrial sequences from individuals with known chromosome characteristics within the five clades proposed by Rossini et al. (2016) to describe the relations between DNA sequences and chromosomal rearrangements. We obtained a phylogenetic tree that depicts chromosomal characteristics that are strongly associated with the proposed clades.

#### MATERIALS AND METHODS

In the present study we sequenced samples of 195 individuals from 16 nominal species of the genus Astyanax, deposited in the tissue collection of the Laboratory of Ecological and Evolutionary Genetics (LaGEEvo), of the Federal University of Viçosa, Rio Paranaíba campus. The species were chosen based on their chromosomal data as analyzed in previous works and because they were included in Rossini et al. (2016) clades. Geographic coordinates, Vouchers, and GenBank sequence access numbers are summarized in Supplementary Table S1. This study was carried out in accordance with the recommendations of the Guide for the Care and Use of Laboratory Animals by the Conselho Nacional de Controle de Experimentação Animal (CONCEA). Tissue samples of fish of the genus Astyanax deposited in the tissue bank of the Laboratory of Ecological and Evolutionary Genetics (LaGEEvo). All the specimens used had their chromosomal data analyzed in previous works. No additional animals were sacrificed to this study.

Our hypothesis on chromosome evolution was based on the following chromosome characters: diploid numbers, fundamental number (FN), location of 5S DNA ribosomal sites, presence/absence of a 5S rDNA site in a specific submetacentric, referred to as the "marker" chromosome by Almeida-Toledo et al. (2002) and Kavalco et al. (2005), and finally, the amount and distribution of the repetitive DNA As-51 probe (Mestriner et al., 2000).

Total DNA extraction from muscle or liver samples was carried out using commercial kits (PureLink Genomic DNA minikit, InvitrogenTM), according to the manufacturer's instructions. Amplification of mitochondrial (mtDNA) subunits 6 and 8 of the ATP synthase enzyme gene (ATPase 6/8) was accomplished using the primers ATP8.2-L8331 (50 -AAAGCRTTRGCCTTTTAAAGC-3<sup>0</sup> ) and CO3.2-H9236 (50 -GTTAGTGGTCAGGGCTTGGRTC-3<sup>0</sup> ) (Sivasundar et al., 2001). PCR was performed in a final volume of 25 µL, with 2.5 µL of 10× Taq buffer, 1 µL of MgCl2, 1 µL of each primer, 0.2 µL Taq DNA polymerase, 12.8 µL ultrapure water, 1.5 µL of dNTP, and 5 µL of DNA. The amplification reactions were performed in a thermocycler at 95◦C for initial denaturation (2 min) and 30 cycles of 94◦C (30 s), 58◦C (30 s), and 72◦C (1 min). The PCR product was visualized on 1% agarose gel; purification and sequencing was performed by a third-party company (Macrogen, Korea).

Sequence editing was performed using Chromas Lite v2.01 and sequence identity was checked with BLASTn<sup>1</sup> . Sequences alignment was carried out with ClustalW v1.6 (Thompson et al., 1994) as implemented in MEGA v7 (Kumar et al., 2016). A Maximum Likelihood was obtained using the best model fit with MEGA v7 (Kumar et al., 2016), and chromosome characters were plotted on this ML phylogram. Phylogenetic signal was estimated using bootstrap (Felsenstein, 1985).

### RESULTS

A total of 195 mitochondrial DNA sequences were obtained from individuals with known chromosomal characteristics, corresponding to 16 nominal species of Astyanax from the Neotropical region. The ATPase 8 gene yielded a 530 bp partial sequence without insertions, deletions, or stop codons, and the substitution model was Tn93 + G. The maximum likelihood phylogram indicated four main clades with strong bootstrap support. The main events of chromosome differentiation were plotted on a simplified phylogenetic tree (**Figure 1**) according to the trends observed in the analyzed specimens and data from the literature. At its root, and according to the chromosomal characteristics of closely related species, we propose a most recent common ancestor with a karyotype with 2n = 50 chromosomes, low FN, one or two pairs of chromosomes bearing 5S ribosomal rDNA sites at the terminal region, and multiple 18S rDNA sites.

### Clade 1

The first clade, composed of 36 individuals, corresponds to exclusively coastal species, here represented here by A. ribeirae, A. intermedius, A. giton, and A. hastatus (**Figure 2**), which have been recently proposed as members of the Probolodini (Silva, 2017). In this group, all individuals have 2n = 50 chromosomes; low FN; absence of repetitive DNA As51 and the 5S rDNA marker chromosome; little constitutive heterochromatin, distributed mainly in the pericentromeric region; a greater number of 5S

<sup>1</sup>http://www.ncbi.nlm.nih.gov

rDNA sites (ranging from 6 to 10 sites) distributed in terminal regions; and a variable number of Ag-NORs and 18S rDNA sites.

# Clade 2

This clade is composed of five individuals of Astyanax mexicanus (**Figure 3**). This species has a few repetitive As51 DNA sites and 5S rDNA is distributed on six sites, including the submetacentric, marker chromosome pair.

# Clade 3

This clade comprises 71 individuals with oval humeral spots distributed in coastal, Upper Paraná, Paraguay, and São Francisco river basins, belonging to the Astyanax bimaculatus species complex, such as A. altiparanae, A. aff. bimaculatus, A. lacustris, A. assuncionensis, and A. abramis (**Figure 4**). They all have 50 chromosomes, with a predominance of submetacentric chromosomes, and consequently high FN numbers. The presence of As51 repetitive DNA in a few sites and the presence of 5S rDNA restricted to the submetacentric marker chromosome is remarkable. The distribution of heterochromatin and rDNA 18S/Ag-NORs is variable.

# Clade 4

This clade encompasses species widely distributed in the Upper Paraná, São Francisco, and coastal river basins. Eighty four individuals of A. paranae, A. rivularis, A. bifasciatus, A. fasciatus, and A. bockmanni are represented in this clade (**Figure 5**). Species of this clade present high intra and interspecific chromosome variability, with 2n = 46 to 2n = 50 chromosomes. Chromosomes carrying As51 satellite DNA sites vary from none to 14 sites. Most species have four 5S rDNA sites, which may include the chromosomal marker pair. As in Clade 3, they have FN numbers, relatively few acrocentric chromosomes, 18S rDNA sites and variable Ag-NORs, and highly variable constitutive heterochromatin distribution.

# DISCUSSION

The phylogram obtained in the present work using the mitochondrial DNA sequence (ATPase subunit 6) is mostly congruent with that obtained by Mello et al. (2015) using the mitochondrial DNA sequence of the cytochrome b gene, as well as that obtained by Rossini et al. (2016) using the mitochondrial COI sequence. Karyotypical data allow to give further support to at least some monophyletic groupings within the current genus Astyanax, corresponding to the major clades, as it was proposed by Travenzoli et al. (2015) in Bryconidae. Despite this cohesion, some species seem to have haplotypes distributed in different clades, as demonstrated by Rossini et al. (2016). According to the authors, the COI sequence is not an appropriate tool to recover phylogenies, but rather to identify species (Hebert and Gregory, 2005). For the Astyanax genus, this sequence also seems to be unsuitable for species identification, since of the more than 70 nominal species analyzed by Rossini et al. (2016), only 21 were unequivocally

identified by barcoding. The low genetic distances observed in the mitochondrial analyses among species of the Astyanax genus are often explained by the rapid divergence between them (Ornelas-García et al., 2008; Carvalho et al., 2011; Rossini et al., 2016). Therefore, multidisciplinary approaches may be more effective for reconstructing phylogenies within this genus. In lower systematic levels, molecular data associated with the chromosomal features were particularly effective to understand evolutionary patterns in A. aff. bimaculatus (Kavalco et al., 2011) and A. fasciatus (Pansonato-Alves et al., 2013; Kavalco et al., 2016).

Overlapping cytogenetic onto molecular phylograms offers an insight on large-scale chromosome evolutionary Astyanax. This analysis allows us to point out as the main specific chromosomal markers to be explored: the distribution patterns of 5S rDNA and As51 repetitive DNA, as they show signatures within the genus Astyanax. On the other hand, other characters such as location of the Nuclear Organizing Regions and C-banding shows patterns seem more informative at population level, as demonstrated by Jacobina et al. (2011) in Hoplias malabaricus (Erythrinidae). Although other markers have been used sporadically in chromosome studies of the genus Astyanax (Barbosa et al., 2015, 2017; Silva et al., 2015; Piscor and Parisi-Maltempi, 2016), their informative value will require intensive sampling in an array of species.

The genus Astyanax is currently incertae sedis in the Characidae family, and the evolutionary relations within the genus, as well as its phylogenetic relation to other genera of the family are quite challenging both from the morphological point of view (Mirande, 2009, 2010), and from the molecular systematic (Oliveira et al., 2011; Rossini et al., 2016). Despite the incomplete database, others genera related to Astyanax usually show more conservative cytogenetic characters, such as steady patterns of 50 chromosomes, multiple NORs and 5S ribosomal genes located in one or more pairs of chromosomes at the terminal region. These suite of characteristics have been reported in the genus Oligosarcus (Kavalco et al., 2005; Hattori et al., 2007; Barros et al., 2015), Deuterodon (Mendes et al., 2011; Coutinho-Sanches and Dergam, 2015), Hollandichthys and Ctenobrycon (Carvalho et al., 2002), plus the absence of the homology with the As51 repetitive DNA (Kavalco et al., 2005), suggesting that these characters are close to the evolutionary origins of Astyanax, being symplesiomorphies in clade 1.

#### Clade 1

The clade 1 obtained by our molecular analyses corresponds to clade 5 of Rossini et al. (2016). In the specimens analyzed we can observe the plesiomorphic chromosome characteristics (as a karyotype with 2n = 50 chromosomes and a low FN) and with good structure in A. ribeirae (Kavalco et al., 2010), A. intermedius, and A. giton (Kavalco and Moreira-Filho, 2003; Kavalco et al., 2004). In A. hastatus, different cytotypes with 2n = 50 chromosomes have been observed, constituting yet another species complex within the genus Astyanax (Kavalco et al., 2009), as corroborated by the present study (**Figure 2**). Also, a combination of molecular and chromosomal data suggests that the current A. hastatus may encompass more than one OTU (operational taxonomic unit) and so, more than one ESU (evolutionarily significant unit). The Astyanax species distributed in the coastal basins were added to the Probolodini group with species from the genus Probolodus, Deuterodon, and Myxiops and Hyphessobrycon luetkenii (Silva, 2017). These other species are less studied by cytogenetic methods, but until now, the chromosome characteristics seems to be shared (Mendes et al., 2011; Coutinho-Sanches and Dergam, 2015), bringing new evidence supporting the group. Unfortunately, As51 repetitive DNA in these coastal distribution species has not been confirmed so far (Kavalco et al., 2007, 2009).

Smaller variations in the number of 5S rDNA sites can be observed among these species, although they are always located in the proximal or distal region of acrocentric chromosomes, reaching up to 10 markings in A. intermedius and A. giton, which differ by a pericentric inversion (Kavalco et al., 2004). The hypothesis that these characteristics are a symplesiomorphy can be corroborated in independent chromosome data, such as those obtained for A. taeniatus (Cunha et al., 2016), which is also included in the same clade 5 by Rossini et al. (2016). The evolutionary dynamics of this gene are related not only to variations in non-transcribed spacers, but also to syntenia with long and short interspersed nuclear elements, non-long terminal repeat retrotransposons, U-snRNA families, and microsatellite polymorphisms (Rebordinos et al., 2013). According to these authors, polymorphisms in non-transcribed regions are observed in fish. Polymorphisms in transcribed regions do not appear to interfere with the cellular activity of 5S rDNA, and the molecular diversity of the 5S rDNA gene families is greater than the chromosome diversity (Rebordinos et al., 2013).

#### Clade 2

At the root of clades 2, 3, and 4, it is possible to hypothesize two main chromosomal events: the occurrence one of the 5S rDNA sites to the proximal position of a specific submetacentric chromosome pair, which has been considered as a marker (Almeida-Toledo et al., 2002) and the presence of As51 repetitive DNA in the Astyanax genome.

species and their related group as A. fasciatus, the A. scabripinnis group, A. bockmanni, and A. bifasciatus. This group is the most chromosome diverse in the genus and shows little mitochondrial divergence.

The 5S rDNA in clade 2 appears on a pair of two-arm chromosomes in the proximal region of the centromere (Kavalco and Almeida-Toledo, 2007). This marker may have arisen by pericentric inversion from an acrocentric chromosome bearing the 5S rDNA site, which should represent the most basal character state.

In addition to this site, A. mexicanus also presents four more 5S rDNA sites on acrocentric chromosomes, being distal markings on one pair and proximal markings on another (Kavalco and Almeida-Toledo, 2007). The isolation of the group represented by the A. mexicanus is congruent with some molecular data that relate ichthyofauna invasions in Central America with the genesis of the Panamá Isthmus (around 3.3 Mya; Ornelas-García et al., 2008), as well as with clustering with DNA barcoding (Rossini et al., 2016). Unfortunately, cytogenetic data for the species of this group are scarce, but it would not be surprising to have independent autapomorphies of the karyotype evolution of the cisandine species of Astyanax.

As51 satellite DNA is partly repetitive tandem DNA and holds similarities with transposable elements that were isolated from a population of A. scabripinnis carrier of chromosome B (Mestriner et al., 2000). In this population, this DNA was located mainly in the B chromosome, besides two to four sites in the distal region of acrocentric chromosomes. Considering the possible origin from transposable elements, it is appropriate to assume that their distribution was initially restricted to a few sites and subsequently spread in the genome by intrinsic mechanisms of the repetitive sequence amplification. In fact, the individuals of A. mexicanus that form clade 2 in the present work have few sites that carry this satellite DNA, and that it is shown more diffusely than in other species, which suggests a smaller number of copies or even only partial homology (Kavalco and Almeida-Toledo, 2007). This divergence might be related to independent evolution of A. mexicanus relative to the A. scabripinnis strain that donated this probe.

#### Clade 3

The increase of As51 satellite DNA sites characterizes the common ancestral root of clades 3 and 4, in contrast to the loss of 5S rDNA sites on acrocentric chromosomes observed in members of Clade 3. Clade 3 is composed of the species complex "A. bimaculatus," represented herein by A. altiparanae, A. aff. bimaculatus, A. lacustris, A. asuncionensis, and A. abramis. Specimens from this clade present 2n = 50 chromosomes,

with a higher FN, multiple Nucleolar Organizing Regions, and consequently multiple sites of 18S rDNA, according to the samples analyzed (Kavalco et al., 2011) and others independent studies (Almeida-Toledo et al., 2002; Fernandes and Martins-Santos, 2004, 2006; Peres et al., 2008; Hashimoto et al., 2011; Giongo et al., 2013).

Clade 3 is characterized by a synapomorphy: only one pair of chromosomes carrying a rDNA 5S site on its pericentromeric region (Kavalco et al., 2011) and in the literature (Almeida-Toledo et al., 2002; Fernandes and Martins-Santos, 2006; Peres et al., 2008, 2012; Hashimoto et al., 2011; Paiz et al., 2015). One exception is A. abramis, which presents in addition to the site of the chromosomal marker pair, a pair of extra-acrocentric chromosomes with proximal marking (Paiz et al., 2015; Piscor et al., 2015). This differentiated pattern may be an autapomorphy of A. abramis, whose molecular distinctiveness within the group was also evident in our analysis (**Figure 5**). Additionally, the pattern of occurrence of a 5S rDNA carrier pair can be seen in other species related to the A. bimaculatus group, as in A. janeiroensis (Vicari et al., 2008), A. goyacensis (Santos et al., 2013), and A. elachylepis (Santos et al., 2016).

In turn, there is little data on the distribution of As51 satellite DNA in this group, besides previous reports (Kavalco et al., 2011). In these species, small sites are observed, suggesting relatively lower number of in tandem copies and more restricted distribution through the karyotype, except for A. bimaculatus from coastal basins, in which As51 satDNA is absent (Kavalco et al., 2011). On the other hand, in Astyanax janeiroensis, a considerable number of very conspicuous sites are observed (Vicari et al., 2008; Kantek et al., 2009b). Although the Catalog of Fishes (Eschmeyer et al., 2017) characterizes the distribution of A. janeiroensis as being "in Brazil", Melo (2001) suggests that its distribution is restricted to the basins of the Paraíba do Sul and other coastal drainages. These discrepant and independent occurrences in A. janeiroensis and A. bimaculatus can be a result of the historical biogeography of coastal drainages, affected by stream capture from continental basins and complex dispersal due to marine regressions and transgressions (Pereira et al., 2013). Fish faunas in these regions are characterized by a high degree of endemism and low species richness (Albert and Reis, 2011).

### Clade 4

Clade 4 comprises a group of species with the highest chromosomal variability in the genus, with 2n ranging from 46 to 50 chromosomes and including the species complexes A. scabripinnis (encompassing A. rivularis and A. paranae) and A. fasciatus, as well the species A. bockmanni and A. bifasciatus (formerly referred to as Astyanax sp. B). Equally variable among the species of this clade is the number of NOR/rDNA 18S sites and the constitutive heterochromatin distribution patterns. The molecular divergence among these species is apparently quite recent, with low genetic distance indices as observed in the clade 1 and by Rossini et al. (2016) using the COI gene. Despite some level of structuring, low bootstrap indexes preclude further hypotheses on the genetic diversification of the group. There is no detailed chromosome information on the other species analyzed by Rossini et al. (2016) belonging to this clade, except for A. parahybae (Kavalco and Moreira-Filho, 2003; Kavalco et al., 2004) presenting 2n = 48 chromosomes, and A. schubarti (Morelli et al., 1983) presenting 2n = 36 chromosomes. Unfortunately, we were unable to obtain the mitochondrial DNA sequence from A. parahybae in the present work.

Among the specimens analyzed, this clade presents some species with a conserved chromosome number, always with 2n = 50 chromosomes, such as A. bockmanni and A. bifasciatus (Fazoli et al., 2003; Kavalco et al., 2009; Hashimoto and Porto-Foresti, 2010). On the other hand, the others species present high levels of numerical chromosome variation.

This diploid number (2n = 50) was also observed in the A. paranae specimens from the Paranaíba and some A. rivularis specimens from the São Francisco river basin, although other specimens of the Paranaíba river basin also had 2n = 46 chromosomes (data not shown, in preparation). Both species belong to the historical A. scabripinnis species complex, characterized by broad sympatric and allopatric karyotype variation (Moreira-Filho and Bertollo, 1991, among others).

Finally, a diploid number that varies from 2n = 46 or 2n = 48 was observed among the analyzed specimens of A. fasciatus (Pazza et al., 2006; Kavalco et al., 2016) that was also observed in other populations (Artoni et al., 2006; Medrado et al., 2008; Pansonato-Alves et al., 2013). This variation is considered common in the species, although 2n = 50 chromosomes have already been reported (Artoni et al., 2006).

In relation to rDNA 5S, most of the species/populations already analyzed have the same following pattern: one site on the marker chromosome plus a pair of acrocentric chromosomes with a rDNA 5S site in the proximal region; this was also observed in the specimens analyzed in the present work (Kavalco et al., 2004, 2007, 2009, 2016; Pazza et al., 2006) and in the literature available for A. bockmanni, A. fasciatus, and A. parahybae (Almeida-Toledo et al., 2002; Hashimoto et al., 2011; Silva et al., 2013; Daniel et al., 2015). Despite this relatively conserved pattern, there are two autapomorphies in the 5S rDNA phenotype in A. fasciatus. Medrado et al. (2015) reported the occurrence of a small variation in the distribution of 5S rDNA sites from a population of Astyanax aff. fasciatus where the marker site of the metacentric chromosome is absent, an evident populational autapomorphy. We also considered the additional occurrence of two 5S rDNA sites of A. aff. fasciatus from the Paraíba do Sul river basin an autapomorphy. In total, six 5S rDNA-bearing chromosomes are detected, represented always by the marker chromosome pair plus two st/a chromosomes pairs in this sample (Kavalco et al., 2016).

In relation to the A. scabripinnis group, the most studied from the cytogenetic point of view (Pazza and Kavalco, 2007), several populations of the Coastal, São Francisco, and Upper Paraná rivers show the same mentioned pattern (Mantovani et al., 2005; Fernandes and Martins-Santos, 2006; Peres et al., 2008; Vicari et al., 2008). However, other studies have already demonstrated, in addition to these standard sites, more distal or proximal sites in other pairs of acrocentric chromosomes in populations of coastal rivers and the Upper Paraná basin (Kavalco et al., 2004).

Although absent in our analyses, A. schubarti is a species that has available chromosome data and is found in clade 1 of Rossini et al. (2016), the clade analogous to clade 4 of this work. Its low chromosomal number (2n = 36) and high FN suggest the occurrence of Robertsonian events of chromosome fusion at its origin (Morelli et al., 1983). In fact, this species shows the 5S rDNA sites located in the proximal region of two pairs of metacentric chromosomes (Almeida-Toledo et al., 2002). Interestingly, A. currentinus, a species recently described (Mirande et al., 2015), presents 2n = 36 chromosomes and the same distribution pattern of A. schubarti 5S rDNA, with an additional odd site (Paiz et al., 2015). The authors suggest that these species may belong to the same morphological group and are phylogenetically related. Unfortunately, there are no available phylogenetic data to support this hypothesis.

Among the specimens used in the present study, populations of the A. fasciatus and A. scabripinnis species complexes presented sites with homology with the As51 satellite DNA (Kavalco et al., 2007, 2013; Pazza et al., 2008). On the other hand, this probe showed no homology in the chromosomes of A. bockmanni (Kavalco et al., 2009). Among the species included in this group and in the clade 1 of Rossini et al. (2016), in only one population of A. rivularis (cited as A. scabripinnis) this repetitive sequence is absent (Abel et al., 2006). On the other hand, other species belonging to clade 1 of Rossini et al. (2016) have already presented homology with As51 satellite DNA, such as A. parahybae (Kavalco et al., 2007) and A. serratus (quoted as Astyanax sp.; Kantek et al., 2009a). Among these species, the available data are mainly concentrated in the group A. scabripinnis/paranae/rivularis (Abel et al., 2006; Kantek et al., 2009b; Barbosa et al., 2015, 2017), and in A. fasciatus (Abel et al., 2006; Kantek et al., 2009b; Medrado et al., 2015). The distribution of this satellite DNA in A. fasciatus seems to follow a biogeographic pattern, with an increase in the number of sites in drainage populations in the interior of the continent, and a decrease in coastal populations (Kavalco et al., 2013; Medrado et al., 2015). This pattern does not appear to be the same in the A. scabripinnis group and its cryptic species.

The absence of a clear biogeographic pattern of satellite DNA distribution in the A. scabripinnis populations should be related to the fact that these fish inhabit isolated headwaters. In addition to facilitating the occurrence of vicariance processes, this ecology keeps low levels of gene flow among, with consequent reduction of population sizes and possible strong effect of genetic drift, resulting in random of As51 satDNA. This could easily generate the biogeographic gaps that occur in the genomic distribution and the existence of populations in which this repetitive DNA is absent, even being a basal characteristic for

#### REFERENCES


the group. In fact, it is not yet proven if there is any adaptive role for this DNA, even though we have evidence that it is present in a larger amount in the species with the greatest chromosome diversity, with extensive polymorphisms such as A. fasciatus. The occurrence of the As51 satDNA predominantly in supernumerary chromosomes (Mestriner et al., 2000) may imply exogenous origin.

The data of the literature obtained in the present study still cannot answer definitively the questions about the origin and evolution of these sequences within the lineages of Astyanax, only to indicate trends within specific clades. Can As51 satDNA promote the chromosomal rearrangements that generate the diversity in diploid numbers and karyotype formulas observed in Astyanax; or can the genome of species with chromosomal plasticity, such as those of the genus, facilitate the dispersion of repetitive DNAs? To answer this question, we need more information about the role of this DNA in the karyotype and organismic evolution of the group. Fortunately, the genus Astyanax, due to the great concentration of studies, is an excellent model to answer these questions of comparative genomics.

### AUTHOR CONTRIBUTIONS

RP and KK designed the research and performed the analyses. All the authors collected data and contributed to the writing and review of the manuscript.

# FUNDING

This study was financed by the Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG; APQ-01247-08, APQ-00756-09, and APQ-01510-11).

#### ACKNOWLEDGMENTS

The authors thank the Manejo e Conservação de Ecossistemas Naturais e Agrários (MCENA) post-graduate program, Federal University of Viçosa, Campus Florestal.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00131/full#supplementary-material


Brazil). Neotrop. Ichtyol. 4, 197–202. doi: 10.1590/S1679-6225200600020 0005



among sympatric samples of Astyanax fasciatus (Characiformes, Characidae). Cytogenet. Genome Res. 141, 133–142. doi: 10.1159/000354885



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Pazza, Dergam and Kavalco. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Molecular Identification of Shark Meat From Local Markets in Southern Brazil Based on DNA Barcoding: Evidence for Mislabeling and Trade of Endangered Species

#### Edited by:

Rodrigo A. Torres, Universidade Federal de Pernambuco, Brazil

#### Reviewed by:

Henrik R. Nilsson, University of Gothenburg, Sweden Reinaldo A. De Brito, Universidade Federal de São Carlos, Brazil

#### \*Correspondence:

Victor H. Valiati valiati@unisinos.br Nelson J. R. Fagundes nelson.fagundes@ufrgs.br

†These authors have contributed equally to this work.

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 01 November 2017 Accepted: 03 April 2018 Published: 27 April 2018

#### Citation:

Almerón-Souza F, Sperb C, Castilho CL, Figueiredo PICC, Gonçalves LT, Machado R, Oliveira LR, Valiati VH and Fagundes NJR (2018) Molecular Identification of Shark Meat From Local Markets in Southern Brazil Based on DNA Barcoding: Evidence for Mislabeling and Trade of Endangered Species. Front. Genet. 9:138. doi: 10.3389/fgene.2018.00138 Fernanda Almerón-Souza1†, Christian Sperb2†, Carolina L. Castilho<sup>2</sup> , Pedro I. C. C. Figueiredo<sup>1</sup> , Leonardo T. Gonçalves <sup>1</sup> , Rodrigo Machado<sup>2</sup> , Larissa R. Oliveira<sup>3</sup> , Victor H. Valiati <sup>2</sup> \* † and Nelson J. R. Fagundes <sup>1</sup> \* †

<sup>1</sup> Laboratório de Genética Médica e Evolução, Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil, <sup>2</sup> Laboratório de Biologia Molecular, Centro de Ciências da Saúde, Universidade do Vale do Rio dos Sinos, São Leopoldo, Brazil, <sup>3</sup> Laboratório de Ecologia de Mamíferos, Centro de Ciências da Saúde, Universidade do Vale do Rio dos Sinos, São Leopoldo, Brazil

Elasmobranchs, the group of cartilaginous fishes that include sharks and rays, are especially vulnerable to overfishing due to low fecundity and late sexual maturation. A significant number of elasmobranch species are currently overexploited or threatened by fisheries activities. Additionally, several recent reports have indicated that there has been a reduction in regional elasmobranch population sizes. Brazil is an important player in elasmobranch fisheries and one of the largest importers of shark meat. However, carcasses entering the shark meat market have usually had their fins and head removed, which poses a challenge to reliable species identification based on the morphology of captured individuals. This is further complicated by the fact that the internal Brazilian market trades several different elasmobranch species under a common popular name: "cação." The use of such imprecise nomenclature, even among governmental agencies, is problematic for both controlling the negative effects of shark consumption and informing the consumer about the origins of the product. In this study, we used DNA barcoding (mtDNA, COI gene) to identify, at the species level, "cação" samples available in local markets from Southern Brazil. We collected 63 samples traded as "cação," which we found to correspond to 20 different species. These included two teleost species: Xiphias gladius (n = 1) and Genidens barbus (n = 6), and 18 species from seven elasmobranch orders (Carcharhiniformes, n = 42; Squaliformes, n = 3; Squatiniformes, n = 2; Rhinopristiformes, n = 4; Myliobatiformes, n = 3; Rajiformes, n = 1; and Torpediniformes, n = 1). The most common species in our sample were Prionace glauca (n = 15) and Sphyrna lewini (n = 14), while all other species were represented by four samples or less. Considering IUCN criteria, 47% of the elasmobranch species found are threatened at the global level, while 53% are threatened and 47% are critically endangered in Brazil. These results underline that labeling the meat of any shark species as "cação" is problematic for monitoring catch allocations from the fishing industry and discourages consumer engagement in conservationist practices through informed decision-making.

Keywords: cação, elasmobranch, cytochrome oxidase-1, shark fisheries, wildlife DNA forensics

#### INTRODUCTION

Elasmobranch (subclass Elasmobranchii) is a group of cartilaginous fishes that include sharks (superorder Selachii) and rays (superorder Batoidea). Even though elasmobranchs comprise less than 1% of the world fisheries catch (Food and Agriculture Organization of United Nations, 2014, 2016), these species have biological characteristics that make them particularly vulnerable to overfishing, such as a low fecundity and late sexual maturation (Bornatowski et al., 2014b). Indeed, several recent reports have indicated that there has been a reduction of elasmobranch populations, resulting in demographic collapse at a regional scale (Baum et al., 2003; Barausse et al., 2014). The overfishing of sharks is especially problematic because these top predators play a key role in marine ecosystems, and, therefore, their population dynamics may affect all local marine diversity (van der Elst, 1979; Heithaus et al., 2008; Gallagher et al., 2012; Pauly et al., 2013; Worm et al., 2013; Bornatowski et al., 2014a). In 1999, FAO (Food and Agriculture Organization) launched an international plan for the conservation and management of sharks and rays, recognizing the high vulnerability of these organisms (Vannuccini, 1999). However, despite this initiative, a significant number of elasmobranch species has remained overexploited or threatened by fisheries activities (Camhi et al., 2009; Cosandey-Godin and Morgan, 2011), which is illustrated by the 42% global increase in the shark meat trade from 2000 to 2011 (Food and Agriculture Organization of United Nations, 2015).

While shark fins are considered to be one of the most valuable products in the ocean (Gallagher and Hammerschlag, 2011), shark meat often attains only 20–60% of the price of tuna and mackerel meat (Bonfil, 1994). As a result, captured individuals usually have their fins removed for the shark fin market, the head is discarded, and the remaining central body part ("cigar") is then sold for the shark meat market with no special care (Kotas et al., 2008; Ward-Paige et al., 2012). From a taxonomic point of view, the removal of the head and fins represents a challenge to reliable species identification based on morphological features, allowing shark carcasses to be traded fraudulently (Holmes et al., 2009).

Brazil is among the six countries that have the highest capture rate for elasmobranchs (Lack and Sant, 2006), even though a thorough assessment of the impact of industrial fishing is made difficult by inaccurate records (Barreto et al., 2017). Southern Brazil is a region of high elasmobranch diversity (Lucifora et al., 2011), and has a large extractive marine fishing industry, with approximately 160 thousand metric tons of fish caught annually (MPA. Boletim estatístico da pesca e aquicultura, 2011). The two southernmost states, Santa Catarina (SC) and Rio Grande do Sul (RS), are responsible for 98% of the catches (MPA. Boletim estatístico da pesca e aquicultura, 2011). In addition, Brazil is a major player in the meat trade market, acting as the world's largest importer of shark meat in 2011 (Food and Agriculture Organization of United Nations, 2015). Internally, the Brazilian shark meat market trades several different elasmobranch species under the popular name "cação" (or other related popular terms such as "caçonete" and "anjo"), which is used to label several species (Figure S1). For example, Neto (2013) found 21 different species traded under the common name "cação", including hammerhead sharks (Sphyrna spp.), the blue shark (Prionace glauca), the tiger shark (Galeocerdo cuvier), the bull shark (Carcharhinus leucas), the Galapagos shark (C. galapagensis), and the blacktip shark (C. limbatus). Consumers value "cação" meat for its low cost and for being a "thornless fish" (Bornatowski et al., 2007). However, most consumers are not aware that "cação" is a synonym for sharks (or rays), and others believe that "cação" represents "a specific race of sharks" or even "a race of small sharks" (Bornatowski et al., 2015). Supermarkets, fisheries, and restaurants often omit any other information when selling "cação" meat. Indeed, the use of this term is so widespread that even Brazilian regulatory agencies categorize all elasmobranch species as "cação" without any species-specific information (MPA. Boletim estatístico da pesca e aquicultura, 2011).

The imprecise nomenclature of elasmobranchs makes it difficult to mitigate the negative effects of human shark consumption, as it becomes more difficult to inform the consumer if the product comes from a threatened species or from an illegal species trade. Since shark carcasses are sliced before being sold, it is virtually impossible to obtain accurate species diagnosis based on morphological traits for marketed elasmobranchs (Bornatowski et al., 2015). Therefore, there is an increasing need for fast, reliable, and cheap testing for determining the taxonomic identity of commercialized fishes (Rasmussen and Morrissey, 2008). A precise identification of marketed species also assures that the correct information is presented to the consumer, motivating him or her to take part in honest and regulated trade (Moretti et al., 2003; Martinez et al., 2005).

DNA barcoding uses a small fragment from a DNA sequence located within a standardized region of the genome to allow precise species identification (Hebert et al., 2003). In animals, the standard DNA barcode comes from a stretch of 650 base pairs (bp) from the 5′ end of the mitochondrial gene Cytochrome Oxidase Subunit I (COI or Cox 1) (Meyer and Paulay, 2005; Hajibabaei et al., 2007). This technique has been widely used in a range of studies of species identification (e.g., Meyer and Paulay, 2005; Lowenstein et al., 2010; Carvalho et al., 2011; Rodrigues-Filho et al., 2012; Galimberti et al., 2013). Whilst DNA barcoding is a valuable tool for species identification, especially when the entire organism cannot be accessed for morphology, there are important limitations concerning its accuracy, which depend on the reference database available and on the degree of genetic difference among species (see Frézal and Leblois, 2008 for a review on the pros and cons of DNA barcoding). The aim of this study is to use DNA barcodes to identify, at the species level, samples of "cação" (or similarly labeled) meat available in local markets in Southern Brazil. Finally, we discuss the implications of these findings in the context of elasmobranch conservation in Brazil.

#### MATERIALS AND METHODS

#### Sample Collection

We studied samples sold under general names such as "cação," "caçonete," and "filé anjo," which usually refer to elasmobranch species. Between 2008-2013 and in 2016 we acquired filet samples from local fish markets and supermarkets in different cities from the RS and SC states in Southern Brazil (**Figure 1**,**Table 1**). We also included in the analysis samples from Sphyrna lewini (n = 4), Pseudobatos horkelii (n = 2), Rhizoprionodon lalandii (n = 1), Narcine brasiliensis (n = 1), Zapteryx brevirostris (n = 2), and Gymnura altavela (n = 1), collected from fishing vessels and morphologically identified according to Figueiredo (1977), to serve as controls for the DNA barcode identification. These samples are identified as E\_\_ in **Table 1**. All samples were stored in 95% ethanol at −20◦C.

#### Laboratory Procedures

DNA extraction started from a small portion (∼100 mg) of the tissue. For most samples we used the Wizard <sup>R</sup> Genomic DNA Purification Kit (Promega) modified to include an initial digestion step with 200 µg proteinase k (Aljanabi and Martinez, 1997). For the remaining samples, we used a protocol based on the CTAB method (Doyle, 1987). We used the COI primers FishF2 (5′ TCG ACT AAT CAT AAA GAT ATC GGC AC 3′ ) and FishR2 (5′ ACT TCA GGG TGA CCG AAG AAT CAG AA 3′ ) (Ward et al., 2005). Amplification reactions were prepared with 0.4µM of each dNTP, 1.5 mM MgCl2, 0.5µM of each primer, 1 U Taq Polymerase, and ∼40 ng of genomic DNA. Cycling conditions included an initial denaturing step of 94◦C for 5′ , followed by 10 cycles of 94◦C for 1′ , 55◦C (−0.5◦C/cycle) for 1 ′ , and 72◦C for 1′ 30′′, and 30 additional cycles of 94◦C for 1′ , 50◦C for 1′ , and 72◦C for 1′ 30′′, with a final extension step of 72◦C for 5′ . The amplification products were visualized on a 1% agarose gel stained with GelRedTM (Biotium). PCR products were purified enzymatically using 0.33U SAP (Shrimp Alkaline Phosphatase) and 3.33U ExoI (Exonuclease I). PCR products were sequenced by the Sanger method in Macrogen Inc. (Seoul, South Korea) and Ludwig Biotec (Porto Alegre, Brazil). DNA sequencing was performed on both strands using the primers mentioned above.

### Data Analysis

The consensus sequence for each sample was assembled and trimmed in Geneious 9.1 (www.geneious.com). The reliability

FIGURE 1 | Sampling locations in Southern Brazil. 1, Rio Grande; 2, Porto Alegre; 3, Tramandaí + Imbé; 4, Arroio do Sal; 5, Torres; 6, Passo de Torres; 7, Araranguá; 8, Laguna; 9, Imbituba; 10, Florianópolis; 11, Itajaí.

TABLE 1 | Sample information, species identification, average genetic distance, and results from the BLAST search.


(Continued)


E54\* MG703574 232b Passo de Torres, SC fresh Zapteryx brevirostris 0.000 100 100

#### TABLE 1 | Continued

<sup>a</sup>Average genetic distance against all sequences from the same species in the final dataset.

<sup>b</sup>%Coverage and %Identity values considering the top-BLAST hit for the candidate species.

\*All samples identified as E\_\_ were obtained directly from fishing vessels, and were not purchased.

NC, not computed.

of each consensus sequence was assessed by a thorough visual inspection of the chromatograms used in the assemblies to check for sequencing errors and artifacts. Low quality regions in the chromatograms, identified as a stretch of five or more contiguous bases having high background noise and uneven spacing, were trimmed and removed before sequence assembly. Because the assembly algorithm gives more weight to better quality reads, cases of sequence heterogeneity between strands are resolved in favor of the best quality read or, if both reads had similar quality for that position, marking it as an ambiguous base (N, R, Y, etc.). The consensus sequence was then used as a query for comparison with the NCBI database (http://www.ncbi.nlm.nih. gov/) using the Basic Local Alignment Search Tool—Nucleotide (BLASTn). In all cases, BLAST matched COI sequences from elasmobranchs (or, in some cases, from teleosts) with good coverage and identity (see Results), suggesting that we generated authentic COI sequences from our samples. We recorded the species representing the top BLAST hit for each query. Following this, we built a dataset of 2,877 COI sequences deposited in the GenBank including all species of all genera represented in the list of top BLAST hits. For example, if the top BLAST hit for a given sample was S. lewini, we included all sequences from all Sphyrna species (including eventual "Sphyrna sp." entries) in the dataset. We then picked at random 2–8 sequences for each species, which were aligned with the consensus sequences from the samples generated in this study using MAFFT 7.0 (Katoh and Standley, 2013), leading to a final dataset of 323 COI sequences for 147 species (including undescribed or unknown species). As a final quality control step, we checked the dataset for nonsense mutations and alignment gaps, as both could indicate the presence of nuclear mitochondrial translocations (Numts) (Triant and DeWoody, 2007). The final

alignment file can be downloaded as Supplementary Material (File S1). The best substitution model (HKY+G+I) for this final dataset was estimated in jModelTest 2 based on the corrected Akaike Information Criterion (AICc) (Darriba et al., 2012). Pairwise genetic distances was estimated in PAUP<sup>∗</sup> 4.0 (Swofford, 2002) based on most likely substitution model and its associated parameters [Lset base = (0.3624 0.2434 0.0914) nst = 2 tratio = 6.1561 rates = gamma shape = 0.8490 ncat = 4 pinvar = 0.4860]. For this final dataset, we inferred the maximum likelihood (ML) tree in RAxML 8.2.10 (Stamatakis, 2014). Node credibility was assessed based on 1,000 bootstrap replicates.

#### RESULTS

In total, 63 samples were collected, amplified, sequenced, and compared to GenBank sequences (**Table 1**). High quality sequences ranged between 204 and 650 bases. There was no sequence heterogeneity between strands involving high quality bases from two or more reads. Overall, our analysis suggests the presence of 20 different species among the samples. Seven samples were identified as belonging to two Actinopterigii (ray-finned fishes) species: Xiphias gladius (Perciformes, swordfish; n = 6), and Genidens barbus (Siluriformes, white sea catfish; n = 1). The remaining samples may represent 18 elasmobranch species from three shark orders (Carcharhiniformes, Squaliformes, and Squatiniformes; n = 42, 3, 2, respectively) and four ray orders (Rhinopristiformes, Myliobatiformes, Rajiformes, and Torpediniformes; n = 4, 3, 1, 1, respectively). Three ray species (P. horkelii, Z. brevirostris, and N. brasiliensis) were only found in samples from fishing vessels (i.e., they were not purchased in the market).

Based on COI sequences, all but one elasmobranch samples were identified at the species level, representing 17 formally described species. One sample was associated with an undescribed or unsequenced species (which occurs in GenBank as "Rajiformes sp. BOLD: AABB1882"). The most common species found among market samples were Prionace glauca (blue shark, n = 15) and S. lewini (scalloped hammerhead shark, n = 14). All other species were far less common, including R. lalandii (Brazilian sharpnose shark, n = 4), Carcharhinus brachyurus (copper shark, n = 3), Carcharhinus falciformis (silky shark, n = 2), Sphyrna zygaena (smooth hammerhead shark, n = 2), Squalus mitsukurii (shortspine spurdog, n = 2), Galeorhinus galeus (school shark, n = 1), Rhizoprionodon porosus (Caribbean sharpnose shark, n = 1), Squalus cubensis (Cuban dogfish, n = 1), Squatina occulta (hidden angel shark, n = 1), and Squatina guggenheim (spiny angel shark, n = 1). All ray species identified in the study occurred once or twice among the samples: G. altavela (spiny butterfly ray, n = 2), P. horkelii (Brazilian guitarfish, n = 2), Z. brevirostris (shortnose guitarfish, n = 2), Myliobatis goodei (southern eagle ray, n = 1), and N. brasiliensis (Brazilian electric ray, n = 1).

The average genetic distance between each sample and representatives of its most likely candidate species (determined by its clustering in the ML tree) was always lower than 3.50%, and usually lower than 1% (**Table 1**). The ML tree showed cohesive clusters of conspecific sequences (**Figure 2**). The few exceptions, which had bootstrap support values lower than 90, included S. mitsukurii, C. brachyurus, S. guggenheim, and S. occulta (**Figure 2**). In all cases, however, the estimated genetic distance between our samples and reference sequences were used to indicate the most likely candidate species (shown in **Table 1**). An interesting case is sample MP16, whose top-hit in BLAST was Squalus montalbani, but clustered with S. mitsukurii in the ML tree (**Figure 2**). However, both MP16 and MP18 showed a much smaller distance from S. mitsukurii (0.0009) than to any other closely related species (0.0027 vs. S. cf. megalops; 0.0043 vs. S. montalbani; 0.0058 vs. S. chloroculus; and 0.0088 vs. S. cf. mitsukurii). Similarly, IIL04, IIL05, and IIL14 were much closer to C. brachyurus (0.0014) than to C. brevipinna (0.0150), MG08 was closer to S. occulta (0.0000) than to S. guggenheim (0.0071), while FA16 was closer to S. guggenheim (0.0008) than to S. occulta (0.0064). The complete distance matrix can be downloaded as Supplementary Material (File S2).

#### DISCUSSION

We found 18 Elasmobranchii and two Actinopterigii species among the samples acquired in Southern Brazilian fish markets as "cação," "caçonete," or "filé anjo." This represents 17% of all elasmobranch species registered for Southern Brazil and 13% of the species described for Brazil (Bornatowski et al., 2009). Other studies, based on other molecular markers, that aimed at species identification of shark filets from Northern Brazil have also shown the great number of species being trade without any taxonomic control (Rodrigues-Filho et al., 2009; Palmeira et al., 2013). Unfortunately, our DNA data does not allow us to conclude that these samples represent individuals captured in Southern Brazil. For example, most individuals included in the final dataset did not have location information. Additionally, even if this was available, it is unclear whether COI would have enough resolution to allow unambiguous recognition of regional stocks for these species (Antoniou and Magoulas, 2014). However, the fact that the vast majority of samples collected in this study were purchased fresh is a strong indication that these specimens may have been captured off Southern Brazil or in nearby areas.

The use of the COI DNA barcode allowed us to identify all samples at the specific level even though some cases deserve further discussion. The best match for IIL26 was an undescribed or unsequenced species, Rajiformes sp. BOLD:AAB1882 (Coverage = 96%, Identity = 100%). The sample MG08 resulted in a short DNA sequence, whose top-result in BLAST was against S. occulta (Coverage = 95%, Identity = 100%), but showed an inconclusive clustering with any Squatina species in the ML tree (**Figure 2**). Nevertheless, as occurred for other samples (FA16, IIL04, IIL05, IIL14, MP16, MP18), comparing the genetic distance among alternative candidate species allowed the identification of the most likely candidate for each sample. In the case of the samples associated to Squatina, species identification was corroborated by the fact that both S. occulta and S. guggenheim occur off Southern Brazil (Vaz and Vaz and De Carvalho, 2013) and that both samples were acquired as fresh filets, likely indicating a local catch. With a single exception, the species associated with the top-BLAST result also resulted in the lowest average genetic distance. The exception was MG16, whose top-BLAST result was Squalus montalbani (Coverage = 100%, Identity = 99%), but whose lowest average genetic distance was against S. mitsukurii, which also represented the second and third top-BLAST results (Coverage = 96%, Identity = 100%). The low genetic distance among Squalus species and the lack of a clear structure in the ML tree (**Figure 2**) may indicate that DNA barcoding for this genus may be more complicated than for other genera, and may require other genetic markers. From the taxonomic point of view, it is difficult to discriminate among Squalus species (Haddad and Gadig, 2005), which may be due to a shallow diversification time that is reflected in the low genetic distances among several species. The inherently difficult taxonomy of the genus may favor misnomers in reference databases. In this regard, S. cubensis presents a likely example of database confusion. There are two COI sequences for this species in GenBank. However, while the entry FJ519595 is close to S. mitsukurii (∼0.2% genetic distance) the other, FN431670, is distantly related to it (∼7.2% genetic distance) and associated with sample MP15 (**Figure 2**). These issues reinforce the importance of database curation and maintenance, with rigorous taxonomic criteria for the deposition of reference sequences (Ekrem et al., 2007; Teletchea, 2009; Dudgeon et al., 2012). It also highlights that in some cases it may be important to analyze additional genetic markers for a more accurate species identification (Mendonça et al., 2009; Moftah et al., 2011; Pérez-Jiménez et al., 2013).

The most abundant shark species in our samples were Prionace glauca and S. lewini (23.8 and 22.2%, respectively).

FIGURE 2B | ML tree based on HKY+G+I distance. The miniature on the upper left side shows major groups, displayed in more detail in individual panels. The number of shark and ray symbols represent the number of different species identified in the study for each group. Please note that this is an unrooted tree. Most entries were collapsed and the names were omitted for clarity. Samples from the present study are labeled according to Table 1. The most likely candidate species, together with other closely related species are shown in red. The numbers above the branches represent bootstrap percentage based on 1,000 replicates. Bootstrap values <70 were omitted. Please note the different scale among panels. The full ML tree is available as Supplementary Material (File S3).


\*Species included in the National List of Species Threatened of Extinction (available at Portaria MMA n◦ 445 of 2014).

a IUCN Red List of Threatened Species (IUCN, 2017).

<sup>b</sup>Global conservation status according to IUCN (2017) criteria, followed by the year of assessment: (CR) critically endangered; (EN) endangered; (VU) vulnerable; (NT) near threatened; (DD) data deficient; (NE) not evaluated.

<sup>c</sup>National conservation status according to the Brazilian Red Book of Threatened Faunal Species (Instituto Chico Mendes de Preservação da Biodiversidade, 2016).

<sup>d</sup>Regional conservation status according to the List of Threatened Fauna of the Rio Grande do Sul State (Fundação Zoobotânica e Secretaria do Ambiente Desenvolvimento Sustentável,

Decreto n◦ 51.797).

<sup>e</sup>Regional conservation status according to the List of Threatened Fauna of the Santa Catarina State (Fundação de Meio Ambiente – FATMA).

P. glauca is distributed globally and its capture volume has been estimated at approximately 20 million individuals per year (Mendonça et al., 2012). Despite its endangered status (**Table 2**), P. glauca is the most fished shark in the world, representing 56% of the total catch of pelagic sharks, especially by industrial fisheries in which the target species are tuna or swordfish (Rose, 1996; Dulvy et al., 2003; Camhi et al., 2009). After the increasing market demand for shark fins and the high prices paid for them, these animals began to be targeted for the removal of these parts, with the carcasses being sold worldwide (Domingues, 2011). Indeed, we identified P. glauca in all samples acquired as frozen filets, which may reflect that these individuals were captured in other parts of the world, such as Taiwan and subsequently imported to Brazil (Figure S1). However, we also found P. glauca among fresh samples, which more likely indicates local capture. On the other hand, S. lewini was the most abundant species among fresh samples, which may indicate a higher local impact on this species. Several authors have raised concerns of predatory fishing for this species off Brazil due to the high commercial value of its fins (Amorim et al., 2011). This results in fishing pressures occurring over all phases and life cycles of these animals, including neonates (Mader et al., 2007) both on the continental shelf and in oceanic waters (Kotas, 2004; Kotas et al., 2005; Vooren and Klippel, 2005).

Regarding their conservation status, IUCN estimates that 47% of the elasmobranch species found in this study are considered threatened at the global level, 53% are threatened at the national level, and 47% are critically endangered at the national level (**Table 2**). It is difficult, however, to present a more regional picture, given that the red list for both Rio Grande do Sul and Santa Catarina states include only 59 and 23.5% of the species identified in this study, even though there are records for most of these species off these Brazilian states (Gadig, 2001). The conservation status for R. lalandii, S. mitsukurii, S. cubensis, M. goodei, and N. brasiliensis is unknown due to data deficiency (DD). In the worst-case scenario, ∼50% of the species identified in this study would be threatened to some extent.

Our sampling was restricted to the south of Brazil due to a limited budget, but it would be important to perform similar studies in other Brazilian regions to provide a better picture of the shark fishing and trade in the country. It should be noted, however, that the Southern coast of Brazil is a hotspot for shark diversity, with high species richness, high endemism, and functional richness (Lucifora et al., 2011). Another future direction would be investigating how much of the shark meat market involves individuals fished locally.

Finally, an important issue in the conservation of these species is how local human populations will engage in more sustainable consumption practices. In this sense, labeling the meat of any shark species as "cação" may impose major barriers to conservation measures for this group, allowing the inadvertent consumption of protected species (Jacquet and Pauly, 2008). Indeed, Bornatowski et al. (2015), who interviewed fish meat consumers in Southern Brazil, reported that 61% of respondents claimed that they have never tried shark meat, even though they ate "cação." In addition, 69% of respondents said they did not know that at least 25% of all elasmobranchs are threatened. Given these answers, it is evident that a significant portion of the population buying these products is not aware of the impact of their consumption habits, or of the current conservation status of elasmobranch species. Another issue for consumers is mislabeling of shark products, a common outcome of DNA barcode assessments of seafood products (Barbuto et al., 2010; Filonzi et al., 2010). This is illustrated by the presence of the two teleost species detected among our sample (**Table 1**). Therefore, it becomes essential and an ethical responsibility for the industry to label their products correctly and allow informed decisionmaking by the consumers. We suggest that all meat being sold as "cação" should be accompanied by the species common name, followed by its scientific name, and, whenever possible, the species threat categories according to the IUCN Red List. While fishing legislation may also have a positive impact on natural populations by suspending the capture and marketing of endangered elasmobranchs, environmental education measures focusing on the fishing community and on consumers will be fundamental for the effective protection of these species.

# AUTHOR CONTRIBUTIONS

FA-S, CS, LO, VV, and NF designed the study; FA-S, CS, CC, PF, and RM executed experimental procedures; FA-S, CS, PF, LG, LO,

#### REFERENCES


Bonfil, R. (1994). Overview of World Elasmobranch Fisheries. Rome: FAO.

Bornatowski, H., Abilhoa, V., and Charvet-Almeida, P. (2009). Elasmobranchs of the Paraná Coast, southern Brazil, south-western Atlantic. Mar. Biodivers. Rec. 2:e158. doi: 10.1017/S1755267209990868

VV, and NF performed data analysis and interpretation; FA-S, CS, VV, and NF wrote the paper.

#### FUNDING

This research was supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Fundação Grupo Boticário de Proteção à Natureza (Projeto 0921\_20112), Programa de Pós-Graduação em Biologia Animal (PPGBAN-UFRGS), and Programa de Pós-Graduação em Genética e Biologia Molecular (PPGBM-UFRGS) and Programa de Pós-Graduação em Biologia (UNISINOS).

#### ACKNOWLEDGMENTS

The authors would like to thank Walter de Nisa e Castro Neto (Pró-Squalus), Maria João Veloso da Costa Ramos Pereira (UFRGS), Clarissa Fleck (McGill University), Andrea Thomaz (University of Michigan), Daniel Lyons (University of Michigan), Bonnie Armour (University of Adelaide), and two reviewers for contributions in earlier versions of this manuscript.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00138/full#supplementary-material


Para a Gestão de Pesca do Estado de São Paulo, Brasil. Master's thesis, Instituto de Pesca, São Paulo.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Almerón-Souza, Sperb, Castilho, Figueiredo, Gonçalves, Machado, Oliveira, Valiati and Fagundes. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Hidden Diversity Hampers Conservation Efforts in a Highly Impacted Neotropical River System

Naiara G. Sales<sup>1</sup> \*, Stefano Mariani<sup>1</sup> , Gilberto N. Salvador<sup>2</sup> , Tiago C. Pessali<sup>3</sup> and Daniel C. Carvalho<sup>4</sup>

<sup>1</sup> Ecosystems and Environment Research Centre, School of Environment & Life Sciences, University of Salford, Salford, United Kingdom, <sup>2</sup> Laboratório de Ecologia e Conservação, Universidade Federal do Pará, Belém, Brazil, <sup>3</sup> Museu de Ciências Naturais PUC Minas, Pontifícia Universidade Católica de Minas Gerais, Belo Horizonte, Brazil, <sup>4</sup> Programa de Pós-graduação em Biologia de Vertebrados, Pontifícia Universidade Católica de Minas Gerais, Belo Horizonte, Brazil

#### Edited by:

Rodrigo A. Torres, Universidade Federal de Pernambuco, Brazil

#### Reviewed by:

Fernanda Simões De Almeida, Universidade Estadual de Londrina, Brazil Carolina Machado, Universidade Federal de São Carlos, Brazil

> \*Correspondence: Naiara G. Sales n.g.sales@edu.salford.ac.uk; naiarasl@hotmail.com

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 01 November 2017 Accepted: 03 July 2018 Published: 24 July 2018

#### Citation:

Sales NG, Mariani S, Salvador GN, Pessali TC and Carvalho DC (2018) Hidden Diversity Hampers Conservation Efforts in a Highly Impacted Neotropical River System. Front. Genet. 9:271. doi: 10.3389/fgene.2018.00271 Neotropical Rivers host a highly diverse ichthyofauna, but taxonomic uncertainty prevents appropriate conservation measures. The Doce River Basin (DRB), lying within two Brazilian threatened hotspots (Atlantic Forest and Brazilian Savanna) in southeast Brazil, faced the worst ever environmental accident reported for South American catchments, due to a dam collapse that spread toxic mining tailings along the course of its main river. Its ichthyofauna was known to comprise 71 native freshwater fish species, of which 13 endemic. Here, we build a DNA barcode library for the DRB ichthyofauna, using samples obtained before the 2015 mining disaster, in order to provide a more robust biodiversity record for this basin, as a baseline for future management actions. Throughout the whole DRB, we obtained a total of 306 barcodes, assigned to 69 putative species (with a mean of 4.54 barcodes per species), belonging to 45 genera, 18 families, and 5 orders. Average genetic distances within species, genus, and families were 2.59, 11.4, and 20.5%, respectively. The 69 species identified represent over 76% of the known DRB ichthyofauna, comprising 43 native (five endemic, of which three threatened by extinction), 13 already known introduced species, and 13 unknown species (such as Characidium sp., Neoplecostomus sp., and specimens identified only at the sub-family level Neoplecostominae, according to morphological identification provided by the museum collections). Over one fifth of all analyzed species (N = 16) had a mean intraspecific genetic divergence higher than 2%. An integrative approach, combining NND (nearest neighbor distance), BIN (barcode index number), ABGD (automatic barcode gap discovery), and bPTP (Bayesian Poisson Tree Processes model) analyses, suggested the occurrence of potential cryptic species, species complex, or historical errors in morphological identification. The evidence presented calls for a more robust, DNA-assisted cataloging of biodiversity-rich ecosystems, in order to enable effective monitoring and informed actions to preserve and restore these delicate habitats.

Keywords: barcode, biodiversity, cryptic diversity, Doce River, ichthyofauna, molecular identification

**Abbreviations:** BIN, barcode index number; BOLD, Barcode of Life Data System; DRB, Doce River Basin; NND, nearest neighbor distance.

# INTRODUCTION

fgene-09-00271 July 23, 2018 Time: 16:37 # 2

Neotropical rivers host an extremely diverse ichthyofauna, but anthropogenic impact associated with the occurrence of many still undescribed or unknown species may hamper conservation effort (Reis et al., 2016; Ely et al., 2017). Due to increasing, rapid anthropogenic environmental impacts (e.g., pollution, siltation, mining, damming), biodiversity in Neotropical rivers may be lost before scientists can fully describe and comprehend it (Agostinho et al., 2005).

Effective biodiversity conservation relies on unequivocal and precise species identification, especially in the case of ecosystems that underwent degradation and require restoration. However, high biodiversity regions, such as the neotropics, and the increasingly reduced budget for basic taxonomical research, have led to the so-called "taxonomic impediment" or "poor taxonomy", in which the shortage of funding and trained taxonomists, and the gaps in taxonomic knowledge, have delayed advances in assessment and description of biodiversity or even contributed to overestimate or underestimate species richness due to species misidentification or taxonomic confusions (Taylor, 1983; Ely et al., 2017).

The DNA barcoding initiative offers a powerful and costeffective tool to assist with the detection of cryptic species and flag potentially problematic taxa, with the standard universal COI marker having proven particularly successful in invertebrates (Hebert et al., 2004a), birds (Hebert et al., 2004b), and fish (Ward et al., 2005; Hubert et al., 2008; Valdez-Moreno et al., 2009; Carvalho et al., 2011; Rosso et al., 2012). For effective DNA barcode performance, intraspecific variability must be lower than variability among congeneric species, the so-called 'Barcode Gap' (Meyer and Paulay, 2005). While the barcode gap tends to be around <1–2% sequence variability within species in most fish, there are exceptions (Hurst and Jiggins, 2005), especially in the case of recently diverged species (Vinas and Tudela, 2009; Shum et al., 2017). Moreover, the unambiguous identification of species from early larval stage to adulthood can aid a variety of conservation management actions. Accurate molecular identification may contribute to improving management and sustainability of long term fisheries (Metcalf et al., 2007), tracking invasive species (Corin et al., 2007; Carvalho et al., 2009), offer insights into community ecology (Pfenninger et al., 2007) and genetic certification of species used in restocking programs (Metcalf et al., 2007), as well as improving fundamental knowledge on cryptic and putatively new species (Pereira et al., 2011). Furthermore, molecular identification of eggs and larvae can provide data regarding spawning and recruitment areas, supporting a definition of priority areas for conservation (Becker et al., 2015; Frantine-Silva et al., 2015).

DNA barcode libraries have been developed for several Neotropical river systems as a biodiversity identification tool, and have contributed to reveal the existence of putatively cryptic/new fish species (Carvalho et al., 2011; Pereira et al., 2011; Gomes et al., 2015; Pugedo et al., 2016; Nascimento et al., 2016). However, the biodiversity complexity remains unknown in many already impacted catchments in Brazil. One emblematic case is that of the Doce River Basin (DRB), which faced the worst environmental accident reported for any South American catchment, in the form of the largest tailings dam burst in modern history; as a result, a toxic mud (i.e., extreme high concentration of iron) spread along its main river course, affecting wild communities, as well as the local human populations (Fernandes et al., 2016; Neves et al., 2016). As the local riverine human communities rely on fisheries for their livelihood (e.g., source of income and subsistence, resource for ecotourism), understanding the impacts of this disaster on the ichthyofauna is crucial for effective management actions (Ecoplan-Lume, 2010; GFT, 2015; Neves et al., 2016). Moreover, the recovery of fish populations in DRB, after the ecological disaster, relies on the recolonization of the main course of this river and on the diversity, size, and conservation status of the remnant fish populations in the tributaries (Olds et al., 2012).

The DRB runs through two Brazilian biodiversity hotspots (Atlantic forest and Brazilian Savanna) located in south-east Brazil (Myers et al., 2000). The river is 853 km long and the catchment covers a total drainage area of 83.400 km<sup>2</sup> , between the states of Minas Gerais (86%) and Espírito Santo (14%), an area inhabited by three million people. DRB harbors a rich ichthyofauna, including several undescribed species, with the number of presently recognized native species summing up to 71 (Vieira, 2009). The Santo Antônio River, the second largest tributary of the Doce, was selected as a conservation priority area, since it hosts a great number of species considered endemic and threatened by extinction (Vieira et al., 2000; Vieira and Alves, 2001; Rosa and Lima, 2005). Historically, DRB is affected by human impacts by many ways. Native forest cover only 27% of DRB area (ANA, 2016), and the remained area is used to cattle, forestry, agriculture, and mining (Vieira, 2009), resulting in high rate of siltation (da Silva et al., 2011). Habitat fragmentation lead by hydroelectric construction is also affecting DRB, where there are 40 hydroelectric built along main channel of Doce River and its principal tributaries (ANEEL, 2010). However, without accurate biodiversity knowledge, species conservation may be hindered in this river system, and it had already been suggested that the environmental disaster involving the mining collapse could have led to the depletion/extinction of many still unknown endemic species (Fernandes et al., 2016). Here, we develop a DNA barcode library for the DRB ichthyofauna, using data obtained prior to the dam burst environmental disaster, contributing to an improved biodiversity baseline record for this recently impacted ecosystem.

#### MATERIALS AND METHODS

#### Sampling

We obtained fish tissue samples from 306 specimens collected between 2011 and 2015 along the main river channel and tributaries (**Figure 1**), identified and deposited by taxonomists in four Brazilian ichthyological collections: PUC Minas Natural History Museum (MCNIP), Museu de Biologia Professor Mello Leitão (MBML), Museu de Zoologia da Universidade Estadual de Campinas (ZUEC), and Núcleo de Pesquisas em Limnologia,

Ictiologia e Aquicultura (NUPELIA). All analyzed specimens were photographed, geo-referenced, and identified to the lowest taxonomic level from identification keys or previously published works (Vari, 1992; Albert et al., 1999; Castro and Vari, 2004; Zanata and Camelier, 2009).

#### Ethics Statement

All fish analyzed in this study were collected in accordance with Brazilian legislation (Collection license 6421-1, number 5498740) or obtained from Ichthyological collections. Fish were collected, and euthanized; samples of fins were clipped from each individual and stored in absolute ethanol for subsequent molecular analysis. Specimens were fixed in 10% formaldehyde and then stored in 70% ethanol.

#### DNA Extraction, Amplification, and Sequencing

Genetic analyses were conducted, whenever possible, on a minimum of five specimens from different sample sites per species. DNA extraction followed the salting out protocol (adapted from Aljanabi and Martinez (1997)). The cytochrome c oxidase I (COI) gene ( 650 bp) was amplified by polymerase chain reaction (PCR) using the primers FishF1/FishR1 described by Ward et al. (2005) and the Cocktail COI-3/C\_FishF1t1- C\_FishR1t1 described by Ivanova et al. (2007), and following the PCR protocol described in Gomes et al. (2015). The PCR products were visualized on 1% agarose gel, alongside negative controls and a size ladder, and positive amplifications were selected for DNA sequencing. DNA sequencing was conducted in both directions in an automated DNA analyzer ABI 3500 (Life Technologies).

#### Data Analysis

Barcode sequences were edited using DNA Baser <sup>R</sup> v.3.5.4 (DNA Sequence Assembler v4 (2013), Heracle BioSoft<sup>1</sup> ) and SeqScape v.2.1.1 (Applied Biosystems, Foster City, CA, United States) (Díaz et al., 2016) softwares. DNA alignment was conducted using the CLUSTAL W alignment tool (Thompson et al., 1997). The neighbor-joining (NJ) trees (Saitou and Nei, 1987) and genetic distances estimations, using the K2P (Kimura-2-parameter) nucleotide evolution model (Kimura, 1980) were generated using MEGA 7 software (Kumar et al., 2016).

Intra- and inter-specific genetic distances, nearest neighbor distance (NND), and the barcode gap were calculated in the on-line Barcode of Life Data System (BOLD) Workbench<sup>2</sup> (Ratnasingham and Hebert, 2007). The NND was used to estimate the minimum genetic distance between pairs of species. Different approaches were used to delimitate the Molecular Operational Taxonomic Units (MOTUs), two clustering algorithms [barcode index number (BIN) and Automatic Barcode Gap Discovery (ABGD)] and one phylogeneticcoalescent method Bayesian Poisson Tree Processes model (bPTP). The BIN (Ratnasingham and Hebert, 2013) was estimated automatically in BOLD Workbench and allowed comparing DNA barcodes obtained here with other river basins that have a comprehensive DNA Barcode library, such as the São Francisco, the Mucuri, the Jequitinhonha, the Paraná, and the Paranaíba River Basins (Carvalho et al., 2011; Pereira et al., 2011; Gomes et al., 2015; Díaz et al., 2016; Pugedo et al., 2016). Using this approach, it is possible to identify endemic lineages and shared ichthyofauna. ABGD analyses (Puillandre et al., 2012)

<sup>1</sup>www.DnaBaser.com

<sup>2</sup>http://www.boldsystems.org

Sales et al. DNA Barcoding in a Tropical River

were performed using the web interface (<sup>3</sup>web version 'May 31 2017') with a relative gap width value of X = 1.0 and two available distance metrics [JC69 (Jukes and Cantor, 1969) and K2P (Kimura, 1980)], while the other parameter values employed default settings. The bPTP was conducted using both ML (maximum likelihood) and Bayesian approaches (Zhang et al., 2013). The PTP file input consisted in a nexus tree generated in MrBayes (Ronquist and Huelsenbeck, 2003) using six random parsimony trees, with the GTRGAMMA substitution model (obtained by MEGA 7 under BIC criteria), without rooting and applying the parameters of 20 million MCMC generations and a burn-in of 10%. Analysis was conducted applying default values through the bPTP server (500,000 generations, thinning = 100, burn-in = 10%).

All data, including fish photos, GPS coordinates of each sample site, vouchers numbers, detailed taxonomic identifications, and the corresponding sequence data and trace files were submitted to the BOLD (<sup>2</sup> see Ratnasingham and Hebert, 2007) within the project file 'DNA Barcoding of DRB'.

### Species Delimitation and Hidden Biodiversity

Species delimitation based on integrative approaches that combine a diverse range of statistical methods has been extensively used to identify hidden biodiversity (i.e., Padial et al., 2010; Costa-Silva et al., 2015; Gomes et al., 2015; Rossini et al., 2016; Ramirez et al., 2017). Here, species with >2% of intraspecific genetic divergences, still undescribed or unknown and identified only at genus or family level were investigated individually to detect the occurrence of new molecular operational taxonomic units (MOTUs) according to the congruence among BIN, ABGD, bPTP outputs.

Undescribed species or those only identified at genus or family level were checked using the BIN and NND analyses in order to verify their occurrence in clusters composed by other nominal species, and their genetic divergence from the nearest neighbor (including species from DRB and/or distinct Brazilian basins). Were considered as new MOTUs when intraspecific genetic divergence was higher than 2% for described species and distinguished clusters identified by BIN, ABGD, and bPTP outputs.

#### RESULTS

Morphological identification on the 306 specimens yielded 69 species (see Supplementary Table S1) in of which 43 are native species (five endemic, three threatened by extinction and one endemic and threatened), 13 non-native species and 13 unknown species to the DRB (see Supplementary Table S2), representing over 76% of its known freshwater ichthyofauna (Vieira, 2009). We then obtained 306 partial sequences of the COI gene, consisting of 665 bp on average, and no insertions, deletions, or stop codons were detected, indicating that there was no case of NUMTS TABLE 1 | Distance summary reports for sequence divergence between species, genus, and family level, including minimum, mean, and maximum genetic distances (K2P).


(Nuclear mitochondrial DNA sequences) (Song et al., 2008; Hazkani-Covo et al., 2010).

A mean of 4.54 individuals per species were sequenced, comprising 45 genera, 18 families, and 5 orders [Characiformes (41.9%), Siluriformes (40.6%), Perciformes (9.4%), Gymnotiformes (4.7%), and Cyprinodontiformes (3.4%)]. Species represented by one or two specimens (N = 19) were not included in the estimation of intraspecific divergences (Callichthys callichthys, Cichla kelberi, Clarias gariepinus, Hoplosternum littorale, Hyphessobrycon bifasciatus, H. eques, Hypostomus sp., Lophiosilurus alexandri, Metynnis maculatus, Parotocinclus maculicauda, Pimelodus maculatus, Poecilia vivipara, Prochilodus vimboides, Pygocentrus nattereri, Salminus brasiliensis, Steindachneridion doceanum, Trichomycterus aff. auroguttatus, T. cf. brasiliensis, and T. longibarbatus). The NJ tree identified species-specific clades for 80.9% of all species. The mean genetic distances found within species, genera, and families were: 2.59, 11.4, and 20.5% (**Table 1**), respectively.

Over 65% of the analyzed species showed genetic distances lower than 1% and for 70% of the species the divergence value was below 2% (**Figure 2A**). When considering intra-generic distance, 19% of the species had a divergence higher than 20% (**Figure 2B**), suggesting the possibility of taxonomic errors or cryptic species.

#### Intra- and Inter-Specific Divergence

Intraspecific distance varied from 0 to 21.82%. Particularly high genetic distances (>10%) were recovered among specimens of Astyanax fasciatus (20.69%), Astyanax scabripinnis (21.82%), Astyanax sp. (20.5%), Characidium sp. (10.17%), Crenicichla lacustris (21.36%), Harttia sp. (12.2%), Poecilia reticulata (14.34%), and Trichomycterus aff. alternatus (18.49%), flagging possible new MOTUs (i.e., hidden diversity) or problems related with taxonomic morphological identification.

The NJ tree encompassing all species showed the occurrence of monophyletic clades and absence of shared haplotypes for 44 of the 69 analyzed species. The interspecific genetic distance showed that 63.2% of the analyzed species had a K2P divergence higher than 2% to their closest neighbor, with the exception of: Astyanax spp., Deuterodon pedri, H. eques, Characidium sp. and Characidium gr. timbuiense, Gymnotus spp., Oligosarcus argenteus and O. acutirostris, P. reticulata and P. vivipara, and T. aff. alternatus and T. longibarbatus (Supplementary Figure S1).

Incongruences between morphological and barcode identifications (BIN, ABGD, bPTP) (i.e., one BIN/ABGD/bPTP cluster containing more than one morphological species, morphological species represented by more than one

<sup>3</sup>http://wwwabi.snv.jussieu.fr/public/abgd/abgdweb.html

BIN/ABGD/bPTP cluster, and/or >2% of intraspecific genetic distance and <1% of interspecific divergence) were observed within species of the genus Astyanax, Characidium, Crenicichla, Deuterodon, Gymnotus, Harttia, Hoplias, Hyphessobrycon, Hypostomus, Knodus, Neoplecostomus, Oligosarcus, Pareiorhaphis, Poecilia, Prochilodus, Rhamdia, and Trichomycterus (Supplementary Table S2).

### Identification of Molecular Operational Taxonomic Units (MOTUs)

The BIN analysis identified 81 clusters, including 48 taxonomically concordant, 17 discordant, and 16 singletons. The ABGD analysis detected 54–133 MOTUs when varying the prior maximal distance from P = 0.001 to P = 0.1000 (applying both the K2P and JC69 nucleotide evolution methods). The partition that recovered 81 groups (intraspecific distance P = 0.0077) was chosen due to its consistency with our BIN analysis. The bPTP analyses (Bayesian and ML approaches) resulted in the same number of clusters obtained by BIN, except for Harttia sp. (three BIN and ABGD clusters and one bPTP) and Prochilodus costatus (two BIN, and one ABGD and bPTP clusters). ABGD species delineation was in agreement with all the BIN clusters with the following exceptions, which contain more than one BIN for each morpho-species: A. scabripinnis (BIN: AAC5910, ABGD: 36 and 81), Knodus moenkhausii (BIN: AAM1485, ABGD: 46 and 49), P. costatus (BIN: ADC2568 and ADC2571, ABGD: 10), Trichomycterus sp./T. aff. alternatus/T. aff. auroguttatus/T. longibarbatus (BIN: ACJ1164 and ACJ1161, ABGD: 64), Trichomycterus sp./T. cf. brasiliensis (BIN: ACK5393 and ACT6325, ABGD: 65) (Supplementary Table S2).

#### Identification of Hidden Biodiversity

Sequences from fifteen undescribed species or identified only at genus or family level were compared to other species available in BOLD database through NND and BIN analyses (**Table 2**). Within undescribed or unknown species, we recovered new MOTUs from the following genera: Astyanax, Characidium, Gymnotus, Harttia, Hisonotus, Neoplecostomus, Pareiorhaphis, Phalloceros, and Trichomycterus. The other six species were not considered new MOTUs (Brycon sp., Hasemania sp., Hypostomus sp., Imparfinis sp., Neoplecostominae, and Pimelodella sp.) since they were included in BINs composed by another nominal species and showed interspecific divergence <2% with the nearest neighbor.

Among species with deep intraspecific divergence (>2%) we recovered additionally at least three putative cryptic species due to the congruence among BIN, ABGD, bPTP, and genetic distance methods for C. lacustris, Hoplias malabaricus, and Rhamdia cf. quelen (**Table 3**). A. fasciatus and A. scabripinnis despite showing a congruence of BIN and ABGD analyses were included in clusters comprising another species of the genus. K. moenkhausii had a maximum intraspecific divergence of 3.07% and two distinct ABGD numbers, however, only one clade and one BIN was recovered for this species. Astyanax lacustris, A. taeniatus, P. reticulata, P. costatus, T. aff. alternatus, and T. aff. immaculatus despite showing a high intraspecific genetic distance were included in BINs comprised by another nominal species and thus, were not considered as putative cryptic species.

# DISCUSSION

# DNA Barcoding Effectiveness

We analyzed 306 fish specimens obtained before the dam burst in 2015 and provided genetic data for the ichthyofauna of the DRB, highlighting the occurrence of cryptic and previously unrecognized biodiversity. Therefore, we significantly extend the knowledge on this river system, whose previous surveys mostly focused on the middle course of the river and in lakes located inside the Doce State Park and its surroundings (Sunaga and Verani, 1987; Vieira, 1994; Vono and Barbosa, 2001; Latini and Petrere, 2004). This baseline offers a more robust platform for any future attempt to restore biodiversity and ecosystem functions to a level comparable to pre-disaster conditions.

Using DNA barcoding, we observed an intraspecific genetic distance considerably higher than previously reported for freshwater fish species from other Brazilian basins. On the other hand, intrageneric divergences were found to be similar to previous studies (Carvalho et al., 2011; Pereira et al., 2011; Pugedo et al., 2016). These results suggest a higher occurrence of hidden biodiversity in DRB when compared to other studied Brazilian basins (**Table 4**).

#### Hidden Biodiversity

DNA barcoding has already been used to reveal hidden biodiversity, such as cryptic species and new candidate fish species in the São Francisco (Carvalho et al., 2011), Mucuri (one species – Gomes et al., 2015), and Jequitinhonha (15 species – Pugedo et al., 2016) River catchments. In DRB, from 69 morphologically identified species, the barcode analyses

TABLE 2 | List of undescribed species, including the nearest neighbor, BIN, and genetic similarity (%).


TABLE 3 | List of described species with high intraspecific divergence (>2%), showing the maximum and mean intraspecific genetic distance, clades and number of BIN, ABGD, and bPTP clusters.


<sup>∗</sup>Occurrence of cryptic species.


TABLE 4 | Comparison among DNA barcoding studies conducted in Brazilian basins, including the number of sequences and species analyzed, and intraspecific and intrageneric distances (minimum and maximum. The mean is inside the parentheses).

<sup>∗</sup>Only Nannostomus spp.

recovered 12 putative cryptic species within Astyanax sp., Characidium sp., C. gr. timbuiense, C. lacustris, Gymnotus sp., Harttia sp. (two putative cryptic species), H. malabaricus, Neoplecostomus sp., R. cf. quelen, Trichomycterus sp. (two putative cryptic species). The high intraspecific genetic distance estimation found for the DRB fish was related to the occurrence of cases of well-known species complexes – e.g., Astyanax spp. (maximum intraspecific distance reaching 21.82% in A. scabripinnis), Gymnotus sp. (6.32%), H. malabaricus (6.7%), R. cf. quelen (3.48%) and also due to the deep intraspecific barcode divergence found to putative overlooked cryptic MOTUs – e.g., C. lacustris (21.36%).

DNA barcoding allows for the identification of cryptic variation among morphologically similar species, indicating the occurrence of more than one species and reinforcing the need of an integrative approach combining molecular and morphological characters (Nascimento et al., 2016). By combining distinct species delimitation methods, we were able to identify new MOTUs from nine undescribed species (Astyanax sp., Characidium sp., Gymnotus sp., Harttia sp., Hisonotus sp., Neoplecostomus sp., Pareiorhaphis sp., Phalloceros sp., and Trichomycterus sp.). Other species showed a high similarity with already described species from another river basins (e.g., specimens of Brycon sp. were assigned as B. ferox from Mucuri River basin) and were not considered as possible new MOTUs (**Table 2**) as shown by the BIN analysis.

Among the undescribed species, we were able to highlight new MOTUs within five morpho-species due to their high intraspecific genetic divergence and based on BIN, ABGD, and NND analyses. For instance, Harttia sp. showed mean divergence of 4.67% and three clades, which were congruent within the BIN and ABGD clustering methods, suggesting the occurrence of three new MOTUs in this genus. Specimens of Hisonotus sp. were included in the same BIN/ABGD/bPTP cluster and had an exclusive BIN containing only specimens from DRB suggesting a new MOTU exclusive to this catchment. Neoplecostomus doceensis is the only loricariid from this genus described for DRB, however, we found two possible cryptic MOTUs within this taxon, as the DNA barcodes from Neoplecostomus sp. did not cluster with barcodes available for this species and had two additional distinct BIN and ABGD clusters. Furthermore, exclusive BIN/ABGD clusters were recovered for Pareiorhaphis sp. and Phalloceros sp. suggesting at least one new MOTU for each genus endemic to the DRB.

Notwithstanding the high intraspecific genetic distance and species delimitation methods detecting more than one MOTUs, we did not consider new MOTUs for species showing a high similarity with another nominal species (e.g., species comprised in the same BIN cluster as another nominal species). Species with high intraspecific divergence were recovered within Astyanax spp. (A. fasciatus, A. lacustris, A. scabripinnis, and A. taeniatus). Despite showing a deep intraspecific divergence, and congruence of BIN/ABGD clusters, these species were not considered as comprising new MOTUs due to its high genetic similarity with another nominal species (e.g., Astyanax parahybae, A. vermilion, Hyphessobrycon spp., Deuterodon sp.) observed within the BIN and NND analysis, and also, because this highly diverse group is a well-known complex of species in need of more systematic studies (Garutti, 1995; Froese and Paulay, 2010; Eschmeyer, 2015).

High intraspecific divergence was also found for T. aff. alternatus and T. aff. immaculatus. These species, despite showing a high intraspecific distance (18.49 and 5.84%, respectively), were included in BINs comprised by another nominal species (e.g., T. longibarbatus) indicating it may be a case of morphological misidentification and not the occurrence of new MOTUs. This genus has an extensive geographical range and its morphological identification is complex due to the lack of consistent synapomorphies (Barbosa and Costa, 2003). Therefore, further studies combining an integrative approach focusing in these species are required in order to investigate the occurrence of putative cryptic species.

Prochilodus costatus showed a high intraspecific divergence (2.6%) and occurrence of two clusters (NJ and BIN analyses). However, this non-native species was not considered as a putative cryptic species since it was included in BINs comprising another non-native species (e.g., Prochilodus argenteus, P. hartii). As suggested in previous studies, the incongruence between morphological and molecular identification of P. costatus may indicate the occurrence of Prochilodus hybrids and not due to new MOTUs (Gomes et al., 2015; Sales et al., 2018).

Poecilia reticulata is a species introduced worldwide, occuring in more than 69 countries outside of its native range (Deacon et al., 2011). A high intraspecific divergence (14.34%) was found

for this species in the DRB. However, two specimens of P. reticulata were assigned to a BIN comprising specimens of P. vivipara (BIN AAC0279) and the high intraspecific divergence was due to the incongruence between morphological and molecular identification and not due to the occurrence of new MOTUs. Hybridization process between congeneric species of Poecilia (P. velifera or P. petenensis and P. mexicana or P. orri) and between different populations of P. reticulata have already been reported (Kittell et al., 2005; Lampert and Schartl, 2008; Sievers et al., 2012) and the incongruence detected in this study might be a case of hybridization between P. reticulata and P. vivipara or misidentification during the deposit in the museum collection and not due to the occurrence of cryptic species.

Hidden biodiversity was found within the genera Characidium, Crenicichla, Gymnotus, Hoplias, and Rhamdia due to high intraspecific genetic divergence and congruence among clustering methods BIN, ABGD, and bPTP (**Table 3**). Species of the genera Rhamdia, Characidium, Pareiorhaphis, Gymnotus were also flagged as cryptic and/or candidate species in other Brazilian basins (Carvalho et al., 2011; Gomes et al., 2015; Pugedo et al., 2016).

For instance, within the genera Characidium spp. we detected a mean intraspecific divergence of 5.82% and the occurrence of four clades, of which: two mixed clades comprising specimens identified as C. gr. timbuiense (n = 3 and n = 4) and Characidium sp. (n = 1), one clade exclusive to C. gr. timbuiense (n = 1) and one clade exclusive to Characidium sp. (n = 4). C. lacustris showed intraspecific divergence of 10.76% and presence of two different clades and BIN/ABGD/bPTP clusters (one for samples collected in Manhuaçu River and one for samples collected below the Baguari Dam). The electric knifefishes Gymnotus spp. had an intraspecific divergence above 2% and occurrence of three different clades corroborated by 3 BIN, ABGD, and bPTP clusters. All Gymnotus specimens were initially morphologically identified as Gymnotus sp. and Gymnotus cf. carapo. However, similarly to the findings obtained for this genus in Mucuri River Basin, these clusters may represent two different known species (G. carapo and the overlooked species Gymnotus sylvius) and a new MOTU yet to be analyzed and properly described (Gymnotus sp.). Two congruent BIN, ABGD, and bPTP clusters were identified for both H. malabaricus and Rhamdia cf. quelen (mean intraspecific divergence of 6.7 and 3.48%, respectively) suggesting the occurrence of cryptic species for each of these taxa. The divergence found in H. malabaricus may be due to allopatric speciation resulting from geographical barriers enhanced by its sedentary habitat, since one cluster comprised exclusively specimens from Jose Pedro River and the other was exclusive for specimens from Corrente Grande River. High genetic diversity was already reported for this species in other studied systems (Paraná and Tibagi Rivers) suggesting distinct evolutionary lineages, population structuring or occurrence of cryptic species (Dergam et al., 1998; Blanco et al., 2011; Oliveira et al., 2015).

The increase of available barcodes in BOLD database, including adjacent basins, may contribute to expose endemic cryptic species and reduce the risk of synonymies (Gomes et al., 2015). However, Pugedo et al. (2016) highlighted the concern of using solely DNA barcodes in defining species (e.g., using NND, BIN, ABGD, and bPTP analyses) due to the fact that Neotropical DNA barcode libraries are not yet complete. Furthermore, specimens included in BINs composed by different nominal species should be re-evaluated by a taxonomist to verify the data and check for potential misidentifications (Díaz et al., 2016).

Thus, a thorough analysis should be done for each flagged species to verify the correspondence of new MOTUs with putative new candidate species based on accurate morphological taxonomy analysis and to evaluate the divergence causes and the correlation of speciation process to natural or anthropogenic causes (e.g., presence of dams).

#### Importance of DNA Barcoding Library for the Doce River Ichthyofauna

This newly developed DNA barcode reference library for the DRB fish detected the occurrence of new MOTUs and suggested the existence of hidden biodiversity. This baseline information will provide a platform for several applications and management efforts, such as ichthyoplankton identification for the detection of fish recruitment areas, unambiguous choice of species to be used in restocking programs, and environmental DNA research. This data may contribute as a baseline for restoration programs in this catchment, by pointing out new MOTUs and suggesting the occurrence of overlooked and cryptic species among the DRB ichthyofauna, highlighting the complexity of Neotropical biodiversity.

The evidence presented here calls for a more robust, DNAassisted cataloging of biodiversity-rich ecosystems, in order to enable effective monitoring and informed actions to preserve and restore delicate habitats, such as the DRB. Furthermore, studies should verify the extent to which fish biodiversity has been affected by the Doce dam collapse disaster, and what hotspots of diversity within the catchment can be identified as potential sources of replenishment. At the same time, the approaches used here, and additional high through-put methodologies (e.g., metabarcoding of water and sediment samples) should be increasingly employed to monitor biodiversity at a pace that can cater for the management needs of these increasingly impacted biodiverse habitats.

# AUTHOR CONTRIBUTIONS

NS performed the molecular genetic analyses and drafted the manuscript. GS and TP collected the samples, conducted the morphological analyses and contributed to the correction of the text. DC designed and coordinates the study. DC and SM conceived the study, participated in its elaboration and helped to draft the manuscript. All authors read and approved the final manuscript.

# FUNDING

This work was funded by the Brazilian Barcode of Life Network BrBol/CNPq (564953/2010-5), Coordenação de Aperfeiçoamento de Pessoal do Ensino Superior (CAPES-PRÓ EQUIPAMENTOS Grant Number 783380/2013), FIP PUC Minas Grant Number 010/2014, FAPEMIG, and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) (Grant Number 204620/2014-7).

#### ACKNOWLEDGMENTS

fgene-09-00271 July 23, 2018 Time: 16:37 # 9

We thank Sergio Alexandre dos Santos, PUC Minas Natural History Museum (MCNIP), Museu de Biologia Professor Mello Leitão (MBML), Museu de Zoologia da Universidade Estadual de Campinas (ZUEC), and Núcleo de Pesquisas em

#### REFERENCES


Limnologia, Ictiologia e Aquicultura (NUPELIA) for the samples provided and fish identification. NS is grateful to Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) for her scholarship (204620/2014-7) and GS is supported by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00271/full#supplementary-material


neotropical skipper butterfly Astrapes fulgerator. Proc. Natl. Acad. Sci. U.S.A. 101, 14812–14817. doi: 10.1073/pnas.0406166101



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Sales, Mariani, Salvador, Pessali and Carvalho. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Remarkable Geographic Structuring of Rheophilic Fishes of the Lower Araguaia River

Tomas Hrbek <sup>1</sup> \*, Natasha V. Meliciano1,2, Jansen Zuanon<sup>3</sup> and Izeni P. Farias <sup>1</sup>

<sup>1</sup> Laboratório de Evolução e Genética Animal, Departmento de Genética, Instituto de Ciências Biológicas, Universidade Federal do Amazonas, Manaus, Brazil, <sup>2</sup> Instituto de Saúde e Biotecnologia, Universidade Federal do Amazonas, Coari, Brazil, <sup>3</sup> Coordenação de Biodiversidade, Instituto Nacional de Pesquisas da Amazônia, Manaus, Brazil

Rapids and waterfalls, and their associated fauna and flora are in peril. With the construction of each new hydroelectric dam, more rapids and waterfalls are destroyed, leading to the disappearance of associated fauna and flora. Areas of rapids harbor distinct, highly endemic rheophilic fauna and flora adapted to an extreme environment. Rheophilic habitats also have disjunct distribution both within and across rivers. Rheophilic habitats thus represent islands of suitable habitat separated by stretches of unsuitable habitat. In this study, we investigated to what extent, if any, species of cichlid and anostomid fishes associated with rheophilic habitats were structured among the rapids of Araguaia River in the Brazilian Amazon. We tested both for population structuring as well as non-random distribution of lineages among rapids. Eight of the nine species had multiple lineages, five of these nine species were structured, and three of the eight species with multiple lineages showed non-random distribution of lineages among rapids. These results demonstrate that in addition to high levels of endemicism of rheophilic fishes, different rapids even within the same river are occupied by different lineages. Rheophilic species and communities occupying different rapids are, therefore, not interchangeable, and this realization must be taken into account when proposing mitigatory/compensatory measures in hydroelectric projects, and in conservation planning.

Keywords: Cichlidae, Anostomidae, Tocantins river basin, mitochondrial control region, micro endemism, diversity loss, rapids

# INTRODUCTION

Brazil, as many other South American nations, has experienced a period of rapid economic growth accompanied by expanding energy needs. To meet its energetic needs, Brazil has invested heavily in hydroelectric power generation such that ≈ 80% of Brazil's electricity production is currently met by hydropower (International Energy Agency, 2013a). At the same time, it is estimated that up to 55% the hydroelectric potential still could be exploited (International Energy Agency, 2013b). While the hydroelectric potential of the Paraná and São Francisco river basins in southern, central and northeastern Brazil has largely been realized, the Amazon basin is the next and last hydroelectric frontier (Lees et al., 2016; Latrubesse et al., 2017). However, several large dams, such as the Tucuruí, Balbina, Santo Antônio, Jirau and the recently completed Belo Monte

#### Edited by:

Rodrigo A. Torres, Universidade Federal de Pernambuco, Brazil

#### Reviewed by:

Henrique Batalha-Filho, Universidade Federal da Bahia, Brazil Rubens Pazza, Federal University of Viçosa, Campus Rio Paranaíba, Brazil

> \*Correspondence: Tomas Hrbek hrbek@evoamazon.net

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 01 November 2017 Accepted: 13 July 2018 Published: 14 August 2018

#### Citation:

Hrbek T, Meliciano NV, Zuanon J and Farias IP (2018) Remarkable Geographic Structuring of Rheophilic Fishes of the Lower Araguaia River. Front. Genet. 9:295. doi: 10.3389/fgene.2018.00295 have already been implemented in the Brazilian Amazon. An additional 200+ dams, such as the Tapajós hydroelectric complex, have been proposed by South American governments (Finer and Jenkins, 2012; Castello et al., 2013; Lees et al., 2016). The construction of the Tapajós hydroelectric complex was only not initiated mostly due to the economic crisis Brazil is experiencing since 2014 (Fearnside, 2015). If these plans were to be enacted, only three Amazon tributaries would remain unimpounded (Castello and Macedo, 2016). Although hydroelectric projects have been lauded as cheap and clean energy alternatives, dam construction and operation result in substantial environmental stresses including the destruction of rheophilic habitats by permanently submerging rapids and waterfalls (Clausen and York, 2008; Castello and Macedo, 2016; Pelicice et al., 2017).

Rheophilic habitats are a unique geomorphological feature of Amazonian River tributaries that descend the Brazilian and Guiana shields, cutting through the rocky surface to create a series of waterfalls and rapids, and a complex matrix of rocky habitats. Rapids are characterized as river sections of supercritical flow, where surface tension breaks at the water/air interface (Hawkins et al., 1993); the presence of rocky substratum is a fundamental component of this type of environment. These river sections are characterized by not only high velocity but also by highly heterogeneous water flow, oxygen–rich waters, and a very complex substrate matrix consisting of rocky slabs, caves, cracks, and crevices with lodged tree trunks harboring a highly diverse array of niches. It is in these high– energy sections of the rivers where hydroelectric projects are developed. Damming of the river at these sites to create a reservoir (most Amazonian hydroelectric projects) or diverting the river from these sites into a holding reservoir (as done in the Belo Monte hydroelectric project), changes permanently the hydrology of the river. Rheophilic habitats generally are permanently submerged underneath the reservoir, turning a high–energy, oxygen–rich lotic habitat into an hypoxic lentic habitat. In the case of the Belo Monte hydroelectric project, much of the rheophilic habitat in the Volta Grande has become permanently dewatered (emerged), and those sections that had not, no longer are subject to seasonal water level fluctuations necessary for the flowering and thus the reproduction of the unique Podostemaceae flora and the fish fauna that depend on this resource and which use seasonal water level fluctuations as reproductive clues (Lowe-McConnell, 1987). Under both scenarios, the specialized rheophilic fauna and flora suffers local or potentially even global extinctions (Winemiller et al., 2016).

From an ichthiofaunistic perspective, rapids had historically been poorly sampled and are poorly known environments (Böhlke et al., 1978; Menezes, 1996). On the other hand, it is known that rapids areas harbor their own specialized and commonly endemic ichthyofaunas, adapted to life in turbulent water environments (e.g., Hora, 1930; Roberts and Stewart, 1976; Kullander, 1988; Isbrücker and Nijssen, 1991; Jégu, 1992, 2004; Zuanon, 1999; Flausino Junior et al., 2016; Collins et al., 2018; Fitzgerald et al., 2018; Machado et al., 2018). Studies of natural history and ecology focusing on associations of fish species in rapids and waterfalls are rare (Balon, 1974; Balon and Stewart, 1983; Casatti and Castro, 1998; Fitzgerald et al., 2018). Mainly due to sampling difficulties, little is known about the way of life and structure of fish communities in these environments. Most of the information on the fish assemblages of rapids in the Amazon region is concentrated in unpublished technical reports, especially those related to studies of environmental impacts of proposed hydroelectric projects. In addition, existing information is usually restricted to occurrence records based on poorly resolved taxonomy, with data obtained without standardized sampling/experimental design among studies, making data interpretation and quantitative comparisons difficult. In Brazil, publications of rheophilic communities are restricted to a natural history study of rheophilic fish fauna of a stream in the headwaters of the São Francisco River (Casatti and Castro, 1998), a study (Fitzgerald et al., 2018) and an unpublished doctoral thesis (Zuanon, 1999) focusing on the Volta Grande region of the Xingu River, and a study of the fishes associated with podostomacean mats in the Aripuanã River (Flausino Junior et al., 2016). Despite the scarcity of studies, it is known that rheophilic environments harbor species–rich and highly diverse assemblages, comprised of many endemics.

Dearth of studies of rheophilic taxa is in part a consequence of the still poor general knowledge of distribution of freshwater biodiversity (Vörösmarty et al., 2010). In the Neotropics, this impediment is the result of severe undersampling of freshwater taxa in general (Lundberg et al., 2000; Lévêque et al., 2007; Alofs et al., 2014), leading to many taxa not being evaluated by the IUCN or national conservation agencies, or when evaluated being listed as data deficient (Collen et al., 2014). This, in turn, hampers prioritizing freshwater habitats for conservation (Abell, 2002; Darwall et al., 2011; Frederico et al., 2016, 2018).

Perhaps the most emblematic groups of fishes occupying rheophilic habitats in the Neotropics are the suckermouth armored catfishes (Loricariidae). The loricariids have acquired spectacular adaptations in their mouth, labial and dental morphologies that allow them to occupy rheophilic habitats and diversify within them (Lujan et al., 2012, 2017; Roxo et al., 2017; Collins et al., 2018). They probably are the most strictly associated group of fishes with rheophilic habitats (Reis et al., 2003). However, the loricariids are not the only group to have colonized and diversified within rheophilic habitats. Several clades of the plant eating pacus (Serrasalmidae) are also rheophilic specialists (e.g., Jégu, 2004; Andrade et al., 2016, 2017; Machado et al., 2018), as are some cichlid (Cichlidae) (e.g., Kullander, 1988) and anostomid (Anostomidae) (e.g., Santos and Jégu, 1987) species.

Among–river discontinuity of rheophilic habitats causes high degree of endemism, with few species occupying more than a single river basin (Reis et al., 2003), and those that occur in more than one river generally represent divergent, independently evolving lineages (Collins et al., 2018). By the same token, discontinuous rheophilic habitats within the same river system could lead to structuring of rheophilic fish species. Rheophilic habitats are islands of suitable habitat within a waterscape of unsuitable habitat, analogous to the rock islands of the great

African lakes (Salzburger et al., 2014). It is with this objective that we decided to study to what degree, if any, the rheophilic fish fauna occupying distinct and unconnected rheophilic habitat is genetically structured along the Araguaia River. The implication of population structuring of rheophilic fishes, or the occupation of different rapids by different lineages would make reophilic communities occupying different rapids even more distinct and non–substitutable when planning environmental mitigatory measures.

For this purpose we chose to sample and analyze the less well studied cichlid and anostomiid species rather than the obvious candidate loricariid species. If the rheophilic habitats really are analogous to the rocky islands and outcrops in the great African lakes, then we should observe significant amount of population structuring between rheophilic cichlids and anostomiids occupying different rapids along the river.

#### MATERIALS AND METHODS

#### Field Sampling

Over an eight day period, we sampled six stretches of rapids in the lower Araguaia River (**Table 1**; **Figure 1**). The sampled rapids were Cachoeira do Zé dos Gatos, São Miguel, Remanso dos Botos, Santa Isabel, São Bento, and Rebojo (**Figure 2**). Distance between the Cachoeira do Zé dos Gatos and Rebojo rapids is 150 km, while the São Bento and Rebojo rapids are separated by less than 1 km.

After a study of the prevailing conditions in the field, we decided to divide the team into a group of two people collecting fish with gilnets (24–60 mm mesh between opposing nodes) and castnets (10 mm mesh between opposing nodes) and one person catching fish with line and hook. This strategy aimed to optimize capture effort and to sample the available diversity of habitats occupied by anostomids and cichlids. Sampling time at each site was approximately 8 h with the the exception of the Cachoeira do Zé dos Gatos and Santa Isabel rapids where we spent approximately 16 h each.

We focused on sampling two groups: cichlids which generally are sedentary and some species are rheophilic specialists, and anostomids which also have rheophilic specialists but as a group may be considered vagil. Final choice of species included in this study was conditioned by our ability to obtain sufficient number of specimens from at least four of the six rapids.

Preliminary identification of specimens was carried out in the field. Final identification of the collected specimens was made based on comparisons with samples deposited in the INPA Fish Collection (Manaus, AM), consultation of original descriptions and use of published dichotomous keys (when available).

All field collections were authorized by IBAMA/SISBIO 11325-1, and access to genetic resources was authorized by permit No. 034/2005/IBAMA.

#### DNA Extraction, Molecular Markers and Data Pre-processing

DNA extraction was performed using standard phenol/chloroform protocol (Sambrook and Russell, 2001). For the molecular analyzes we chose to amplify the control region CR of mitochondrial DNA (mtDNA), since it is the fastest evolving section of the mitochondrial genome, and thus is most likely to register fine-scale population structuring (Meyer, 1993). We obtained approximately 650–750 base pairs for each individual. We analyzed the same region of mtDNA for all individuals and thus eliminated any confounding effects of using different mtDNA gene regions with potentially different rates of molecular evolution in different species.

A common forward primer was used for all species (ProF: 5′ - AACYCCCRCCCCTAACYCCCAAAG-3′ ). For anostomids we used the reverse primer DLOstariR.1 (5′ -gtaaaacgacggccagTCCT GGTTTHGGGGTTTRAC6AG-3′ ), while for cichlids we used the reverse primer DLPercoR.1 (5′ -gtaaaacgacggccagTCCTG

FIGURE 1 | A close up of an island within the Santa Isabel rapid complex showing habitat structure and flowering Podostemaceae.



Isabel rapid; (5) São Bento rapid; (6) Rebojo rapid.

TTTCCGGGGGGTTTACAG-3′ ). Both reverse primers incorporated an M13(-21) tail on their 5′ end. The 15 µL PCR mix included 1.2 µL of 10 mM dNTPs (2.5 mM each DNTP), 1.5 µL 10× buffer (75 mM Tris HCL, 50 mM KCL, 20 mM (NH4)2SO4), 1.2 µL 25 mM MgCl2, 1.5 µL of primer cocktails (2 pmol each), 0.5 µL of Taq DNA polymerase, 1 µL of template DNA and 6.6 µL ddH2O. PCR conditions were: 94◦C (30 s); 35 cycles of 94◦C (30 s), 50◦C (35 s), 68◦C (90 s); followed by 68◦C (5 min).

PCR products were evaluated on a 1% percent agarose gel, and then purified using Exo-SAP (Exonuclease—Shrimp Alcaline Phosphatase) following the manufacturer's suggested protocol (Werle et al., 1994). The control region products were sequenced using the M13(-21) primer. Sequencing reactions were carried out according to the manufacturer's recommendation for the ABI BigDye Terminator cycle sequencing mix, using an annealing temperature of 50◦C (Platt et al., 2007). Sequencing reactions were precipitated using standard EDTA/EtOH protocol, and was resolved on the ABI 3130xl (Life Technologies) automatic sequencer. Sequence products were edited, concatenated and aligned using the Clustal W algorithm (Thompson et al., 1994) followed by manual adjustments as implemented in the program GENIOUS v8.1.7 (Kearse et al., 2012). Sequences are deposited in GenBank under accession numbers MH514035–MH514287.

#### Analyses

Using the mtDNA control region we tested whether the populations of the species found at different rapids are differentiated from one another. We tested this hypothesis using the Analysis of Molecular Variance (AMOVA) (Excoffier et al., 1992) as implemented in the program ARLEQUIN v3.5 (Excoffier and Lischer, 2010). Significant differentiation was interpreted as lack of gene exchange between populations of individuals of the same species found in different rapids.

We also delimited lineages using the Bayesian approach implemented in the program BAPS v6.0 (Corander et al., 2008). Individual level mixture analysis was performed for different maximum number of lineages (k = 1–5), with 10 independent runs for each value of k. The k with the highest posterior probability was selected as representing the correct data partition. We subsequently tested for a non-random distribution of lineages across the different rapids from which these lineages were sampled using the Fisher's exact test as implemented in the R statistical language (R Development Core Team, 2011).

Haplotype network and geographic occurrence of haplotypes was estimated in the program HAPVIEW (Salzburger et al., 2011) using a phylogeny of haplotypes generated in RAXML v8.2.0 (Stamatakis, 2014).

Pairwise uncorrected p–distances were calculated between all individuals of a given species using GENIOUS v8.1.7 (Kearse et al., 2012).

Finally, we calculated the Tajima's D (Tajima, 1989) for each species.

All analyses were carried out separately for each species, assuming α = 0.05.

#### RESULTS

The aligned DNA data matrix comprised nine taxa, 253 individuals by 640–778 bp. Number of individuals per species varied from 16 to 43 (**Table 2**). Eight of the nine taxa had more than one lineage, with upto five lineages being observed in Crenicichla cametana. In total we observed 175 unique haplotypes (**Figures 3**, **4**) grouped into 29 lineages (**Figure 5**). The existence of multiple lineages within rheophilic taxa and Leporinus desmotes is further reforced by significantly positive Tajima's D metrics. Significant population structuring was observed in five of the nine species analyzed, and non-randon association of lineages and rapids was observed in three of the eight species with multiple lineages. Results of tests of among-rapids population differentiation, Tajima's D and Fu's Fs, and non-random distribution of lineages among rapids is summarized in **Tables 3**, **4**, respectively.

#### Species Accounts

#### Retroculus lapidifer (Cichlidae) (Figure 6A)

This species is a sand-sifter and a rheophilic specialist. The collections of this species occurred in the localities of Cachoeira do Zé dos Gatos (N = 2), Santa Isabel (N = 9), São Miguel (N = 2) and Rebojo and São Bento (N = 7). Within this species we observe at least four genetic lineages with up to 2.24% p– distance sequence divergence. Analysis of Molecular Variance (AMOVA) indicated high levels of geographic structuring (φST = 0.35119; p = 0.00040 ± 0.00019). Tajima's D was positive (0.53176). The hypothesis of random associations of genetic lineages with rapids was rejected (Fisher's exact test, p = 0.00100). These lineages were largely restricted to specific rapids.

#### Geophagus altifrons (Cichlidae) (Figure 6B)

This species is a habitat generalist but is commonly found in shallow sandy areas of stretches of rapids. The collections of this species occurred in the localities of Cachoeira do Zé dos Gatos (N = 1), Santa Isabel (N = 12), Remanso dos Botos (N = 1) and Rebojo and São Bento (N = 29). We observed only one genetic lineage within this species. AMOVA indicated geographic structuring (φST = 0.11732; p = 0.00802 ± 0.00086). Tajima's D was negative and significant (−1.79152).

#### Cichla piquiti (Cichlidae) (Figure 6C)

This species is usually found in structured habitats, including rapids, where it lurks and waits for its prey. The collections of this species occurred in the localities of Cachoeira do Zé dos Gatos (N = 1), Santa Isabel (N = 19), São Miguel (N = 1), Remanso dos Botos (N = 1) and Rebojo and São Bento (N = 6). We observed three genetic lineages with up to 2.58% p– distance sequence divergence. AMOVA indicated high levels of geographic structuring (φST = 0.29184; p = 0.01050 ± 0.00096). Tajima's D was positive and significant (2.35861). The hypothesis of random associations of genetic lineages with rapids was not rejected (Fisher's exact test, p = 0.08546).

#### Crenicichla cametana (Cichlidae) (Figure 6D)

This species is a relatively small ambush predator, and a specialist of rapids. The collections of this species occurred in the localities of São Miguel (N = 9), Remanso dos Botos (N = 5) and Rebojo and São Bento (N = 8). We observed five divergent genetic lineages with up to 1.57% p– distance sequence divergence. AMOVA indicated high levels of geographic structuring (φST = 0.86498; p < 0.00001). Tajima's D was positive and significant (1.57836). The hypothesis of random association of genetic lineages with rapids was rejected (Fisher's exact test, p = 0.00050). These lineages were largely restricted to specific rapids.

#### Hypomasticus cf. pachycheilus (Anostomidae) (Figure 7A)

This species is a rheophilic specialist, an algae scraper, found in fast flowing waters of rapids. The collections of this species


FIGURE 3 | Haplotype network of four species of Cichlidae indicating geographic occurance of the haplotypes. Networks were generated in HAPVIEW, and haplotype size is proportional among networks and to the number of individuals with that haplotype. Localities are: (1) Zé dos Gatos rapid; (2) São Miguel rapid; (3) Remanso dos Botos rapid; (4) Santa Isabel rapid; (5) São Bento rapid and Rebojo rapid.

occurred in the localities of Cachoeira do Zé dos Gatos (N = 27), Santa Isabel (N = 1), São Miguel (N = 2), Remanso dos Botos (N = 4) and Rebojo and São Bento (N = 9). We observed four genetic lineages with up to 1.04% p–distance sequence divergence. AMOVA indicated geographic structuring (φST = 0.15674; p = 0.00406 ± 0.00066). Tajima's D was

TABLE 3 | Analysis of population differentiation among rapids of the Araguaia River, and demography of each species.


\*,significantly negative; †,positive; ‡,significantly positive.

negative (-0.80127). The hypothesis of random association of genetic lineages with rapids was rejected (Fisher's exact test, p = 0.00700).

#### Leporinus maculatus (Anostomidae) (Figure 7B)

This species is found in relatively slow and fast water regions of rapids; although it is not a rapids specialist, shows a strong association with rocky substrates. The collections of this species occurred in the localities of Cachoeira do Zé dos TABLE 4 | Analysis of non-random distribution of lineages among rapids of the Araguaia River.


Gatos (N = 4), Santa Isabel (N = 9), Remanso dos Botos (N = 9) and Rebojo and São Bento (N = 6). We observed four genetic lineages with up to 0.89% p–distance sequence divergence. AMOVA indicated no genetic structuring in the species (φST = −0.04735; p = 0.87851 ± 0.00363). Tajima's D was negative (−1.04746). The hypothesis of random association of lineages with the rapids was not rejected (Fisher's exact test, p = 0.94250).

#### Leporinus affinis (Anostomidae) (Figure 7C)

This species is a habitat generalist that is found in both slow and fast water regions of rapids, and also in a wide variety of habitats and water flows within the Araguaia-Tocantins river system. The collections of this species occurred in the localities of Santa Isabel (N = 3), Remanso dos Botos (N = 2) and Rebojo and São Bento (N = 29). We observed three divergent genetic lineages with up to 1.50% p–distance sequence divergence. AMOVA indicated absence of geographic structuring (φST = 0.00075; p = 0.43525 ± 0.00483). Tajima's D was negative (−1.09098). The hypothesis of random association of lineages with the rapids was not rejected (Fisher's exact test, p = 0.16490).

#### Leporinus desmotes (Anostomidae) (Figure 7D)

This species is found in both slow and fast water regions of rapids; although it is not a rapids specialist, it shows strong association with rocky substrates. The collections of this species occurred in the localities of Cachoeira do Zé dos Gatos (N = 2), Santa Isabel (N = 2), São Miguel (N = 2), Remanso dos Botos (N = 4) and Rebojo and São Bento (N = 6). We observed two divergent genetic lineages with upto 7.33% p– distance sequence divergence. AMOVA indicated lack of genetic structuring (φST = −0.19958; p = 0.95950 ± 0.00191). Tajima's D was positive (0.96519). The hypothesis of random association of lineages with the rapids was not rejected (Fisher's exact test, p = 1.00).

#### Leporinus unitaeniatus (Anostomidae)

This species is found in both slow and fast water regions of rapids, and although it is not a rapids specialist, it is a rocky substrate dweller. The collections of this species occurred at the localities of Cachoeira do Zé dos Gatos (N = 4), Santa Isabel (N = 4), Remanso dos Botos (N = 0) and Rebojo and São Bento (N = 11). We observed two divergent genetic lineages with upto 1.54% p– distance sequence divergence. AMOVA indicated lack of genetic structuring (φST = 0.02545; p = 0.29604 ± 0.00478). Tajima's D was negative (−0.19675). The hypothesis of random association of lineages with the rapids was not rejected (Fisher's exact test, p = 0.71860).

#### DISCUSSION

In this study we have shown an unprecedented level of fine-scale population structuring among fish species that occupy rapids stretches of a large Amazonian river. Five of the nine studied species showed either non–random distribution of genetic diversity among rapids, non–random distribution of lineages among rapids or both. Among these, three rheophilic specialists (the cichlids Retroculus lapidifer and Crenicichla cametana and the anostomid Hypomasticus cf. pachycheilus) show strong population level differentiation between rapids and non–random distribution of lineages among rapids. The remaining two species of cichlids (Geophagus altifrons and Cichla piquiti) are not rapids specialists but both are phylopatric and sedentary species which

favors the occurrence of genetic structuring in habitat patches along the river. Geneflow among populations of these species occupying distinct rapids is low, in all five instances Nm<2 and in three instances Nm<1. In addition to the nine analyzed species, two additional species of rheophilic anostomids, Leporellus vittatus and Leporinus bistriatus, probably exhibit a similar pattern of genetic differentiation as Hypomasticus cf. pachycheilus. However, this could not be tested because of the low sample size resulting from the difficulty in collecting samples. Finally, Leporinus desmotes was comprised of two deeply divergent—7.33% p–distance—lineages that may represent two cryptic species. These results parallel those of the only other fine–scaled population study by Markert et al. (2010) of two rheophilic cichlid fish species collected at five rapids–associated sites of the lower Congo River separated by no more than 100 km. Therefore, significant levels of within river system geographic structuring is likely to be the norm for rheophilic fish fauna inhabiting rapids of not just the Araguaia, but all Amazonian rivers.

In summary, rheophilic and sedentary species such as cichlids showed high levels of population differentiation among rapids, and consequently low levels of geneflow, in some cases sufficiently low to prevent divergence of these populations by genetic drift. These species were also comprised of distinct lineages non–randomly distributed among rapids. When nonrandom distribution of lineages was observed, in many cases these lineages were allopatric, i.e., there was no overlap in the distribution of these lineages. This signifies that there are some lineages that only occur in specific rapids and/or groups of rapids, i.e., they are micro-endemic. However, none of these lineages are readily identifiable using traditional external morphological traits.

The occurrence of such remarkable population structuring of rheophilic fishes in the rapids along the Araguaia River points to a possible high level of extinction risk for a considerable portion of the Amazon fish fauna, at an unexpected spatial scale. The Amazon River, as other major rivers, is rapidly being developed and its hydroelectric potential is starting to be realized (Lees et al., 2016; Winemiller et al., 2016). However, more so than any other major river, the Amazon and its affluents have been largely unimpacted by large–scale anthropogenic actions and flow through predominantly pristine landscapes (Winemiller et al., 2016). Most of the basin's hydroelectric potential has also been largely untapped, but there is strong pressure to do so both in Brazil as well as other Amazonian countries (Finer and Jenkins, 2012; Fearnside, 2016). At the moment hydroelectric projects have concentrated in regions with very large energy generating potential—e.g., Tucurui or Belo Monte supplying electricity to energy intensive industries—or on those supplying local needs e.g., Balbina supplying electricity to Manaus and Curua-una supplying electricity to Santarém, both in the Brazilian Amazon.

Dam construction is subject to environmental impact studies which generally consist of faunal and floral surveys. More complex biological processes acting beyond the seasonal scale are generally not addressed by time constraints of the licencing process. Part of the licensing process requires a plan for the mitigation of environmental disturbances caused by the construction and functioning of the hydoelectric complex, although it is questionable to what extent, if any, this goal can be accomplished (Fearnside, 2012). Rarely are molecular tools employed, and when they are, they do not necessarily address relevant conservation–driven questions. If molecular tools are used, they are predominantly employed to test to what extent, if any, the area of rapids functions as barriers to geneflow to commercially important species with distributions upstream and downstream of the rapids. The presumed motivation is to determine if fish ladders should be constructed or not. The effectiveness of fish ladders in Amazonian context is questionable, however (Pelicice and Agostinho, 2008). In the only case in the Brazilian Amazon where fish ladders were constructed and their effectiveness evaluated—the Santo Antônio hydroelectric complex on the upper Madeira River many of the migratory species, including the goliath catfishes (Brachyplatystoma spp.), that used to migrate upstream of the Teôtonio rapids for reproduction scarcely do so any more, while species previously isolated by the rapids have used the ladders to invade upstream (Cella-Ribeiro et al., 2017).

Molecular tools are not, however, used to evaluate the effects on rheophilic fauna and flora, and consequently to what extent, if any, can the destruction of this fauna and flora can be compensated with mitigatory measures. To our knowledge this study is the first to attempt to specifically address this question.

#### Implication for Conservation

Rapids–dwelling fishes have ecological specializations uniquely linked to the occupation of these habitats. One of these specializations is the use of podostemaceous plants as a place of shelter, growth, foraging and direct consumption by herbivorous species (Flausino Junior et al., 2016). As an example, strongly rheophilic pacus species (Characiformes: Serrasalmidae), such as those of the genera Mylesinus and Tometes, present cutting teeth used in leaf pruning of these plants (e.g., Jégu and Santos, 1988; Santos et al., 1997; Vitorino Júnior et al., 2016; Andrade et al., 2017). The strong dependence on podostemaceous mats as food makes these pacus particularly vulnerable to the environmental impacts resulting from dam construction. The reduction of densities and numbers of the Podostemaceae in rivers with artificially regulated flow seems to have caused a significant population reduction of the pacu Mylesinus paraschomburgi, a specialized consumer of these plants (Santos et al., 1997). Similar responses to environmental perturbations have been observed in other rheophilic fish species, such as the anostomids (Santos et al., 1997), loricariids (Zuanon, 1999) and cichlids (Kullander, 1988, 1991; Zuanon, 1999).

In addition to the Podostemaceae, many rapids–dwelling fish species use epiliton, a layer of organic debris, algae and small invertebrates that cover the rocky substratum in the rapids, as food. There is evidence that rheophilic fishes are strongly dependent on this type of food (Casatti and Castro, 1998; Zuanon, 1999), and environmental impacts that alter the quality and quantity of this resource negatively affect the local ichthyofauna, with loss of species and trophic relationships.

In addition to the close and direct ecological relationship between fishes and rapids, the spatial distribution of rheophilic habitats within a course of a river also contributes to the fragility of the system. In many rivers, rapids are environments distributed in a mosaic, isolated from each other by other habitat types unsuitable for rheophilic fauna and flora. Not only are rheophilic habitats characterized by high levels of endemism and taxonomic diversity, our study suggests that rheophilic habitats are also characterized by high levels of populational structuring and phylogenetic diversity. Thus while fish assemblages and communities may be similar among the different rapids of a same river, our study indicates that they are composed of different lineages, and in some cases even ecologically equivalent but morphologically cryptic species.

Areas of rapids of several Amazonian rivers have already been radically altered by the construction of hydroelectric dams. In all these cases, the rapids and much of the associated ichthyofauna disappeared (Agostinho et al., 2008), and based on our study, this fauna is not substituitable by fauna occuring outside the area of direct impact of the hydroelectric dam. The construction of reservoirs causes a permanent change of lotic to lentic habitats, ensuing in the local extinction of the specialized rheophilic fauna and flora whose habitat became submerged under the hydroelectric reservoir. The elevated levels of phylogenetic diversity, the non-random distribution of this diversity among the rapids, with lineages often being restricted to a specific rapid or groups of rapids, implies that the disappearance of the rheophilic ichthyofauna from areas of hydroelectric projects probably leads to the extiction of these lineages, even if other lineages survive elsewhere.

#### We May Ask: Why Does This Matters?

Although ecologically equivalent lineages may persist in other areas, we are loosing unique and singular evolutionary heritage. Once extinct, these lineages can never become "unextinct." Even more importantly, these lineages are an inevitable consequence of evolution, and thus are intricately linked to the underlying evolutionary processes that gave rise to them. By destroying these lineages, we, inevitable, impair or even destroy our potential to understand these processes and the origins of this amazing biodiversity.

An argument may also be made that we should not focus on lineages, but rather on species, since species are the fundamental units of biodiversity (Soulé and Wilcox, 1980). However, species are lineages (Simpson, 1961; de Queiroz, 2007), and need to be understood in this light. If we treat species as classes, or do not recognize that species may be morphologically cryptic, often relying on incomplete taxonomic knowledge, we limit ourselves in what we can understand of the evolutionary history and the evolutionary potential of the studied groups.

Ultimately we must recognize the lineage-like nature of species and focus our studies on them if we hope to understand the evolution and maintenance of biodiversity (Willis, 2017), and if we hope to minimize our contribution to the sixth mass biological extinction (Ceballos et al., 2015).

# AUTHOR CONTRIBUTIONS

TH, IF, and JZ conceived the experiment and obtained funding. TH, NM, and JZ conducted fieldwork and collected specimens. JZ morphotyped and revised specimens. NM and TH collected molecular data. TH analyzed the results. TH wrote the manuscript with substantial contributions from JZ. All authors contributed to and reviewed the manuscript.

#### FUNDING

Funding for this study was provided by the GESAI consortium, CNPq 483155/2010-1, and 482662/2013-1 to TH.

#### REFERENCES


#### ACKNOWLEDGMENTS

We thank Marla Soares Carvalho and Kiara Formiga for helping to collect and process fish in the field, and Oliver Lucanius, Mark Sabaj, Andry Rapson, and Oldˇrich Rí ˇ can for representative photos of studied ˇ species.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Hrbek, Meliciano, Zuanon and Farias. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Multilocus Approach to Understanding Historical and Contemporary Demography of the Keystone Floodplain Species *Colossoma macropomum* (Teleostei: Characiformes)

#### *Edited by:*

Rodrigo A. Torres, Universidade Federal de Pernambuco, Brazil

#### *Reviewed by:*

Yessica Rico, Instituto de Ecología (INECOL), Mexico Fernanda Dotti do Prado, Universidade Estadual Paulista Júlio de Mesquita Filho (UNESP), Brazil Fabio Porto-Foresti, Universidade Estadual Paulista Júlio de Mesquita Filho (UNESP), Brazil

> *\*Correspondence:* Izeni P. Farias izeni@evoamazon.net

#### *Specialty section:*

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> *Received:* 08 November 2017 *Accepted:* 28 June 2018 *Published:* 14 August 2018

#### *Citation:*

Santos MCF, Hrbek T and Farias IP (2018) A Multilocus Approach to Understanding Historical and Contemporary Demography of the Keystone Floodplain Species Colossoma macropomum (Teleostei: Characiformes). Front. Genet. 9:263. doi: 10.3389/fgene.2018.00263 Maria da Conceição Freitas Santos <sup>1</sup> , Tomas Hrbek <sup>2</sup> and Izeni P. Farias <sup>2</sup> \*

<sup>1</sup> Departamento de Biologia, Universidade do Estado do Amazonas, Manaus, Brazil, <sup>2</sup> Laboratório de Evolução e Genética Animal, Departamento de Genética, Universidade Federal do Amazonas, Manaus, Brazil

We studied the natural populations of a flagship fish species of the Amazon, Colossoma macropomum which in recent years has been suffering from severe exploitation. Our aim was to investigate the existence or not of genetic differentiation across the wide area of its distribution and to investigate changes in its effective population size throughout its evolutionary history. We sampled individuals from 21 locations distributed throughout the Amazon basin. We analyzed 539 individuals for mitochondrial genes (control region and ATPase gene 6/8), generating 1,561 base pairs, and genotyped 604 individuals for 13 microsatellite loci obtaining, on average, 21.4 alleles per locus. Mean H<sup>E</sup> was 0.78 suggesting moderate levels of genetic variability. AMOVA and other tests used to detect the population structure based on both markers indicate that C. macropomum comprises a single and large panmitic population in the main channel of the Solimões-Amazonas River basin, on the other hand localities in the headwaters of the tributaries Juruá, Purus, Madeira, Tapajós, and localities of black water, showed genetic structure. The greatest genetic differentiation was observed between the Brazilian Amazon basin and the Bolivian sub-basin with restricted genetic flow between the two basins. Demographic analyzes of mitochondrial genes indicated population expansion in the Brazilian and Bolivian Amazon basins during the Pleistocene, and microsatellite data indicated a population reduction during the Holocene. This shows that the historical demography of C. macropomum is highly dynamic. Conservation and management strategies should be designed to respect the existing population structure and minimize the effects of overfishing by limiting fisheries C. macropomum populations.

Keywords: tambaqui, microsatellites, mitochondrial DNA, genetic variability, gene flow, genetic structure, Amazon basin

# INTRODUCTION

The Amazon basin holds the largest diversity of fishes in the world. It is estimated that approximately 2,411 fish species occur there (Reis et al., 2016), with 1,089 species being endemic. Aquatic biodiversity of the Amazon basin is thought to be the consequence of diversification of modern fauna that occurred mainly during the Miocene (Lovejoy et al., 2010), driven, to a large extent by the establishment of the current hydroscape. Amazonian rivers also drain three principal geological formations, the Andes and the Guyana and Brazilian Shields, with consequences for the physicochemical properties of the waters draining these geological formations. Thus some of these rivers present physical barriers which limit geneflow between different sections of the river, further acting as agents of divergence (Hoorn et al., 2010). Naturally, all of these forces interact, producing an amazingly diverse ichthyofauna. Part of this ichthyofauna is also exploited as a fisheries resource that represent the production base of an economic sector that contributes more than US\$ 200 million per year to the economy of the Brazilian Amazon basin (Barthem and Fabré, 2003). Colossoma macropomum (tambaqui) is on the top of the list of most important commercial species. This species also has an important ecological role, as it is an important disperser of seeds of trees and shrubs of the Amazonian floodplain (Araújo-Lima and Goulding, 1998). For all its commercial importance, this species has suffered over-exploitation of its natural stocks over the last years and today, juveniles account for most of the catch (Barthem and Goulding, 2007). The average size of the fish landed and sold in the main markets in the Amazon suggests that many individuals are captured before reaching sexual maturation, which occurs in females between 50 and 55 cm in length, at an estimated mean age of 3 years (Goulding and Carvalho, 1982; Isaac et al., 1996) based on the length/age relationship estimated from Bertalanffy's model by Isaac and Ruffino (1996).

Colossoma macropomum is found throughout almost the entire length of the Amazon River and most of its affluents, as well as in the Orinoco basin (Araújo-Lima and Goulding, 1998). Thus, the species is found in the three main Amazonian water types (white, clear, and black), as well as upstream and downstream of geographic barriers such as rapids (Goulding et al., 2003). Colossoma macropomum is thus an idea candidate for the study of and the understanding of the structuring patterns in the Amazon basin, which are important for the implementation of science-driven conservation measures.

Previous studies have found a high degree of genetic variability of populations of C. macropomum (Santos et al., 2007; Farias et al., 2010), indicating that overfishing has not yet affected genetic diversity of wild populations, nor were signals of population reduction detectable. The authors suggested that the absence of a genetic sign of population reduction was likely related to the large effective population size of the species. Moreover, C. macropomum has migratory behavior and moves through the rivers of the Amazon seasonally for the purposes of feeding and breeding (Araújo-Lima and Ruffino, 2004). The behavior of C. macropomum and the hydrological dynamic of the floodplain habitat it predominantly occupies may partially explain the panmixia reported for this species. This apparent lack of population structuring is found throughout the mainstream of the Amazon basin, with the exception of individuals found upstream of the series of rapids delimiting the Bolivian sub-basin from the Amazon basin (Farias et al., 2010). The authors also suggested that populations of C. macropomum from Bolivian sub-basin were largely demographically stable, while the Brazilian Amazon basin populations evidenced a historical population growth from the Pleistocene onward.

Knowledge of changes of effective population sizes of C. macropomum is important for understanding the demography of the species. In addition, robust estimates of population differentiation, are important for implementing conservation and management strategies. Therefore, the aim of the present study was to test the two hypotheses raised in previous studies of C. macropomum using samples from the entire area of distribution of C. macropomum in the Amazon basin, and using both nuclear-encoded microsatellites and mtDNA genes sequences. As the first hypothesis we test if C. macropomum populations are differentiated, considering: (i) samples of the mainstream of the Amazon River, as well as eight of its main tributaries; (ii) samples of all three major water types of the Amazon (white, clear, and black water) based on the classifications of Sioli (1984) and Venticinque et al. (2016); (iii) samples of water upstream and downstream of rapids in the Madeira and Tapajós rivers. In the second hypothesis, historical and contemporary demographic approaches were used to test if C. macropomum underwent changes in the effective population size throughout its evolutionary history in the Amazon basin.

# MATERIALS AND METHODS

#### Samples and Data Collection

A total of 637 samples of Colossoma macropomum were collected directly from artisanal fishers at 21 localities within the Brazilian Amazon basin and one locality in the Bolivian subbasin (**Figure 1**). The adipose fin or fragment of muscle tissue was removed from between 20 (localities within the Brazilian Amazon Basin) and 69 individuals (within the Bolivian subbasin) and then preserved in 100% ethanol for subsequent laboratory analyzes.

Total genomic DNA was extracted using Proteinase K/Phenolchloroform/isoamyl alcohol protocol and precipitated with 70% ethanol (Sambrook et al., 1989). Approximately 50 to 100 ng of genomic DNA was used as a template for PCR reactions. We amplified the mitochondrial DNA control region (mtDNA control region) and the ATPase subunits 6 and 8, using the primers Chara\_LDloop and Chara\_RDloop; CMF2 and CMR2 (control region) and ATP 8.2\_L8331 and CO3.2\_H9236 (ATPase genes) listed in **Supplementary Table S1**. The PCR reactions for the two regions were performed in a final volume of 15 µL containing 1.5 µl of the forward and reverse primer (2 mM), 1.5 µl of buffer (Tris-KCL 200 mM, pH 8.5), 1.5 µL of 25 mM MgCl, 1.5 µL of 25 mM dNTP, 0.3 µL of 5 U/µL Taq polymerase and 6.2 µL ddH2O. PCR conditions (for control region and ATPase gene) were as follows: denaturation at 94◦C for 60 s, primer annealing

FIGURE 1 | Map of the Amazon basin showing sampled localities. Circles represent localities in the mainstream of Amazon River (yellow), tributaries of the Amazon River (blue), and the locality of Bolivian sub-basin (red).

at 50◦C for 30 s, primer extension at 68◦C for 90 s, followed by a final extension at 68◦C for 5 min. The first three steps were repeated 35 times.

Purification of the PCR products was performed using ExoSAP (Exonuclease Enzymes and Shrimp Phosphatase Alkaline). The samples were sequenced using the BigDye terminator v3 kit (ThermoFisher), following the manufacturer's protocol. Due to the size of the control region of C. macropomum (approximately 1,100 bp), each sample was sequenced in two steps, using the CMF2 (forward) and CMR2 (reverse) internal primers (**Supplementary Table S1**). For the ATPase gene only the primer ATP 8.2 (forward) was used. The precipitated product was resolved in the ABI 3130xl DNA Analysis System sequencer (ThermoFisher), according to the manufacturer's standard protocol.

Microsatellite genotypes were generated using a multiplex design (**Supplementary Table S2**) using 13 pairs of primers developed by Santos et al. (2009) for C. macropomum. The amplification conditions for each multiplex were: For three pairs of primers: 1.5 µl MgCl<sup>2</sup> (25 mM), 1.5 µl dNTPs (10 mM), 1.5 µl 10x buffer (100 mM Tris-HCl, 500 mM KCl), 1.0 µL of each forward primer containing one of the two M13 tails (2µM), 1.5 µL of each reverse primer, 1.5 µL of fluorescence-labeled M13f (FAM) primer, 0.7 µL of primer fluorescence-labeled M13r (HEX), 0.8 µl Taq DNA Polymerase (5 U/µl) and 1 µl DNA (50– 100 ng), with a final volume of 14.5 µl. For two pairs of primers: 3.0 µl ultra pure water, 1.5 µl MgCl<sup>2</sup> (25 mM), 1.5 µl dNTPs (10 mM), 1.5 µl 10x buffer (100 mM Tris-HCl, 500 mM KCl), 1.0 µL of each forward primer containing one of the M13 tails (2µM), 1.5 µL of each reverse primer, 1.5 µL of the fluorescently labeled M13f primer (FAM), or 0.7 µL of fluorescently labeled M13r primer (HEX), 0.6 µl of Taq DNA Polymerase (5 U/µl), and 1 µl of the DNA (50–100 ng), with a final volume of 14 µl.

PCR conditions were as follows: denaturation at 94◦C for 20 s, primer annealing at 60–65◦C (depending on the primer combination) for 20 s, and extension at 68◦C for 30 s, repeated for 30 times, followed by another cycle for annealing the M13 primers with the following conditions: denaturation at 94◦C for 20 s, annealing of the M13 fluorescence-labeled primer at 50◦C for 20 s, and extension at 68◦C for 30 s, repeated for 20 times, with final extension of 30 min at 68◦C.

For the genotyping reaction the PCR products were diluted to between 10–50 µL with ultra-pure water depending on the intensity of PCR products on an agarose gel. For each 1 µl of diluted product, 8.0 µL of Hi-Di formamide (ThermoFisher, Inc.), and 1.0 µL 6-carboxy-X-rhodamine (ROX) size standard from DeWoody et al. (2004) were added. The samples were genotyped in ABI 3130xl automatic sequencer (ThermoFisher, Inc.) and allele sizes (in base pairs) were estimated in GeneMapperTM software version 4.0 (ThermoFisher, Inc.). Matrix of genotypes is available at https://github.com/legalLab/datasets.

The sequences of the control region and subunits 6 and 8 of the ATPase gene were verified, edited and aligned in the program BIOEDIT v7.0.5 (Hall, 1999). The ATPase genes were translated into hypothetical amino acids in the program MEGA 6.0 (Tamura et al., 2013) to verify the presence of any unexpected stop codons. Sequences were deposited in the GenBank under accession numbers MH514288–MH514827 for control region and MH520124–MH520663 for ATPase genes.

# Mitochondrial DNA Analyses

The existence of population structure was tested for using the Analysis Molecular Variance (AMOVA) implemented in the program ARLEQUIN v3.5 (Excoffier and Lischer, 2010). We analyzed three datasets: (1) all 21 locations were analyzed as a single hierarchical level; (2) the Guajará-Mirim (Bolivian basin) locality was removed from the data matrix; 3) tributaries vs. locations of the main channel, of Amazon River. Pairwise 8ST were also estimated, and statistical significance was corrected for multiple comparisons (Rice, 1989). We also tested for population structuring using the Spatial Analysis of Molecular Variance (SAMOVA) (Dupanloup et al., 2002), and the Bayesian Analysis of Genetic Population Structure (BAPS) (Mantel, 1967; Corander and Tang, 2007). Spatial structuring was tested using the Mantel test as implemented in ARLEQUIN v3.51 (Excoffier and Lischer, 2010).

In order to investigate patterns of change in C. macropomum historical effective population sizes, we carried out a Bayesian Skyline plot analyses in the program BEAST v.1.8.4 (Drummond and Rambaut, 2007). We collected 50,000,000 Monte Carlo Markov Chain steps (MCMC), discarde the first 5,000,000 steps as burnin, and subsequently sampled every 1,000th step, retaining 45,000 topologies. The HKY85 (Hasegawa et al., 1985) model of molecular evolution was selected as the best fitting model in the program Modeltest. We estimated a genetic network of mtDNA haplotypes from all samples using Network (http://www.fluxus-engineering.com/) using the median-joining algorithm.

To convert the results of the coalescent analyzes into years and effective number of individuals, we assumed a threeyear generation time (Goulding and Carvalho, 1982; Isaac and Ruffino, 1996), and a rate of molecular evolution of 2.0 × 10−<sup>8</sup> mutations per site and per year (Farias et al., 2010).

The Tajima's D (Tajima, 1989) and the Fu's Fs (Fu, 1997) tests were used to examine whether populations are at a mutation-drift equilibrium assuming no selective differences among haplotypes. Both tests were performed using the program ARLEQUIN v3.5 (Excoffier and Lischer, 2010). Demographic history may also be inferred from frequency distribution of pairwise haplotype differences. In populations that are at a demographic equilibrium, the distribution of differences are generally multimodal, while populations that have undergone recent expansion or reduction typically have a unimodal distribution (Slatkin and Hudson, 1991). In order to distinguish population reduction and expansion, we used the results of two tests. The first test evaluates the distribution of the sum of the squares of differences (SSD) between the mismatch distribution observed for each locality and the expected distribution for a null expansion model, where significant values for SSD indicate deviations from the population expansion model (Schneider and Excoffier, 1999). The other test is based on the Harpending inequality index (Hri = r) (Harpending, 1994), which quantifies the variance of the mismatch distribution, assuming that the mismatch distribution is unimodal. These analyzes were performed in the programs DNASP v5.0 (Librado and Rozas, 2009) and ARLEQUIN v3.5 (Excoffier and Lischer, 2010); significance was tested via 10,000 permutations with a P = 0.05 cut-off.

### Microsatellite DNA Analyses

The data matrix with allele sizes was verified for the occurrence of null alleles, allelic stutter, and large allele dropout in the program MICRO-CHECKER (Van Oosterhout et al., 2004) The number of alleles (A), observed (HO) and expected (HE) heterozygosities, gene diversity (h), nucleotide diversity (π), linkage disequilibrium (LD) between pairs of loci and the Hardy-Weinberg equilibrium (HWE) were calculated. All these parameters were estimated using ARLEQUIN v3.5 (Excoffier and Lischer, 2010). Considering that some of these estimates suffer influence of sample size (Leberg, 2002), we implemented a rarefaction analysis and calculated allelic richness (AR) and private allelic richness (PAR) in the program HP-Rare (Kalinowski, 2005), so that the number of alleles and allele richness estimates could be compared between localities. Additionally, we estimated the inbreeding coefficient (FIS) for each sample. The effective population size (Ne) for each population was estimated using the LD method (Waples and Do, 2008) as implemented in NeEstimator 2.0 (Do et al., 2014). The Ne estimates are equivalent to the effective number of breeders that produced offspring during a certain period of time and assuming that sample sizes are not representative of the entire generation (Palstra and Fraser, 2012). In all instances, significance levels for tests involving multiple comparisons were adjusted using the sequential Bonferroni correction (Rice, 1989).

The overall genetic structuring was estimated using the analysis of molecular variance (AMOVA—Excoffier et al., 1992) performed in the program Arlequin v3.5 (Excoffier and Lischer, 2010). We also analyzed three datasets: (1) all 21 locations were analyzed as a single hierarchical level; (2) the Guajará-Mirim (Bolivian basin) locality was removed from the data matrix; (3) tributaries vs. locations of the main channel of Amazon River. Genetic differentiation between pairs of populations was estimated using FST. Additionally, pairwise genetic differentiation between populations was estimated using Hedrick's GST, (Hedrick, 2005) based on the empirical Bayes (EB) GST estimator (Kitada et al., 2007) suitable for high gene flow species (Kitada et al., 2017), using the FinePop 1.3.0 package (Kitada et al., 2017) implemented in the R statistical language (R Development Core Team, 2011).

We used SAMOVA (Dupanloup et al., 2002) to infer spatial population structure and STRUCTURE v2.3.4 (Pritchard et al., 2000) to identify biological populations. For STRUCTURE analyses we used the admixture and correlated allelic frequencies models with and without location prior, and we tested one to 20 groups (K = 1–20). The analysis was run with 1,000,000 MCMC step, discarding the first 100,000 steps. The Isolation by Distance (IBD) was tested via a correlation between genetic and geographical distance using the Mantel test implemented in the Software Arlequin (Excoffier and Lischer, 2010). In addition, we also used a multivariate approaches implementing the Discriminant Analysis of Principal Components (DAPC; Jombart et al., 2010) to cluster genotypes using the R package Adegenet (Jombart, 2008).

Coalescent analyses implemented in the program IMa2 (Hey and Nielsen, 2007) were used to partition allele sharing between populations due to ongoing geneflow and ancestral haplotype sharing. We estimated the parameters t (splitting time), m (migration rates), and theta (θ) where θ = 2 Neµ. We sampled 20,000,000 Monte Carlo Chain Markov Chain Monte Carlo (MCMC) generations after discarding the first 1,000,000 generations as burn-in. Two independent runs were carried out with different starting points, in order to verify convergence. The two independent runs converged and thus were combined and the parameters θ, m, and t were estimated. Then, these were converted into demographic parameters: contemporary effective population size, number of migrants per generation, and time of divergence of the populations in generations. The analyses using IMa2 were performed with pairs of sampling sites located at the geographic extremes along the main channel of the Amazon River and between upstream and downstream localities of principal tributaries. We also estimated geneflow using the program MIGRATE 3.1.6 (Beerli and Felsenstein, 2001), where for diploid data θ = 4 Neµ and M = m/µ migration rate ratio and mutation rate. For the Bayesian analysis, we ran ten short chains, sampling each chain 10,000 times. Then we ran six long chains of 2,000,000 steps, sampling each chain 200,000 times and discarding the first 2,000 samples. The runs were replicated, and the convergence between the chains was evaluated using the Gelman-Rubin statistic implemented in the program. We estimated the historical migration rates (M) between the localities and the relative number of migrants per generation Nm = Mθ/2. To convert the results into biological information, we assumed a 3-year generation time for C. macropomum (previously justified with mtDNA data), and a mutation rate µ = 5 × 10−<sup>4</sup> (mean rate of evolution of microsatellites, Di Rienzo et al., 1994).

To detect, quantify, and date the historical and contemporaneous demographic changes in C. macropomum populations we implemented the coalescent sampler implemented in the program MSVar v1.3 (Beaumont, 1999; Storz et al., 2002). We ran 11 independent parallel chains sampling every 1,000th proposal to collect 20,000 proposals in the MCMC chain in each parallel run. Priors for current and historical population size means and variances were equal, and variances encompassed three orders of magnitude. Prior for mean time of population size change was set at 1,000 with variance encompassing time range from 1,000,000 to 0 generations ago. The runs were evaluated for convergence and were pooled to provide an estimate of current and historical effective population size. Convergence was assessed using the Gelman–Rubin criterion (Gelman and Rubin, 1992) and the test of alternative hypotheses (population decline vs. stable population size) as suggested by Beaumont (1999) was tested using Bayes factors. Calculations and plots were performed in the R statistical programming language (R Development Core Team, 2011) using the packages CODA and ggplot2.

In order to verify reduction in effective population size (Ne) or bottleneck effect, we tested for heterozygosity excess in the program BOTTLENECK (Piry et al., 1999) using three different mutation models: the stepwise mutation model, SMM (Ohta and Kimura, 1973); the two-phase model, TPM (Di Rienzo et al., 1994); and the infinite alleles model, IAM (Estoup et al., 1995). Genetic bottlenecks can also leave a signature in the ratio of number alleles to the allele size range (the M-ratio), where a bottleneck depletes the number of alleles faster than reducing allelic size range of the microsatellite (Garza and Williamson, 2001). We calculated the M-ratio using ARLEQUIN, and considered a reduction in the number of alleles to occur when M < 0.68, as suggested by Garza and Williamson (2001).

# RESULTS

#### Genetic Diversity of *C. macropomum*

Eight hundred and thirty-nine base pairs from the control region and 732 base pairs from the ATPase6/8 gene were obtained from 539 individuals. Approximately 5% of the samples were resequenced to confirm the sequences obtained. The sequences of the mitochondrial gene fragments were concatenated, resulting in a total of 1,561 base pairs. A total of 444 haplotypes were found, 400 of which were unique. The haplotype network showed numerous reticulations between haplotypes (**Figure 2**). There was very little clustering among the haplotypes found at most localities, implying in a high degree of gene exchange. High and relatively homogeneous values of haplotype diversity was found, ranging from h = 0.895 in Porto Velho to h = 1.000 at nine of the 21 sampled localities (**Table 1**).

A total of 604 individuals were genotyped for 13 microsatellite loci. The data revealed no evidence of allelic stutters or large allele dropouts (genotyping errors) and neither linkage disequilibrium (LD). However, deviation from the Hardy-Weinberg equilibrium (EHW) was observed at the Cm1E3 locus in 17 of the 21 sampled localities and the locus was therefore removed from the population analyses. Genetic variability parameters were quite homogeneous among the individuals from different localities (See **Table 1**). The expected heterozygosity (HE) ranged from 0.714 in Guajará-Mirim to 0.797 in Eirunepé (Juruá River). Mean heterozygosity was 0.777 ± 0.395 for all loci and all locations (**Supplementary Table S3**). Allelic richness varies from 5.32 alleles in Guajará-Mirim (Guaporé River) to 6.53 in Carauari (Juruá River). The endogamy coefficient (FIS) ranged from 0.023 to 0.144 and was significant for all localities.

#### Population Structure

AMOVA of the mtDNA data demonstrated that more than 90% of genetic variance was within sampling sites. When AMOVA was performed without Guajará-Mirim (Bolivian sub-basin), 8ST was 0.032, which is lower than the 8ST = 0.062 found in the analysis including all sampling sites. Considering the tributaries vs. locations of the main channel, 8ST was 0.052. Nonetheless, AMOVA was significant for all three datasets analyzed. The result of the Mantel test was non-significant (r = 0.1587, P = 0.157), demonstrating no correlation between the genetic distances of the sampling sites and their respective geographic distances.

Global AMOVA of microsatellite data resulted in partitioning more than 98% variance within sites (FST = 0.0111, P = 0.00124). Based in this result, additional AMOVA tests were implemented assuming two main groups: Amazon basin vs. Bolivia basin and, within the Amazon basin, tributaries vs. main channel. The AMOVA results were significant for both analyses (FST = 0.0192, P = 0.0478; FST = 0.0086, P = 0.9277; FST = 0.0026, P = 0.0004), respectively.

The matrix of pairwise GST (**Figure 3**, **Supplementary Table S4**) and FST values (**Supplementary Table S5**) were congruent and indicated population structuring. Significant values were observed for almost all comparisons involving the Bolivian sub-basin (Guapore River), and the Madeira (Porto Velho, Humaita, Borba), upper Jurua (Eirunepé), upper Purus (Boca do Acre), and upper Tapajós (Itaituba, Jacareacanga) rivers. The Mantel test was significant (r = 0.34260, P < 0.05) only when the Bolivian sub-basin was included in the analysis.

SAMOVA analyses indicated the existence of two geographic groups, one group comprised of Guajará-Mirim and another group comprising all remaining localities. At K = 2 (Group 1: Guajará-Mirim; Group 2: other sampling sites in the Brazilian



N, number of individuals; H, number of haplotypes; S, singletons; h, gene diversity; π, nucleotide diversity; A, number of alleles; ANA, average number of alleles per locus; AR, allelic richness; PAR, private allelic richness; AGD, average genetic diversity; HO, observed heterozygosity; HE , expected heterozygosity; FIS, endogamy index; Ne [CI], effective population size [95% confidence intervals], p\_SMM= Stepwise Mutation Model from Bottleneck program, Mvalue= M-ratio using ARLEQUIN (Garza and Williamson index). Values in bold are significant at 0.05. \*Classification of water types based in Sioli (1984) and Venticinque et al. (2016).

Amazon basin), FCT was maximized for both the mtDNA and microsatellite datasets, but with significant support only for the mtDNA data. Bayesian analyses implemented in STRUCTURE v2.3.4 identified three biological groups. The highest posterior probability was LnP (K = 3) = −31766.0000. The three populations comprised individuals from the Bolivian sub-basin and the Brazilian Amazon basin. Individuals from the three localities in the Madeira River showed a linear gradient of admixture between these two populations, and a contribution of an additional biological group principally within the Humaitá locality (**Figure 4**). Results based on DAPC analysis displayed a general pattern of low genetic differentiation (**Figure 5**), however, as observed in the previous results, some individuals from upper Madeira River and Guaporé drainage are partially differentiated from the other localities.

#### Gene Flow

Results of the isolation-with-migration analyses using microsatellite data are in **Supplementary Table S6**. Thus, Mexiana and Tabatinga (on the Amazon River) were paired, and a group denominated the main channel was formed by randomly sampling 30 individuals from among the sampling sites of the main Amazon River channel, which was then analyzed with upper-most tributary localities: Jacareacanga (Tapajós River), Guajará-Mirim (Bolivian subbasin), Boca do Acre (Purus River), and Eirunepé (Juruá River) (**Table 2**). The result indicated bidirectional gene flow between all localities. In all cases, the direction of migration from upstream areas of tributaries to the central Amazon basin predominated except in the case of the Jacareacanga locality.

The results of MIGRATE analyses supported substantial levels of geneflow between sites in the main stream of the Amazon, but reduced gene flow levels between localities at tributary headwaters, and of the Madeira River (**Table 3**). The genetic parameters estimated for C. macropomum in MIGRATE version 3.1.3 inferred from microsatellites data is reported in **Supplementary Table S7**.

#### Population Demography

Using the genetic parameters from IMa2 analyses (**Table 2**), the coalescent effective population size did not differ substantially for almost all pair of localities examined. As a whole, effective population sizes were of thousands of individuals, with the exception of Guajará-Mirim.

The Bayesian skyline plot for C. macropomum from the Brazilian Amazon demonstrated a strong sign of population expansion, which began slowly approximately 3,000,000 years ago. Demographic growth accelerated considerably approximately 450,000 years ago, with a weak signature of a recent population decline. From the beginning of the initial growth phase, the population size of this species have increased two orders of magnitude from approximately little more than 750 thousand to 75 million individuals in the coalescent history of the populations sampled (**Figure 6**). Population from Bolivian sub-basin shows demographic growth beginning at 500,000 years ago, with a signature of recent population stability.

Population expansion was also supported by Harpending's raggedness index, which was significantly small (r = 0.0046, P = 1.0000) considering all samples as well as when considering the two basins separately (Amazon basin: r = 0.00060, P = 1.0000; Bolivian basin: r = 0.0018, P = 0.9990) (**Table 4**). These indexes statistically support the inference of population expansion based on the observation of the distribution of mismatch distribution (Harpending, 1994) for all samples of C. macropomum. However, when mismatch distribution was investigated for the basins separately, unimodal distribution was found only within the Amazon basin, whereas multimodal distribution was found for the Bolivian basin (results not shown), which fits a pattern expected under stable population size, although this was not supported by Harpending's raggedness index. The sum of squared deviations (Schneider and Excoffier, 1999) was non-significant for the overall sample (SSD = 0.0020, P = 0.6260) as well as the inference performed for the two basins separately (Amazon basin: SSD = 0.0021, P = 0.6430; Bolivian basin: SSD = 0.0047, P = 0.8420). Thus, these values neither support nor reject the null hypothesis of a demographic population expansion for C. macropomum (**Table 4**). The Tajima's D was non-significant for all sampling localities. The same was found for Fu's Fs. When considering the basins separately, Fu's Fs was significantly negative for the Bolivian basin, suggesting a population expansion.

Results based in MSVar analyses show that historically C. macropomum has undergone a pronounced population decline in both the Amazon and Bolivian basins (**Figure 7**). The mean estimated ancestral effective population size were at approximately 100,000 individuals, declining recently to approximately 5,000 individuals, an approximately 1.5 orders of magnitude decrease. Demographic decrease was strongly supported (BF = 1206) and occurred with 0.9992 probability. Population decline started at approximately 10,000 (Amazon) and 2,500 (Bolivia) years ago. Signs of population reduction in the Bottleneck program were significant for 18 locations under the SMM model, however Mvalue showed no signal of population reduction, with exception of individuals from Tefé (**Table 1**). Effective population sizes (Ne) were low for majority of the localities. However, the confidence intervals were also "infinite" for all but the Madeira River, Parintins, Eirunepe and Tabatinga localities.

FIGURE 5 | Results of the Discriminant Analysis of Principal Components (DAPC) showing the scatterplot of the first two principal components based on 13 microsatellite loci of 604 individuals of Colossoma macropomum from 21 sampling locations. Discriminant function 2 on the x axis and discriminant function 1 on the y axis. In the DAPC graph circles represent different individuals, and colors different sampling localities. Locality codes are: Mexiana (Mex), Almeirim (Alm), Santarém (San), Itaituba (Ita), Jacareacanga (Jac), Oriximiná (Ori), Nhamundá (Nha), Parintins (Pin), Borba (Bor), Humaitá (Hum), Porto Velho (Pve), Guajará-Mirim (Gua), Manaus (Mao), Tapauá (Tap), Boca do Acre (Bda), Coari (Coa), Tefé (Tef), Carauari (Car), Eirunepé (Eir), Fonte Boa (Fbo), Tabatinga (Tab).

TABLE 2 | Demographic parameters estimated in IMa2 program for microsatellite data.


Effective population size Ne = 4 Neµ/4 µx3 (Ne<sup>1</sup> of the locality1, Ne<sup>2</sup> of the locality 2 and Ne<sup>A</sup> of the ancestor of the two localities); current time in years T = (t x µ) × 3; number of migrants per generation Nm = Mθ/2. Values in parentheses = minimum and maximum of confidence limits. Three year generation time is assumed.

# DISCUSSION

# The Role of Rapids in *C. macropomum* Gene Flow

The use of more variable genetic markers, such as microsatellites, has confirmed some of our earlier findings. Colossoma macropomum is not panmictic throughout its distribution area. Considering the entire sample, AMOVA, SAMOVA, STRUCTURE, 8ST (DNAmt), and FST/GST (microsatellites) analyses, suggested genetic differentiation of the Bolivian sub-basin and Brazilian Amazon basin localities.

Within a given river system, freshwater fishes can either form a large panmictic population or be divided into genetically differentiated groups with sufficient gene flow between groups to maintain the integrity of the meta-population. Gene flow measured indirectly by the number of effective migrants per generation (Nm) for DNAmt and microsatellite data evidenced restricted gene flow between the two basins, but enough to


Number of migrants per generation Nm = Mθ/2. Row localities are sending individuals, while column localities are receiving individuals. Locality codes are: Mexiana (Mex), Almeirim (Alm), Santarém (San), Itaituba (Ita), Jacareacanga (Jac), Oriximiná (Ori), Nhamundá (Nha), Parintins (Pin), Borba (Bor), Humaitá (Hum), Porto Velho (Pve), Guajará-Mirim (Gua), Manaus (Mao), Tapauá (Tap), Boca do Acre (Bda), Coari (Coa), Tefé (Tef), Carauari (Car), Eirunepé (Eir), Fonte Boa (Fbo), Tabatinga (Tab).

maintain the exchange of genes, thereby minimizing effects of genetic drift. The microsatellite data demonstrate that migration between the two basins is bidirectional. The results of both the IMa and MIGRATE analyses show that gene flow is greater from Bolivia to the Brazilian Amazon, which is in agreement with the results described by Farias et al. (2010). The Brazilian part of the Amazon basin receives more migrants, probably through the passive downstream transport of larvae and juveniles.

The genetic differentiation evidenced between the two basins (Brazilian and Bolivian) is associated with the upper Madeira River rapids, which serve as a natural barrier that restricts, but does not prevent, geneflow between the populations of C. macropomum of the two basins. The origin of the Bolivian sub-basin is related to the elevation of the Fitzcarrald arch at the beginning of the middle Pliocene (4 to 3 Ma), which gradually isolated the Bolivian basin, resulting in considerable changes in the drainage pattern, in which the main rivers north of Bolivia drain into the Amazon River through the Madeira River (Hoorn et al., 1995; Campbell et al., 2001), which is the largest tributary of the southern margin of the Solimões-Amazonas basin (Lundberg et al., 1998). The Bolivian sub-basin includes the main Beni and Mamoré rivers, as well as approximately 60% of the entire drainage area of the Madeira River (upper part of Madeira) and is separated from the Amazon basin by a set of 18 rapids and cataracts located between Guajará-Mirim and Porto Velho (Cella-Ribeiro et al., 2013). The largest of these cataracts, the Teotônio cataract, constituted the greatest barrier to navigation on this river, as well as to the movement of many species of fish (Goulding et al., 2003). The rapids of the upper Madeira River play an important role in the structuring of populations of other Amazonian aquatic species, such as the river turtle Podocnemis expansa (Pearse et al., 2006), river dolphins (Gravena et al., 2014, 2015), the black Amazonian flanelmouth characin Prochilodus nigricans (Machado et al., 2017), and the catfish Brachyplatystoma rousseauxii (Batista, 2010; Carvajal-Vallejos et al., 2014). The Teotônio cataract, as well as the Jirau cataract, the second largest of the Madeira River, have been submerged by hydroelectric reservoirs.

#### Population Genetic Structure in the Amazonas River

The lack of genetic differentiation of C. macropomum in the main channel of Amazonas River in the Brazilian Amazon basin was supported by all population structure analyses based on mitochondrial genes and microsatellites loci. These findings confirm the pattern reported by Santos et al. (2007) and Farias et al. (2010) who used mtDNA only (supported by 8ST)), and worked at much smaller geographic scales. However, 8ST comparisons involving the tributaries were significant even after Bonferroni corrections. Fisher's exact test and Hedrick's GST analysis with microsatellites markers show a weak population differentiation between localities, and stronger differentiation involving comparisons with tributaries, and also black water sites. Geneflow between the main stream localities

and tributary headwaters and black water sites generally is smaller than one effective migrant per generation which also confirms the migration patterns observed in IMa and MIGRATE analyses. Contrary to this pattern, STRUCTURE analysis shows population differentiation only for upper Madeira (localities below the rapids). STRUCTURE program uses a Bayesian approach to investigate the number of biological groups in the dataset. The discordance, between these analyses, could be due to the fact that GST analyses are based on variance in allelic frequencies between/among groups and Fisher's exact test uses contingency tables to test null hypothesis that the alleles are drawn from the same distribution in all populations. These two analyses are more sensitive to detecting smaller and finer levels of genetic differentiation, while STRUCTURE tests for shared system of mating among individuals of the same group. The algorithm of STRUCTURE will not necessarily detect weak structure (Evanno et al., 2005). The population structure observed in C. macropomum falls within the category of weak to moderate level of population differentiation (according to Wright, 1965), which could be limiting the sensitivity of the analysis to find more refined substructuring.

Putman and Carbone (2014) emphasize that analyzes used to infer population differentiation have limitations in detecting or not the population structure. Therefore, being conservative for purposes of management and conservation of this species, we consider that the populations of the tributaries (Juruá, Purus, Madeira, Tapajós rivers, and blackwater localities) are different management units until proven otherwise. In this case, the management of fisheries, and seasonal fishing closures during reproductive period, must be effectively respected to preserve the evolutionary potential for the species sustainability. Encouraging aquaculture of the species could also minimize the impact of harvesting natural stocks, which is in fact already occurring.

On the other hand, in the great corridor of the main channel of the Amazon River, from Mexiana (1) to Tabatinga (20) there seem to be not even a signal of isolation-by-distance between the two localities separated by approximately 2,500 Km. The observed lack of genetic structuring of C. macropomum in the main channel of the Amazon River basin is probably the result of living in a floodplain environment. The life cycle of this species is tied directly to the seasonal flood cycle of the Amazon; during the flood C. macropomum disperses to reproduce and feed in the floodplains and in the flooded forest, while in the dry period fishes become concentrated in lakes and rivers (Araújo-Lima and Goulding, 1998). During the reproduction season, the eggs and larvae are passively transported by the millions to the floodplains, as this species is highly fecund (Araújo-Lima and Goulding, 1998; Araújo-Lima and Ruffino, 2004). This dynamic together with the interlinking of river channels during from flood pulses thus is the primary factor in homogenizing differences among populations of C. macropomum (Junk, 1997) and this is observed in molecular data as well. The lack of population structuring is not a unique feature of C. macropomum; numerous other species occupying the Amazonian floodplain [e.g., Prochilodus nigricans (Machado et al., 2017), Brycon amazonicus (Oliveira et al., 2018), and Paratrygon aiereba (Frederico et al., 2012)] show this pattern as well. Thus, the pattern found in the present study likely stems from events acting at a macro time scale that affected the region, which, together with current water cycles and the migratory movements of C. macropomum, may maintain intra-population homogeneity over generations.

#### Genetic Diversity: Large or Small?

Considering that H<sup>O</sup> suffers from sampling effects, we used H<sup>E</sup> as a minimally biased estimate of diversity (Frankham et al., 2002). The H<sup>E</sup> for C. macropomum for the microsatellite data were moderate and uniform across the sampling localities, ranging

#### TABLE 4 | Demographic parameters estimated for Colossoma macropomum, inferred from mitochondrial DNA data.


N, number of individuals; Hri, Harpending raggedness index; SSD, sum of square deviations.

Significant values \*P < 0.05, •P > 0.95.

from 0.71 to 0.79 with an average of 0.78. For all microsatellite data, variability measures, such as expected heterozygosity and allelic diversity, were similar to those observed in other exploited migratory Amazonian fishes (Carvajal-Vallejos et al., 2014; Ochoa et al., 2015; Oliveira et al., 2018). A compilation of average H<sup>E</sup> values for microsatelite loci obtained from the literature for exploited Amazonian migratory fish shows average H<sup>E</sup> values ranging from 0.50 of Brachyplatystoma platynemum (Ochoa et al., 2015), to 0.87 of Semaprochilodus insignis (Passos et al., 2010). Between this minimum and maximum, one can observe H<sup>E</sup> = 0.61 of Brachyplatystoma rousseauxii (Batista, 2010), H<sup>E</sup> = 0.75 of Brachyplatystoma vaillantii (Rodrigues et al., 2009), and H<sup>E</sup> = 0.83 of Brycon amazonicus (Oliveira et al., 2018). An average of these values (mean H<sup>E</sup> = 0.72) is not in agreement with DeWoody and Avise (2000), who report an average H<sup>E</sup> = 0.54 for freshwater fishes. The H<sup>E</sup> = 0.78 of C. macropomum and most of the H<sup>E</sup> reported for exploited Amazonian migratory fishes are more similar to the H<sup>E</sup> of marine fishes reported by DeWoody and Avise (2000) with a mean H<sup>E</sup> = 0.77. In the review of DeWoody and Avise (2000), the authors summarized microsatelite data from North American and European freshwater species, that in comparison to the Amazon basin are geographically very restricted. At 5.5 million km<sup>2</sup> , the Amazon basin is by far the largest hydrographic basin on the planet, and the size of area occupied by many fish species is comparable to that for marine fishes. The Amazon is in a sense a "sea" that provides an expansive and effectively continuous environment for migratory freshwater fishes such as C. macropomum and the other cited species. Migratory fishes, whether freshwater or marine, are usually r strategists, that is, they have high dispersal capacity, produce a lot of offspring, and in general have a large effective population size, which is turn is reflected in high heterozigosity levels.

In this respect, when compared to freshwater fishes of Europe and North America, the heterozygosities of Amazonian fishes may appear to be high, but this is an illusion. The expected heterozygosities are on par with those expected for fishes occupying large areas and having large census numbers. Within the Amazonian species analyzed, the large predatory catfishes of the genus Brachyplatystoma have lower H<sup>E</sup> as would be expected by their smaller census sizes. At the opposite end of the spectrum are the relatively small, detrivorous and frigivorous migratory characids (Semaprochilodus, Prochilodus, and Brycon) which have higher H<sup>E</sup> as would be expected by their much larger census sizes. Colossoma macropomum has an intermediate HE, again a reflection of its frugivorous lifestyle combined with large body size, and thus smaller census size than the other migratory characids but larger census size than the predatory catfishes.

#### Population Genetic Demography

An analysis of historical demography of C. macropomum suggested population expansion in the Amazon basin, which is the same scenario suggested by Farias et al. (2010). However, the result of the current study suggest a population size reduction in the Holocene (**Figure 5**). This result is confirmed by the very recent population decline observed in the Skyline plot analyses (**Figure 4**). Similar pattern are observed for the Bolivian basin, however, very recent population decline is not evidenced in the Skyline plot.

Most studies conducted in the Amazon involving fish species such as Prochilodus nigricans (Machado et al., 2017), Brachyplatystoma rousseauxii (Batista and Alves-Gomes, 2006; Carvajal-Vallejos et al., 2014), and Brachyplatystoma platynemum (Ochoa et al., 2015) indicate recent population expansion, at least as indicated by Fu's Fs test. The difference in effective population size of C. macropomum prior estimated for the Bolivian and Amazon basin broadly corresponded to the relative proportion of potential habitat in the Bolivian basin. This basin accounts for approximately 20% of the total Amazon basin, suggesting that the Bolivian basin had a C. macropomum population approximately 20% smaller than the rest of the Amazon basin during the Holocene.

Glacial and inter-glacial periods of the Pleistocene exerted considerable impact on the climate, which consequently affected the vegetation in South America (Ledru et al., 1996) and also had impact on aquatic and terrestrial fauna. It is during the Late Pleistocene that C. macropomum in the Amazon basin began expanding (**Figure 4**) associated with the expansion of the várzea-like habitat (Irion and Kalliola, 2010). Therefore, the population expansion of C. macropomum in the Amazon basin likely occurred due to the increase in the availability of habitat for this species starting in the later half of the Pleistocene; however, population growth is no longer observed in the Holocene.

Corroborating the observation for the Holocene, analyses of microsatellite data in MSVar indicate that C. macropomum has undergone a pronounced population decline in both drainages during the Holocene (10,000 years ago—Amazon and 2,500 years ago Bolivia), probably due to climate change related to Last Glacial Maximum and during the mid-Holocene epoch (Wang et al., 2017). Demographic decrease was strongly supported (BF = 1206) and occurred with 0.9992 probability.

The demographic decrease was from approximately 100,000 effective individuals to approximately 5,000 individuals, an approximate 1.5 orders of magnitude decrease. Similar values for current effective population size were inferred using IMa2 (**Table 2**). By any measure, the effective population size is small, and much smaller than in the last several thousand years.

DeWoody and Avise (2000) estimate that at equilibrium H<sup>E</sup> = 0.79 represents 25,000 effective individuals assuming a substitution rate of 10−<sup>4</sup> . Our estimate of substitution rate from MSVar was 10−3.81, or just about 17,000 effective individuals are expected at equilibrium. In this sense, the Ne values of C. macropomum are below an equilibrium expectation, which also suggests a current reduction of Ne. In fact, the results of the Bottleneck program indicate decrease in population size in the majority of sampling localities, which is also corroborated by the Ne values (**Table 1**). The only major event that may have contributed to population decrease of C. macropomum in the nowadays, within a time window of decades, is the overexploitation of the species and the destruction of its floodplain habitat. Natural stocks of the C. macropomum suffer from overfishing and juveniles currently account for the largest part of the catch (Barthem and Goulding, 2007). Although aquaculture of C. macropomum has grown in recent years, there is strong evidence that the natural population of this species are still depressed because of over-fishing. This can be evidenced in the continuous reduction of the tonnage landed in the port of Manaus and other major Amazonian ports. Araújo-Lima (2002) report that in 1976 C. macropomum reached 16,000 tons/year landed in the port of Manaus, while data from late 1990's indicated less than 4 thousand tons. During this time, the population of Manaus more than doubled. Furthermore, another worrying factor is the mean size of the fish landed, with juveniles representing the majority of the catch (Barthem and Goulding, 2007). The average size of the fish landed in the main markets in the Amazon suggests that most individuals are fished before reaching sexual maturation, which in the case of females occurs between 50 and 55 cm (Isaac and Ruffino, 2000).

In conclusion, naturally exogamous species with large census sizes have considerable genetic diversity and large effective population sizes (Frankham et al., 2002). In this context, C. macropomum has levels of genetic diversity that are on par with expectations for species of similar lifestyle and body size. However, with the historical decline of C. macropomum populations, it is evident that part of the genetic diversity that existed in the past has been lost. Still the remaining diversity is representative of this species's historical genetic diversity and it is this genetic diversity that can secure the recovery and longterm persistence of natural populations of C. macropomum in the Amazon basin.

#### ETHICS STATEMENT

All field collections were authorized by IBAMA/SISBIO 11325- 1, and access to genetic resources was authorized by permit No. 034/2005/IBAMA. Field collection permits are conditional that collection of organisms be undertaken in accordance with the ethical recommendations of the Conselho Federal de Biologia (CFBio; Federal Council of Biologists), Resolution 301 (December 8, 2012).

#### REFERENCES


#### AUTHOR CONTRIBUTIONS

IF and TH conceived the experiment and obtained funding. MS, IF, and TH conducted fieldwork, collected specimens, analyzed the results, and wrote the manuscript. MS collected molecular data. All authors contributed to and reviewed the manuscript.

#### FUNDING

This research was supported by the MCT/CNPq/PPG7 557090/2005-9, CNPq/CT-Amazonia 554057/2006-9, CNPq/ CT-Amazonia 575603/2008-9, and FINEP/DARPA (Convênio No. 01.09.0472.00) to IF. Brazilian permits for field collection and molecular analyses were given by IBAMA/CGEN 11325-1, and IBAMA/MMA-N◦ 086/2006 de 08/09/2006. TH and IF were supported by a Bolsa de Pesquisa scholarship from CNPq during the study and MS by a FAPEAM fellowship.

#### ACKNOWLEDGMENTS

We thank Mário Nunes, Pedro Bittencourt and Rommel Rojas for technical support. This study is part of MS's Ph.D. thesis in the Biotechnology graduate program of UFAM.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00263/full#supplementary-material


loci in human populations. Proc. Natl. Acad. Sci. U.S.A. 91, 3166–3170. doi: 10.1073/pnas.91.8.3166


of catfish Brachyplatystoma platynemum (Siluriformes: Pimelodidae) in the Amazon basin with implications for its conservation. Ecol. Evol. 5, 2005–2020. doi: 10.1002/ece3.1486


Rice, W. R. (1989). Analyzing tables of statistical tests. Evolution 43, 223–225.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Santos, Hrbek and Farias. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

digital media

of impactful research

article's readership