Genome-Wide Discovery of Single-Nucleotide Polymorphisms and Their Application in Population Genetic Studies in the Endangered Japanese Eel (Anguilla japonica)

The Japanese eel (Anguilla japonica) is a commercially important aquatic species in East Asia. The number of the Japanese eels has been dramatically declining over the last four decades, and it is now listed as an endangered species (International Union for Conservation of Nature [IUCN] 2014). To manage and conserve this endangered species, it is necessary to assess population genetic diversity, genetic structure, and identify regions of the genomes that are under selection. Here, we generated a catalog of novel single-nucleotide polymorphism (SNP) markers for the Japanese eel using restriction site-associated DNA (RAD) sequencing of 24 individuals from two geographic locations. The 73,557 identified SNPs were widely distributed across the draft genome of the Japanese eel. No genetic differentiation between the two populations was detected based on all loci or neutral loci. However, highly significant genetic differentiation was detected based on loci that appeared to be under selection. BLAST2GO annotations of the outlier SNPs yielded hits for 61 (72%) of 85 significant BLASTX matches. The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis identified some of the putative targets of local selection, including genes in several important pathways such as calcium signaling pathway and intestinal immune network for IgA production. This SNP catalog will provide a valuable resource for future population genetic and genomic studies and allows for targeting specific genes and genomic regions under selection in the Japanese eel genome.


INTRODUCTION
Recent advances in next-generation sequencing (NGS) technologies have greatly improved the speed of genome sequencing, facilitating the application of genomic approaches in many research areas, including ecology, conservation, fisheries genetics, and so on (Hudson, 2008;Harismendy et al., 2009;Allendorf et al., 2010;Ekblom and Galindo, 2011;Dudgeon et al., 2012). The highthroughput sequencing methods in NGS can result in extremely large collections of data, which enables the identification of numerous sensitive markers for a high-resolution genetic analysis, such as single-nucleotide polymorphisms (SNPs) (Ekblom and Galindo, 2011). However, because of the large and complex genomes of research organisms, researchers have typically used complexity-reduction technology for SNP discovery (Slate et al., 2009). One promising approach is restriction site-associated DNA (RAD) sequencing, which reduces genome complexity by sequencing the same loci across the genomes of numerous individuals, enabling comparisons between individuals, and reducing sequencing cost. This approach consists of employing specific restriction enzymes to cleave double-stranded genomic DNA into random fragments and then amplifying and sequencing the regions flanking the restriction enzyme cut sites through specific adapters and NGS platforms, such as Illumina. The paired-end reads from each RAD tag can then be assembled into long contiguous sequences or aligned to a known reference to identify and score 1000s of genetic markers (Miller et al., 2007;Baird et al., 2008;Hohenlohe et al., 2010Hohenlohe et al., , 2011. To date, RAD sequencing has been successfully applied to SNP discovery in many species, such as threespined stickleback (Gasterosteus aculeatus) (Baird et al., 2008;Hohenlohe et al., 2010), European eel (Anguilla anguilla) (Pujolar et al., 2013), sea mullet (Mugil cephalus) (Kruck et al., 2013), and rainbow trout (Oncorhynchus mykiss) (Palti et al., 2014). As sequencing cost continues to drop, these methods will have numerous applications in a genetic analysis of complex genomes.
The Japanese eel, Anguilla japonica, is an important economical marine fishery species distributed in East Asian countries, which is characterized as a temperate catadromous fish with a complex life cycle and an extensive migratory loop. The spawning area of this species is located in the western Mariana Islands, near 14 • -16 • N, 142 • -143 • E (Tsukamoto, 1992(Tsukamoto, , 2006. After being spawned between April and November (Tsukamoto, 1992), leptocephalus larvae disperse via the North Equatorial Current (NEC) and the Kuroshio Current (KC). Four to six months later, they reach the coasts of East Asia and then develop into juvenile Japanese eels (which are typically referred to as "glass eels"). Glass eels then move into continental (freshwater, brackish, or coastal) growth habitats and become yellow eels. After more than four 4 years of feeding, they develop into mature silver eels and then migrate approximately 2,000 to 3,500 km back to the western Mariana Islands, spawn once, and die (Han et al., 2010b). The abundance of Japanese eels has sharply decreased by more than 90% since the 1970s, owing to factors such as commercial overfishing, habitat destruction, pollution, and other environmental changes (Han et al., 2010a). In March 2014, the Japanese eel was listed as an endangered species in the International Union for Conservation of Nature (IUCN) Red List of Threatened Species 1 . Hence, an assessment of population genetic structure, genetic diversity, and regions of the genome that are under selection for Japanese eel is urgently needed in order to better manage and conserve this endangered species.
Despite occupying a broad range of habitats spanning almost the entire coastal area of East Asia, the Japanese eel has been regarded as a panmictic species. The concept of panmictic 1 http://www.iucnredlist.org/details/166184/0 populations for Japanese eel was originally established by genetic studies focusing on allozyme and mitochondrial DNA (mtDNA) markers (Sang et al., 1994;Ishikawa et al., 2001). In addition, Han et al. (2010a) used eight polymorphic microsatellite DNA loci to investigate the spatial population genetic structures of Japanese eels collected from nine locations in East Asia, resulting in no identified genetic differentiation between different geographical locations. Gong et al. (2014) used mtDNA and microsatellite DNA loci to investigate the population structure of Japanese glass eels collected from 10 locations in China, which provided further evidence for panmixia in Japanese eel. However, significant geographical clines have been detected in this species. For example, Chan et al. (1997) found that A. japonica in the western Pacific fringe exhibited clear geographic clines in two allozyme loci. Moreover, Tseng et al. (2006) conducted an analysis of eight microsatellite loci and found that Japanese eel populations could be divided into two major groups.
Owing to the inconclusive and often conflicting results of earlier research, a better understanding of crucial aspects of the population genetic structure in the Japanese eel is still needed. Here, we developed a genome-wide catalog of SNPs for the Japanese eel by employing Illumina HiSeq 2500 pairedend sequencing of RAD tags (Baird et al., 2008). Previous studies in this area either sequenced single-end reads (Hohenlohe et al., 2011;Palti et al., 2014;Guo et al., 2015) or sequenced paired-end reads but only used the single reads (Baird et al., 2008;Wang et al., 2013;Pujolar et al., 2014), which resulted in less accurate alignments. In the present study, we compared the number of SNPs generated from single-end and pairedend data, and we found that paired-end sequencing resulted in more high-quality SNPs. The RAD tags in this study were generated from a total of 24 glass eels from two separate sampling locations, enabling the discovery of novel candidate SNP markers for this widespread, highly fecund marine fish species. The SNPs were then used in a population genetic analysis of two Japanese eel populations, which improved the current understanding of population structure and identified genomic regions that are subject to selection. This improved our understanding of Japanese eel populations, which have significant implications for the sustainable management and utilization of these important resources.

Ethics Statement
All methods were performed in strict accordance with the relevant guidelines and regulations in China.

Sample Preparation and Restriction Site-Associated DNA Tag Sequencing
Samples of glass eels were collected from two geographically distant locations: one is in Dandong, Liaoning Province (39 • 58 N; 124 • 12 E), and the other is in Xiapu, Fujian Province (26 • 55 N; 120 • 13 E). Tissues were kept in 95% ethanol and stored at −80 • C. Genomic DNA was extracted from a total of 24 individuals (12 from each location) using standard phenolchloroform extraction. The quality of the DNA samples was checked on 1% agarose gels and measured using a Nanodrop 2000 spectrophotometer and Qubit fluorometric quantitation, in order to ensure that the concentration was more than 25 ng/µl and the total DNA weight was no less than 1 µg. RAD libraries were prepared as described by Zhang et al. (2016).

Restriction Site-Associated DNA Data Analysis and Single-Nucleotide Polymorphism Identification
Sequencing reads from the Illumina runs were filtered to generate clean reads and then were aligned to the Japanese eel genome draft 2 using BWA version 0.7.12 (Li and Durbin, 2010). The BWA-MEM algorithm was used for mapping with default parameters except for adjusting the seeding length to 32. SNP calling was performed by the SAMTOOLS 0.1.19 software package (Li et al., 2009) with a maximum read depth of 1,000. This package consists of two key programs, SAMTOOLS and BCFTOOLS, which incorporate genotype likelihood and call variants using Bayesian inference. In order to remove false positives, putative SNPs were required to fit the following criteria: (i) only bi-allelic SNPs were retained; (ii) SNPs with an average phred-scaled quality score > 30 (Q30) were retained; (iii) a minimum read depth of 10 for each individual was required; (iv) the minimum individual coverage for each population was set to 80%; v) a global minor allele frequency (MAF) of 0.05 was applied; vi) SNPs with F IS values < -0.3 or > 0.3 and observed heterozygosity (H O ) values > 0.5 were discarded to exclude false SNPs, which might have resulted from the paralog clusters (Hohenlohe et al., 2011). All subsequent datasets used were reformatted with PGDSPIDER version 2.0.5.2 (Lischer and Excoffier, 2012).

Outlier Detection
F ST -based outlier tests implemented in ARLEQUIN v3.5.2.2 (Excoffier and Lischer, 2010) were applied to test for selection between the two samples. This approach obtains the distribution of F ST across loci as a function of heterozygosity between populations by performing coalescent simulations under a symmetrical two island model assuming near-random mating in order to reduce the number of false-positive loci (Beaumont and Nichols, 1996). For each outlier locus, F ST values located above the 99.5th quantile of the simulated distribution were selected.

Population Genetic Analyses
The package VCFTOOLS (Danecek et al., 2011) (Excoffier and Lischer, 2010) and significance was determined using 10,000 permutations. Furthermore, ADMIXTURE v1.23 (Alexander et al., 2009) was used for maximum likelihood estimation of individual ancestries 2 www.eelgenome.com from multi-locus SNP genotype datasets. The analysis was performed with values of K ranging from 1 to 3, and the K with the lowest cross-validation error was selected. NETVIEW P 0.7.1 (Steinig et al., 2016), which is an implementation of NETVIEW (Neuditschko et al., 2012) in Python, was used to infer population structure including individual-and family-level relationships of populations on the basis of putative neutral SNPs and outlier datasets. A K-nearest neighbor (KNN) value of 10 was selected for visualization in CYTOSCAPE 3.2.1 (Shannon et al., 2003) as suggested by the author. The analyses of population pairwise F ST , ADMIXTURE, and NETVIEW were conducted based on all SNPs, neutral SNPs, and outlier datasets.

BLAST Analyses and Gene Ontology Annotation
Contigs containing outlier SNPs were used as queries in nucleotide searches using local BLASTX against the Genome Reference Consortium Zebrafish Build 10 (GRCz10) proteins with an E-value threshold of 1E-5. Next, functional annotations of these genes were obtained using the BLAST2GO suite (Götz et al., 2008), with parameters set according to previous recommendations (Pujolar et al., 2013). The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis implemented in the Database for Annotation, Visualization and Integrated Discovery (DAVID) web-server v6.7 3 was used for a more systematic functional interpretation of the set of candidate genes. A gene functional analysis in DAVID was conducted with default settings.

Comparison of the Number of Single-Nucleotide Polymorphisms Generated From Paired and Unpaired Reads
In order to compare the number of SNPs generated from pairedend reads with that of SNPs generated from single-end reads, SNP calling was performed on each single-end read group separately in addition to SNP calling on reads mapped as pairs. Singleend BAM files were extracted from paired-end BAM files by using SAMTOOLS. SNPs were called by using SAMTOOLS and BCFTOOLS, followed by additional filtering based on the following criteria: (i) retained only bi-allelic SNPs; (ii) retained SNPs with a minimum quality score of 30; and (iii) retained SNPs with a minimum read depth of 10. In order to increase read depth, individuals were pooled together before filtering.

Restriction Site-Associated DNA Tag Sequencing and Data Filtering
Sequencing of the RAD libraries generated a total of 42.10 Gb of raw data with an average of 1.75 Gb per individual, ranging from 1.02 to 7.84 Gb ( Table 1). A total of 40.00 Gb of clean data was retained after quality filtering, with an average effective rate

Single-Nucleotide Polymorphism Discovery and Population Genetic Structure
Of the retained sequences, an average of 10.51 million (97.35%) reads per individual were aligned to the Japanese eel draft genome contigs, and then 9,905,533 SNPs were called from 24 individuals using SAMTOOLS. After quality filtering, a total of 73,557 SNPs were retained. The final retained SNPs were widely distributed across the genome and were found in a total of 28,407 contigs. The expected and observed heterozygosity (H E and H O ) per individual ranged from 0.194 to 0.198 and from 0.157 to 0.312, respectively ( Table 2). The F ST values between the two populations ranged from −0.0667 to 0.4354 for each SNP. The final dataset had a transition:transversion ratio of 1.83:1 (Figure 1). ARLEQUIN 3.5 obtained an F ST value of 0.00271 (P > 0.05) between the two populations using all the retained SNPs. The population pairwise F ST was 0.00112 (P > 0.05) based on the neutral SNPs and 0.28473 (P < 0.001) based on the outlier SNPs. Results of ADMIXTURE and NETVIEW showed that all individuals from Dandong and Xiapu were grouped into a single cluster when using the neutral or all SNPs. However, when the analysis was based on the outlier SNPs, the two populations were clearly separated (Figure 2).

BLAST Analyses of the Divergent Loci
A total of 250 outlier SNPs representing 238 unique contigs in the draft genome were detected, with all showing high F ST values (0.209-0.435). BLASTX similarity searches demonstrated that 85 contigs (36%) significantly matched to zebrafish proteins, with 61 of these contigs corresponding to zebrafish proteins Gene Ontology (GO) functional annotations (Figure 3). These annotations are represented by GO terms including signaling, response to stimulus, metabolic process, biological regulation, catalytic activity, development process, and others (Figure 4). Classification of biological and molecular functionalities for these hits is listed in Supplementary Table S1. A total of 228 zebrafish genes, which were homologous to the 85 contigs with significant BLASTX hits, were loaded into the DAVID database and were found to correspond to 224 DAVID IDs. Enriched KEGG pathways including 52 genes classified into nine pathways were summarized in Table 3. The pathway with the highest number of genes was endocytosis (10 genes), including several genes such as those encoding chemokine (C-X-C motif) receptor 4, fibroblast growth factor receptor, and par-6 partitioning defective 6 homolog. Other pathways of particular interest were neuroactive ligand-receptor interaction (nine genes), calcium signaling pathway (eight genes), intestinal immune network for IgA production (three genes), and mitogen-activated protein kinase (MAPK) signaling pathways (eight genes).

The Number of Single-Nucleotide Polymorphisms Generated Based on Single Reads or Paired Reads
A total of 4,227,450 SNPs were called using the first reads before filtering, whereas 7,741,237 SNPs were called when using the second reads. A total of 2,449,359 for the first reads, 3,724,441 for the second reads, and 5,515,225 for both of the paired reads were retained after applying filtration. However, much fewer SNPs were retained for the second reads while applying the filter on each individual, suggesting low depth of the second reads. This is a common phenomenon in RAD, as the randomly sheared fragments are expected to be sequenced at a much lower depth than the other ends. However, the longer fragment coverage of the second reads increases the chance for targeting loci under local adaptation and is also suitable for annotations.

DISCUSSION
Compared with microsatellites, SNPs are the most widespread type of sequence variation in genomes. Despite microsatellites presenting higher diversity per locus (with a mutation rate of 10 −4 per generation), the large number of unlinked SNPs may overcome their relatively low mutation rates (10 −8 -10 −9 per generation), and panels of several 1000 SNPs are likely to be more informative than the 10-20 microsatellite loci used in  standard population genetic studies (Helyar et al., 2011;Seeb et al., 2011). Here, we reported the discovery of a large number of genome-wide SNPs in the Japanese eel using RAD sequencing. After quality filtering, 73,557 high-quality SNPs were retained, which further improved the amount of genomic resources available for this endangered species. Most previous population genetic studies of A. japonica were conducted using less than 20 microsatellite loci, mitochondrial sequence, or isozyme markers, resulting in inconsistent findings (Sang et al., 1994;Chan et al., 1997;Tseng et al., 2003Tseng et al., , 2006Tseng et al., , 2009Han et al., 2010a). Compared  with other studies of eels on the basis of RAD (Pujolar et al., 2013(Pujolar et al., , 2014, our study made full use of the paired-end reads, which resulted in significantly more markers and enabled the identification of genomic regions under selection. The transition:transversion ratio of the SNPs in our dataset was 1.83:1, which is in keeping with similar ratios that have been reported in the study of European eel (Anguilla anguilla) (1.6:1) (Pujolar et al., 2013) and great tit (Parus major) (1.7:1) (Bers et al., 2010). The overall non-significant pairwise F ST (0.00271, P > 0.05) indicated that most of the genome is homogenized by gene flow, supporting the theory of panmixia in the Japanese eel. However, the F ST based on the outlier SNPs was 0.285 and significant (P < 0.001). In addition, both ADMIXTURE and NETVIEW analyses based on the outlier

KEGG pathway Count Genes
Endocytosis 10 Chemokine (C-X-C motif) receptor 4a Chemokine (C-X-C motif), receptor 4b Fibroblast growth factor receptor 2 Fibroblast growth factor receptor 3 Fibroblast growth factor receptor 4 Par-6 partitioning defective 6 homolog beta (Caenorhabditis elegans) Par-6 partitioning defective 6 homolog gamma A (C. elegans); similar to par-6 partitioning defective 6-like protein gamma Par-6 partitioning defective 6 homolog gamma B (C. elegans) Similar to CXC chemokine receptor-2 zgc:162756; par-6 partitioning defective 6 homolog alpha (C. elegans) SNPs demonstrated strong genetic differentiation between the two populations. Taken together, these results indicated that some genomic regions of the Japanese eel have undergone divergent natural selection. Furthermore, RAD sequencing-based population genomic studies can also detect signatures of selection and local adaptation. Candidate genes and genomic regions were identified using an F ST outlier approach by detecting loci showing increased or decreased differentiation across populations compared with neutral expectations, which are suggestive of directional or purifying natural selection. For a panmictic species like Japanese eel, SNP-based genome scans can be used for test selection within a single generation, which results from geographically varying environmental conditions encountered by glass eels across different regions. Such outlier detection methods have been illustrated by several recent population genomic studies in marine fishes displaying extensive gene flow (Russello et al., 2012;Milano et al., 2014). For instance, Hess et al. (2013) detected fine-scale population structure in Pacific lamprey (Entosphenus tridentatus) by using 162 putatively adaptive SNPs identified by RAD sequencing. Larson et al. (2014) found three potential regions of adaptive divergence in Chinook salmon (Oncorhynchus tshawytscha) using RAD sequencing, showing that RAD sequencing is a promising approach for marine species with large population sizes and shallow structures.
The two populations of A. japonica analyzed were collected from two environmentally heterogeneous locations, and different environmental factors may lead to local adaptation. BLASTX and BLAST2GO revealed that 61 out of 238 highly divergent contigs were located in functional genes or genomic regions, which might indicate divergent natural selection (Foll and Gaggiotti, 2008). The low frequency of BLASTX-annotated outliers suggested that substantial outlier polymorphisms detected might be in the intergenic or intronic regions of the genome. The KEGG pathway analysis showed that some of the putative targets under local selection were associated with genes involved in several important pathways, such as calcium signaling pathway, intestinal immune network for IgA production, cytokinecytokine receptor interaction, and MAPK signaling pathways. Differences in these pathways might indicate local adaptation of A. japonica to different geographical environments, which vary in salinity and temperature. For instance, the calcium signaling pathway plays an important role in fertilization, development, learning and memory, ion exchange, and osmoregulation (Berridge et al., 2000;Daverat et al., 2006). The intestine can generate large amounts of non-inflammatory immunoglobulin A (IgA) antibodies that serve as the first line of defense against microorganisms (Reinking et al., 1999) and is also crucial for adaptive inflammatory host defenses, cell growth, differentiation, cell death, and angiogenesis (Lackie, 2010). Similar results were also reported in other studies. For instance, Guo et al. (2015) investigated genome-wide patterns of genetic variability of Atlantic herring using the RAD sequencing approach, and outlier analyses uncovered 100s of directionally selected loci that were associated with salinity and temperature. Pujolar et al. (2014) found a small set of SNPs showing high genetic differentiation, which is consistent with single-generation signatures of spatially varying selection acting on European eel (glass eels) using RAD sequencing. After 50,354 SNPs were screened, a total of 754 potentially locally selected SNPs were identified. Candidate genes for local selection include those involved in calcium signaling, neuroactive ligand-receptor interaction, circadian rhythm, and others.
It remains an open question whether the adaptive loci inferred in this study were directly related to the local adaptation of A. japonica populations to specific environments or were linked to genes underlying specific traits. Further analyses are necessary to confirm divergent selection on outliers through a correlation analysis between allele frequencies and environmental factors among populations (Seeb et al., 2011;Limborg et al., 2012;DeFaveri et al., 2013;Teacher et al., 2013).

DATA AVAILABILITY STATEMENT
All data generated or analyzed during this study are included in this published article and its Supplementary Information Files.

ETHICS STATEMENT
The animal study was reviewed and approved by the Institutional Ethics Committee of Zhejiang Ocean University.

AUTHOR CONTRIBUTIONS
J-XL and B-JL conceived and designed the research. B-JL, Y-LL, and B-DZ conducted the experiments, analyzed the data, and wrote the manuscript. All authors critically reviewed and approved the manuscript.