No Population Genetic Structure of Skipjack Tuna (Katsuwonus pelamis) in the Tropical Western and Central Pacific Assessed Using Single Nucleotide Polymorphisms

Anderson, Giulia; Lal, Monal; Stockwell, Brian; Hampton, John; Smith, Neville; Nicol, Simon; Rico, Ciro

doi:10.3389/fmars.2020.570760

ORIGINAL RESEARCH article

Front. Mar. Sci., 18 December 2020

Sec. Marine Fisheries, Aquaculture and Living Resources

Volume 7 - 2020 | https://doi.org/10.3389/fmars.2020.570760

No Population Genetic Structure of Skipjack Tuna (Katsuwonus pelamis) in the Tropical Western and Central Pacific Assessed Using Single Nucleotide Polymorphisms

Giulia Anderson^1,2*

Monal Lal^1,3

Brian Stockwell¹

John Hampton²

Neville Smith²

Simon Nicol²

Ciro Rico^1,4

¹Molecular Analytics Laboratory (MOANA-LAB), School of Marine Studies, Faculty of Science, Technology and Environment, The University of the South Pacific, Suva, Fiji
²Oceanic Fisheries Programme (OFP), Pacific Community, Nouméa, New Caledonia
³Australian Centre for Pacific Islands Research, School of Science and Engineering, University of the Sunshine Coast, Maroochydore, QLD, Australia
⁴Instituto de Ciencias Marinas de Andalucía (ICMAN), Consejo Superior de Investigaciones Científicas, Puerto Real, Spain

Skipjack tuna (Katsuwonus pelamis) has historically been considered panmictic at the ocean basin scale because of the species’ life history and substantial absolute population size. However, recent advances in population genomics have enabled scientists to demonstrate population structure in other species that were likewise traditionally assumed to be panmictic. Accordingly, we used DArTseq, a proprietary platform of reduced-representation genome sequencing developed by Diversity Arrays Technology (DArTech) to genotype 222 skipjack tuna from 9 sample groups collected across 6 exclusive economic zones (EEZs) and over 6 years in the tropical Western and Central Pacific Ocean. This is one of the most fine-scale population genetics assessments of skipjack tuna to date. We found several statistically significant pairwise F_ST comparisons using purportedly neutral loci, but very little indication of structure based on observed and expected heterozygosity, effective population size, a Mantel test, AMOVA, numerous genetic clustering and population assignment tests, and exploration of correlation with oceanographic features. Significant pairwise F_ST values were often shallow and resulted in uncertainty about their biological significance. Furthermore, a test of the hypothesis that the geographic and temporal distribution of the potentially adaptive genetic diversity may depict cryptic fine scale structure using two different outlier detection methods, could not reject panmixia based on estimates of fixation indices and two clustering software. Our results support the presence of a single population of skipjack tuna in the tropical Western and Central Pacific Ocean, but highly recommend the expansion of the study area, in latitude, longitude, and seasonality, before drawing any global conclusions.

Introduction

As cosmopolitan fishes with large absolute population sizes and a remarkable propensity for trans-oceanic movement, tunas fit the life history of species that are not expected to show population structuring (Funk et al., 2012). A few species are widely recognized to function outside this model, particularly the three Bluefin tunas (Thunnus thynnus, T. orientalis, and T. maccoyii). Structure in Bluefin species is partly explained by a life history that includes structured breeding migrations (Richardson et al., 2016), which can produce cryptic reproductive subunits within an ostensibly mixed group. Bluefin tunas also suffer from previous or current overfishing (ISSF, 2017), which can cause population differentiation through fragmentation. However, the panmictic nature of other species is now also under increasing scrutiny. Growing evidence suggests that yellowfin tuna (Thunnus albacares) has the potential for genetic population structuring (Dammannagoda et al., 2008; Kunal et al., 2013; Grewe et al., 2015; Pecoraro et al., 2016), given their capacity for precise homing using a geomagnetic sensory organ (Walker et al., 1984), stronger than expected regional fidelity (Schaefer et al., 2007), and long-term coordinated movements of small groups (Klimley and Holloway, 1999) that may represent co-dispersal of kin (Anderson et al., 2019). There is even discussion of temporally variable population structure in albacore (Thunnus alalunga) in the Atlantic, given the regimented distribution of adults and juveniles, and separation of juveniles based on migratory routes (Duncan et al., 2018). In contrast, there is only limited discussion about the potential for population structure in skipjack tuna (Katsuwonus pelamis), and little evidence to merit its expectation.

Skipjack demonstrate many of the life history traits associated with panmixia. Skipjack in the Western Pacific grow to 35 cm in 6 months (Tanabe et al., 2003), can reach sexual maturity upon growing to a length of 40 cm (Ashida et al., 2008), produce 80,000–2,000,000 eggs (FAO, 2004) on a nightly basis (Hunter and Macewicz, 1986; Ashida et al., 2008), live only 6–8 years (Collette et al., 2011), and maintain a large absolute population size, estimated at roughly 4 MT (Vincent et al., 2019). It is logical to assume that high gene flow and lack of influence from genetic drift associated with the skipjack lifestyle would negate the possibility of intra-ocean population differentiation (Funk et al., 2012).

However, other evidence contradicts the simple assumption of panmixia in skipjack. Tagging and catch per unit effort studies confirm that populations in both the Western and Eastern Pacific migrate westward and to higher latitudes in the summer to feed (Fujino, 1996). Likewise, skipjack experience environmental stresses that drive coordinated movements within size-based subgroups that reflect age stratification (FAO, 2004). Because they lack the size and rete mirabile of other principle market tunas (Fudge and Stevens, 1996), skipjack are sensitive to temperature changes (Lehodey et al., 1997) and are spatially limited by the 15°C isotherm, with smaller fish preferring still warmer waters (FAO, 2004). Skipjack are likewise restricted by their high metabolism to oxygen-rich waters (Wild and Hampton, 1993; Essington, 2003), with young fish vertically separated from older generations in captivity in coordination with temperature and oxygen profiles (Barkley et al., 1978). Age-specific thermal and minimum oxygen demands are even incorporated in increasingly accurate predictive models of skipjack distribution (Senina et al., 2020). Size-based distribution within a larger population can serve as the basis for cryptic population structure.

Precedent also exists for acknowledging population structure among skipjack from a genetic perspective. Blood specimen analyses demonstrate an east-west cline among Pacific skipjack based on carboxylesterase and guanine deaminase allozymes (Richardson, 1983). As many as three semi-independent groups have been identified using esterase (Fujino, 1996), dividing the ocean basin into central-western, central-northeastern, and central-southeastern partitions. Mitochondrial studies also identified separation of skipjack populations between the Indian Ocean and the northwestern Pacific (Menezes et al., 2006), highlighting the capacity for more significant population structuring in the presence of semi-permeable oceanographic barriers like the Indonesian Throughflow (ITF). The ITF is an ocean current that serves as the upper branch of the global heat conveyor belt and it is known to limit dispersal of larvae from numerous pelagic species (Jackson et al., 2014). Finally, a size-specific discrepancy in esterase allozymes within New Zealand waters (Richardson and Habib, 1987) further supports the potential for cryptic population structure within any location.

Exploration of population genetics structure in all tuna species has benefited from the increased statistical power and reduced costs of analysis associated with next generation sequencing (NGS). Differentiation of yellowfin population within the Pacific has been revised from a tentative possibility supported by select allozymes (Ward et al., 1997), microsatellites (Diaz-Jaimes and Uribe-Alcocer, 2006), and mitochondrial regions (Li et al., 2015) that at best identified two populations, to a consistently demonstrated and increasingly fine scale phenomenon. Grewe et al. (2015) used 215 purportedly adaptive loci at Single Nucleotide Polymorphisms (SNP) and argued not only for separation of Eastern and Western Pacific yellowfin groups, but also for more shallow differentiation between Western and Central Pacific sub-populations. Genotyping by sequencing for the study was conducted by Diversity Arrays Technology (DArTech) using the NGS protocol DArTseq.

Given the success of next generation sequencing (NGS) studies to reveal novel genetic structure in similar tuna species, it is possible that skipjack tuna also experience currently unrecognized genetic structure. In fact, a recent and thorough review of genetic and non-genetic studies of skipjack structure in the Pacific explicitly highlights the lack of a comprehensive assessment of skipjack using NGS protocols as a knowledge gap (Moore et al., 2020b). It is especially important to explore the possibility of unrecognized structure in skipjack because the species supported a 3.0 MT fishery globally in 2018, 1.8 MT of which originated just from the western and central Pacific Ocean (WCPO) (WCPFC, 2019). Therefore, we analyzed 222 skipjack tuna from six EEZs and across 6 years in the tropical WCPO using the DArTseq protocol. We designed sample collections to uniquely control for spatial and temporal variation in structure. Specifically, three sample groups in three different countries were collected within 8 weeks of each other; likewise, four sample groups were collected within the Papua New Guinea EEZ in the same season but in different years. We calculated standard measures of population structure and SNP-optimized population assignment analyses using neutral loci (NL), and compared differences against both geographic and oceanographic features that might explain the observed patterns of subtle differentiation. Furthermore, we explored the hypothesis that the spatial and temporal distribution of potentially adaptive genetic diversity depicts fine scale structure by conducting a parallel analysis using loci under potential selection (LUPS).

Materials and Methods

Sample Collection

Fish specimens were selected for analysis from tissue samples archived in a tissue bank collection managed by the Pacific Community (SPC) under the auspices of the Western and Central Pacific Fisheries Commission (WCPFC), in Noumea, New Caledonia. Tissue samples consisted of muscle plug biopsies taken by scientists during research cruises. Tissue samples within the archive reflect an opportunistic sampling distribution, from which specimens for this study were selected based on catch location and date, such that all samples in a grouping were collected within 6 weeks of each other, and within an area of 3,400 km² (radius of 330 km). When possible, samples were selected from archived tissue collected in the same EEZ and season, but from different years, for a uniquely controlled temporal comparison. Likewise, multiple sample groups from different countries were collected within the same 2-month period in order to produce a concisely controlled spatial comparison. Samples were accompanied by metadata including catch location, date, and catch event number, fish size, and a description of objects associated with the school of interest (Supplementary Table 1). Two hundred and fifty nine individuals from nine geographic locations were collected, representing five countries and spanning 6 years (Table 1 and Figure 1). Sample groups were collected in waters of the EEZ of the Federated States of Micronesia in 2008 (FM08), the Gilbert Islands of Kiribati in 2009 (GL09), Indonesia in 2009 (ID09), the Marshall Islands in 2009 (MH09), Papua New Guinea in 2007 (PG07), 2009 (PG09), 2011 (PG11), and 2012 (PG12), respectively, and Palau in 2008 (PW08) and ranged in size from 23 to 31 specimens (Table 1).

TABLE 1

Table 1. Initial population parameters and extraction of individuals during quality filtering.

FIGURE 1

Figure 1. Map of sample site locations. Size of circle represents the number of fish caught.

DNA Extraction and Quality Filtering

Genomic DNA was extracted from muscle samples and processed for reduced representation library construction, sequenced, and genotyped by Diversity Arrays Technology (DArTech) using its patented protocol, DArTseq. DArTseq^TM data has previously been used to demonstrate population genetic structure in yellowfin tuna (Thunnus albacares) (Grewe et al., 2015). Some of the procedures are proprietary knowledge, but reasonably detailed descriptions are available (Sansaloni et al., 2011; Kilian et al., 2012; Cruz et al., 2013; Ren et al., 2015). Briefly, DNA was extracted using a basic CTAB protocol in the DArTech laboratory. Purified DNA underwent genome complexity reduction with a double restriction digest using a PstI and SphI methylation-sensitive restriction enzyme combination. Highly methylated portions of the genome tend to be highly repetitive and uninformative in genomic structure analyses (Kilian et al., 2003). Adaptors that include variable barcode sequences and Illumina flowcell attachment sequences were ligated to fragments. PCR amplified only mixed fragments in a sequence of initial denaturation at 94°C for 1 min, followed by 30 cycles of 94°C for 20 s, 58°C for 30 s, and 72°C for 45 s. A final extension step took 7 min at 72°C. Libraries were sequenced on an Illumina HiSeq 2500 platform, and raw reads obtained following sequencing were processed using Illumina CASAVA v.1.8.2 software for initial assessment of read quality and sequence representation. The DArT PL proprietary software pipeline, DArTtoolbox, performed further filtering and variant calling, and generated final genotypes according to Kilian et al. (2012). Raw datasets are publicly available from the Open Science Framework under Digital Object Identifier DOI: 10.17605/OSF.IO/MPE5U

The species identification of all samples was confirmed using the COI barcoding gene. Samples were shipped to the Australian Genome Research Facility Ltd. (AGRF) wherein DNA was extracted and the COI gene was sequenced using the Fish COI primer set supplied by Carolina Biological Supplies. Sequences were then compared against the GenBank data set¹ for species identification. One specimen from the MH09 sample group proved more compatible with the yellowfin database than skipjack, and it was removed from further analyses. Sequences are available on GenBank using accession numbers MT645970-MT645989.

Following preliminary filtering from DArTech, the genomic dataset consisted of 38,507 single nucleotide polymorphisms (SNPs). The dataset underwent three further rounds of quality filtering, first to delete individuals of suspect quality, then to filter SNPs using the refined sample groups and finally to isolate LUPS. The same thresholds were applied in all filtering efforts. SNP loci located on the same contig, which is metadata provided in the DArTech dataset, were removed except the locus with the highest call rate. Further locus quality filtering used thresholds of 99% for call rate, 7x for read depth, and 5% for minor allele frequency calculated across all samples. F_ST outliers were initially identified using Bayescan v. 2.01 (Foll and Gaggiotti, 2008) with an FDR of 10%. All F_ST outliers were extracted and the remaining neutral loci (NL) was further processed in PLINK v. 2 (Chang et al., 2015) for linkage disequilibrium greater than 70%, and Hardy Weinberg Equilibrium with a cut-off value of p < 0.0001 across all individually analyzed populations. Although a stricter p-value should have been used in filtering for HWE to accommodate for the high number of loci, we selected the most stringent available value.

At the end of the first round of filtering, individuals with outlier proportions (defined as three standard deviations above the average) of observed heterozygosity, missing data, or the ratio of heterozygous loci to homozygous loci, were deleted. All remaining individuals were further tested for genetic relatedness. The presence of kin groups in population samples can skew population structure results (Selwyn et al., 2016), but can also indicate contamination or duplication of DNA samples (Flickinger et al., 2015). COANCESTRY v 1.0.1.7 (Wang, 2011) established relatedness based on identity by descent using the TrioML algorithm assuming no inbreeding. COANCESTRY also generated 200 half-sibling dyads using the same allele frequencies as the final dataset. The lower 95% confidence interval of the coancestry coefficients from the simulated dyads was used as the cut-off value for related individuals in the empirical dataset. Twenty-eight dyads consisting of 53 individuals were identified as half- or full-siblings, the vast majority of which included individuals from neighboring wells on the sequencing plate. Further investigation is needed to explain the presence of purportedly related individuals. For analyzing structure, one individual from each dyad was removed from the final dataset.

After all suspect individuals were removed, the dataset was filtered a second time using the same filtering steps but without potential bias introduced by potentially cross-contaminated individuals. At the end of the second round of filtering, loci with observed heterozygosity greater than 0.5 were also extracted, as these loci have an elevated probability of including paralogs (O’Leary et al., 2018).

The data produced above is the primary dataset. A series of other data subsets were also produced to explore potential patterns that might be overwhelmed by noise in the full dataset. The majority of specimens taken in Palau in 2008 were small enough that they verge on juvenile status. In order to confirm that the presence of an anomalous group of juveniles did not mask other forms of population structure, a “non-juvenile dataset” extracted the PW08 sample group before secondary filtering of loci. Likewise, a “PG only” subset maximized loci selection for the four sample groups from Papua New Guinea in order to assess temporal structure, and a “PG-joined” dataset avoided potential skewing of analyses by selecting thirty individuals at random from across the four PG groups to represent the EEZ as a single group. Finally, to explore if the geographic and temporal distribution of the potentially adaptive genetic diversity may uncover cryptic fine-scale structure, we used two different software programs to identify candidate loci under selection or LUPS. First, BayeScan v.2.1 (Foll and Gaggiotti, 2008; Foll, 2012) was run again but with FDR threshold of 20% to increase the chances of detecting LUPS. Secondly, we used HacDivSel (Carvajal-Rodríguez, 2017). As these software packages employ different analytical approaches, their joint use increased the statistical confidence of F_ST outlier detection (White et al., 2010; Kovach et al., 2012; Pujolar et al., 2014).

Genetic Analyses

To validate the statistical power of the datasets to identify population structure, the number of loci, individuals, populations, and allele frequencies per locus from the primary dataset were replicated in POWSIM (Ryman and Palm, 2006) using recommended settings and run for 6, 15, and 30 generations. The resulting probability of successfully identifying structure was assessed for true differentiation values of 0.001, 0.0025, and 0.005.

Once adequate statistical power was confirmed, the primary dataset was used to calculate basic measures of genetic diversity. Sample size adjusted expected and observed heterozygosity (H_n.b. and H_o) were calculated using GENETIX v 4.05 (Belkir et al., 2004). Genetix also provides H_n.b. and H_o information per locus per sample group. T- and Mann Whitney tests for difference between H_n.b. and H_o at all loci within a sample group determined the statistical significance of any observed heterozygosity deficit or excess. The inbreeding coefficient F_IS was calculated using commands “basic.stats” and “boot.ppfis” (1,000 permutations) in the R package hierfstat (Goudet, 2005). N_e was calculated using both a linkage disequilibrium and heterozygosity excess algorithm in NeEstimator v 2.01 (Do et al., 2014), assuming random mating. Values are reported at an α value of 0.01. The analysis was also re-run on all samples grouped into western and central Pacific regions, and as a single population. Pairwise F_ST (Weir and Cockerham, 1984) was analyzed in Arlequin v 3.5.2.1 (Excoffier and Lischer, 2010) using 10,000 permutations and adjusted using the Bonferroni correction. A mantel test using the command “mantel.rtest” from the R package ade4 (Dray and Dufour, 2007) checked for patterns of isolation by distance using Euclidean distances, which were measured between the average latitude and longitude of each sample group using the command “spDists” from R package “sp” (Pebesma and Bivand, 2005). As an alternative to a mantel test, a decomposed pairwise regression was attempted in R using the command “lm()” as described in Koizumi et al. (2006). A hierarchical AMOVA in Genodive v. 3.3 (Meirmans and van Tienderen, 2004) organized specimens first by catch events and then sample groups in order to check for school-level structure.

The dataset was also submitted to a four population clustering software programs with different methodologies. First, a discriminant analysis of principle components (DAPC, available in the R package adegenet; Jombart and Ahmed, 2011) uses a multivariate method to maximize variation between previously allocated sample groups given a user-determined number of principle components. The optimal number of principle components is determined by using the R command “a.score.optim” to identify the highest α-score produced by the lowest number of principle components. Second, network analyses carried out using the Netview R package (Neuditschko et al., 2012; Steinig et al., 2016) generate population networks based on a shared allele 1-identity-by-state (IBS) distance matrix. The IBS matrices and corresponding networks for the current dataset were constructed at various thresholds of the mk-NN value between 1 and 50, after which the optimal network for representation was selected based on cluster stability (Steinig et al., 2016). Individual networks are visualized and edited in the Cytoscape v.2.8.3 network construction package (Smoot et al., 2011).

Third, fastSTRUCTURE v. 1.0 (Raj et al., 2014) uses a variational Bayesian inference algorithm that builds on the STRUCTURE program to coerce individuals to a user-determined number of groups. The optimum number of distinct genetic clusters (k) into which individuals is arranged is determined by marginal likelihood. We ran samples using the full model and a logistic prior at each value of k between 1 and 10. ADMIXTURE v. 1.3.0 (Alexander et al., 2009) uses a similar approach to fastSTRUCTURE but draws conclusions based on maximum likelihood and compares the resulting amount of cross-validation error (CV) to recommend k. We ran ADMIXTURE for k between 1 and 10 and selected the result with the lowest associated CV.

To explore the impact of oceanographic as well as geographic mechanisms on skipjack structuring, Pearson correlations were also explored between pairwise F_ST values and either the difference in average monthly sea surface temperature (SST), or monthly average chlorophyll-a concentrations (chlor-a). SST and chlor-a were chosen based on previous acknowledgment of each measurement as effective predictors of skipjack distribution (Monllor-Hurtado et al., 2017; Zainuddin et al., 2017), using the logic that they might also prompt size- or competitiveness-specific movements that separate fish based on age. Remote sensing oceanographic data for this analysis was downloaded from the NASA Ocean Color website². Data files originated from the SeaWiFS depository, and included all available daytime measurements of SST and chlor-a collected between 1st March 2007 and 14th March 2012 that included any part of the region between geographic coordinates 7°N and 11°S, and 131°E and 173°E. Measurements were taken at 9 km resolution. SST and chlor-a data from the location of each sample group were collected for 28 days prior to the sampling events, and averaged. The differences between average values for both oceanographic features were calculated between sample locations, as a foil for pairwise F_ST values. The two sets of values were measured for Pearson’s product-moment correlation using R command “cor.test ().”

Auxiliary datasets were assessed using appropriate subsets of analyses and the same methods as the primary dataset. The non-juvenile and PG-joined datasets were used to calculate heterozygosity, inbreeding, effective population size, AMOVA, pairwise F_ST, a Mantel test, and a DAPC. The PG-only dataset was submitted to an AMOVA, pairwise F_ST, a Mantel test, and a DAPC. The purportedly selectively neutral primary dataset generated following MAF filtering was used to identify additional outlier loci as described above. Screening for F_ST outlier loci identified markers potentially affected by selection, genetic drift, as well as hitch-hiking loci. BayeScan computations employed default settings (burn-in = 50,000, sample size = 5,000, total iterations = 100,000, pilot runs = 20 × 5,000) with FDRs = 0.001, 0.005, 0.01, 0.05, 0.1 and 0.2, and verification of loci detected at each FDR using QQ plots (data not shown). HacDivSel analyses employed the Extreme Outlier Set EOS test using a two-step heuristic algorithm (see Carvajal-Rodríguez, 2017 for further details). As HacDivSel is only able to make pairwise population comparisons, 36 pairwise runs were carried out. Genetic structure was estimated by calculating fixation indices as described above and visualized using a DAPC and fastSTRUCTURE with datasets containing all loci, neutral loci only and putative outliers only.

Finally, the filtered dataset (1,479 SNPs) generated earlier was used to create a population dendrogram to further explore if any genetic structure is present within the sample groups. Using the propShared function of the adegenet package in R (Lal et al., 2016; Kjeldsen et al., 2018), the 1-proportion of shared alleles distances (1-PSA) were computed and the resulting matrix was used to generate a neighbor-joining tree in MEGA6 (Tamura et al., 2013; Supplementary Figure S2).

Results

DNA Extraction and Quality Filtering

Quality filtering of individuals resulted in the extraction of 38 individuals and retained 220 in the primary dataset (Table 1). Sample group MH09 was reduced to 17 individuals, while all other sample groups have 23 or more members. The reduced sample size of MH09 requires extra precaution when drawing inferences about that group. Subsequent quality filtering for loci identified 1,373 neutral loci using the primary dataset, which consists of nine sample groups taken in the Federated States of Micronesia in 2008 (FM08), the Gilbert Islands of Kiribati in 2009 (GL09), Indonesia in 2009 (ID09), the Marshall Islands in 2009 (MH09), in Papua New Guinea in 2007, 2009, 2011, and 2012 (PG07, PG09, PG11, and PG12, respectively), and in Palau in 2008 (PW08).

A statistical power analysis using POWSIM confirmed that a dataset incorporating the number of samples, sample groups, loci, and allele distributions as the primary dataset would have a 100% probability of producing statistical significance at differentiation levels as low as F_ST = 0.0025, and a 96.8% chance of identifying structure at F_ST = 0.001.

Subdivision and re-filtering of the primary dataset produced four other data subsets of neutral loci, and one LUPS dataset. Number of neutral loci remaining after each quality-filtering step for primary dataset is listed in Table 2, while results for auxiliary datasets (i.e., no-juveniles, PG-only, PG-joined and LUPS) are presented in Supplementary Table S2. Comparisons of results from the same analyses run on the primary dataset and various data subsets helps guide confidence in the robustness of primary analyses. However, the result trends, if not absolute values, produced by alternative datasets including the analyses conducted using 453 LUPS are redundant to patterns in the primary dataset. These results are clearly illustrated by the F_ST values obtained with the neutral primary and LUPS datasets (Tables 3A,B) and the visualization of clustering patterns using DAPC (Figures 2A,B). Therefore, a comparison of results for duplicated analyses including figures and tables is available in Supplementary Tables S2–S7 and Supplementary Figure S1. Only informative observations will be highlighted directly in-text.

TABLE 2

Table 2. Number of neutral loci remaining after each quality-filtering step for primary dataset.

TABLE 3

Table 3. Pairwise F_ST values (below diagonal) and associated p values (above diagonal) between population samples.

FIGURE 2

Figure 2. (A) DAPC using neutral loci from the primary dataset, 47 principle components, and 8 degrees of freedom. (B) DAPC using LUPS dataset, 34 principle components, and 8 degrees of freedom. Placement of individuals relative to each other reflects their genetic similarity and extent of genetic admixture irrespective of the purported neutrality of potential selectivity.

Genetic Analyses

Analyses using the primary dataset included standard measures of genetic diversity and differentiation. Sample size adjusted expected heterozygosity (H_n_b) of the nine sample groups ranged from 0.233 (PG11) to 0.255 (ID09). Observed heterozygosity (H_O) ranged from 0.211 (PG11) to 0.270 (ID09). The largest discrepancy between expected and observed heterozygosity was in PG11, with a significant difference in the form of a heterozygous deficit of 0.022 (p < 0.05) (Table 4). Significant heterozygosity deficits were found in all Papua New Guinea samples, while an excess was present in the fish collected in Indonesian waters (p < 0.05). Calculations of the inbreeding coefficient F_IS were small or negative, ranging from −0.058 (ID09) to 0.095 (PG11) (Table 4). Effective population sizes (N_e) were estimated to range from 249 (95% confidence interval of 219–287; ID09) to 1,115 (confidence interval of 771–2003; PG11) based on linkage disequilibrium (Table 4). N_e values calculated using a heterozygosity-based algorithm are predictably less reliable than those assessed by linkage disequilibrium, and ranged from 14 (ID09) to infinite (multiple instances). When samples were amalgamated into Central and Western Pacific groups, N_e values calculated via the LD method jumped to 1,282 (range 1,012–1,750) in the Central Pacific and 1,585 (range 1,464–1,728) in the Western Pacific. A global analysis of N_e increased further to 1,751 (range 1,633–1,889). Finally, using the neutral primary dataset, 14 of 36 pairwise F_ST values, which represent the fixation indices and summarize the extent of allele frequency difference between two groups, were significant at a p-value of 0.05 after Bonferroni correction. Significant F_ST values ranged from 0.004 (between ID09 and PG09) to 0.008 (between ID09 and PG07) (Table 3A), with a mean value of 0.005. Similar trends in the results using the LUPS dataset were obtained (Table 3B). Only nine of 36 pairwise comparisons were significantly different after Bonferroni correction. Of these, only four correspond to the same sample pairs from the neutral primary dataset.

TABLE 4

Table 4. Inbreeding coefficient (F_IS), expected and observed heterozygosity (H_nb and H_o, respectively), p-values for significant differences between H_nb and H_o, p values for difference measured by T-test and Mann-Whitney Test (significant p-values in bold), and effective population size (N_e) calculated using the primary dataset.

A Mantel test comparing pairwise F_ST values against Euclidean distances showed a moderate and statistically significant association between distance and genetic differentiation (Supplementary Table S7), with an r² value of 0.643 and associated p-value of 0.001. It should be noted, the PG-joined dataset was not statistically significant (r² = 0.470, p = 0.123) in the same analysis. Alternatively, a decomposed pairwise regression found no indicator of outlier populations that might overshadow geographic patterns during analyses, which lends validity to the mantel test results. An AMOVA placed 0.04% of variation among catch events within sample groups, 0.5% among sample groups, and 99.2% within individuals, providing no evidence for hierarchical genetic sub-structuring.

Mantel and AMOVA results from data subsets provide helpful insight (Supplementary Tables S6, S7). No dataset allocated more than 1% of genetic variation to between sample groups. PG-joined was the most clearly structured at only 0.8%. However, all analyses produced significant p-values. The mantel test using the No Juveniles dataset improved the correlation between genetic and geographic distance compared to the primary dataset (r² = 0.713, p = 0.001), while PG-joined lost statistical significance (r² = 0.134, p = 0.301). Heterozygosity, F_IS, N_e, and F_ST values all increased slightly in absolute values over the primary dataset, but maintained their relative trends.

Four software programs that have been engineered to describe sample clustering and population assignment using SNPs all produced concordant results using the neutral primary dataset. For the LUPS dataset, only DAPC and fastSTRUCTURE were used to explore the potential number of divergent samples and produced concordant results with the above analyses. A Discriminant Analysis of Principle Components (DAPC, Figure 2) uses multivariate analyses in order to maximize differentiation between user-defined sample groups and visually represents them through scatterplots. Alpha-optimization recommended the use of 47 out of 73 possible principle components for the neutral primary dataset, resulting in scatterplots that very marginally separated FM08 from other sample groups, and otherwise strongly overlapped all groups. Alpha-optimization recommended the use of 34 out of 73 possible principle components for the LUPS dataset, resulting also in scatterplots that strongly overlapped all sample groups and thus, confirmed broadscale admixture with an even stronger signal than with neutral loci.

Netview R uses cluster analysis to represent population structure, but produces population networks based on genetic similarity among individuals, without prior knowledge of sampling groups. Netview R displayed high levels of admixture among individuals in a single, large, diffuse cluster, suggesting a weak level of fine-scale genetic structure (Figure 3). Network patterns stabilized when using maximum number of nearest neighbor (mk-NN) values > 15, however, population networks constructed at a range of mk-NN values are presented in Supplementary Figure S2.

FIGURE 3

Figure 3. Fine-scale population genetic structure network analyzed using Netview R at mk-NN = 24. Node sizes displayed are mapped to the relatedness of individual fish, where larger node diameters denote higher levels of relatedness.

The software fastSTRUCTURE predicts k, the true number of distinct genetic clusters within a sample set, based on marginal likelihood. Both, the neutral primary and LUPS datasets functioned best at k = 1, indicating the presence of a single genetic cluster (Table 5). ADMIXTURE software uses a similar model to predict k, but selects the best number of clusters by attempting to minimize the cross-validation error. The inclusive dataset also performed best at k = 1 (Table 5).

TABLE 5

Table 5. fastSTRUCTURE and ADMIXTURE results associated with each possible k of 1–10.

Finally, while a phylogenetic analysis is outside the scope of the current assessment, as we are not exploring the longer-term evolutionary history of the observed population samples, we produced a dendrogram based on a 1-proportion of shared alleles distance matrix. The star-like topography obtained produces very similar distances among individuals providing no novel insights. It has been included in Supplementary Figure S3.

Pearson correlations between pairwise F_ST and the month-average daytime sea surface temperature (SST) were not significant (p = 0.250). The correlation with chlorophyll-a values was likewise uninformative (p = 0.953) That said, the average difference in SST was only 0.9°C (maximum 2.2°C), and the average difference in chlorophyll-a concentration was only 0.14 mg/m³ (maximum 0.35 mg/m³) (Table 6), which may be insufficient changes to prompt genetic differentiation or other stratification.

TABLE 6

Table 6. Differences in SST (below diagonal, in °C) and chlorophyll-a concentrations (above diagonal, in mg/m³) between sample group locations.

BayeScan 2.1 computations identified eight loci at FRD 20% (i.e., three more than in the primary dataset), while HacDivSel runs identified 447 common SNPs among the 36 pairwise population comparisons. Two loci identified by BayeScan 2.1 were also detected by HacDivSel. Individual SNP F_ST values for HacDivSel-identified loci ranged from 0.0007 to 0.9902, with a median of 0.5117. The majority of putative outlier SNPs had an F_ST > 0.7 (n = 291). Using both approaches, 453 putative F_ST outlier loci were detected, and removal of these from the primary dataset generated a selectively neutral subset containing 1,026 SNPs. Significant population pairwise F_ST values for the new neutral dataset ranged from 0.002 (between ID09 and PW08) to 0.0063 (between PG12 and MH09) with a mean of 0.0035. Non-significant population pairwise comparisons were found between GL09 vs. MH09; PG07 vs. PG09; PG07 vs. PG11; PG09 vs. PG11; PG11 vs. PG12; and PG12 vs. PW08 Supplementary Table S8). Significant population pairwise F_ST values for the LUPS dataset ranged from 0.002 (between ID09 and PW08) to 0.0063 (between PG12 and MH09) with a mean of 0.0035. Non-significant population pairwise comparisons were found between GL09 vs. MH09; PG07 vs. PG09; PG07 vs. PG11; PG09 vs. PG11; PG11 vs. PG12; and PG12 vs. PW08 (Table 3B).

Discussion

Current analyses suggest that geographical distribution of the neutral genetic diversity among skipjack tuna in the WCPO is homogenous, and accordingly there is little need to revise extant general assumptions of the population’s health. Comparisons of results between the primary and No Juvenile dataset that excludes very small fish from PW08 show no change in overall trends, minimizing concern for overt age- or size-specific population structure. Likewise, the consistency between PG07, PG09, and PG12 in both the primary and PG-only datasets support a general rule of temporal stability in genetic structure. Furthermore, the distribution of the potentially adaptive divergence shows no change in the trends supporting the null hypothesis of panmixia (Figure 2B). There was also a small, but significant indication of heterozygote deficits or inbreeding in the primary dataset for all samples collected in Papua New Guinea, which may serve as an indicator of undetected sub-structuring at the local level through the Wahlund effect (Garnier-Gere and Chikhi, 2013) or other context that reduces skipjack’s ability to reproduce with genetically dissimilar fish. Collectively, heterozygosity and inbreeding coefficients give little indication of anomalies that could be linked to population structure for both neutral and potentially adaptive loci.

Two exceptions exist that challenge inferences of panmixia from diversity statistics. First, ID09 demonstrates a slight but significant heterozygosity excess that is retained across all datasets, in conjunction to a negative F_IS value. Of all sample locations, Indonesia is the most westerly and positioned at the edge of the ITF, which is a known semi-permeable barrier to multiple marine species (Jackson et al., 2014). It is entirely feasible that the Indonesian skipjack experience slight but significant interbreeding with fish from a distinct, Indian Ocean stock. Second, PG samples carry inbreeding coefficients that are significantly different from zero compared to other sample groups, especially PG11. It is possible that the presence of increased land masses around sample locations in the Bismarck Sea and Solomon Sea, and the consequential high propensities for eddies (Steinberg et al., 2006), insulate the observed region. Significant heterozygous deficits can be due to genotyping artifacts, selection, non-random mating, inbreeding, small effective population sizes, migration and admixture. Fisheries overexploitation has also been associated with heterozygous deficits. For example, Hoarau et al. (2005) found that populations of plaice (Pleuronectes platessa) examined between 1924 and 1960 were in HWE, whereas populations examined after a fisheries peak in the 1970’s were not. The authors were able to attribute the significant heterozygote deficiencies found from 1970 onward to inbreeding and overexploitation. Unfortunately, we do not have access to archived samples to verify if the analyzed skipjack populations were in HWE before the major expansion of tuna fisheries after WWII. Genotyping artifacts are unlikely to be a major cause because the quality filtering steps used in this study are quite stringent and only five loci deviated from HWE in the primary dataset (e.g., Kilian et al., 2012; Lal et al., 2017). Furthermore, specimens from PG sample groups proved fairly robust during quality filtering. Two or three specimens were deleted, compared to seven or eight individuals from FM08 and GL09, respectively. Diversifying selection is unlikely to explain this result because in spite of having detected up to 453 putative Fst outlier loci as potentially adaptive, no evidence of population genetic structure was evident. Finally, the uneven extraction of highly heterozygous and potentially related individuals from sample groups might impact the remaining patterns of diversity. However, without a means to explicitly reject the possibility of sample cross contamination, retaining the suspect specimens would risk introducing severe artifacts into results (Selwyn et al., 2016). The potential presence of kin deserves further exploration.

Interpretations of effective population size are more debatable. Of the available algorithms for calculating N_e, the LD approach is popular because it is precise and does not require temporal resampling to be accurate (Wang et al., 2016b). However, the presence of non-random mating or population structure can skew results. Excluding individuals with excessively close relatives (ECR), like those filtered out prior to locus filtering in the current study, is known to artificially inflate N_e calculated using the LD approach (Waples and Anderson, 2017). However, there is ongoing debates about whether it is more appropriate to refrain from manipulating datasets based on ECR and risk underestimating N_e (Waples and Anderson, 2017), or extract suspect individuals and possibly lose biologically significant population structure while simultaneously overestimating N_e (Wang, 2018). To sidestep the issue in present analyses, samples were also assessed for N_e using the heterozygosity excess method, which is characteristically less accurate but carries different assumptions and vulnerabilities (Wang et al., 2016b).

The current range of N_e values, from roughly 250 to 1100 using the linkage disequilibrium algorithm, could either function as red flags for conservation in Indonesia and the Federated States of Micronesia, or further support robustness of the skipjack population within the WCPO. The basic 50/500 rule (that a population with an N_e under 50 is in immediate danger of diversity loss through inbreeding, and an N_e under 500 is susceptible to long-term loss of evolutionary adaptability and extinction; Franklin, 1980) is a simplified but functional guideline for studies of terrestrial species. If applied to the current results, such parameters would merit management concerns about fishery pressure in at least two EEZs. However, mating dynamics in the ocean are very different to those in terrestrial environments. While a majority of genetically assessed terrestrial species utilize internal fertilization (Baur, 1994), skipjack and other pelagic species rely on broadcast spawning (Hutchings et al., 2002). Among other implications for marine species’ life history, broadcast spawning can produce a distinctive pattern of variable reproductive success described as sweepstakes reproduction (Waples, 2016). In this mechanism, small sub-units of fish broadcast spawn in tandem and in close proximity, producing thousands to millions of larvae that are either half- or full-siblings (Hedgecock and Pudovkin, 2011). Meanwhile, fish that are not effectively positioned or timed to partake in such an event produce no offspring. Consequently, the larger population is made up of many semi-independent cohorts, each of which is derived from only a few parents and has a drastically low N_e. In this context, the N_e values produced by skipjack tuna are far from exceptional. The rationality of a reproductive sweepstakes interpretation is further supported by discrepancy between our N_e estimates and independent calculations of the WCPO’s available spawning stock, which most recently was gauged at roughly 2 MT (Vincent et al., 2019).

In fact, current N_e values further emphasize the limited substructure evident among the skipjack sample groups. Long-term co-dispersal of cohorts produced under reproductive sweepstake conditions creates chaotic genetic patchiness, a very subtle and temporally unstable form of population structure that retains the signature of small N_e per cohort (Broquet et al., 2013), with values as low as 40 in the Pacific oyster, Crasssostrea gigas, in Washington state, United States (Hedgecock, 1994). The more cohorts produced by different parents mix, the larger the local N_e values will become. In the current study, N_e values generated for PG11 epitomize the trend; the sample group with the largest variance in fish length (340–690 mm) produced the largest N_e value regardless of a high inbreeding coefficient. More generally, all sample groups were assigned values at least a magnitude higher than C. gigas values, despite behaviors that would be sufficient to produce a similar pattern of chaotic genetic patchiness. Namely, skipjack express prolific schooling habits (Hilborn, 1991), and mating behaviors that have been observed to include extensive mating displays prior to synchronized gamete release between as few as two fish (Matsumoto et al., 1984). Although it is reasonable to argue that skipjack experience reproductive sweepstakes, this does not translate to a lifelong maintenance of chaotic genetic patchiness.

The results of an AMOVA, which placed effectively 100% of variation between individuals, rather than catch events or sample groups, further substantiate skipjack admixture. Therefore, even a shallow pattern of structure like chaotic genetic patchiness does not effectively describe skipjack tuna’s neutral population genetic dynamics. Mantel tests using two of four purportedly neutral datasets produced significant results, which supports an isolation by distance pattern of structure. However, the strength of the correlation is artificially bolstered by the five statistically not significant pairwise F_ST values among Papua New Guinea sample groups. The PG-joined dataset overcomes the skew in sample distribution, and fails to support isolation by distance.

Other analyses do little to change the trend. Pairwise F_ST analyses, on both neutral and potentially adaptive loci, to indicate potential population differentiation show statistical significance but low absolute values. The relationship between statistical and biological relevance of F_ST values is increasingly disconnected as studies continue to use more loci to explore population differentiation, with yellowfin studies now using up to 11,000 loci (Mullins et al., 2018). The improved genomic sample size lends statistical significance to lower F_ST values, independent of whether they carry implications for population structure. Failure to find significant structure through an AMOVA and population clustering and assignment algorithms (DAPC, Netview R, fastSTRUCTURE, and ADMIXTURE) supports the possibility of pairwise F_ST irrelevance. Alternatively, there is also precedent for low pairwise F_ST values proving relevant. Knutsen et al. (2011) identified similar neutral F_ST values, as low as 0.003, in Atlantic cod. In that species, statistical differentiation aligns with population structure inferred from tag-recapture trends. Strong evidence of adaptive variation through chromosome inversion along the same geographic delineations further supports Knutsen et al.’s neutral observations (Kirubakaran et al., 2016). However, coastal cod are well recognized for their site fidelity (Knutsen et al., 2011), whereas skipjack are notoriously mobile and must swim more than 60 km per day just to facilitate respiration (Sharp, 1978; Wild and Hampton, 1993). It is not clear if the same, already tentative significance can be applied to a species with a different life history.

Incomplete migration offers one possible explanation for the evident but potentially uninformative genetic differentiation found in skipjack. Large-scale movements in Pacific skipjack are neither well documented nor ubiquitous and predictable (Wang et al., 2016a). Based on otolith analyses, some Pacific fish remain residents of the tropics their entire lives, while others make repeated trips into subtropical water (Arai et al., 2005). No consensus exists on what determines which fish leave the tropics, but the large-scale movement can produce stochastic population structure through asynchronous spawning. Since skipjack breed year-round in the tropics (FAO, 2004), when migrating fish are and are not present, cohorts spawned during the different seasons may carry different genetic signatures. Simultaneously, fish that migrate as far north as Japan enter more productive hunting waters and, accordingly, experience a 6.5-fold increase in energy consumption in conjunction to reduced energy output over tropical residents, which creates energy excess to invest in growth and reproduction (Aoki et al., 2017). The resulting advantage likely carries some implication for the distribution of parental input in tropical cohorts produced during and between migrations, regardless of whether it represents biologically relevant population structure.

There is a long tradition of failing to find neutral population structure in skipjack tuna. Ely et al. (2005) failed to find structure on a global scale using mitochondrial haplotypes and RFLPs, and Menezes et al. (2008) concluded that the Indian Ocean was panmictic using microsatellites. While both of these studies use very few loci and acknowledge the possibility that population structure exists in unsampled genomic regions, the current study provides significantly better genomic resolution without challenging the same conclusion. However, it also assesses a much smaller geographical region than either of the mentioned studies. More specific to the region of interest, models of skipjack individual movements continue to emphasize high mobility within Western and Central Pacific regions (Scutt Phillips et al., 2018). In contrast, Menezes et al. (2006) argues for a division in populations between the Indian and Western Pacific Oceans using RFLPs in the mitochondrial D-loop region, and Richardson and Habib (1987) even recommends differentiating between New Zealand and Western Pacific stocks using purportedly neutral allozymes. Likewise, Fujino (1996) specifically recommended the Marshall and Gilbert Islands archipelagos as a semi-permeable barrier to skipjack that move from the Eastern to the Central Pacific during their own summer feeding migration. Again, the current study does not incorporate sample locations that would test the proposed stock boundaries; instead, it is focused strictly in the tropical Western and Central Pacific, which is a smaller sample range than almost any other skipjack genetic study. Although this study represents one of the most thorough population genetic assessments of skipjack to date using a battery of genome wide SNP markers, the high resolution collection strategy was more appropriate to capture fine-scale meta population structure than oceanic level genomic divisions. Moore et al. (2020a) provides an in-depth discussion of considerations when designing sample strategies for genetic assessments of tuna. In short, the current lack of neutral structure determined thorough exploration of the skipjack genome should be extended to a larger study area prior to deriving any regional or global interpretation.

Conclusion

The current study represents the most thorough population genomics assessment of skipjack tuna in their most productive fishing grounds, the tropical Western and Central Pacific Ocean, to date, and fails to demonstrate overt neutral genetic structure. Although we do observe subtle cues of structure, the biological significance is debatable. We recommend broader sampling that applies high-resolution genomic assessment to geographic regions that have previously been highlighted as potentially genetically distinct. Primary candidate locations include the tropical eastern Pacific and the Indian Ocean side of the Indonesian Throughflow. In short, while we show functionally insignificant population structure using high-resolution genomic coverage, we need further studies with larger geographical range before we completely dispel the concept of biologically relevant population sub-structuring of skipjack tuna within the Pacific Ocean.

Data Availability Statement

Raw datasets are publicly available from the Open Science Framework under Digital Object Identifier DOI: 10.17605/OSF.IO/MPE5U. mtDNA sequences are available from GenBank under accession numbers MT645970-MT645989.

Ethics Statement

Ethical review and approval was not required for the animal study because samples were taken from a pre-existing tissue bank and did not involve contact with live animals.

Author Contributions

CR wrote the project proposal and obtained funding from The University of the South Pacific. CR and GA designed the study. GA performed the research, analyzed the data, and wrote the manuscript. BS independently confirmed species identification and analyzed data. ML reviewed data quality and analyzed the data. JH, SN, and NS, as agents of the Secretariat of the Pacific Community (SPC), critically assessed the results from a fisheries perspective. SPC was also responsible for sample collection, facilitated sample access, and prepared Figure 1. All authors contributed in the preparation of the final manuscript.

Funding

The genomic analyses in this work were funded by The University of the South Pacific Strategic Research Themes intramural program project number F1006-R1001-71502-622. Funding for the tuna tissue bank is currently provided by the WCPFC as part of Project 35b.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We gratefully acknowledge all the fisheries observers and research staff who collected tissue bank samples, and Simon Nicol and Valerie Allain for establishing the tissue bank; Sylvain Caillot, as an agent of SPC, for preparing Figure 1; Maria Ines Rico and Francois Roupsard for technical assistance with sample preparation; Anthony Beeching and Feleti P. Teo from the Western and Central Pacific Fisheries Commission (WCPFC) for facilitating and granting access to the Tissue Bank.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmars.2020.570760/full#supplementary-material

Supplementary Figure 1 | DAPCs using neutral loci from the primary (47 principle components, 8 degrees of freedom) and auxiliary datasets after excluding PW08 (no Juveniles; 42 principle components, 7 degrees of freedom), after disregarding all samples except those from Papua New Guinea (PG-only; 25 principle components, 3 degrees of freedom), and after replacing the four PG sample groups with a single subset picked from across all four PG sample groups (PG-joined; 36 principle components, 5 degrees of freedom). Placement of individuals relative to each other reflects their genetic similarity and extent of genetic structure resulting in scatterplots that strongly overlapped all sample groups and thus, confirmed broadscale admixture irrespective of sample selection.

Supplementary Figure 2 | Population genetic structure network analyzed using Netview R at mk-NN = 6–30. Node sizes displayed are mapped to the relatedness of individual fish, where larger node diameters denote higher levels of relatedness.

Supplementary Figure 3 | Population dendrogram based on 1-psa genetic distance matrix and NJ approach.

Supplementary Table 1 | Catch data for sampled individuals, available as spreadsheet “Supplement Sample metadata.xlsx.”

Supplementary Table 2 | Number of neutral and potentially adaptive loci remaining after each quality-filtering step for auxiliary datasets (i.e., no-juveniles, PG-only, PG-joined and LUPS). MAF, Minor Allele Frequency; FDR, False Discovery Rate; HWE, Hardy Weinberg Equilibrium; LD, Linkage Disequilibrium.

Supplementary Tables 3–7 | Comparison of basic genetic analyses between the primary dataset and neutral subsets that were filtered after excluding PW08 (no Juveniles), after disregarding all samples except those from Papua New Guinea (PG-only), and after replacing the four PG sample groups with a single subset picked from across all four PG sample groups (PG-joined).

Supplementary Table 8 | F_ST from auxiliary neutral dataset.

Footnotes

References

Alexander, D. H., Novembre, J., and Lange, K. (2009). Fast Model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664. doi: 10.1101/gr.094052.109

PubMed Abstract | CrossRef Full Text | Google Scholar

Anderson, G., Lal, M., Hampton, J., Smith, N., and Rico, C. (2019). Close kin proximity in yellowfin tuna (Thunnus albacares) as a driver of population genetic structure in the tropical western and central pacific ocean. Front. Mar. Sci. 6:341. doi: 10.3389/fmars.2019.00341