Sample contamination explains evidence of close kin proximity in yellowfin tuna (Thunnus albacares) in the Western and Central Pacific Ocean

Anderson, Giulia; Macdonald, Jed I.; Lal, Monal; Hampton, John; Smith, Neville; Rico, Ciro

doi:10.3389/fmars.2023.1204884

BRIEF RESEARCH REPORT article

Front. Mar. Sci., 11 December 2023
Sec. Marine Molecular Biology and Ecology
Volume 10 - 2023 | https://doi.org/10.3389/fmars.2023.1204884

Sample contamination explains evidence of close kin proximity in yellowfin tuna (Thunnus albacares) in the Western and Central Pacific Ocean

Giulia Anderson^1*

Jed I. Macdonald¹

Monal Lal^2,3

John Hampton¹

Neville Smith¹

Ciro Rico⁴

¹Oceanic Fisheries Programme, Fisheries Aquaculture and Marine Ecosystems Division, Pacific Community (SPC), Noumea, New Caledonia
²Molecular Analytics Laboratory (MOANA-LAB), School of Marine Studies, Faculty of Science, Technology and Environment, The University of the South Pacific, Suva, Fiji
³Australian Centre for Pacific Islands Research, School of Science and Engineering, University of the Sunshine Coast, Maroochydore, QLD, Australia
⁴Instituto de Ciencias Marinas de Andalucía (ICMAN), Consejo Superior de Investigaciones Científicas, (CSIC), Puerto Real, Cádiz, Spain

Anderson et al.’s 2019 paper ‘Close Kin Proximity in Yellowfin Tuna (Thunnus albacares) as a Driver of Population Genetic Structure in the Tropical Western and Central Pacific’ provided observations of genetically related individuals among sampled yellowfin tuna. Resampling of some individuals from the original study produced very different sequencing results when compared with the original dataset, one possible explanation of which is cross-contamination among the original samples. Re-analyses produced no indication of strong population structure, including a very slightly and consistent heterozygosity deficit with only one statistically significant pairwise F_ST value of 0.002. We identified only one pair of genetically similar individuals that could still be an artefact of lingering contamination. These new results therefore support a highly connected population of yellowfin tuna in the study area, although we do not rule out the potential for local adaptation driven by non-observed loci or genetic sub-structure operating at more contemporary scales.

1 Introduction

Fisheries science and management has the potential to benefit significantly from incorporating genetic research in its toolbox. With the advent of next generation sequencing (NGS) in the 2000’s, which made it feasible to apply direct sequencing techniques on non-model species (including most commercially important fishes), the number and power of fisheries genetic and genomic applications has expanded rapidly. Today, it is possible to test an individual fish for species identification, provenance and age (Cusa et al., 2021; Hallerman, 2021; Mayne et al., 2021), to explore stock structure (Grewe et al., 2015; Brophy et al., 2020), estimate abundance (Bravington et al., 2016; Ruzzante et al., 2019), and to monitor ecosystems biodiversity through DNA metabarcoding (e.g. Oka et al., 2020). The variety of questions that can be addressed with confidence through the analysis of genome wide markers has contributed to an increasing recognition of the value of incorporating genetics into fisheries management in recent years (Bernatchez et al., 2017; Moore et al., 2020).

However, genetic assessments come with some challenges and limitations. Perhaps the most common challenge is ensuring DNA integrity during sampling and laboratory handling. The presence of cross-contamination in population genetic research can have a significant impact on the accuracy of results by artificially increasing the observed heterozygosity of individuals (Anderson et al., 2023). Should such data be incorporated into stock assessment without reservation, it could severely misguide management decisions. One example is Anderson et al.‘s now-retracted 2019 paper ‘Close Kin Proximity in Yellowfin Tuna (Thunnus albacares) as a Driver of Population Genetic Structure in the Tropical Western and Central Pacific’, which reported the observation of individuals within sample groups that were genetically so similar that they were possibly siblings. These results reopened discussion about the life history of the species in the Western and Central Pacific Ocean (WCPO). However, a re-analysis of the Anderson et al. (2019) dataset indicates that the original samples was cross-contaminated, and that the sequencing results were consequently compromised.

We present here a revised interpretation of the Anderson et al. (2019) study that incorporates new and higher-confidence sequencing data based on amended laboratory handling protocols and improved quality filtering protocols. In revisiting this study, we have two main goals: 1) to correct the record regarding the retracted study and to re-address the same question: is there fine- and/or meso-scale population structure among yellowfin tuna in the WCPO? And 2) to demonstrate the importance of vigilance against cross-contamination in NGS datasets.

2 Methods

Research was conducted in two stages. First, in 2016, a selection of yellowfin tuna samples was accessed from the Pacific Marine Specimen Bank (PMSB), which is managed by the Pacific Community (SPC) under the auspices of the Western and Central Pacific Fisheries Commissions (WCPFC). Samples were selected based on three criteria: 1) that individuals in each group of 30+ specimens were collected within six weeks and a 330 km radius from each other, 2) that groups from three locations had been collected in the same season, and 3) that three groups came from the same EEZ and season, but in different years. The 279 selected specimens, ranging in size between 28 and 140 cm fork length, were thus organized into seven sample groups across six years and four countries (Figure S1). These groups were from the Federated States of Micronesia in 2009 (FM09) and 2011 (FM11), the Gilbert Islands of Kiribati in 2009 (GL09), New Caledonia in 2014 (NC14), and Papua New Guinea in 2009, 2011, and 2013 (PG09, PG11 and PG13, respectively).

Frozen muscle tissue samples from the PMSB were subsampled for genetic analysis by extracting a 2mm³ volume from the interior of each tissue plug using a scalpel that was sterilized with ethanol and flame exposure after any cut that contacted the sample surface. The subsamples were then sent to Diversity Arrays Technology (DArT PL) in Canberra, Australia, for DNA extraction and de novo genetic sequencing using the DArTseq protocol. The sequencing process includes some proprietary steps but was described by DArT PL as follows (see also Sansaloni et al., 2011; Kilian et al., 2012; Ren et al., 2015).

Following automated DNA extraction, samples were digested using PstI and SphI restriction enzymes. Methylation-sensitive enzymes were chosen to avoid highly repetitive, methylated genomic regions that are minimally informative and which tend to carry an elevated risk of misinterpreting paralogs as a single locus during marker calling. Specialized adaptors were ligated to digested DNA. Both PstI and SphI adapters included a PCR primer sequence and Illumina flowcell attachment sequence, and the PstI adaptor also included a unique, varying length barcode sequence for sample recognition within pooled libraries. PCR selectively amplified fragments capped with both adaptors, using the following protocol: 1 min denaturation at 94°C, 30 cycles of 20 s at 94°C, 30 s and 58°C and 45 s at 72°C, and a final extension step of 7 min at 72°C. Libraries were then further amplified using bridge PCR on the Illumina HiSeq 2500 platform and sequenced. The resulting data was submitted to an in-house software, DArTsoft, which interprets sequences from images of fluorescence taken during Illumina sequencing and produces FASTQ files. Files were quality controlled for sequences with 90% confidence at 50% of bases, and split by barcode into individual specimens. Sequences were aligned de novo. A separate algorithm, DArTsoft14, called SNPs and further quality filtered for singletons and other suspected sequencing errors. The final output produced by DArT PL was a genotype report of all identified SNPs, their global call rate, polymorphic information content, and their co-dominant status in each sequenced specimen.

The resulting data was published in the now retracted paper by Anderson et al. (2019).

The second stage of research began in 2021. As part of a larger exploration of genetic cross-contamination in the context of fisheries science, 94 samples from the Anderson et al. (2019) dataset were re-sequenced. The selected samples included all individuals that were identified as part of half- or full-sib pairs in the Anderson et al. (2019) paper, plus a random selection of other individuals to capture the impact of resampling on specimens that were not involved in the kin-versus-contamination discussion. This time, subcutaneous subsampling was conducted by a staff member at SPC who sterilized cutting tools with 10% bleach followed by a Milli-Q water wash between incisions, rather than detergent wash followed by ethanol and flame exposure. This procedure has been validated as a method to recover contaminated samples using experimentally generated contaminated samples (Anderson et al., 2023). Samples were otherwise handled the same way as per Anderson et al. (2019) and processed using the same methods, except that DArT PL has since upgraded to an Illumina Novaseq platform. The new samples were co-analysed with the 2016 dataset during SNP calling. Data as returned by DArT PL is available at DOI 10.17605/OSF.IO/YRFM3 (https://osf.io/yrfm3/).

Returned data was further processed to replicate the Anderson et al. (2019) methods, but subjected to additional tests for cross-contamination. First, the re-sequenced data was compared against the original data from the same specimens and the version with lower observed heterozygosity selected for further use, since elevated observed heterozygosity is a common indicator of cross-contamination (e.g. in Petrou et al., 2019). Subsequently, the dataset was filtered for low quality loci using the same thresholds as Anderson et al., 2019. All but the locus with the highest call rate per contig was discarded, plus loci with call rate below 99%, heterozygosity above 50%, read depth less than 7x, or minor allele frequency less than 5%. Further, loci were tested for compliance with Hardy Weinberg Expectations (HWE) at p > 0.0001, linkage disequilibrium at < 60%, and departure from neutrality with respect to local selective pressures at a false discovery rate of 0.1. Tests for HWE and linkage disequilibrium (LD) were conducted with PLINK 1.9 (Chang et al., 2015) using the commands ‘—hardy’ and ‘–r2’, respectively. Loci under potential selection were identified using both OutFLANK v 0.2 (Whitlock and Lotterhos, 2015) and BayeScan v 2.1 (Foll and Gaggiotti, 2008) with a burn in of 50000 steps and prior odds of 10, followed by 20 pilot runs of 5000 iterations.

Alternative filtering regimes were also explored with heterozygosity cutoffs reduced to 90%, read depth increased to 10x, and a minor allele frequency reduced to 1.8% (equivalent to an allele being present 5 times in the dataset), but these datasets were less successful at flagging remaining contaminated individuals and did not significantly change downstream analyses. Basic results using these thresholds during final filtering are available in the Supplementary Materials.

It should be noted that methods exist to filter contaminated loci and individuals at this stage. The program ART-DeCo (Fiévet et al., 2019) interprets an unbalanced allelic ratio per heterozygote loci as contamination, and O’Leary et al. (2018) encourages use of the same filter to address a number of potential data quality issues. The present dataset does not include the allelic balance raw data necessary for such analysis. We therefore proceeded with filters more in line with those described in Guo et al. (2014), which include heterozygosity rate and genetic similarity between individuals. Namely, after the filtering described in the previous paragraph, all specimens were again assessed for missing data and individual heterozygosity, the latter calculated as the mean heterozygosity for that individual across all the loci surveyed. First, specimens with more than 20% missing data were removed from further analysis. This threshold was selected because of a clear break in data quality, with samples either demonstrating less than 20% missing data or more than 40%. Second, we elected to set a threshold for removal due to high heterozygosity as the median individual heterozygosity value (calculated across all specimens) plus the difference between median and minimum values. This calculation was preferred over z-score assessments (recommended by Anderson et al., 2023), after a Shapiro-Wilk test in the ‘stats’ package in R (R Core Team, 2023) indicated that the distribution of individual heterozygosity values was not normal and therefore parametric assessments were not appropriate. Our approach was not subject to parametric assumptions about distributional spread, and produced similar, though slightly more conservative thresholds for filtering (i.e. 0.2649) compared with approaches using z-scores (0.2884, stabilized after five iterations) or using the mean individual heterozygosity in the calculation instead of the median (0.2869). After extraction, the remaining 247 individuals were filtered again using the same quality metrics and thresholds as during preliminary filtering. These steps produced a ‘final’ genotype dataset for further analysis.

We again applied the Shapiro-Wilk test to test for normality of the heterozygosity distribution in this final dataset.

Downstream population genetic analyses replicated those conducted by Anderson et al. (2019). The inbreeding coefficient (F_IS) and adjusted expected and observed heterozygosity (H_nb and H_o) values were obtained for each sample group using GENETIX v. 4.05 (Belkhir et al., 2004), while pairwise F_ST (Weir and Cockerham, 1984) was calculated using Arlequin v 3.5.2.2. ADMIXTURE v. 1.3.0 (Alexander et al., 2015) was used to recommend the number of independent genetic clusters (k) among all sampled individuals. Analyses were run with hypothetical k values ranging from 1 to 10, and the optimal k value was selected based on a low coefficient of variance. Discriminant Analyses of Principal Components (DAPC) in the ‘adegenet’ R package v 2.1.10 (Jombart and Ahmed, 2011) was first applied to infer cryptic structure within the dataset using k-means clustering to determine the number of underlying groups. DAPC was also applied using the sampling units as a priori groups to visualize the heterogeneity among geographic populations and sampling years. In both analyses, cross validation was used to determine the number of principal components retained.

Additionally, in keeping with the original study, we explored relatedness using three software programs, RelateAdmix (Moltke and Albrechtsen, 2014), KING v 2.3.1 (Manichaikul et al., 2010) and COANCESTRY v. 1.0.1.7 (Wang, 2011). The algorithm employed in RelateAdmix maximizes its accuracy when working with admixed populations, which is one plausible population model for tuna. Meanwhile KING employs an algorithm that is robust to unknown population structure. COANCESTRY allows for the consideration of inbreeding, which was a relevant feature for some sample groups in the original study. RelateAdmix requires a priori information about general genetic clustering, which was recycled from ADMIXTURE. The output.P and.Q files from the selected ADMIXTURE analysis (k=1) were submitted to RelateAdmix, and output values used to calculate θ = (k₁/4) + (k₂/2), where k₁ and k₂ are respectively the probabilities of sharing one or two alleles from a shared ancestor, and θ is the expected fraction of two genomes that is identical by descent. KING was run via command “related” and produces the kinship coefficient φ, plus a synthesis of dyads that are related in the first, second and third degree. COANCESTRY was run using the TrioML algorithm, which was empirically demonstrated to best describe the relatedness coefficient (r) among the seven algorithms available in that software package after conducting preliminary simulations as described in the COANCESTRY user manual.

RelateAdmix and COANCESTRY both produce relatedness coefficients, but no recommendation on how to differentiate full sibling pairs from half siblings and more distantly related individuals. To determine cut-off values for categorizing relatedness coefficients, we again used COANCESTRY to simulate 200 dyads each of full sibling, half siblings, first cousins, and unrelated individuals based on the same allele frequencies as the original population samples. The simulated genotypes were then submitted to both COANCESTRY and RelateAdmix, and results used to calculate the lower 95% confidence interval of the mean of each group, assuming a normal distribution. These values became the cut-offs for delineating full- and half-siblings in empirical datasets in the respective software programs.

3 Results

Of the 94 specimens that were resampled in 2021, 73 were selected to replace existing data in the original dataset. The updated dataset including all 279 non-duplicated individuals was filtered and retained 1,417 of 63,351 loci (Table 1). This dataset was used to assess individuals for missing data and high heterozygosity. The Shapiro-Wilk test conducted on the H_o distribution at this stage produced W = 0.519 and p < 2.2e-16, and the associated skewness coefficient was 4.77, indicating marked divergence from normality (Figure S2). Four individuals were then removed due to missing data, and another 29 due to high heterozygosity. Subsequently, filtering again with the same thresholds but only 247 high quality individuals, 3,781 loci passed all quality checks (Table 1).

TABLE 1

Table 1 Number of loci retained through quality filtering process for initial data exploration (‘Initial’) and the final dataset (‘Final’).

The Shapiro-Wilk test on the H_o distribution for these remaining 247 individuals produced W = 0.940 and p = 1.72e-8, with skewness coefficient 0.50, (Figure S2). Although the continued non-conformity to a normal distribution suggests that our final genotypic dataset is not entirely free of all contamination, based on the relationship between contamination and inflated heterozygosity demonstrated in Petrou et al. (2019) and Anderson et al. (2023), the substantial reduction in the amount of skewness suggests that cross-contamination is likewise much lessened. We therefore proceeded with a reassessment of the yellowfin tuna population structure results published in Anderson et al. (2019).

Observed and adjusted expected heterozygosity values (H_o and H_nb) and inbreeding coefficients (F_IS) for each sample group are summarized in Table 2. All sample groups feature H_nb values of 0.26-0.27 and a slight heterozygosity deficit of 0.01-0.02 (on a scale of 0 to 1). Likewise, there was a small, yet consistent indication of inbreeding (0.05-0.07 on a scale of -1 to 1). The fixation index F_ST is presented in Table 3 and showed only values of 0.001-0.002 (on a scale of 0-1). Furthermore, only the comparison of FM11 and PG13 was statistically significant at p < 0.05 after Benjamini and Yekutieli (2001) FDR correction, and the corresponding F_ST estimate was only 0.002. K-means clustering recommended a k of 1, indicating strong evidence for the existence of only one underlying population. Similarly, the DAPC results with a priori groups showed minimal differentiation among sample groups (Figure 1).

TABLE 2

Table 2 Number of samples in the final dataset (n, followed by the number of samples extracted during filtering in parentheses), adjusted expected heterozygosity (H_nb), observed heterozygosity (H_o) and inbreeding coefficient (F_IS) produced by the final dataset.

TABLE 3

Table 3 Fixation index (F_ST) values below the diagonal and related p-values above the diagonal.

FIGURE 1

Figure 1 Discriminant analysis of principal components using 60 principal components, 6 discriminant functions, and a priori groupings. A synthesized distribution of specimens reflecting their genetic similarity and coloured according to group.

A full, side-by-side comparisons of these results to data from the original 2019 paper is available in the Supplementary Materials (Tables S1-S5, Figure S3) to demonstrate the changes brought about by resequencing and the updated tissue extraction procedures and data filtering steps. For example, heterozygosity excess of the FM09 and FM11 sample groups dropped from 0.07 to -0.015, and F_IS increased from -0.255 to 0.065.

Finally, there was strong agreement between relatedness analyses on the (minimal) presence of kin among specimens. COANCESTRY, RelateAdmix and KING all identified the same, lone pair of individuals showing a relatedness coefficient within the range of simulated half sibs in the entire dataset, along with one or two cousin-level pairs. COANCESTRY and RelatedAdmix also reported several thousand dyads with relatedness coefficients lower than the threshold for cousin pairs, but more than would be expected from unrelated individuals (specifically, 8399 dyads using COANCESTRY and 5727 using RelatedAdmix), while KING identified 86 such dyads. Direct comparisons of these results with the results of Anderson et al. (2019) are available in Table S5 in the Supplementary Materials.

4 Discussion

The results presented here, in contrast to the conclusions of the retracted paper (Anderson et al., 2019), leave little evidence of strong genetic population structure for yellowfin tuna across our study region in the WCPO. Previous studies that generated levels of genetic divergence comparable to those inferred in this work have been hesitant to infer biologically significant structure patterns (e.g. Gonzalez et al., 2008). The observed heterogeneity and deficit in heterozygotes can most likely be attributed to transient ecological assemblages (Nikolic et al., 2020), which our sampling featuring spatially and temporally restricted collections was well designed to highlight. The findings of this study are consistent with expectations for a tropical tuna species like yellowfin tuna, which is highly mobile and widely distributed. Previous research has shown that adult yellowfin tuna populations exhibit genetic structuring at broad spatial scales (e.g. [Appleyard et al., 2001; Grewe et al., 2015; Pecoraro et al., 2018; Evans et al., 2019; Grewe et al., 2019; Proctor et al., 2019]; and reviewed in [Moore et al., 2020; Hamer et al., 2023]). The structure patterns described in these studies may be influenced by factors such as incomplete migration and the presence of semi-permeable oceanographic barriers (Pecoraro et al., 2018).

The lack of genetic variation we observed in adults and sub-adults from across the study region is also consistent with reproductive studies and historical larval collections that suggest that yellowfin tuna can spawn year-round across a large, continuous area in tropical waters of the WCPO, leading to high potential for mixing during early life stages (Nishikawa et al., 1985; Schaefer, 2001; Reglero et al., 2014; Muhling et al., 2017; Ijima and Jusup, 2023). Small juvenile yellowfin tuna (i.e. ~20-40 cm fork length) are regularly caught in purse seine, pole and line and ringnet fisheries throughout the equatorial and tropical WCPO, from Indonesia to Hawaii and from the Solomon Islands to south of Japan (Hamer et al., 2023; Macdonald et al., 2023). Taken together, these findings indicate that juvenile yellowfin tuna are widespread across our study region, consistent with the widespread larval distribution. No discrete (in space or time) spawning or nursery areas have been identified to date in this region. That said, genetic assessments of larval or early juvenile stages may reveal more structure due to limited mobility and/or mixing opportunities during those life phases. Indeed, an excess of sibship was found among 5-11 day old larvae of the related Atlantic bluefin tuna samples by McDowell et al. (2022). Further representative sampling of larvae and juveniles has long been identified as a research need to improve our understanding of tropical tuna population structure, and plans are afoot to obtain larval samples during upcoming collaborative research cruises.

While the steps taken in this work to address cross-contaminations have improved the reliability of the corrected dataset, caution should still be exerted during interpretation of the results. The present study implemented enhanced protocols for sample handling and post-sequencing checks, resulting in a significant reduction in sample cross-contamination compared with the Anderson et al. (2019) dataset. However, it is probable that some traces of contamination persist in the data. This hypothesis is supported by the non-normal distribution of H_o (depicted in Figure S2) and the correlation between high heterozygosity with elevated genetic similarity in relatedness assessments, in line with indicators of cross-contamination proposed by Guo et al. (2014). In all three evaluations of relatedness, twelve out of the thirteen dyads with the greatest relatedness coefficients consist of at least one individual whose heterozygosity is above the 90^th percentile. The remaining dyad consists of two individuals with heterozygosity above the 75^th percentile. The dyad exhibiting the strongest relatedness coefficient in which both individuals that fall below the 75^th percentile is related at the 4^th degree (e.g. half-first cousins), a degree of similarity more likely to arise naturally and not through contamination effects. The sole statistically significant pairwise FST value could potentially be compromised considering that the two groups involved in the comparison (FM11, PG13) contained a substantial proportion of individuals that were removed from the dataset during filtration based on our criteria to infer contamination.

Bioinformatic pipelines such as those developed by Simion et al. (2018) and Fiévet et al. (2019) are available to identify and filter out compromised loci and individuals from next generation sequencing datasets. These methods require sequencing read processing and SNP calling to be done in house. This was not possible in the present study, which sought to remain compatible with the Anderson et al. (2019) dataset by utilising DArT PL services to process raw data through SNP calling and to undertake preliminary quality filtering. Regardless, given the strength of evidence against the presence of strong population structure as shown by our new data, we believe that additional filtering steps are unlikely to alter this overall conclusion.

Some discussion is warranted on the likely source of contamination events in the original dataset presented by Anderson et al. (2019). Handling of samples always introduces the risk of contamination, even by experienced field and laboratory staff. Even professional sequencing labs can be a potential source of contamination (Ballenghien et al., 2017). The use of ethanol and flame treatment during the sample preparation phase was also possibly a weak point, since ethanol does not denature DNA and heat is likewise not recognized as a highly effective treatment (Wang et al., 2014). However, if this was the sole source of contamination, then samples would likely have been compromised at the same rate across groups. Instead, the concentration of contamination among Micronesian samples from FM09 and FM11 suggests that local deviations in onboard or onshore sampling protocols may have increased the rate and intensity of contamination in those groups. Situations like this directly highlight the need for the development of standard operating procedures for genetic sampling (Anderson et al., 2023) and effective communication of these SOPs among fisheries observers and other biological samplers. Notably, the success of resequencing in dramatically reducing contamination rates in the present study further validates the subsampling protocol proposed by Anderson et al. (2023) as an effective means of recovering compromised tissue samples. The approach is applicable to samples obtained either at sea or in port, in cases where standard operating procedures for genetic sampling were not adhered to, where the history of sample handling is unknown, or when tissue collection was primarily intended for non-genetic purposes (e.g. muscle stable isotope analysis – Lorrain et al., 2020).

Finally, it must be acknowledged that this study is still only assessing a subset of de novo SNP loci from the yellowfin tuna genome, and only assesses adults and sub-adults. There remains the potential for different conclusions using different loci, including alternative neutral loci and adaptive loci (especially given ongoing efforts to sequence full reference genomes for the various tuna species), different life stages, and different data types. Evidence of population genetic structure among yellowfin tuna at a Pacific-wide scale has been provided (e,g. Grewe et al., 2015; Pecoraro et al., 2018). However, we note no additional peer reviewed nuclear genome research has been published on yellowfin tuna since 2019. We encourage continued exploration of the population dynamics and genetic structure of yellowfin tuna both at the local and ocean-basin scale, particularly through collaborative assessments, ideally combining inference from multiple data sources (e.g. genetics, otolith chemistry and shape analysis, tagging, and modelling– e.g. Sakamoto et al., 2019; Taillebois et al., 2021). We also encourage ongoing efforts to publish reference genomes for yellowfin tuna and the pursuit of further larval and young juvenile samples.

5 Conclusion

We revisited data published in 2019 that indicated population genetic structure among yellowfin tuna in the WCPO. By resequencing a suspect part of the dataset, we found that the results were not reproducible. Instead, our new results indicate a lack of evidence for genetic population substructuring across the study area spanning Papua New Guinea to the Gilbert Islands, Kiribati, over six years of sample collection. The most likely explanation is that the original dataset was compromised by cross-contamination, which was reduced through the amended laboratory handling and data filtering protocols used in the present study.

It is possible that population structure exists but was not observed in this study, in particular local adaptation driven by loci under selection not captured by our genotyping assay. Structure may also be revealed when sampling alternative life stages as shown in other tunas. Finally, population structure can occur at time scales that are important to management processes but not measurable by genetic processes. We therefore encourage continued exploration of yellowfin tuna population structure using a diverse array of methods. We also highlight the need for the refinement and roll-out of sampling standard operating procedures for fisheries genetics and genomics applications that reduce the potential for sample cross-contamination from the point of collection to sequencing.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://osf.io/yrfm3/.

Ethics statement

Ethical approval was not required for the study involving animals in accordance with the local legislation and institutional requirements because samples are preexisting from a tissue bank and were not collected for this project.

Author contributions

Conceptualization: GA, JM, ML, CR. design and methodology: GA, JM, ML, CR. data collection: GA, CR. analysis of the findings: GA, JM, JH, NS. drafting of the manuscript: GA, JM. All authors contributed to the article and approved the submitted version.

Funding

Funding was provided by the Western and Central Pacific Fisheries Commission (Project 35b) and the European Union “Pacific-European-Union-Marine-Partnership” Programme (agreement FED/2018/397-941). This publication was produced with the financial support of the European Union and Sweden. Its contents are the sole responsibility of the authors and do not necessarily reflect the views of the European Union and Sweden.

Acknowledgments

We thank the Pacific Marine Specimen Bank for repeated access to samples; Ines Rico and staff from both the Pacific Community and the University of the South Pacific for supporting sample and data handling in both 2016 and 2023; and Diversity Arrays Technology for exceptionally providing a comparative dataset that incorporated DNA sequences from 2016 and 2023.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmars.2023.1204884/full#supplementary-material

References

Alexander D., et al. (2015) Admixture 1.3 Software Manual. Available at: http://www.genetics.ucla.edu/software/admixture/admixture-manual.pdf.

Google Scholar

Anderson G., Lal M., Hampton J., Smith N., Rico C. (2019). Close kin proximity in yellowfin tuna ( Thunnus albacares ) as a driver of population genetic structure in the tropical Western and Central Pacific Ocean. Front. Mar. Sci. 6. doi: 10.3389/fmars.2019.00341

BRIEF RESEARCH REPORT article

Sample contamination explains evidence of close kin proximity in yellowfin tuna (Thunnus albacares) in the Western and Central Pacific Ocean

1 Introduction

2 Methods

3 Results

4 Discussion

5 Conclusion

Data availability statement

Ethics statement

Author contributions

Funding

Acknowledgments

Conflict of interest

Publisher’s note

Supplementary material

References

People also looked at