Genetic structure and local adaptation of Neptunea cumingii crosse populations in China based on GBS technology

To identify the genetic characteristics and local adaptation mechanism of the snail Neptunea cumingii in different sea areas of China, specimens from six coastal areas of the Yellow Sea and Bohai Sea were collected. Simplified genome technology was used to study the population genetic structure and genetic diversity level of N. cumingii and to infer the genetic variation pattern of environmental adaptation of this species. In total, 1992 discrete loci with high quality were obtained used for population genomics analysis. The observed heterozygosity was 0.1551–0.1612, and the expected heterozygosity was 0.1064–0.1117. Nucleotide diversity ranged from 0.1120 to 0.1241, and fixation index values ranged from −0.04683 to −0.02041. A total of 330 discrete loci were screened based on two fixation index values and a method associated with environmental factors. Functional annotation showed that the genes of discrete loci were involved in the three major functions of cell composition, biological process, and molecular function, including growth and development and cell metabolism and catalytic activity. These results suggested that different populations of N. cumingii had loci that may be related to local adaptation. The results of this study helped to clarify the level of genetic diversity and the germplasm genetic background of N. cumingii. They also provided information about the genetic mechanism of environmental adaptation of N. cumingii that can be applied to the restoration and management of N. cumingii resources.


Introduction
The marine snail Neptunea cumingii (Gastropoda, Probranchia, Neptunea, Mothidae) is a large cold water species that is mainly distributed in the Yellow Sea and Bohai Sea in China (Cai, 2001;Guo et al., 2015). It is a high value mariculture species due to its delicious taste and because it is rich in nutrients, including a variety of trace elements, unsaturated fatty acids, and various amino acids (Hao et al., 2016).
Previous studies of N. cumingii mainly focused on fishery resources (Guo et al., 2015), reproduction and development (Yu et al., 2019), and its breeding physiology and ecology (Ge et al., 2022;Zhang Q. H. et al., 2022;, and little is known about its genetic diversity. Sui (2008) used microsatellite labeling technology to study the population structure of N. cumingii in the waters of Liaoning and Shandong in China, the results showed that the observed heterozygosity ranged from 0.173-0.865, the genetic similarity coefficient ranged from 0.858 to 0.992, the genetic distance ranged from 0.008 to 0.164, the similarity between the populations was great with the F ST value 0.006-0.188. Which indicated that there was differentiation among different populations, but the differentiation was small. Wang (2018)studied the genetic diversity of the mitochondrial Cytochrome b (Cytb) gene in six populations of N. cumingii from the Yellow Sea and Bohai Sea. He found that intra-population differentiation was dominant and genetic differentiation was not obvious. Azuma et al. (2011)used microsatellite technology to study specimens from six geographic populations from Hokkaido, Japan, who suggested that the limited gene flow might be caused by the species' poor dispersability, low genetic diversity and genetic differentiation between populations. Ma P. et al. (2022) and Ma C. L. et al. (2022) constructed microsatellite markers of snail populations in Dalian, Lushun and Weihai, and identified their genetic diversity by Illumina high-throughput sequencing and capillary electrophoresis. In recent years, few studies focused on the genetic diversity and genetic differentiation of N. cumingii population, and the existing studies mainly used microsatellite technology, which could not be used to conduct in-depth and accurate genetic analysis of N. cumingii populations.
The rapid development of high-throughput sequencing technology in recent years has resulted in unprecedented innovation in population genomics research. Simplified genome sequencing can identify a large number of mutation sites. Its relatively low cost is suitable for large-scale population studies, and it is often used for genetic mapping, genome-wide association analysis, and population genetic analysis (Emerson et al., 2010;Hohenlohe et al., 2010Hohenlohe et al., , 2011. Genotyping-by-sequencing (GBS) is a simplified genome analysis technique based on high-throughput second-generation sequencing. Restriction enzymes are used to segment and sequence the genome to obtain high-density single nucleotide polymorphisms (SNPs) throughout the genome, which can be used for genetic characterization analysis (Nimmakayala et al., 2014). The type II enzymes selected by GBS technology can well avoid the methylated part and determine which tags can be sequenced, especially for the species without reference genome at present, or species with highly repetitive DNA sequences and low polymorphism level in the genome have more advantages. Nowadays, using sequencing genotyping (GBS) technology to study the genetic structure of fish is becoming very mature. Létourneau et al. (2018) predicted the genetic impact of stocking in Brook Charr (Salvelinus fontinalis) by combining RAD sequencing and modeling of explanatory variables. Rougemont et al. (2019) used GBS to assess the effects on genetic structure and allelic diversity of indigenous species in a mass stocking of more than 1 million Muskellunge (Esox masquinongy) in the lower Lawrence River. Yu et al. analyzed the genetic structure and genetic diversity of two populations of Crassostrea gigas from the Bohai Sea and the Yellow Sea using GBS simplified genome sequencing technology. Which showed that although the genetic diversity of the Bohai Sea population was rich, both populations were in the state of heterozygote deletion. The population was moderately genetically differentiated. However, there were few reports on the genetic structure of shellfish using GBS technique.
In this study, GBS technology was used to study the genetic structure characteristics of six natural populations of N. cumingii in the Yellow Sea and Bohai Sea. From the perspective of the whole genome, the genetic diversity of the populations was systematically measured, the genetic structure of the population in the Bohai Sea and Yellow Sea was identified, and the genetic mechanism responsible for adaptation of different geographic populations to the local environment was evaluated. The results of this study provided an important reference for the conservation and management of N. cumingii genetic resources.  Table 1). They were placed in six 500 L tanks according to their sampling location for temporary culture for 7 days, and 50% of the water volume was changed every day.

Genomic DNA extraction
The abdominal muscle tissue (≥100 mg) from 90 specimen was removed, frozen in liquid nitrogen, and placed in a centrifuge tube for DNA extraction. Genomic DNA was extracted using a salting out protocol (Sunnucks and Hales, 1996). For nuclear SNP marker development, we performed DNA quality assurance, including a restriction enzyme digest using EcoRI (New England Biolabs, Ipswich, MA, United States; Online Resource 1). Finally, we measured DNA concentrations spectrophotometrically (NanoDrop 1000, Thermo Scientific, Waltham, MA, United States) and verified DNA integrity electrophoretically using agarose gel electrophoresis (0.8% agarose in 1× TBE buffer).

GBS library
Sequencing libraries were prepared according to the GBS protocol following (Elshire et al., 2011), except the selective primers described below were used. Single-end sequencing was performed on a single lane of an Illumina Genome Analyzer II (at the McGill University-Génome Québec Innovation Center in Montreal, Canada) for the initial step of the method using a 48-plex GBS library, of which eight barcodes were devoted to soybean Set A. Subsequent work was performed on an Illumina HiSeq2000 system (San Diego, CA, United States).

SNP detection and screening
SAMTOOLS (Li and Durbin, 2009) software was used to detect SNP in the filtered raw data. The filtering criteria were as follows: individual depth = 4, deletion rate miss = 0.5, and minimum allele frequency min/max autocorrelation factors = 0.01. Bayesian model was used to detect polymorphic loci in populations. Finally, SNP sites were screened, and the criteria were: SNP quality value ≥30 (Q30); SNP reads support number ≥ 20; Individual coverage rate ≥ 66.7%; Frontiers in Ecology and Evolution 03 frontiersin.org Minimum allele frequency (MAF) ≥5%; This SNP is retained when the allele frequency of a population is >0.2.

Genetic analysis
The evolutionary tree was constructed using the neighbor-joining method. After SNP detection, the individual SNPs obtained were used to calculate the distance between populations. Treebest-1.9.2 software was used to calculate the distance matrix. GCTA software was used for principal component analysis (min/max autocorrelation factors ≥0.05). Observed heterozygosity (H O ) and expected heterozygosity (H E ) per sampling location were calculated using the R package "hierfstat. " An analysis of molecular variance (AMOVA) was performed in Arlequin v.3.5 using 10,000 permutations to estimate F-statistics in order to detect population genetic partitioning between sampling locations and to estimate nucleotide diversity (Pi). Pairwise differences between localities were calculated with Nei's estimator of fixation index (F ST ) in the "hierfstat" R package; 999 permutations were used to calculate the respective p-values with B-Y false discovery rate (FDR) correction. The Bayesian information criterion was used to detect the optimal number of genetic clusters. PLINK was used to analyze population structureFirstly, PLINK's input file, Ped file, is created, and then admixture (Hu et al., 2019) software is used to construct population genetic structure and population lineage information. PLINK. 1 1 http://pngu.mgh.harvard.edu/~purcell/plink/  Lositan, BayeScan, and BayeNV were used to select the discrete sites from the filtered high-quality SNPs. Lositan and BayeScan software were used to screen the discrete sites based on F ST . Lositan analyses were performed with 1,000,000 simulations and a subsample of 50 in an infinite allele model using a "neutral" mean F ST (which means that the software performs a preliminary simulation to remove potentially selected loci before computing the mean neutral F ST ). Benjamini-Yekutieli FDR corrections were applied to the p-values of Lositan with an overall alpha level of 0.01. BayeScan search was implemented with 150,000 iterations, with a burn-in period of 100,000 and 40 pilot runs of 10,000 iterations each. Outlier loci in BayeScan were identified using a q-value of 0.01. BayeNV, which is associated with SNPs and environmental factors, allowed discrete sites associated with local adaptation to be further screened. In this process, the Markov Chain Monte Carle estimates the covariance matrix and then screens for discrete sites associated with environmental variables using a Bayesian factor, and Bayesian factor screening for discrete sites associated with environmental variables and an F ST discrete SNP marker was the adaptive locus.
The sequences of the candidate loci for local adaptation. The scaffold accession number, nucleotide position, name of the gene where the sequence is located, and position were recorded for matches with E-values <1 × 10 −4 . Those loci then subjected to BLAST2GO functional annotation analysis using the Gene Ontology (GO) database.
In this study, four geographical environmental factors were analyzed: latitude and longitude for each sampling point and the average sea surface water temperature for each sampling point 30 days and 90 days before sample collection. The water temperature data were downloaded from the national weather service satellite information database. 2 3. Results

Sequencing quality and SNP detection
Genotyping GBS sequencing results showed that 15 individuals in the HS group, 14 in the LS group, 15 in the PL group, 15 in the WH group, 14 in the YT group, and 8 in the ZZ group were sequenced successfully and produced 38602677472.00 bp of raw reads, with an average of 476576265.09 bp per individual (Table 2). After quality filtering, 36720237888.00 bp of valid data were obtained, with an average of 453336270.22 clean reads per individual. The GC content of clean reads was between 38.91% and 41.16%. The Q20 quality scores ranged from 91.41% to 96.09% and those of Q30 ranged from 81.28% to 90.81%, indicating that GBS sequencing results had high quality and could be used for subsequent analysis.
Among the clean reads, 115,615 SNPs were detected by SAMTOOLS software. The filtering criteria were as follows: individual depth = 4, deletion rate miss = 0.5, and minimum allele frequency min/  Table 2). H E was lower than H O . Among the six populations, the heterozygosity of the HS, YT, and ZZ populations was higher, and the Pi of the ZZ population was highest.

Population genetic differentiation
In the analysis of the population genetic structure of the six populations, the results of principal component analysis showed that WH and ZZ groups are relatively independent and the other four samples were clustered together (Figure 2). The results are consistent with those of phylogenetic tree analysis (Figure 3). When K = 2, the CVerror value was the smallest, thus it was the optimal K value. This indicated that the N. cumingii belonged in two clusters (Figure 4), which suggested that there was no significant genetic differentiation among different geographical populations. The F ST between populations ranged from −0.04683 (YT) to −0.02041 (HS) ( Table 3). The F ST values were low, indicating that there was low genetic differentiation among the populations. Analysis of molecular variance (AMOVA) showed that 0.27% of genetic differences came from among populations and 100.80% from within populations (Table 4).

Local adaptation of populations 3.4.1. Screening of discrete sites
Fifty-nine discrete loci subjected to directional selection were selected based on the Lositan and BayeScan methods, and these loci had relatively F ST values (−0.04683 to -0.02041). A total of 287 discrete loci were screened based on BayeNV association analysis of allele frequency and four environmental factors. Among these loci, 143 were associated with longitude and 83 were associated with latitude, and only 6 were jointly associated with these two factors. The results for discrete sites screened based on the two water temperatures differed significantly. The average water temperature factor 30 days before sampling was associated with 37 sites, and the average water temperature factor 90 days before sampling was associated with 116 sites. Only nine loci were jointly associated with these two factors.

Functional annotation and enrichment of discrete sites
Discrete loci were screened online using BlastX for gene function annotation. The results showed that 67 sequences had comparison results and 263 sequences had no comparison results. Blast2GO was used to carry out functional annotation for the 67 sequences with alignment results, and only 15 of them were mapped to the GO database. Based on the GO database analysis, these 15 sequences were involved in Biological Process (BP), Cellular Component (CC), and Molecular Function of the population, and their major functions included stress response metabolic process, biological regulation, catalytic activity, and other functional processes.

Genetic diversity of Neptunea cumingii populations
Genetic diversity is the sum of genetic variation between different populations or different individuals in the same population, and it is the basis for organisms to adapt to changes in environmental conditions and maintain long-term survival and evolution (Jia et al., 2020). Richer genetic diversity results in stronger breeding and genetic improvement abilities of the species and better adaptability to different    (Xue, 2015). Heterozygosity is an important index for evaluating the genetic diversity of a population. Populations with higher heterozygosity have richer population genetic diversity (Nei, 1978). Yu et al. (2017) used GBS technology nanalyzed the genetic diversity of the Pacific oyster (Crassostrea gigas) along the coast of Liaoning in the Yellow Sea and Bohai Sea and found that the average H O of the two populations was 0.2277 and 0.1723, respectively, and the H E was 0.2705 and 0.2375, respectively. These values indicated that the two populations were in a state of heterozygote loss. Previous studies have shown that the genetic diversity of shellfish was relatively rich, and the average heterozygosity was usually above 0.15 (Zhang et al., 2004). In this study, the H O values of the six populations in the Yellow Sea and Bohai Sea were all higher than the He values. Sui (2008)studied the genetic diversity of the N. cumingii population and found that the heterozygosity was 0.594, and Azuma et al. (2011) used microsatellite markers found that the heterozygosity was 0.580-0.811 in the study of important resources of N. cumingii in northern Japan, but in the study, the H O of the six populations of N. cumingii in the Yellow Sea and Bohai Sea ranged from 0.1551 to 0.1612, and the He was 0.1064 to 0.1117. The low level of heterozygosity was likely due to the methodology used in this study, which was very different from methods based on traditional molecular markers. Traditional methods only select a few marker loci as representative of the diversity of the whole genome. In contrast, the method used herein screens gene loci for the whole population based on population genomics. In addition, the genetic diversity of the six natural N. cumingii populations could be influence by differences in invalid alleles or the presence of non-random mating within the population.

Genetic differentiation of Neptunea cumingii populations
Genetic differentiation is the differentiation of a population caused by gene variation, and the F ST value is an important index of the degree of population genetic differentiation (Fu et al., 2013).
In the present study, the F ST of the six populations ranged from −0.04683 (YT) to −0.02041 (HS). The small F ST values indicated that there was low genetic differentiation among the six populations. Consistent with the results of principal component analysis, the WH and ZZ populations were separated from each other and the samples from the other geographical groups were clustered into one place without any obvious population genetic differentiation, in which the HS, LS, PL, and YT snails live is similar. Guess it might be because, the geographical distance between the WH and ZZ groups and their Genetic structure diagram of N. cumingii from the six populations. The genetic differentiation of the six N. cumingii populations in this study was lower than that reported in previous studies. Although a certain amount of differentiation was detected, no significant population geographic isolation was found, Wang (2013) used 16SrRNA to study the genetic diversity of the snails Thais luteostoma and Reishia clavigera and found that the genetic differentiation of both species differed greatly within the population, and no obvious genetic differentiation between the populations was detected. In this study, AMOVA indicated that only 0.27% of the genetic differences came from between populations and 100.80% came from within populations. The frequency of gene communication between populations may decrease with increasing geographical distance, so genetic differentiation At sites that are farther away from eacho ther is more likely to occur. However, several researchers found that there may be no significant correlation between genetic distance and geographical distance between populations due to the effect of different environments (Zhu et al., 2007;Liu et al., 2016). In the current study, salinity, water temperature, sediment type, feed, and other factors differed among sea areas of the Yellow Sea and Bohai Sea. Because different environments affected the degree of genetic differentiation, the degree of genetic differentiation among populations may not fully conform to the characteristics of geographical distribution.

Local adaptation of Neptunea cumingii populations
Local adaptation refers to the evolutionary process in which some groups of species are better adapted to the environment than others. The essence of local adaptation is frequent gene exchange between groups, resulting in population differentiation and trait differences, which can be inherited. In different environments, the genetic basis changes to adapt to environmental changes (Gienapp et al., 2008;Valladares et al., 2014), and differences in genetic components of different geographic populations are caused by the effects of natural selection on different populations (Thompson, 1998).
Use of the simplified genome sequencing echnique and Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathway analysis revealed that some metabolic pathways, such as calcium ion signaling and circadian rhythm, may be involved in local environmental adaptation of eel populations in different geographical regions (Gagnaire et al., 2012;Ulrik et al., 2014). In the current study, the annotation results of discrete sites showed that multiple discrete sites were related to energy metabolism, indicating that when the temperature was low, snails would use the stored energy in the body to adapt to environmental changes. Genes related to ion and membrane transport were also involved in the regulation of osmotic pressure, which played an important role in the adaptation to environmental changes (temperature). Previous reports showed that under heterogeneous conditions such as changing climate, water level, and disturbance in the environment, disproportionate selection led to the development of anisotropic traits in the population, thus generating local adaptation (Ravenscroft et al., 2014).
Adaptation is the result of interaction among various evolutionary forces, such as selection, genetic drift, mutation, and migration, and local adaptation of N. cumingii may be affected by many factors, including water quality, heavy metals, water temperature, salinity, parasites, predators, and physiological effects. The mechanism responsible for local adaptation may be very complex, as various factors interact to promote local adaptation. In summary, the signs of adaptive differentiation between populations based on SNP markers detected in this study were evidence of local adaptation on a micro-time scale, and the annotation results for discrete sites indicated the existence of local adaptation in different populations of N. cumingii.

Conservation of Neptunea cumingii resources
The results of genetic diversity and genetic difference level in population genetics can provide some theoretical basis for resource conservation. Therefore, in this study, the genetic characteristics of the N. cumingii population in China were preliminarily analyzed, and the results provide a theoretical basis for the formulation of relevant regulations and measures. The following methods can be adopted to protect and utilize the wild N. cumingii resources: (1) establish a key protection system for this species; (2) stipulate a strict fishing moratorium system; (3) strictly demarcate no-fishing zones; and (4) strengthen the protection of snails during breeding. In addition, appropriate artificial seedling cultivation or semi-artificial seedling collection can be conducted according to the geographical location of the snail population to implement proliferation and release, which can quickly restore the snail resources. However, more genetic information about the population genetic structure and characteristics of the N. cumingii population in China is needed.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: https://www.ncbi.nlm.nih.gov/, PRJNA929455.

Ethics statement
The animal study was reviewed and approved by University Animal Care and Use Committee of Dalian Ocean University.

Author contributions
ZH and YC designed the study. DZ, YT, and JM performed the study. LW, XW, and BT analyzed the data. BT wrote the manuscript. All authors contributed to the article and approved the submitted version.