Genome-Wide SNP Discovery and Population Genetic Analysis of Mesocentrotus nudus in China Seas

Mesocentrotus nudus is an important commercially aquatic species because of its high edible and medicinal values. However, wild stocks have dramatically decreased in recent decades. Understanding the population structure and genetic diversity can provide vital information for genetic conservation and improvement. In the present study, the genotyping-by-sequencing (GBS) approach was adopted to identify the genome-wide single-nucleotide polymorphisms (SNPs) from a collection of 80 individuals consisting of five geographical populations (16 individuals from each population), covering the natural habitats of M. nudus in China seas. An average of 0.96-Gb clean reads per sample were sequenced, and a total of 51,738 biallelic SNPs were identified. Based on these SNPs, diversity index analysis showed that all populations have a similar pattern with positive Fis (0.136) and low Ne (724.3). Low genetic differentiation and high genetic connectivity among five geographical populations were detected by pairwise Fst, principal component analysis (PCA), admixture, and phylogenetic analysis. Besides, two YWL individuals originating from an isolated ancestor may imply that there is a genetically differentiated population in the adjacent sea. Overall, the results showed that GBS is an effective method to detect genome-wide SNPs for M. nudus and suggested that the protective measures and the investigation with larger spatial scale and sample size for M. nudus should be carried out in the future.


INTRODUCTION
Sea urchin is an ancient group of marine invertebrates that belongs to the class of Echinoidea under phylum of Echinodermata. To date, about 850 species of sea urchins have been found worldwide (Harris and Eddy, 2015). Mesocentrotus nudus, a member of the family Strongylocentrotidae, inhabits the intertidal and subtidal rocky sea bottoms along the coast of northwestern Pacific (Agatsuma, 2020). M. nudus is a well-known edible sea urchin for its good taste and redundant nutrition of vitamins and unsaturated fatty acids in the gonad (Mi et al., 2014). Furthermore, sea urchin extracts have extensive biological effects, such as anticancer, antioxidant, antileukemia, anti-fatigue, and anti-inflammatory effects (Liu et al., 2007).
The high edible and medicinal values of M. nudus have greatly stimulated the consumption demand, which has led to serious overfishing of wild sea urchins (Andrew et al., 2002;Li and Qi, 2008). In addition, the natural habitat of M. nudus, mostly adjacent to the densely populated areas (Adachi et al., 2020), has also been heavily disturbed by human activities including land-sourced pollution, aquaculture pollution, and sea reclamation (Lukyanova et al., 2017;Song and Duan, 2019). Under the influence of overfishing and natural habitat loss, the wild population resources of M. nudus are declining rapidly in recent years (Andrew et al., 2002). Especially in China, M. nudus has been considered as an endangered species. Although the breeding of M. nudus has been implemented to solve the shortage problem of natural resource in China since the 1980s (Ding et al., 2007), the parents of the seedlings were mainly from the wild populations. Therefore, understanding the genetic attributes of wild populations of M. nudus not only provide basic information for resources assessment and management and but also can lay a foundation for germplasm improvement.
Molecular markers have become the preferred tools for population genetic analysis. However, previously, quite limited genetic information on molecular markers (e.g., AFLP, mtDNA, and SSR) for M. nudus were available; and the level and pattern of population genetic diversity and structure are still poorly understood (Zhang et al., 2012). Recently, advances in next-generation sequencing (NGS) have greatly reduced the cost of nucleotide sequencing and made the genetic marker discovery more convenient. Genotyping-by-sequencing (GBS) as a genomic approach can explore the data covering the whole genome range with reduced cost and also provide more high-resolution genetic information for species lacking genomic information (Elshire et al., 2011). So far, GBS has been widely applied to assess genetic diversity on marine organisms, e.g., Ostrea lurida (Silliman, 2019), Sillago japonica (Yang et al., 2020), and Haliotis discus (Wells and Dale, 2018). However, no report on the population genetic analysis of M. nudus by using GBS was found up to now.
In the present study, GBS was used to genotype a total of 80 individuals consisting of five populations, covering the distribution range of M. nudus in China seas. The objectives were to detect and genotype single-nucleotide polymorphisms (SNPs) at a genome-wide level and to characterize the genetic diversity and population structure of M. nudus. The results will provide useful information for the utility of GBS on the genetic analysis of sea urchin species and contribute to the population conservation and genetic improvement of M. nudus.

Sampling
In 2020, a total of 80 individuals were collected from five localities by SCUBA diving (16 individuals from each locality, Figure 1), covering the natural distribution range of M. nudus in China. Gonad tissue was sampled and fixed by ethanol on site. Genomic DNA of each sample was isolated from gonad tissue by using Plant Genomic DNA Kit (TIANGEN, Beijing, China) following the manual instruction. The purity and integrity of extracted DNA were determined by using a NanoDrop 1000 Spectrophotometer (NanoDrop, Wilmington, DE, United States) and electrophoresis on 1% agarose gel. Qualified genomic DNA was stored at −20 • C.

Sequencing and Genotyping
All samples were sequenced by using the GBS (Qi et al., 2018) method, which was performed by OE Biotech Co., Ltd. (Shanghai, China). Briefly, extracted DNA from each individual was digested with restriction enzymes PstI-HF and MspI, and the barcoded adapters and common adapter were respectively ligated on the PstI cut site and the MspI cut site of all samples by T4 DNA ligase. Then, the fragments of 300-700 bp were retrieved using recovery system of improved magnetic bead and amplified by PCR using high-fidelity enzymes. Finally, amplification fragments were sequenced on an Illumina Nova, PE150 (Illumina, Inc., San Diego, CA, United States). The generated raw reads were filtered to remove the potential adaptor sequences by using STACKS software (Catchen et al., 2013). After quality control, there was no reference genome available for aligning the clean reads for SNP discovery. Therefore, ustacks (-M 3), cstacks (-n 3), sstacks, tsv2bam, and gstacks programs using STACKS pipeline were used for de novo SNP discovery. The initial SNPs were filtered according to the following parameters: (1) minimum sequence depth was above 5; (2) the allelic number was equal to 2; (3) minor allele frequency (MAF) was greater than 0.05, and (4) missing rate was lower than 10%. The quality control of SNP data was performed using vcftools (Danecek et al., 2011) and R software (R Core Team, 2018).

Genetic Diversity Analysis
Population genetic diversity within overall populations and within each of the five populations was assessed. The polymorphic information content (PIC) was calculated using R package snpReady (Granato et al., 2018). The observed heterozygosity (Ho), expected heterozygosity (He), and inbreeding coefficient (F is ) were calculated at each SNP locus, and also across all loci, using hierfstat package in R software (Goudet, 2005). Besides, multilocus heterozygosity (MLH) and standardized MLH (sMLH) were calculated for all individuals using inbreedR package (Stoffel et al., 2016). The effective population size (Ne) using the linkage disequilibrium (LD) method was estimated by using NeEstimator V2.01 (Do et al., 2014). Considering the small size of each population, the Ne was only analyzed for overall populations.

Population Structure Analysis
Initial analysis of population structure was carried out by principal component analysis (PCA) using R package SNPRelate (Zheng et al., 2012) and plotted using Scatterplot3d package (Ligges and Mächler, 2002). In addition, the levels of ancestry and admixture proportions of individuals were assessed by using R package LEA (Frichot and François, 2015). The number of ancestral populations (k) ranging between 1 and 10 were tested, and 10 replicate analyses were run for each value of k. The optimal k was chosen based on cross-entropy and cross-validation errors  (Frichot et al., 2014). To better understand fine-scale patterns of genetic structure between populations, pairwise F st values were calculated with the function pairwise.neifst and significance was tested using 1,000 bootstrap replicates with the function boot.ppfst in the R package hierfstat (Goudet, 2005).

Phylogenetic Analysis
The individual-level phylogenetic tree was constructed using SNPhylo (Lee et al., 2014) with an automated bash shell script snphylo.sh. For this analysis, an LD threshold (r 2 = 0.8) was used to reduce SNP redundancy, and 1,000 bootstrap analyses were fulfilled. The individual phylogenetic tree was visualized by using R package ggtree (Yu et al., 2017). Besides, the populationlevel phylogeny using the maximum likelihood approach was also inferred by using TreeMix (Pickrell and Pritchard, 2012). Considering that the missing data can affect the accuracy of TreeMix inference, the SNPs with missing values were removed. The population-level phylogeny was drawn by using plotting_funcs.R script 1 .

Sequencing and Genotyping
All samples were sequenced by using GBS at efficient sequencing depths of approximately 34.11-fold. In total, 76.76 Gb of raw data was produced, with an average of 0.96 Gb per individual. After filtering, a total of 72.39 Gb of clean data was retained with an average effective rate (clean reads compared with raw reads) of 94.30%. Statistics on clean reads further showed the quality value 20 (Q20) ≥ 96.33% and quality value 30 (Q30) ≥ 90.41%, and the guanine-cytosine (GC) contents ranged from 40.43 to 42.07%. The reference sequence constructed using STACKS pipeline contained 439,316 tags with an average length of 277 bp, the longest one being 1,036 bp and the shortest one being 71 bp. Initially, 992,728 putative SNPs were identified. After quality control based on successive filtering steps for sequence depth, allelic number, MAF, and missing rate, a total of 51,738 biallelic SNP markers located on 15,562 tags were retained.

Genetic Diversity
Several diversity indices are shown in Table 1. The PIC ranged from 0.213 (LD) to 0.220 (DZD), with an average of 0.217. The

Population Genetic Structure
The PCA revealed that the majority of samples were clustered together, and no geographical patterns were observed (Figure 2). Similar results can be achieved from the admixture analysis (Figure 3). With the use of LEA, the minimum value of the cross-validation errors was 0.710 when k = 2, and the values continuously increased with k from 3 to 10 ( Figure 3A). All individuals were classified into two groups at the level of k = 2, but one group contained only two individuals from YWL ( Figure 3B). To further investigate the population structure, the analysis at k = 3 was also performed, and similar results were observed ( Figure 3C). The pairwise F st values ranged from 0.0001 to 0.0039 with a mean value of 0.0019 ( Table 2). The highest level of differentiation was observed between LD and YWL, whereas DZD and NHD differentiated the lowest.

Phylogenetic Tree
The individual-level phylogenetic tree was constructed with 33,178 biallelic SNPs (Figure 4), in which all individuals belong to the same branch except YWL8 and YWL10. The population-level phylogeny constructed with 5,460 biallelic SNPs showed that the five populations appeared to originate from the same group, and three migration events were modeled among different branches (Figure 5). Overall, the phylogenetic tree analysis coincided with the PCA and admixture analysis, which corroborated a high genetic connectivity with even biological mixing among all locations.

DISCUSSION
Reduced-representation sequencing could provide an opportunity for those species without prior genetic resources to understand their basic properties of the genome (Silliman, 2019;Yang et al., 2020). In this study, a large number of genome fragments were obtained for M. nudus by using GBS. By analyzing the DNA base composition, the percentage of GC  content was up to 41%, which was higher than that of the reported sea urchin, including Strongylocentrotus purpuratus (36.9%) (Sodergren et al., 2006), and Lytechinus variegatus (36.1%) (Davidson et al., 2020). Considering that genomic GC content is closely associated with genome size and significantly affects the genome functioning and species ecology (Šmarda et al., 2014), this result suggested that M. nudus may have a more complex genome than S. purpuratus and L. variegatus. This study is the first report to identify genome-wide SNPs for M. nudus and to determine the genetic diversity and population structure of this species in China seas. Several indices concerning heterozygosity were firstly analyzed across five geographical populations. In general, the DZD population displayed the highest genetic diversity. This result may benefit from the good ecological status in the surrounding waters of Danzi island especially after removal of the marine raft culture and minimizing the amount of waste water discharged into the coastal waters since 2010 (Li et al., 2013). In contrast, the Lidao Bay as an important fishery and aquaculture water was seriously affected by anthropogenic disturbance (Jiang et al., 2013;Wu et al., 2016), which may result in the lowest PIC, Ho, and He for LD population. Besides, compared with previous studies on the population genetics of aquatic animals by using genomewide SNPs, a moderate level of heterozygosity was observed for M. nudus in the current study. For example, the average Ho in M. nudus was lower than that in Oncorhynchus keta (Waples et al., 2017) and Paracentrotus lividus (Carreras et al., 2020), but higher than in several species such as Oreochromis urolepis (Nyinondi et al., 2020), O. lurida (Silliman, 2019), and Isocladus armatus (Wells and Dale, 2018). However, positive F is values have been observed in M. nudus. F is values could reflect the deviation from the Hardy-Weinberg equilibrium (HWE) genotype frequencies and indirectly reflect relative population heterozygosity. Generally, positive F is values may be caused by mating among relatives, life history trait with free-spawned planktonic sperm, and selection acting on the genetic markers (Zouros and Foltz, 1984;David et al., 1997;Whitaker, 2004;Addison and Hart, 2005). In the current study, the estimated Ne value for M. nudus was significantly lower than that for other marine species such as Chinook salmon (Larson et al., 2014), Salmo salar (Johnston et al., 2014), and Panulirus homarus (Al-Breiki et al., 2018). According to the proposal by Frankham et al. (2014), this value is too low to retain evolutionary potential for M. nudus and thus may lead to inbreeding. Besides, the life history of M. nudus can also contribute to the positive F is because the species like M. nudus with free-spawned planktonic sperm tend to have higher F is than those species that copulate or have some form of direct sperm transfer to females or benthic egg masses (Addison and Hart, 2005). In addition, the population were sampled along coastal zone of the Yellow Sea and Bohai Sea, where they have been heavily disturbed by anthropogenic activities including landsource pollution, aquaculture pollution, and reclamation. The phenomena of eutrophication and hypoxia are happening with an increasing frequency in recent years, resulting in drastic changes in the physical and chemical parameters of seawater such as nutrient content, dissolved oxygen (DO), and pH especially in a small spatial scale. These drastic and persistent environmental disturbances can not only lead to the reduction of population size but also impose strong selective pressure on M. nudus, which is relatively weak in activity and thus may contribute to the high F is value. Overall, the positive F is and low Ne suggested that the genetic diversity of M. nudus should be constantly monitored, and the protective measures should be carried out in China.
In order to assess the genetic differences among populations, pairwise F st values were estimated. The results showed that all F st values were far less than 0.05, which implies that there were low genetic differentiation and high genetic connectivity among five geographical populations. This result can be confirmed by PCA, admixture analysis, and phylogenetic analysis. This phenomenon may be mainly related to the lack of dispersal barriers in the Yellow Sea and Bohai Sea, and also the fact that M. nudus has a pelagic phase lasting for 1 month in the water column (Dautov et al., 2020). However, it is worth noting that two individuals from YWL deviated significantly from the main cluster and have little genetic association with other individuals. This phenomenon can also be found in similar studies. For example, individual ancestry analysis for five populations of Pacific cod showed that two individuals from coastal area had no similar ancestries as coastal fish (Drinan et al., 2018). Interestingly, these individuals displayed the same ancestry as the fish from the Salish Sea. Similarly, some seahorses from New Zealand were descendants of Australian ancestors (Ashe and Wilson, 2019). Therefore, in the current study, two differentiated individuals from YWL may imply that there is a genetically differentiated population in the adjacent sea. Especially, it is worth exploring whether the population structure of M. nudus from the Japan Sea will be different from that of the Yellow Sea and Bohai Sea, considering the two sea areas are separated by the Korean Peninsula, which might form a dispersal barrier. Besides, the small sampling size in the current study may be a potential factor leading to the cryptic genetic structure of M. nudus that may not be fully revealed. Therefore, the investigation with larger spatial scale and sample size of M. nudus needs to be implemented to fully assess the population structure and genetic diversity of M. nudus in the future.

CONCLUSION
In the present study, a total of 51,738 high-quality SNPs were detected, indicating that the GBS technology is a powerful tool for the genome-wide SNP discovery of M. nudus. Diversity index analysis showed that all populations have a similar pattern with positive F is and low Ne, suggesting that it is necessary to carry out the conservation of M. nudus in China. Low genetic differentiation and high genetic connectivity among five geographical populations were detected by pairwise F st , PCA, admixture analysis, and phylogenetic analysis. Besides, two individuals originating from an isolated ancestor were identified, which may be crucial for the conservation of genetic diversity and germplasm improvement. Thus, the investigation with larger spatial scale and sample size needs to be implemented to assess the population structure and genetic diversity of M. nudus in the future.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: GenBank (Accession number: PRJNA738544).

ETHICS STATEMENT
Each of the procedures that were used to handle and treat the Mesocentrotus nudus during this study was approved by the Key Laboratory of Coastal Biology and Biological Resources Utilization, Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, prior to the initiation of the study and the Animal Management Regulations, revised on March 1, 2017, No. 676.

AUTHOR CONTRIBUTIONS
QW conducted the experiment and data processing. BL conceived and supervised the project. QW and YL collected the experimental animals. LY and LC contributed to preparing the genomic DNA for SNP genotyping. QW and BL wrote the manuscript. All authors have read and approved the manuscript.