Population Genetic Structure and Gene Expression Plasticity of the Deep-Sea Vent and Seep Squat Lobster Shinkaia crosnieri

Shinkaia crosnieri (Decapoda: Munidopsidae) is a squat lobster that dominates both deep-sea hydrothermal vent and methane seep communities in the Western Pacific. Previous studies comparing S. crosnieri living in these two types of habitats have suffered from methodological and/or sample size limits. Here, using transcriptome-wide single nucleotide polymorphisms (SNPs) markers from 44 individuals of S. crosnieri, we reveal the extent of genetic connectivity between a methane seep population in the South China Sea and a hydrothermal vent population in the Okinawa Trough, as well as their signatures of local adaptation. Analysis of differentially expressed genes (DEGs) between these two populations and population-specific genes (PSGs) revealed that a large number of unigenes, such as cytochrome P450 (CYP), glutathione S-transferase (GST) and peroxiredoxin 6 (Prdx6) related to oxidoreductase, and sulfur dioxygenase (ETHE1) and chondroitin 4-sulfotransferase 11 (CHST11) related to sulfur metabolism, showed opposite expression patterns in these two populations. Data subsampling in this study revealed that at least five individuals of S. crosnieri per site are required to generate reliable results from the differential gene expression analysis. Population genetic analyses based on 32,452 SNPs revealed clear genetic differentiation between these two populations with an FST value of 0.07 (p < 0.0005), and physical oceanographic modeling of the ocean currents in middle and deep layers also suggests a weak connection between these two sites. Analysis of outlier SNPs revealed 345 unigenes potentially under positive selection, such as sarcosine oxidase/L-pipecolate oxidase (PIPOX), alanine-glyoxylate transaminase/serine-glyoxylate transaminase/serine-pyruvate transaminase (AGXT), and Cu-Zn superoxide dismutase (SOD1). Among the differentially expressed genes and genes with amino acid substitutions between the two sites are those related to oxidation resistance and xenobiotic detoxification, indicating local adaptation to the specific environmental conditions of each site. Overall, exploring the population structure of S. crosnieri using transcriptome-wide SNP markers resulted in an improved understanding of its molecular adaptation and expression plasticity in vent and seep ecosystems.

Shinkaia crosnieri (Decapoda: Munidopsidae) is a squat lobster that dominates both deep-sea hydrothermal vent and methane seep communities in the Western Pacific. Previous studies comparing S. crosnieri living in these two types of habitats have suffered from methodological and/or sample size limits. Here, using transcriptome-wide single nucleotide polymorphisms (SNPs) markers from 44 individuals of S. crosnieri, we reveal the extent of genetic connectivity between a methane seep population in the South China Sea and a hydrothermal vent population in the Okinawa Trough, as well as their signatures of local adaptation. Analysis of differentially expressed genes (DEGs) between these two populations and population-specific genes (PSGs) revealed that a large number of unigenes, such as cytochrome P450 (CYP), glutathione S-transferase (GST) and peroxiredoxin 6 (Prdx6) related to oxidoreductase, and sulfur dioxygenase (ETHE1) and chondroitin 4-sulfotransferase 11 (CHST11) related to sulfur metabolism, showed opposite expression patterns in these two populations. Data subsampling in this study revealed that at least five individuals of S. crosnieri per site are required to generate reliable results from the differential gene expression analysis. Population genetic analyses based on 32,452 SNPs revealed clear genetic differentiation between these two populations with an F ST value of 0.07 (p < 0.0005), and physical oceanographic modeling of the ocean currents in middle and deep layers also suggests a weak connection between these two sites. Analysis of outlier SNPs revealed 345 unigenes potentially under positive selection, such as sarcosine oxidase/L-pipecolate oxidase (PIPOX), alanineglyoxylate transaminase/serine-glyoxylate transaminase/serine-pyruvate transaminase (AGXT), and Cu-Zn superoxide dismutase (SOD1). Among the differentially expressed genes and genes with amino acid substitutions between the two sites are those related to oxidation resistance and xenobiotic detoxification, indicating local adaptation to the specific environmental conditions of each site. Overall, exploring the population

INTRODUCTION
Deep-sea hydrothermal vents, often distributed along active midocean ridges and back-arc spreading centers, are well known for discharging sulfur-rich geofluids into the water column (Corliss et al., 1979;German et al., 2000). Methane seeps, usually found along the continental margins and in trenches, typically release methane-rich geofluids from the seabed more slowly (Van Dover et al., 2002). These two types of ecosystems share some similar features, such as lack of light to support photosynthesis, high pressure, and high concentration of chemically reduced compounds as well as heavy metals (Levin, 2005;German et al., 2011). Nevertheless, a great number of macrobenthos flourish in these 'extreme' environments, forming high-biomass hotspots powered by chemosynthesis in the deep ocean. Over 700 species of macrobenthos have been recorded in the global hydrothermal vents and more than 600 in the methane seeps (German et al., 2011). However, only a small fraction of them inhabit both hydrothermal vents and methane seeps (Watanabe et al., 2010;Vrijenhoek, 2013), indicating that thriving in both types of environments requires specific adaptation and gene expression (Watanabe et al., 2010).
The Western Pacific is an ideal region for a comparative study of vent and seep populations due to their close proximity and the presence of 20% of all the recorded macrobenthos in both types of habitats (Watanabe et al., 2010). Kiel (2016) reported the Western Pacific as having the highest number of active, sedimentary back-arc vents in the world, which might provide a vital biogeographic link between vent and seep animals. Previous population genetic studies of deep-sea macrobenthos in the Western Pacific have mainly focused on macrobenthos with a planktotrophic larval stage, such as bathymodioline mussels (Kyuno et al., 2009;Miyazaki et al., 2010;Xu et al., 2017Xu et al., , 2018 and the deep-sea limpet Shinkailepas myojinensis (Yahagi et al., 2017). In this study, we aim to understand the population connectivity and local environmental adaptation of the squat lobster Shinkaia crosnieri (Decapoda: Munidopsidae), whose larvae are lecithotrophic and likely have a limited dispersal ability (Nakajima et al., 2018). Shinkaia crosnieri was initially discovered on the Edison Seamount vent field in the Bismarck Archipelago (Baba and Williams, 1998), and was later found in methane seeps in the South China Sea (SCS) and hydrothermal vents in the Okinawa Trough (OT), with a known bathymetric range between 700 and 2,200 m (Chan et al., 2000;Watanabe et al., 2010;Miyazaki et al., 2017a). The distribution pattern of S. crosnieri in terms of both habitat types and life-history trait makes it a suitable model for studies of population divergence and local adaptation under different environmental conditions.
Previous studies of S. crosnieri using the mitochondrial cytochrome c oxidase submit I (COI) gene  or three mitochondrial genes (COI, cytochrome b gene (Cytb), and 16S rRNA) (Shen et al., 2016) have revealed clear genetic differentiation between a methane seep population in the SCS and hydrothermal vent populations in the OT. Nevertheless, due to limited genomic coverage, a single to a few gene markers can hardly reflect the population divergence at the genome level and do not allow for the effective discovery of signatures of natural selection. Genome-wide single-nucleotide polymorphism (SNP) markers have the potential to resolve these issues (Xu et al., 2012). Recently, Cheng et al. (2020) obtained 12,963 genome-wide SNPs using restriction site associated DNA sequencing (RAD-Seq). By using these markers, they also revealed clear genetic divergence between a seep population of S. crosnieri in the SCS and a vent population in the OT. Nevertheless, RAD-Seq can only capture a reduced representation of the genome, and numerous SNP markers are located in non-coding regions (Pegadaraju et al., 2013;Houston et al., 2014). Indeed, Cheng et al. (2020) identified 54 outlier SNPs potentially under positive selection, but only five were in the protein-coding regions. More recently, Cheng et al. (2019) compared the transcriptomes of S. crosnieri collected from the same seep and vent populations, but with only three individuals from each population. They detected 545 differentially expressed genes (DEGs) and 82 protein-coding genes (PCGs) that have potentially undergone positive selection. However, the small sample size may have limited the statistical power for the detection of DEGs and PCGs.
In the present study, we sequenced the transcriptomes of a total of 44 S. crosnieri individuals, including 20 from a methane seep in the SCS and 24 from a hydrothermal vent in the OT. We identified DEGs between the two populations, populationspecific genes (PSGs), transcriptome-wide SNP markers, and outlier SNP markers. We also conducted subsampling analyses to determine the number of individuals required to generate representative data for a meaningful population comparison. Our results not only provide new insights into the local adaptation and population genetics of S. crosnieri inhabiting both types of habitats, but also demonstrate how different techniques and sample sizes may affect these results.

Sample Collection
A total of 44 S. crosnieri individuals were used in this study. Among them, 10 were collected by the manned submersible Jiaolong from Jiaolong Ridge (also known as the F site), a methane seep ( (Nakamura et al., 2015) in the OT during the Japan Agency for Marine-Earth Science and Technology (JAMSTEC) R/V Kairei cruise KR15-17 in November 2015. Samples were frozen at −80 • C upon recovery or fixed with RNAlater, and then shipped to the laboratory and stored at −80 • C until use.

RNA Extraction and Sequencing
The gill, hepatopancreas, ovary, and abdominal muscle from one vent individual (OT_1), testis, and abdominal muscle from another vent individual (OT_2) and the abdominal muscles of the other 42 individuals (from SCS_1 to 20 and OT_3 to 24) were dissected for total RNA extraction using the TRIzol kit (Thermo Fisher Scientific, United States) according to the manufacturer's protocol. The quantity and quality of RNA were examined using 1% agarose gel electrophoresis and NanoDrop 2000 (Thermo Fisher Scientific, United States), respectively. RNA libraries were constructed individually and sequenced on an Illumina HiSeq 2500 platform (PE150) by Novogene Bioinformatics Technology Co., Ltd., Beijing, China 1 .

Transcriptome Assembly, Completeness Assessment, and Functional Annotation
The quality of raw sequencing data was controlled using FASTQC 2 and filtered using TRIMMOMATIC v.0.36 (Bolger et al., 2014). The parameters were leading: 10; trailing: 10; sliding window: 4:15; minlen: 25, and adapters: ILLUMINACLIP: TruSeq3-PE.fa:2:30:10. The resultant clean reads of the four tissues from OT_1, testis from OT_2 and abdominal muscle from SCS_3 (with maximum sequence data size of SCS individuals) were de novo assembled using Trinity v.2.8.3 (Grabherr et al., 2011) under default settings except the min_contig_length parameter was set to 300 and the min_kmer_cov was set to 2. The longest isoform of each gene cluster was selected using a custom Python script as a unigene. CD-HIT-EST (Fu et al., 2012) was used to remove redundant unigenes based on a similarity threshold of 95%. TransDecoder v.5.5.0 (Haas et al., 2013) was then applied to predict candidate open reading frames (ORFs) with the single_best_only option. BUSCO v.3.0 (Simão et al., 2015) was utilized to evaluate the completeness of the final assembly based on the metazoa_odb9 database, and the Perl script assemblathon_stats.pl (Bradnam et al., 2013) was run to evaluate the final assembly. The transcriptome from each individual was also assembled separately using the same Trinity parameters and CD-HIT was applied thereafter.
All obtained unigenes were searched against the NCBI non-redundant (NR) protein database using BLASTp v.2.7.1 (Altschul et al., 1990) with an E-value of 10 −5 , a word size of 3, a minimum alignment of 20, and max hsps of 20, and the resultant .xml file was fed into OmicsBox (Götz et al., 2008) to search for the Gene Ontology (GO) function. The UniProt database was scanned using BLASTp with an E-value of 1 www.novogene.com 2 https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ 10 −5 and the online annotation KAAS-KEGG server 3 applying the single-directional best hit method was used to search for the Kyoto Encyclopedia of Genes and Genomes (KEGG) annotation.

Differentially Expressed Genes (DEGs) and Population-Specific Genes (PSGs) Analyses
Kallisto (Bray et al., 2016) was used to quantify abundances of the assembled unigenes with a bootstrap value of 100. The gene expression level in transcripts per kilobase million (TPM) was further normalized with the TMM method in edgeR (Robinson et al., 2010). The correlation between individuals and normalized TPM was determined by principal component analysis (PCA) implemented in the R package DESeq2 (Love et al., 2014). Genes without expression were removed, and the average TPM of each contig was calculated individually within each sampling site. The Differential expression analysis was conducted via the web portal RNA-seq 2G  using the DESeq2 method with a minimal read count of 10. Unigenes with an absolute value of fold change greater than 2 and a false discovery rate (FDR) less than 0.05 were considered as DEGs. Partial DEGs were visualized by Morpheus 4 .
Quantitative real-time reverse transcription-PCR (qRT-PCR) was performed to validate the expression levels of selected genes. Primers for qRT-PCR were designed utilizing the online NCBI Primer-BLAST tool 5 with sequences given in Supplementary  Table 1. Total RNA of two individuals (SCS_23 and OT_3) were extracted with the TRIzol kit (Thermo Fisher Scientific, United States) and reverse transcription from RNA to singlestranded cDNA was conducted using the High Capacity cDNA Reverse Transcription Kit (Thermo Fisher Scientific, United States). The SYBR Green PCR Master Mix (Thermo Fisher Scientific, United States) was utilized with synthesized cDNA to conduct qPCR reaction on a LightCycler 480 Instrument II (Roche, Switzerland). The relative fold change in the expression of selected genes was calculated by the 2 − Ct method using actin as an internal standard gene. The correlation coefficient (r 2 ) between expression of qRT-PCR and that of RNA-Seq was analyzed in Excel.
Gene Ontology enrichment analysis of DEGs was performed with Fisher's Exact Test in OmicsBox v1.3.11 6 . KEGG pathway enrichment analysis was performed via Enrichr (Chen et al., 2013). In both enrichment analyses the significant enrichment level was set to 0.05.
The transcriptome data for individuals collected from the two sampling sites were compared, adopting the same Kalisto and RNA-seq 2G parameters as mentioned above. The groups of orthologous genes of each individual transcriptome were detected using Proteinortho v.6.0b (Lechner et al., 2011). PSGs were defined as the unigenes shared among at least 90% of the individuals from one site but among less than 10% of the individuals from the other site (Parra et al., 1998;Belcaid et al., 2019). PSGs were annotated according to the NCBI NR and KEGG databases. The KEGG pathway of PSGs was reconstructed using KEGG Mapper 7 .
Statistical power of different sample sizes used for the identification of DEGs was tested by a subsampling approach. In brief, a total of 3, 5, 10, and 15 individuals were randomly selected from each habitat to identify DEGs using the methods detailed in this section. Afterward, comparison was conducted to determine an appropriate sample size for the reliable DEGs. In addition, DEGs obtained in our study were also compared with those reported in Cheng et al. (2019).

Identification of Transcriptome-Wide SNP Markers
Raw reads of each individual were trimmed using Trimmomatic v.0.36 (see section "Transcriptome Assembly, Completeness Assessment, and Functional Annotation" for the settings). Filtered reads of each individual were mapped to the assembled transcriptome using Bowtie2 (Langmead and Salzberg, 2012) with a maximum fragment length of 1,200 (-maxins 1200), a seed of 20 (−D 20), re-seed reads twice (−R 2), a mismatch number of 1 (−N 1), a seed length of 18 (−L 18), and an interval between seeds of S,1,0.50. The generated .sam files were converted to the .bam format and sorted by SAMtools v.1.8 (Li et al., 2009). PCR duplicates were removed using Picard MarkDuplicates 8 . Then, the mpileup function implemented in SAMtools was applied to generate .bcf files with the following settings: 10 for minimum mapping quality for an alignment (−q 10) and 20 for minimum base quality (−Q 20). SNPs were identified using the call function implemented in BCFtools v.1.9 (Li, 2011) with the default settings along with the options −mv and −Ov. The obtained .vcf file for each individual was then sorted using VCFtools v.0.1.13 (Danecek et al., 2011) and the .vcf files of all the individuals were combined into a single file using the merge function in BCFtools v.1.9.
Statistical power of different sample sizes selected to identify SNPs was tested via a subsampling approach as well. In brief, a total of 3, 5, and 10 individuals were randomly selected from each habitat to identify SNPs using the methods described above. Afterward, the SNP numbers were compared to evaluate the impact of sample sizes on SNP detection.

Detection and Annotation of Outlier SNPs
Two methods were adopted to detect outlier SNPs. First, a coalescent method implemented in Arlequin v.3.5 (Excoffier and Lischer, 2010) was used for outlier detection by running 20,000 simulations with 100 simulated demes per group, and SNPs with a p-value < 0.05 were considered as the Arlequin outliers. Second, a multivariate analysis was implemented in the R package pcadapt (Luu et al., 2017) for outlier identification, and SNPs with an adjusted p-value < 0.1 from the Benjamini-Hochberg procedure were considered as the pcadapt outliers. Outliers detected by both methods were retained and used for GO enrichment and KEGG enrichment analyses using the same approaches described in the Section "Differentially Expressed Genes and Population-Specific Genes".

Population Genetic Analyses Based on the Entire and the Outlier SNP Dataset and Physical Oceanographic Modeling
Genetic differentiation was evaluated using the pairwise F st statistic via Arlequin v.3.5.2.2 with the default settings based on both the entire and the outlier SNP datasets. Population structure was examined using the maximum-likelihood estimation method implemented in ADMIXTURE v1.3 (Alexander et al., 2009) and PCA implemented in the R package SNPRelate (Zheng et al., 2012) using single SNP per locus of both the entire and the outlier SNP datasets. The .ped, .map, and .bed format files (Alexander et al., 2015) required by the ADMIXTURE analysis were transformed based on the combined .vcf file of all the individuals using PLINK v.2.00a2LM (Purcell et al., 2007) with the -allowextra-chr option. The number of ancestral populations (i.e., the K value) in ADMIXTURE was predefined from 1 to 3. When K was set to 2, the divergence was between sites. Population structure was visualized using the function bar plot in R based on the best K value. The relative migration pattern of all samples was estimated via divMigrate in the R package diveRsity (Keenan et al., 2013) using the D Jost statistic as a measure of genetic distance and a bootstrap value of 1,000. To validate population genetics results against physical oceanographic model of larval drift, we inferred the annual-mean lateral ocean currents from the HYCOM + NCODA Global 1/12 • Reanalysis (experiment sequence: 53.X) data 9 , which has assimilated multiple sources of available observational records of the ocean.

Transcriptome Assembly, Completeness, and Annotation
A total of 3.93 Gb of paired-end clean reads of abdominal muscle from one SCS individual (SCS_3) and 10.96 Gb of pairedend clean reads of five tissues (gill, ovary, abdominal muscle, hepatopancreas, and testis) from two OT individuals (OT_1 and OT_2) (Supplementary Table 2) were obtained after quality control and were used for transcriptome reference assembly. The detailed sequencing statistics of the abdominal muscle from these individuals are summarized in Supplementary Table 3. De novo assembly and data filtering resulted in 29,273 unigenes. The assembled transcriptome had a contig N50 of 2,690 bp and a mean contig size of 1,615 bp. The contig length ranged from 301 to 29,765 bp. BUSCO analysis revealed that the transcriptome contained 89.7% complete plus 3.1% fragmented conserved metazoan genes.
Results of PCA based on the unigene expression matric revealed that all the SCS seep individuals were separated from all the OT vent individuals along the first eigenvector ( Figure 1A). Further analyses showed that a total of 4,854 (16.6%) unigenes were differentially expressed between S. crosnieri from the SCS and the OT. Among them, 2,597 (8.87%) unigenes showed higher expression levels in the SCS than in the OT, whereas 2,257 (7.71%) had higher expression levels in the OT than in the SCS. Relationship between the FDR and fold change for all DEGs is illustrated in Figure 1B. Expression heatmap and hierarchical clustering analysis of 29 DEGs involved in oxidative activity, metabolism of xenobiotics by cytochrome P450, sulfur metabolism, and methane metabolism are presented in Figure 2 and their mean expression levels in the SCS and OT as well as functional annotations are given in Supplementary Table 6. Furthermore, qRT-PCR and RNA-Seq data are highly correlated (R 2 = 0.97) (Supplementary Figure 2), indicating the robustness of our DEG analyses.

SNP Identification and Outlier SNP Characterization
After genotyping and stringent data filtering (Table 2), a total of 32,452 transcriptome-wide SNPs located in 8,667 unigenes were obtained and subjected to downstream analyses. Outlier screening tests resulted in 1,065 Arlequin outliers and 1,307 pcadapt outliers, with 504 outliers identified by both approaches (Figure 3A). These 504 outlier SNPs ( Figure 3B) were located in 345 unigenes and GO enrichment analysis of these unigenes resulted in the discovery of 27 enriched categories (Supplementary Figure 3B). Among them, the top category for all, as well as molecular function, was ' ATP binding' (GO: 0070403), and the second and third categories of MF were 'calcium ion binding' (GO: 0005509) and 'zinc ion binding' (GO: 0008270), respectively. The category of 'calcium ion binding' includes myosin light chain 6 and 12 (MYL6 and MYL12), chloride intracellular channel protein 2 (CLIC2), Plastin-2 (PLSL) and serine/threonine-protein phosphatase 2A regulatory subunit B" subunit alpha (PPP2R3A); the category of 'zinc ion binding' includes NAD-dependent protein deacylase 2 and 5 (SIRT2 and SIRT5), glycine hydroxymethyltransferase (GlyA) and chitin deacetylase 1 (CDA1). The annotations and partial functions of the nine metal ion binding related genes using the published literature are presented in Table 3.
KEGG enrichment analysis revealed that 180 of the 345 outlier-containing unigenes were significantly enriched in 15 pathways (Table 4). Specifically, a number of pathways may be related to the adaptation of S. crosnieri to local environments, although it is not possible to pinpoint the specific environmental condition shaping the adaptation. For example, three unigenes were enriched in the 'peroxisome' (ko04146) pathway: sarcosine oxidase/L-pipecolate oxidase (PIPOX), alanine-glyoxylate transaminase/serine-glyoxylate transaminase/serine-pyruvate transaminase (AGXT), and Cu-Zn superoxide dismutase (SOD1); and five unigenes were enriched in the 'thermogenesis' (ko04714) pathway: cyclic AMP-responsive element-binding protein 1 (CREB1), succinate dehydrogenase cytochrome b560 subunit (SDHC), succinate dehydrogenase (SDHD), NADdependent protein deacetylase Sirt6 (SIRT6), and succinate dehydrogenase [ubiquinone] iron-sulfur subunit (SDHB). Genes from environment-related pathways such as sulfur dioxygenase (ETHE1) in the 'sulfur metabolism' (ko00920) pathway and dihydrodiol dehydrogenase (DDH) in the 'metabolism of xenobiotics by cytochrome P450' (ko00980) pathway were identified in the outlier SNP dataset and the results of amino acid substitutions of 27 outlier SNPs with annotation are given in Table 5. Overall, 74 unigenes were found in both the outlier SNP dataset and the DEG dataset (42 in the SCS and 32 in the OT) (Supplementary Table 7), such as ETHE1 and DDH in the SCS population, suggesting a link between mutation and expressional change.

Population Structure Based on the Entire and the Outlier SNP Dataset
Pairwise F st calculated based on the entire SNP dataset was 0.07 (p < 0.0005), and pairwise F st value estimated based on the outlier SNP dataset was 0.43 (p < 0.0005), much higher than that based on the entire SNP dataset as expected. ADMIXTURE analyses and PCA based on both all the (Figures 4A,B) and just the outlier SNPs (Figures 4C,D) clearly revealed two genetic groups of S. crosnieri, with one formed by all the individuals from the SCS and the other formed by all the individuals from the OT. Nevertheless, eight individuals in the OT showed a signature of admixture of two genetic groups (Figure 4A), indicating a stronger gene flow from the SCS to the OT. A similar result was also obtained via the divMigrate analysis where a higher migration rate was detected in the SCS to OT direction (1.00 for SCS to OT vs. 0.21 for OT to SCS). Figure 5A shows the lateral velocity vectors at 500 m depth laid upon the bathymetry in our study region. The Kuroshio Current is visible at this level  and turns anticyclonically at the Luzon strait, allowing for water exchange between the Pacific Ocean and the SCS. At 1,000 m depth, however, much less connectivity is revealed between the Pacific and the South China Sea SCS ( Figure 5B).

Subsampling Analyses and Comparison With Previous Studies
Although the samples we used were different from those used in Cheng et al. (2019), some DEGs identified in our study showed the same expression patterns as in Cheng et al. (2019), including cystathionine gamma-lyase (CSE) and heat shock protein 22 (HSP) ( Table 6), indicating they might reflect true differences between the two populations. Nevertheless, subsampling of our data clearly revealed a requirement of a larger sample size to capture the DEGs of the two populations ( Figure 6A). From our data, the total number of DEGs increased sharply from 2,142 when using data from three individuals to 4,353 when using data from five individuals. But the number of DEGs did not increase further when data from even more individuals were included in the analysis. The SNP markers detected using the subsampling strategy showed a basically consistent trend when 3, 5, and 10 individuals were chosen from each habitat (Figure 6B), and all three subsampling groups under transcriptome sequencing obtained more SNPs than the number of loci obtained under RAD sequencing. Moreover, more outlier SNP sequences were detected from transcriptome-wide SNP dataset compared with the results from RAD-Seq. Another advantage is that SNPs detected from transcriptome are all located in protein coding sequences while markers from RAD-Seq were sometimes out of proteincoding regions which is degermed by restriction enzymes (Cheng et al., 2020).

DISCUSSION
By using the RNA-Seq technique, a total of 112 PSGs, 4,854 DEGs, and 32,452 SNPs were detected in this study, which ALT, alternate codon and marked with boldness; AA, amino acid; "-", synonymous substitution.
provided the basis for investigating the population differentiation and environment-specific local adaptation of S. crosnieri from a methane seep and a hydrothermal vent in the Western Pacific.

Population Genetic Structure of Shinkaia crosnieri Between SCS and OT
Previous population genetic studies (Shen et al., 2016;Yang et al., 2016;Cheng et al., 2020) using different molecular markers have consistently shown that the seep population of S. crosnieri in the SCS and the vent population in the OT form two distinct genetic groups. By using 32,452 transcriptome-wide SNPs markers, our study also uncovered the same pattern of population structure. One of the abiotic factors that may have contributed to such population subdivision is the physical barrier formed by the Luzon Strait. The Luzon Strait is the only deep channel that connects the semi-enclosed SCS with the deeper Pacific Ocean for water transportation via a sandwiched vertical structure (Tian et al., 2009). Seawater flows into the SCS from the Pacific Ocean in both the upper (<700 m) and the deep (>1,500 m) layers and exits the SCS through the Luzon Strait in the intermediate layer (700-1,500 m) (Tian et al., 2006;Yuan et al., 2009). The eastward spread is stronger in both winter and spring (You et al., 2005). Previously, by using genomewide SNP markers, Xu et al. (2018) revealed a limited gene flow of the deep-sea mussel Gigantidas platifrons (previously known as Bathymodiolus platifrons) between the SCS and the OT. However, G. platifrons and S. crosnieri have different lifehistory traits. Bathymodioline mussels produce planktotrophic larvae that take almost a year to develop, and their dispersal  ability is largely influenced by the upper or surface currents (Arellano and Young, 2009). Consequently, the limited larval exchange of G. platifrons between the two sides of the Luzon Strait is suggested to be achieved by the looping path of the Kuroshio Current, which flows into the SCS via the middle part and exits via the northern part of the Luzon Strait (Nan et al., 2011), as well as by the North Pacific Intermediate Water (NPIW; Xu et al., 2018). In contrast, S. crosnieri produces large (2 mm) oil-rich eggs and equally oil-rich lecithotrophic larvae (Miyake et al., 2010;Nakajima et al., 2018), and its dispersal is expected to mainly occur within the middle and deep layers. Its larval dispersal between the two sides of the Luzon Strait could be greatly restricted by the effect of the NPIW widely distributed between a depth range of 300-800 m (Watanabe and Wakatsuchi, 1998), approximately 300 m above the SCS methane seep and approximately 700 m above the OT hydrothermal vent. The physical oceanographic modeling results also revealed limited water connection between these two sites at the middle (500 m) and deep (1,000 m) layers.

Local Adaptation of Shinkaia crosnieri
Results of DEGs, PSGs, and outlier SNP detection revealed that several functional groups such as oxidoreductase activity, sulfur metabolism and metal ion binding may be involved in the local adaptation and expression plasticity of S. crosnieri under the varying environmental conditions of methane seeps and hydrothermal vents.
Mitochondria are a target of pollutant-induced toxicity, as reflected by the expressional changes of mitochondria-related proteins as well as oxidative enzymes in response to exposure to toxicants (Gobe and Crane, 2010). Iron plays a role in many cellular processes as well as the generation of harmful reactive oxygen species (ROS). Extracellular iron is taken up by cells and transported to mitochondria which form a major center of iron utilization and accumulation. In this process, metalloreductase (STEAP3), an endosomal membrane enzyme, can convert iron from an insoluble ferric (Fe 3+ ) to a soluble ferrous (Fe 2+ ) form (Martinez-Finley et al., 2012). Then, Fe 2+ is transported directly from the endosome to mitochondria (Paul et al., 2017). STEAP3 was unigene upregulated in the SCS seep population, which may be a strategy against comparatively higher level of dissolved Fe concentration in the SCS than in the OT (Hu et al., 2015;Miyazaki et al., 2017a,b).
Hydrogen sulfide (H 2 S) is an important energy source driving carbon fixation and other critical metabolic processes in deep-sea thiotrophs, which then supply energy and nutrients to many macrobenthos living in hydrothermal vents and methane seeps. However, this gas is also a potent toxin to cytochrome c oxidase. Therefore, animals living in these chemosynthesis-based ecosystems are expected to be able to detoxify sulfide (Julian et al., 2005). Organisms inhabiting H 2 Srich environments have evolved many strategies for coping with continuous exposure to H 2 S (Tobler et al., 2016). Eukaryotes usually possess a mitochondrial sulfur detoxification pathway, which converts sulfide into sulfite. Two key enzymes in the sulfide detoxification system, sulfide:quinone oxidoreductase (SQR) and sulfur dioxygenase (SDO), have been identified in this study. SQR is a mitochondrial membrane-bound enzyme. It converts sulfide to persulfides and transfers electrons to the pool of ubiquinone, which is then oxidized by SDO to generate sulfite with the consumption of molecular oxygen and persulfide molecules (Hildebrandt and Grieshaber, 2010). A previous study of the mussel G. platifrons from the SCS methane seep discovered the expression of the same sulfide detoxification pathway in the symbiont-hosting gill (Wong et al., 2015;Sun et al., 2017). Succinate dehydrogenase [ubiquinone] iron-sulfur subunit (SDHB), a subunit of SQR, was identified in our outlier SNP dataset, and sulfur dioxygenase (ETHE1), a homodimeric Fe-containing sulfur dioxygenase (SDO), was a DEG in the seep population as well as an outlier SNP with synonymous substitution. Natural gas hydrate on the seafloor of the Formosa Ridge (also known as the Jiaolong Ridge) in the SCS contained significant amounts of H 2 S . These results indicate activation of mitochondrial sulfur detoxification in the seep population to cope with the local physicochemical condition. One PSG in the vent population, chondroitin 4-sulfotransferase 11 (CHST11), belongs to the family of sulfotransferases that catalyze the conversion of sulfate to chondroitin 4 -sulfate, a sulfated polysaccharide usually found in the matrix of animal cells (Suzuki and Strominger, 1960;Helbert, 2017).
Apart from cellular organelles, the mixed-function oxygenase (MFO) system also contributes to xenobiotic detoxification in all organisms (Ramos and Garcia, 2007). The first phase of MFO is mainly controlled by biotransformation enzymes like CYP and glutathione S-transferase (GST) that introduce a functional group to transform and metabolize toxicants. Superoxide dismutase (SOD), catalase (CAT), and glutathione peroxidase (GPx) catalyze subsequent reactions (second phase) and generate more polar products which are more easily excreted by the organisms (Lee, 1981). CYP and GST were upregulated unigenes in the seep population, while the alternative hydroperoxide of GPx, the peroxiredoxin 6 (PRDX 6) that also uses glutathione (GSH) as an electron donor (Deponte, 2013), was upregulated in the seep population. Cu-Zn superoxide dismutase (SOD1), a major antioxidant defense enzyme in charge of the detoxification of superoxide radicals and the generation of H 2 O 2 , was identified as an outlier SNP. The different expression levels of MFO systemrelated enzymes may suggest varying toxicant accumulation patterns between the two studied populations. Besides SOD1, sarcosine oxidase/L-pipecolate oxidase (PIPOX) and alanineglyoxylate transaminase/serine-glyoxylate transaminase/serinepyruvate transaminase (AGXT) were two other unigenes enriched under the 'peroxisome' pathway from the outlier SNP dataset. PIPOX catalyzes the oxidation of sarcosine and l-pipecolate with the production of H 2 O 2 and it was also identified as a PSG in the vent population. AGXT takes part in glyoxylate detoxification and prevents the generation of toxic oxalate from glyoxylate. All three unigenes underwent amino substitutions. Differential gene expression and allele frequency changes of all these oxidoreductase-related genes may suggest different extents of local adaptation to the extreme environmental conditions.
Plastin is involved in the formation of cross-linked actin filaments, which are also responsible for the structural solidity of certain cells (Shinomiya, 2012). Plastin is a protein known for binding metals including Ca and could therefore be a candidate for binding uranium (U) with a low pH of 5.2 (Bucher et al., 2016). If the alteration in plastin occurs in the presence of U, it may contribute to the structural damage previously observed in gill tissues of zebrafish (Barillet et al., 2010). Hu et al. (2015) reported that cold seeps of the northern SCS contained enriched authigenic U in sediments, which may explain its influence on the genetic variation of plastin. Chitin deacetylases may convert chitin into chitosan by enzymatic deacetylation of the amino group of chitin (Raval et al., 2017). Chitin and chitosan both play a role in metal ions absorption (Kurita, 2006). Previous study found metal ions such as Sr 2+ , Mg 2+ , Na + , Ca 2+ , and K + simulate the activity rate of CDA and further inhibited by Co 2+ , Ba 2+ (Chai et al., 2020). Trace metals such as Sr 2+ , Mg 2+ , and Ca 2+ have been detected in the seawater of both the SCS seep and OT vent sites, with Sr 2+ being more abundant in the SCS site than in the OT site (Hu et al., 2015;Miyazaki et al., 2017a,b). Plastin-2 (PLSL) and chitin deacetylase 1 (CDA1) were enrichened under GO term from the outlier SNP dataset.

Sample Size Advantages
Transcriptome sequencing data are able to reveal gene expression patterns in different sample tissues or in samples under different conditions. Nevertheless, an appropriate sample size should be applied for transcriptome-based studies as it will affect the results obtained (Aach and Church, 2001). For example, in a study of gene expression profiling to prognose lymphoma, breast cancer, head cancer, and neck cancer, all data sets showed that predictive inaccuracy was reduced by increasing the sample size (Dunkler et al., 2007). Our empirical experience showed that, based on ∼3 Gb of 150-bp data per sample, at least five samples would be required to detect sufficient DEGs for comparing transcriptomes from two different populations. When only three replicates were used in a simulation with our data, only about half of the DEGs were detected. Moreover, when less data per sample were available (2.3 Gb of 100-bp data) (Cheng et al., 2019), even fewer DEGs (545) could be detected.
Our results are consistent with the results of another study that indicated at least six replicates should be used in order to obtain sufficient DEGs for a study of the treatment effects (Schurch et al., 2016). Meanwhile, our subsampling indicated the need to choose up-to-date methodology in order to detect transcriptomewide SNPs. Our empirical results will thus be useful when planning population genetic studies of not only the squat lobster S. crosnieri, but also other invertebrates.

CONCLUSION
Using transcriptome sequencing, we identified DEGs and PSGs between two Shinkaia crosnieri populations, one from a methane seep in the SCS and another from a hydrothermal vent field in the OT. These genes were found to be involved and enriched in metabolic pathways including oxidoreductase activity and sulfur metabolism that may be important in adaptation to different environmental conditions. Using transcriptome-wide SNP markers, we confirmed clear genetic subdivision between S. crosnieri from the SCS and the OT, and suggested the Luzon Strait as an important physical barrier to the larval dispersal between the two populations. Moreover, our outlier analysis revealed the involvement of genes related to metal ion binding and energy metabolism in shaping the genetic divergence of the two populations. Using a subsampling strategy, we found that at least five replicates were required to capture the full spectrum of DEGs between the two populations. Altogether, our study provide new insights into the genetic structure of S. crosnieri and molecular mechanisms that have enabled it to thrive in both vents and seeps. A broader sampling involving more localities with contrasting environmental conditions including both different temperatures and depths is desired in the future to further improve our understanding of the population dynamics and local adaptation of this widespread species in the Western Pacific.

DATA AVAILABILITY STATEMENT
The datasets generated from this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ncbi.nlm. nih.gov/, PRJNA612951.

ETHICS STATEMENT
The animal study was reviewed and approved by The Ethics Committee of the Hong Kong University of Science and Technology. Written informed consent was obtained from the owners for the participation of their animals in this study.

AUTHOR CONTRIBUTIONS
P-YQ and J-WQ conceived and designed the project. J-WQ, JS, and CC collected the samples. TX and YX extracted the RNA. YX, YHK, and WCW performed the real-time PCR experiments. YW conducted ocean modeling analysis. YX, TX, and JS performed the bioinformatics analyses and drafted the manuscript. All authors contributed to the manuscript and approved it for submission and publication.

ACKNOWLEDGMENTS
We thank the captains and crews of Xiangyanghong 9, R/V Kairei, and R/V Tan Ka Kee, as well as the operation team of the deep-submergence vehicle Jiaolong, ROV KAIKO, and ROV ROPOS for helping us with sample collection during the relevant cruises. Hiroyuki Yamamoto (JAMSTEC) is thanked for serving as the chief scientist of the research cruise KR15-17. The data for physical oceanographic modeling in this study can be found on the http://www.hycom.org data server under the "HYCOM + NCODA Global 1/12 • Reanalysis" link.