Population genetics of zig-zag eel (Mastacembelus armatus) uncover gene flow between an isolated island and the mainland China

Introduction Mastacembelus armatus is a commercially valuable fish, normally distributed in southern China and Southeast Asia. The natural population size of M. armatus is shrinking in recent years because of overfishing and habitat loss. In order to clarify the genetic diversity and differentiation of M. armatus populations, we collected 114 samples from eight populations in southern China and Vietnam and analyzed their population structure using nuclear ribosomal DNA sequences, the concatenated 18S and ITS2 regions. Methods Genomic DNA from the fin clip was extracted and sequenced on an Illumina novaseq 6000 (Illumina, USA) high-throughput sequencing platform in accordance with the manufacturer’s instructions. After assembly and annotation, haplotype diversity, TCS network analysis, AMOVA analysis, population pairwise genetic distances, and UPGMA tree construction were conducted based on the concatenated sequences of 18S and ITS2. Results and discussion In total, eleven nrDNA haplotypes were detected based on the concatenated sequences of 18S and ITS2. Amongst, three haplotypes were the main haplotypes, as representatives of three corresponding Clusters. There were two major Clusters in China, however, the Cluster in Vietnam was significantly divergent from the other two in China, likely due to the lack of river connection between China and Vietnam. Interestingly, based on low FST value, we found that gene flow occurred between the isolated island, Hainan Province, and the mainland China of Guangxi Province, probably as a result of exposed continental shelf connected them during glacial periods. In general, combing our data and literature data, genetic diversity and differentiation of M. armatus populations are relatively high regardless of spatial scale, although their natural population size is declining. This suggests that it is not too late to adopt measures to protect M. armatus, which benefits not only species itself but also the whole ecosystem.


Introduction
Mastacembelus armatus, the common name is zig-zag eel or tire-track spiny eel, is an economically important fish, belonging to the Order Symbranchiformes (Family: Mastacembelidae; Genus: Mastacembelus). Among the four species of the genus, M. armatus is the largest (Serajuddin and Pathak, 2012). It is widely distributed in southern China, mainly in Yangtze River and Pearl River (Xue et al., 2020), and Southeast Asia, such as India, Thailand, Nepal, Vietnam, Sri Lanka, and Pakistan (Hossain et al., 2015;Gupta and Banerjee, 2016;Han et al., 2019). Usually, it inhabits rivers, streams, ponds, beels and inundated fields (Hossain et al., 2015;Gupta and Banerjee, 2016). M. armatus is a carnivorous fish, it prefers to feed on crustacean and insect larvae when young while the adults devour small fish and tadpoles (Hossain et al., 2015). M. armatus is in high demand on the market as it attracts consumers with its delicious taste, no intermuscular spines and high nutritional value (Gupta and Banerjee, 2016;Li et al., 2016;Xue et al., 2020). Besides, the appealing color pattern of M. armatus makes it a popular aquarium fish as well (Gupta and Banerjee, 2016).
However, due to overfishing and habitat loss, the wild population size of M. armatus gradually declines year by year (Hossain et al., 2012;Rahman et al., 2016;Xue, 2018). M. armatus is designated as an endangered species in Bangladesh (IUCN Bangladesh, 2000) and has been classified as least concern by the International Union for Conservation of Nature (IUCN) (IUCN, 2019). In addition, largescale artificial breeding has not been achieved for M. armatus (Jiang, 2018). Therefore, it is urgently needed to clarify the present condition of natural populations of M. armatus, particularly their genetic diversity and structure, thus providing basis for their biological conservation. As a native species in China, M. armatus is assigned as a key protected wild aquatic animal by Fujian, Guangdong, and Hunan provinces, and moreover, a national germplasm resource reserve of M. armatus has been established in Fujian Province (Jiang, 2018). Furthermore, aquaculture of M. armatus has intensified in several provinces of China, greatly facilitating its artificial breeding (Han et al., 2017;Han et al., 2019).

Materials and methods
Sampling M. armatus from seven regions in southern China and one region in Vietnam were sampled in 2021. Sample sets collected from a single region were considered a population. In China, we collected M. armatus from four Provinces. Two populations were sampled from Guangdong Province, three populations from Guangxi Province, one population from Jiangxi Province and one population from Hainan Province (Table 1, Figure 1). What should be noted is that Hainan Province is located in an independent island, spatially separated with the mainland China by Qiongzhou Strait. Additionally, Guangxi Province is geographically adjacent with Vietnam. The Vietnam samples were collected from Guangzhou Lanhai Marine Technology Co., Ltd, Guangzhou city, Guangdong Province, China (latitude: 23.21°N , longitude: 113.47°E). More specifically, we collected 12 and 16 samples in GDHY and GDQY regions from Guangdong Province, respectively; 18, 18, 15 samples in GXBS, GXLZ and GXYL regions from Guangxi Province, respectively; 7 samples in HNHK region from Hainan Province, 13 samples in JXGZ region from Jiangxi Province, 15 samples in YN region from Vietnam. A total of 114 samples from eight populations were collected in this study (Table 1).

DNA extraction and sequencing
A 30-40 mg fin clip was collected and preserved in 95% ethanol at -20°C for later genomic sequencing. Both DNA sequencing and assembly were performed by Science Corporation of Gene (SCGene) Co., Ltd, Guangzhou city, Guangdong Province, China. Total genomic DNA was extracted with a Tissue DNA Kit (OMEGA E.Z.N.A) following the manufacturer's protocol. The quality and quantity of genomic DNAs were determined by 0.8% agarose gel electrophoresis and NanoDrop 2000 spectrometer (Thermo Scientific, Waltham, MA, USA). High-quality genomic DNAs were used to construct a paired-end sequencing library with an insert size of 450 bp. The library was then sequenced on an Illumina novaseq 6000 (Illumina, USA) high-throughput sequencing platform in accordance with the manufacturer's instructions.

Sequence assembly
Adaptors and low-quality reads were filtered using Trimmomatic v0.39 (Bolger et al., 2014), resulting the raw reads, which number were between 3,876,630 and 51,352,359. Paired-end reads of 2 × 150 bp were generated, and the quality threshold was set to Q20. Qualified reads were then compared by BWA (Li and Durbin, 2009) employing setting of 0 match and 0 gap. Afterwards, the obtained reads were assembled using SOAPdenovo (Luo et al., 2012). To verify the correctness of the assembly, assembled whole nrDNA sequences were amplified and sequenced by Sanger sequencing. The annotation of assembled nrDNAs was performed using blastn in NCBI with closely related and well-annotated sequences, manually verified afterwards. Finally, respective region sequences were generated, including 18S, ITS1, 5.8S, ITS2 and 28S.

Data analysis
Standard diversity indices, including number of haplotypes (N h ), haplotype diversity (hd) and nucleotide diversity (p), were calculated using DnaSP v 5.10 (Librado and Rozas, 2009) constructed with PopArt (Clement et al., 2002;Leigh and Bryant, 2015) to investigate genealogical relationships among populations inferred from the concatenated sequences of 18S and ITS2. Hierarchical analysis of molecular variance (AMOVA) was performed to detect genetic variations within and among different regions using Arlequin v 3.5 (AMOVA; Excoffier and Lischer, 2010), with statistical significance determined by 1,000 permutations. To quantify the genetic dissimilarity between populations, population pairwise genetic distances (F ST ) were also calculated by Arlequin v 3.5 (Excoffier and Lischer, 2010). The genetic distance among different populations was used to construct the UPGMA tree in MEGA X (Kumar et al., 2018).

Characteristics of nrDNA
Variations of the length of either whole nrDNA sequences or respective region sequences were slight, see details in Table 2. To be specific, the length of 18S and 5.8S of all individuals were all the same, with 1,840 bp and 154 bp, respectively. Especially, all sequences of 5.8S were completely identical. In addition, there was also very little variation in the length of 28S, with only a 2 bp difference in length. In the 28S alignment of all individuals, 30 variable sites were found, accounting for 0.85%, compared with 3 in 18S (0.16%), 20 in ITS2 (3.37%) and 66 in ITS1 (5.77%). It is worth to note that although the proportion of variable sites in the 28S alignment was low, the majority of variable sites, 21 out of 30, were singletons variable sites. Clearly, ITS1 and ITS2 were regions with greater variability. Furthermore, we found that many indels occurred in the alignment of ITS1, for example, the longest indel was 14 bp in length, located at 1020 nt -1043 nt. The GC content of whole nrDNA in all populations was similar, between 62.5% and 62.7%, showing a high GC content.
More than 5% indels and 5.77% variable sites occurred in the alignment of ITS1, directly affecting the accuracy of subsequent analyses. Additionally, in the alignment of 28S, too many singletons variable sites were detected, which is also thought to negatively impact the molecular analyses, such as phylogenetic analysis and population genetics (Dress et al., 2008;Steenwyk et al., 2020). Besides, the 5.8S sequences of all individuals were the same, we thus decided to use the concatenated sequence of 18S and ITS2 for all later analyses.

Genetic diversity
Eleven nrDNA haplotypes were identified from the concatenated sequences of 18S and ITS2, consistently inferred from DnaSP results and TCS network (Table 1, Figure 2). The overall genetic diversity was relatively high (0.561 ± 0.047) while nucleotide diversity was low (0.00199 ± 0.00028) ( Table 1). The highest haplotype diversity was found in GDHY region in China, followed by HNHK, YN, and GDQY region. Most regions harbored more than one haplotype, except GXLZ and JXGZ region. In a word, we found eight haplotypes in China and three haplotypes in TCS network based on the concatenated sequences of 18S and ITS2 of Mastacembelus armatus. Each tick represents a mutational step. nrDNA haplotypes are named as in Table 1. Circle size is proportional to the haplotype frequency. Vietnam (Table 1, Figure 2), and H1, H2 and H3 were the main haplotypes, making three different Clusters ( Figure 2). Amongst, H1 and H2 were the main haplotypes found in China and H3 in Vietnam. Moreover, in China, the genetic diversity of M. armatus in Guangxi province was the highest (0.462 ± 0.060), consisting of two main haplotypes H1 and H2, while only H1 was detected in Jiangxi Province, predominant H1 in Guangdong Province, and predominant H2 in Hainan Province.
Overall, more males were found in our study. In Vietnam, the number of females exceeded males, compared to Chinese populations with dominant males.

Population structure
In general, AMOVA analyses presented that there was high genetic differentiation in overall populations of M. armatus (FST = 0.882, p < 0.001), and the largest level of genetic differentiation was found among populations (88.19%) ( Table 3). The similar pattern was shown at the province level (FST = 0.796, p < 0.001) and at the country level (FST = 0.886, p < 0.001) as well (Table 3).
Pairwise F ST comparisons revealed that most populations were significantly differentiated (Table 4). Particularly, pronounced differences among different provinces were found in a range of F ST between 0.210 and 0.970, all being statistically significant (Table 5), in agreement with AMOVA result that most variations were from among provinces (79.59%) at the province level (Table 3). In addition, the UPGMA tree demonstrated that all populations were divided into three groups (Figure 3). Cluster I consisted of all populations from Jiangxi and Guangdong Province and two of three populations of Guangxi Province (GXLZ and GXBS). While the other population of Guangxi Province (GXYL) was clustered with the population from Hainan Province, forming Cluster II, consistent with the low F ST value between them, only 0.004. The Vietnam population was a single group, Cluster III. Overall, Cluster I and Cluster II were grouped together, making the Chinese Cluster. Furthermore, the genetic differentiation of Clusters between China and Vietnam was also distinctive, characterized with different nrDNA haplotypes (Table 1, Figure 2) and distinct phylogenetic Clusters (Figure 3), as well as shown in the AMOVA analysis, with 88.59% variances from among countries (Table 3).

Discussion Genetic diversity and population structure
Previous population studies (Wang et al., 2012;Zou, 2013;Chen, 2014;Yang et al., 2016;Lin, 2017;Jiang, 2018;Gao et al., 2022) all showed high genetic diversity of M. armatus populations in China, but the haplotype diversity in our result (hd=0.434) is lower than that in other population studies, for example, Wang et al. (2012), Jiang (Jiang, 2018) and Gao et al. (2022) reported the haplotype diversity in China was 0.965, 0.768 and 0.895, respectively. This may be due to different markers employed and the small sample size in our study compared to other studies. Additionally, the overall genetic diversity of M. armatus populations in this study was over 0.5, showing a relative high diversity (Grant and Bowen, 1998). In general, combing our data and literature data, the genetic diversity of M. armatus maintains at a relatively high level, although the size of natural population is declining as reported (Hossain et al., 2012;Rahman et al., 2016;Xue, 2018). Furthermore, genetic differentiation between most populations is pronounced, regardless of spatial scale. Guangxi Province harbored the highest genetic diversity. Amongst, GXLZ population and GXBS population were grouped together, in agreement with low F ST value between them. GXYL population, nevertheless, was in a distinct cluster, either shown in TCS network or UPGMA tree. This suggests not only high genetic diversity but also high genetic differentiation among Guangxi Province populations. Surprisingly, GXYL population was clustered together with HNHK population. Further, the low F ST value between Hainan Province and the mainland means that gene flow occurred between them, despite the fact that Hainan Province is an isolated island and separated from the mainland by sea waters. In contrast, most Chinese population studies revealed that the population from Hainan Province was genetically partitioned with the mainland populations (Yang et al., 2016;Lin, 2017;Jiang, 2018). Gene flow between the population of Hainan Province and the mainland, revealed by nuclear makers in our study, is congruent with the latest research of Gao et al. (2022) discovered by mitochondrial markers. This gene flow is probably as a consequence of geological changes during glacial periods, when the exposed continental shelf connected Hainan and the mainland China (Sun et al., 2000;Voris, 2000). However, Guangxi Province and Vietnam possess completely different nrDNA haplotypes, despite their proximity. This is mainly because that there are no rivers connecting Guangxi Province and Vietnam.

Sex ratio
It is well known that the population size is closely related with sex ratio, and an unbalanced sex ratio may significantly reduce the effective size of populations (Dubreuil et al., 2010). We checked the sex of each individual after sampling, and we found that all sampling sites in China are male dominated, except GDQY. However, Vietnam is female dominated. In fact, the sex ratio of M. armatus in the nature is on debate. One research reported female dominance trend (Panikkar et al., 2013), while the other one showed an equal proportion of male and female (Serajuddin and Pathak, 2012). We need more natural population data to clarify the sex ratio of M. armatus in the future, in order to provide more references for the further biological conservation. In addition, we discovered that the sex ratio of M. armatus is very unbalanced during the artificial breeding process, characterized with significant female dominance. For example, the proportion of females can reach 86.33% in the report of Xue et al. (2021b). This suggests that there are differences and difficulties in sexual differentiation of M. armatus, which may also occur in nature. At present, although some  (Xue et al., 2020;Xue et al., 2021b) and Y chromosome differential (Xue et al., 2021a). The underlying sex determination mechanisms are still unclear, which also make challenges to investigate the sex ratio of M. armatus.

Biological conservation
The biological conservation of species heavily depends on their genetic diversity. Understanding intraspecific genetic diversity and differentiation can help us take scientific and effective measures to protect threatened or endangered animals. Combing the results of our study with other previous studies, we found that the present genetic diversity of M. armatus populations is not low, indicating that it is not too late to take action to protect them, so that their genetic diversity can remain high. For example, precisely constructing more reserves for M. armatus based on the distribution of different genotypes/haplotypes, strengthening the investigation and protection of M. armatus in the isolated island, Hainan Province, which may provide crucial clues for their expansion, and exploring artificial breeding techniques to better conserve the germplasm resource.

Conclusion
In conclusion, we researched the genetic diversity and population structure of M. armatus in China and Vietnam, based on nuclear ribosomal DNA markers. The genetic diversity and differentiation of M. armatus populations were at a relatively high level according to the data from this study and previous studies. Three Clusters were classified according to the genetic distance between each population, characterized with two Clusters in China and a distinct Cluster in Vietnam. In particular, we found that gene flow occurred between an isolated island and the mainland China based on the low F ST value between them.

Data availability statement
The original contributions presented in the study are publicly available. This data can be found here: NCBI, OP847241 -OP847354.

Ethics statement
The study was approved by the Laboratory Animal Ethics Committee of Pearl River Fisheries Research Institute, CAFS (number: LAEC-PRFRI-20201219).