Intraspecies Genomic Divergence of a Fig Wasp Species Is Due to Geographical Barrier and Adaptation

Understanding how intraspecies divergence results in speciation has great importance for our knowledge of evolutionary biology. Here we applied population genomics approaches to a fig wasp species (Valisia javana complex sp 1) to reveal its intraspecies differentiation and the underlying evolutionary dynamics. With re-sequencing data, we prove the Hainan Island population (DA) of sp1 genetically differ from the continental ones, then reveal the differed divergence pattern. DA has reduced SNP diversity but a higher proportion of population-specific structural variations (SVs), implying a restricted gene exchange. Based on SNPs, 32 differentiated islands containing 204 genes were detected, along with 1,532 population-specific SVs of DA overlapping 4,141 genes. The gene ontology (GO) enrichment analysis performed on differentiated islands linked to three significant GO terms on a basic metabolism process, with most of the genes failing to enrich. In contrast, population-specific SVs contributed more to the adaptation than the SNPs by linking to 59 terms that are crucial for wasp speciation, such as host reorganization and development regulation. In addition, the generalized dissimilarity modeling confirms the importance of environment difference on the genetic divergence within sp1. Hence, we assume the genetic divergence between DA and the continent due to not only the strait as a geographic barrier, but also adaptation. We reconstruct the demographic history within sp1. DA shares a similar population history with the nearby continental population, suggesting an incomplete divergence. Summarily, our results reveal how geographic barriers and adaptation both influence the genetic divergence at population-level, thereby increasing our knowledge on the potential speciation of non-model organisms.


INTRODUCTION
The origination of new species usually starts from intraspecies diversity and population subdivision. The division of one population into several descendances generally occurs. Several hypotheses were proposed to explain the evolutionary dynamics driving the descendant populations into new species such as adaptation (Martin et al., 2013), chromosomal structure variant (Korunes et al., 2021), and absence of gene flow (Hill et al., 2021). However, the intraspecies divergence, which is the very beginning of speciation, has received much less attention than fully separated species.
Fig trees (Ficus, Moraceae) and fig wasp (Hymenoptera: Chalcidoidea: Agaonidae) form a specialized co-evolution system, containing the most complex speciation processes (Herre et al., 2008;Bronstein, 2009;Cruaud et al., 2012). The host tree, on the other hand, provides their ovules as food for the larvae and urn-shaped inflorescences (figs) as a mating environment for the adults. When the new wasp generation finish mating, they are ready to disperse. The stamens mature simultaneously and stain the pollen on the wasps when they leave. Thus, the wasps transfer pollen to other trees when entering figs. For successful oviposition, wasps must specifically recognize the volatile organic compounds (VOCs) of the host, and match the narrow entrance (i.e., ostiole) of enclosed fig (Berg, 1989;Bronstein and McKey, 1989;Borges, 2015;Chen Y. et al., 2016). Hence, morphologically distinguished figs are usually pollinated by distinct genera (Ramirez, 1970;Cruaud et al., 2012). For the fig tree, successful pollination depends on the strictly consistent life cycle with the wasp (Bronstein, 1988;Anstett et al., 1997;Cook and Rasplus, 2003). Therefore,  (Weiblen, 2002;Ronsted et al., 2005). However, recently studies have proven that the speciation of wasps seems out of step with the host. Multiple pollinator wasps were found co-existing in a single host species (Darwell et al., 2014;Yang et al., 2015;Darwell and Cook, 2017;Yu et al., 2019). Those wasps are genetically and morphologically distinct. The phylogeny among those species is complex, including populations under different phases of speciation and fully separated new species (Liu et al., 2013;Tian et al., 2015;Souto-Vilaros et al., 2019). One explanation is that the insects have much shorter longevity than trees (Petit and Hampe, 2006;Thomas et al., 2010). For instance, when a reduce of gene flow has led to divergence in the wasp after many generations, the host tree population is still probably in one generation without any genetic differentiation. The adaptation to different niches is an alternative hypothesis for the rapid speciation (Darwell and Cook, 2017), accompanied by others such as geographical isolation (Liu et al., 2013), hybridization (Renoult et al., 2009), host switching (Cruaud et al., 2012;Wang et al., 2021), and pollinator sharing (Bernard et al., 2020).
Ficus hirta Vahl. is a wide-spread species with the most varied pollinator wasps, Valisia javana Mayr (Wiebes, 1993;Yu et al., 2019). Recently, analysis based on neutral gene markers and microsatellites proved V. javana to be a complex including nine cryptic species (referred as sp1-sp9) . The sp1 is also in the process of divergence: the population on Hainan Island, China presents genetic differences from those of southeast Asia (Tian et al., 2015;Yu et al., 2019). However, whole-genome level analysis on fig wasps, especially on V. javana complex, is still rare. The previous studies are based on relatively few genetic markers, insufficient for revealing genome-wide pattern and detecting the evolutionary dynamics within sp1.
Here, we take sp1 of the V. javana complex as an example, and combine both SNPs and SVs to detect its intraspecies genetic divergence. To be specific, we ask: (1) do the Hainan and continental populations of sp1, differ genetically at the wholegenome level and (2) whether the variation patterns differ among populations? Hainan island is divided from the continent by Qiongzhou Strait and is under a tropical maritime climate, while the continental habitats of sp1 are mainly under a subtropical continental monsoon climate. Hence, we have interest in the effect of the physical barrier caused by Qiongzhou Strait and local adaptation caused by environmental difference. Finally, we reconstruct the population-level demographic history within sp1 to realize the divergent process.  Figure 1A is the map of sampling locations and Table 1 provides details. Three locations were included. One is from Hainan Island (DA) and two are from the continent (VH and SCBG). The distances between each two sampling locations are 822 km (SCBG-VH), 498 km (SCBG-DA), and 532 km (VH-DA). In each location, figs were collected from trees 3-5 m away from each other. We chose 10-30 figs from each site and sealed them individually in fine-mesh bags under ambient temperature until the mature wasps emerge. The sp1 wasps were identified and collected, then preserved in 95% ethanol under −20 • C until DNA extraction. The DNA pool of each sample were created for all mature female wasps in a single fig by mixing extraction of genomic DNA. We finally obtained 26 samples for DNA extraction. Genomic DNA was extracted following the method outlined in a previous study (Tian et al., 2015). The genomic DNA was fragmented with the S220 Ultrasonic Processor (Covaris, United States). After end repairing and phosphorylating, common adapters were ligated to the fragments for denaturing and amplifying. The prepared libraries were sequenced with the Illumina HiSeq2500 (Illumina,

Calling and Filter of Single-Nucleotide Polymorphisms
Reads were evaluated with FastQC 1.0.0 (Andrews, 2010). Adapter sequences in reads were trimmed. Reads containing > 10% N bases or Q no more than 5 were removed. High-quality reads were mapped with BWA-MEM (Li and Durbin, 2011) to the sp1 genome under accession number GWHBDGE00000000, National Genomics Data Center. After being sorted and indexed by SAMtools (Li et al., 2009) (Danecek et al., 2011) with parameters as follows: 10 < read depths < 100, max missing < 0.5, and minor allele frequency > 0.03. All SNPs located within 3 bp of an InDel were removed. Finally, SNPs of each population were merged for the following analysis.

Genetic Structure Within sp1
To detect the intraspecies structure within sp1, we performed principal component analysis (PCA) and maximum likelihood (ML) tree based on SNPs. The PCA was conducted by R package SNPRelate (Zheng et al., 2012). PCA result was visualized with R package ggplot2 (Wickham, 2016). The phylogenetic tree was constructed with 1,000 bootstraps by iQtree (Nguyen et al., 2015). Command "-m MFP" was set for automatic detecting of the optimal model of nucleotide substitution. The bootstrap value was adjusted by command "-bnni." The final tree was visualized by iTOL version 5 (Letunic and Bork, 2021). Only SNPs with minor allele frequencies more than 0.05 and with max missing less than 0.5 were used for PCA. Only SNPs at fourfold degenerated sites (4DTv) were used for ML tree. The 4DTv were then filtered by minor allele frequencies more than 0.02 and with max missing less than 0.5. To test the effects of filtering threshold setting, we ran multiple filtering with alternative parameters as follows: max missing < 0.5 or 0.7; without Hardy-Weinberg equilibrium (HWE) setting or HWE with p-value > 0.0001, > 0.001, > 0.1, and > 0.5; minor allele frequencies >0.02, 0.03, and 0.05; and minimum distance between any two sites >1000 and >5000. All filtering was performed with VCFtools 0.1.16 (Danecek et al., 2011).
We also estimated the linkage disequilibrium (LD) decay with PopLDdecay (Zhang et al., 2019) in the three populations. Only SNPs were included. To compare the diversity pattern among populations, we firstly performed a Shapiro-Wilk test (Shapiro and Wilk, 1965) on the data for normality in frequentist statistics. Based on the results, the unpaired two-samples Wilcoxon ranksum test (Wilcoxon, 1946) was applicated among π and Tajima's D of each population, and among the Fst of each combination.
Wilcoxon rank-sum test is a method to inspect whether the samples are from two populations with different medians when their distributions are not normal. The tests were performed respectively with "shapiro.test()" and "wilcox.test()" in R 3.6.3 (R Core Team, 2020).

Identification and Gene Ontology Enrichment of Differentiated Islands
To identify the differentiated islands between Hainan (DA) and continental populations (VH and SCBG), the continental ones were merged as a single mainland population. Then we calculated the Fst (Weir and Cockerham, 1984) between DA and mainland in 100-kb non-overlap window with VCFtools (Danecek et al., 2011). The results were Z-transformed with R 3.6.3 (R Core Team, 2020). In the present study, highly differentiated islands were defined as windows with Fst three standard deviations from the mean (i.e., Z-Fst > 3). To assess the signatures of selection, we compared π and Tajima's D between the islands and background with the same methods in 2.4. Genes of every identified island were assigned gene ontology (GO) enrichment analyses by Ontologize 1.0 (Bauer et al., 2008). The GO terms were inferred from the annotation of sp1, the same reference genome as reads mapping. GO terms with p < 0.05 were considered significant.

Demography
MSMC2 (Schiffels and Durbin, 2014;Malaspinas et al., 2016) was employed to reconstruct the historical change of effective population size (Ne). MSMC2 is capable of estimating the continuous change of Ne over time by a Markovian approximation of the coalescent-with-recombination process. The input SNPs were sorted and aligned by TASSEL (Bradbury et al., 2007). SNPs with minimum frequency below 0.05 were filtered. Then Beagle 5.2 (Browning et al., 2018) was used for genotype phasing. Nasonia (Hymenoptera: Chalcidoidea: Pteromalidae) is a well-studied genus of Chalcidoidea and phylogenetically close to Valisia (King, 1961;Wylie, 1965;Werren et al., 2010). The substitution rate at silent sites of Nasonia species was about 1.4 × 10 −8 per site per year (Oliveira et al., 2008). Since mutation includes not only substitutions but more types, the mutation rate is at least twice higher than the substitution rate (Press et al., 2019). Thus, in the present study, the mutation rate for sp1 was estimated as 2.8 × 10 −8 per site per year. The generation time was 0.25 per year according to our field observation. The longest seventh contigs were used. MSMC2 estimation was carried out under the default parameters. Four samples from each population were chosen randomly per run. Ten independent runs were performed separately for each population.

Calling, Statistic and Annotation of Structural Variations
Since translocation (TRA) may involve more than one contig, short-read data is not suitable for translocation calling. Hence in the present study, SVs only contained deletions ( (Rausch et al., 2012). To reveal the divergence pattern of SVs among populations, we extracted the population-specific SVs by Bedtools 2.25.0 (Quinlan and Hall, 2010), then annotated their distribution regions with snpEff 4.3 (Cingolani et al., 2012). GO enrichment analysis was performed on genes overlapping with every population-specific SV by the same methods as described above.

Contributions of Environment vs. Geographic Distance
For estimating the effects of environment and geographic distance on the genetic distance, we applied the generalized dissimilarity modeling (GDM) (Ferrier et al., 2007) with R package GDM (Manion et al., 2016). The sp1 was recently identified as a separated species within V. javana complex . Hence, the habitat records of V. javana is too vague for predicting the habitats of sp1. To expand the sampling area, we inducted 13 reliable sampling sites from previous studies (Tian et al., 2015;Yu et al., 2019;Supplementary Table 1). Wasps of the 13 sites were identified as sp1 by COI gene markers. Because of the low nucleotide diversity within sp1, the pairwise distance is barely distinguished among sites. Hence, we constructed the genetic matrix with Nei's D (Nei, 1972), which is a standardized genetic distance ranging from 0 to 1. Furthermore, for reducing the effect of the similarity of sequences from the same population, one COI sequence was randomly chosen from each site. Three random samplings of COI sequences were conducted. We then separately calculated Nei's D based on each random sampling with "nei.dist()" in R package poppr (Kamvar et al., 2014). Then the 19 Bioclim variables at 2.5 m resolution (Hijmans et al., 2005;Fick and Hijmans, 2017) of the 13 sites were obtained. According to Myers et al. (2019), nine of the 19 variables have a low correlation between variables (<0.7). Those are BIO1 (Annual Mean Temperature), BIO2 (Mean Diurnal Range), BIO3 (Isothermality), BIO4 (Temperature Seasonality), BIO8 (Mean Temperature of Wettest Quarter), BIO9 (Mean Temperature of Driest Quarter), BIO12 (Annual Precipitation), BIO14 (Precipitation of Driest Month), and BIO15 (Precipitation Seasonality). We thus only used the nine variables to present environmental differences. In the GDM, we set the genetic distance as the response variable, and respectively set three independent variables: (1) environment differences and geographic distance, (2) environment differences only, and (3) geographic distance only. The most fitted model and the variable importance were quantified with "gdm.varImp()" in the GDM package. The "gdm.varImp()" returns percent deviance were explained and the p-value was fitted for each model. The model with p < 0.01 and highest explained percent deviance was considered as the most suitable for estimating the variable importance and variable significance. We further used MaxEnt 3.4.1 (Phillips et al., 2006) to evaluate the most important environment variables based on all the 19 Bioclim variables. MaxEnt is capable of parsing the contributions of each variable regardless of their correlation (Elith et al., 2011;Myers et al., 2019). The analysis was set with 10 replicates, with 25% of samples as the training data. Receiver operating characteristic curve (ROC) (or area under curve, AUC) and jackknife method were used to measure fitness and variable importance, respectively.

Sequencing, Mapping, and Single-Nucleotide Polymorphisms Calling
A total of 45.8 Gb clean data and 30.3 Gb filtered BAM files were generated from 26 samples. The average sequencing depth is 12.82× and the average alignment to the reference genome is 97.01% (Supplementary Table 2

Intraspecies Genetic Structure of Sp 1
We conducted a PCA and an ML tree based on SNPs to explore the genome-wide genetic structure within sp1. See Materials and Methods for parameters of SNP filtering. The PCA proves that DA was clearly separated from the mainland populations (VH and SCBG), even with overlap on Comp 1 (15.87%) and Comp 2 (6.42%) (Figure 1B). On the ML tree (Figure 1C), all samples fell into two distinct clades in which DA forms a monophyletic one. VH and SCBG do not form clear sub-clades separated from each other. Therefore, our results are consistent with previous studies that showed that genetical divergence exists between DA and the continental populations. We noticed that MAF and HWE (>0.0001 or >0.001) barely affected the filtering results let alone the genetic structure results. Different max missing, sites distance, and severe HWE settings presented similar results (Supplementary Figure 1).

Genetic Diversity
To compare the diversity pattern among the three populations, the Tajima's D and π of each population were calculated on 100kb non-overlap windows. The LD decay was variable among the three populations (Supplementary Figure 2). DA has the highest decay ratio while the continental populations present strong linkage. The distributions of π and Tajima's D of each population and Fst of each combination were all non-normal according to the Shapiro-Wilk test (Supplementary Table 3). Therefore, the Wilcoxon rank-sum test was performed for comparing their medians (Supplementary Table 4). Tajima's D of both continental populations is significantly higher than DA with p < 0.01 (Figure 2A). The π presents a similar pattern with D ( Figure 2B). Hence, we assume that the polymorphism in DA is significantly reduced. The median of Fst scores between DA and SCBG is significantly higher (p < 0.05) than that between DA and VH, and both of them are significantly higher (p < 0.01) than the one between VH and SCBG ( Figure 2C). The Fst result confirms the clear genetic structure between DA and the continental populations, while the continental populations barely differ. The upper quartiles of Fst between DA and the continental populations are below 0.2 (Figure 2C), indicating the genetic differentiation is still moderate due to incomplete block of gene flow or recent divergence.

Identification and Gene Ontology Enrichment of Differentiated Islands
A total of 32 differentiated islands with Z-Fst > 3 were identified between DA and the mainland (Supplementary Figure 3 and Supplementary Table 5). Continuous islands were merged into a single one. The distribution of π and Tajima's D within or outside islands are all non-normal, except the Tajima's D of islands in DA (Supplementary Table 3). Hence, we still applied the Wilcoxon rank-sum test for comparison. The differentiated islands of DA have significantly lower π and Tajima's D (Figures 3A,B and Supplementary Table 4), with p < 0.05 and < 0.01, respectively. In the mainland, π of differentiated islands is significantly reduced (p < 0.05) but the Tajima's D shows no significant difference (Figures 3A,B). The GO enrichment analysis of the 32 windows detected 204 genes, but only linked to three significant GO terms with p < 0.05. These 3 GO terms are sodium:potassium-exchanging ATPase complex (GO:0005890), cation-transporting ATPase complex (GO:0090533), and glycerol-3-phosphate catabolic process (GO:0046168). Each of the GO term contains three study terms, which means that most of the 204 genes fail to be enriched by any term.

Demography
To explore whether the demographic history differs between island and continental populations, we estimated change of the effective population size (Ne) along time. Due to the very short generation time (0.25 year) and rapid mutation rate (2.8 × 10 −8 per site per year), the estimation of demographic history can only be traced back 4,000 generations (1,000 years) at most. According to the result, three populations had declined 1,000 years ago (ya) then expanded lately (Figure 4). The Ne of VH has been always higher than the other two populations with a recent explosion. Nevertheless, the demography of SCBG and DA has been similar. Thus, the recent demographic history between the Hainan and continental populations, at least between DA and SCBG, has not separated.

Statistics and Annotation of Structural Variations
We obtained a total of 3,244 SVs and 1,881 population-specific SVs ( Table 1). DA has the most SV (1,811), exponentially higher than VH (584) and SCBG (849). The number of populationspecific SV in DA (1,532) are also multiples higher than the two continental populations (VH: 311; SCBG: 488). As a result, DA has the highest population-specific SV proportion (84.59% vs. VH: 53.25% and SCBG: 57.48%). Hence, in addition to SNPs, SVs contribute to the genetic difference of DA as well. All three populations showed a similar pattern on the type of SVs ( Figure 5A). In every population, DEL takes the highest proportion. The population-specific SVs are mostly located at inter-gene regions and introns ( Figure 5B). In VH and DA, an unusually high proportion of SV is on splice site donner and transcript, respectively. The results imply that the population-specific SVs may lead to transcriptomic differences in VH and DA. We failed to detect any gene that overlaps the population-specific SVs of VH or SCBG. 4,141 genes were discovered overlapping the population-specific SVs of DA. The GO enrichment analysis of those 4,141 genes enriched to 59 significant GO terms (p < 0.05) (Figure 6). The GO terms, such as signaling pathway, cell communication, or stimulus response, are important for specific host cognization. Besides, GO terms linked to development regulation imply a flexible life cycle.

Contributions of Environment vs. Geographic Distance
To compare the effects of environment and geographic distance, and evaluate the most important environment variables, we applied the GDM to estimate the correlation between genetic distance and environment difference and/or geographic distance. We performed three random samples from the COI sequences. In each run, one sequence was selected from each site. We then separately calculated the Nei's D as response variable. The GDM results based on the three samples (random 1-3) were presented in Table 2. Three GDM analyses returned a similar conclusion, in which models involving both environment and geographic distance as potential predictor variables explained the most deviance (52.590-57.894%). When only the environment was included, the predictor variables still explained almost half of the deviance (49.778-53.433%). However, when only geographic distance included, the predictor variables explained 11.693-13.791% of the deviance, or even failed to predict response variable. The result confirmed the findings of previous studies (Tian et al., 2015;Yu et al., 2019), which detected the lack of genetic isolation by distance within the mainland populations of sp1. The most important Bioclim variable was either BIO8 (Mean Temperature of Wettest Quarter) or BIO9 (Mean Temperature of Driest Quarter); both present the synchronization of temperature and humidity. Therefore, we assume that the genetic distance in sp1 is due to environment, implying differentiation driven by adaptation. Heat stress accompanied by dryness stress could be a crucial influence. We further estimated the importance of all 19 Bioclim variables. The AUC values of the result are not excellent but acceptable (training AUC: 0.8231; test AUC: 0.5897), due to the strapped number of sampling sites. According to the jackknife method, BIO5 (Max Temperature of Warmest Month, Percent contribution = 50.613%) contributes the most to the distribution of sp1, while the model highly depends on BIO7 (Temperature Annual Range, permutation importance = 54.902%). The result is consistent with GDM, both of which predict the importance of heat stress.

Genome-Wide Patterns of Divergence Within sp1
The potential divergence between the Hainan and the continental populations of sp1 was hinted by neutral gene markers (Tian et al., 2015;Yu et al., 2019). Our results confirm the genetic structure within sp1 at genome-level, suggesting the Hainan population (DA) has been genetically differing from the continental populations (VH and SCBG). Furthermore, we reveal the divergence pattern across the whole genome.
The PCA ( Figure 1B) and ML tree ( Figure 1C) based on resequencing data both confirm the previous pattern. In addition, the Fst between DA and either VH or SCBG is significantly higher than that between the two continental populations ( Figure 2C). All those results indicate a larger genetic distance between DA and continental populations than within the continental ones. We additionally proved that both SNPs and SVs contributed to the genetic difference of DA. The SNP number (Table 1), Tajima's D (Figure 2A), and π ( Figure 2B) of DA are immensely lower than the continental populations. The population-specific SVs of DA, however, are larger in number and higher in proportion than the other two populations (Table 1). Thus, the genetic difference of DA is caused by a decrease on SNP polymorphism and a differentiation on chromosome structure. The LD decayed rapidly in DA but maintained strong linkage in both of the continental populations. Such a phenomenon could be due to the influence of a high proportion of SVs. Moreover, the pattern of variant localization is heterogeneous across the genome but differs among populations. Firstly, we detected 32 differentiated islands with Z-Fst > 3 based on the SNP. Those differentiated islands have significantly reduced diversity than the genomic background (Figure 3 and Supplementary Table 4), and cluster on a few contigs (Supplementary Figure 3 and Supplementary  Table 5). Secondly, the proportion of SVs varies among genomic regions ( Figure 5B). SVs are significant components of genomic variants and affect large genomic regions (Feuk et al., 2006;Wellenreuther et al., 2019;Ho et al., 2020). The present study discovered a great number of population-specific SVs in DA (1,532), which is times higher than in VH (311) or SCBG (488). DA also presents a higher proportion of SVs in the transcript regions than the others. Therefore, SVs cause divergence not only in genome but also in transcriptome. We detected 4,141 genes overlapping the population-specific SVs of DA, more than the 204 genes detected in the differentiated islands based on SNPs. Furthermore, the functions of those 204 genes are still vague, indicating that they are probably hypothetical genes without thorough study. One SV may overlaps a significant amount of genes and lead to enormous genetic diversity in the process of evolution (Bertolotti et al., 2020;Weissensteiner et al., 2020) and domestication (Liu et al., 2019;Kou et al., 2020). We assume that the divergence between DA and the continental populations is more likely caused by SVs than SNPs.
Our results show some differences from the previous studies. According to mitochondrial gene COI, the Tajima's Ds of all sp1 populations are significantly negative (Tian et al., 2015). In the present study, however, the whole-genome data reveal an overall positive D (Figure 2A). A probable explanation is that the evolution rate of cytoplasmic genome is much slower than nuclear genome. Therefore, gene markers surely have fewer variants than the nuclear genome.

Geographical Barrier Contributes to the Genetic Divergence
The severe reduced SNP polymorphism and high proportion of population-specific SV in DA represent a signature of gene flow restriction from the continental populations. The Qiongzhou Strait may provide a geographical barrier. Qiongzhou Strait separates Hainan Island from Southeast China, which is about 30 km wide and opened nearly 8,000 years ago (Zhao et al., 2007;Ni et al., 2014). In the pollinator wasp, the lack of genetic structure across a large distance on continent was reported, such as Valisia (Tian et al., 2015) and Wiebesia species (Liu et al., 2015) across Southeast China, or Ceratosolen fusciceps from Southeast China to Thailand (Kobmoo et al., 2010). However, population fragmentation also occurs when their habitats are islands. The signatures of isolation-by-distance was discovered in a Wiebesia species among the islands of Zhoushan Archipelago (Liu et al., 2013), even though the same species are genetically homogeneous over a distance of 1,000 km across Southeast China (Liu et al., 2015). It is thus reasonable to hypothesize that for pollinator wasp, the dispersal capability might be weakened by water bodies, or limited by islands. The dispersal of wasps totally depends on the adult female. After emerging from figs, an adult female wasp only survives for 1-2 days and is vulnerable to abiotic stresses. For them, high temperature and dryness are lethal while moisture is a relief (Ware and Compton, 1994;Dunn et al., 2008;Jevanandam et al., 2013;Gigante et al., 2020). Thus, the dispersal of wasps could be limited by the drastically changing weather above the Qiongzhou Strait, which is open to direct sunlight, and often experiences high temperatures and strong winds. The host of sp1, F. hirta, is a dioecious shrub existing in secondary jungle and the edge of forests. Pollinator wasps of such a host prefer the understory with gentle air (Harrison and Rasplus, 2006;Chen et al., 2011). It is reasonable to assume that the sensitivity and/or preference on habitats delimit the dispersal of wasp, and then restrict the gene flow. Previous studies (Tian et al., 2015;Yu et al., 2019) and our own (Table 2) all confirm the limited influence of the geographic distance. To be clear, the Qiongzhou Strait contributes to the gene flow barrier not because of its width but its unfavorable weather, suggesting that adaptation may also be an underlying drive.

The Role of Adaptation in the Genetic Divergence
The climate of Hainan Island is a tropical maritime climate while the habitats of sp1 continental populations are mainly under subtropical continental monsoon climate. As we have discussed before, in the absence of moisture, heat and dryness lead to high mortality of fig wasp (Dunn et al., 2008;Jevanandam et al., 2013;Gigante et al., 2020). Our own results also prove that the synchronization of temperature and humidity is the most important influence underlying the genetic divergence within sp1 ( Table 2). Furthermore, heat and dryness are major restrictions of sp1 distribution (See "Contributions of Environment vs. Geographic Distance"). Therefore, it is reasonable to say that local adaptation contributes to the genetic structure between Hainan and the mainland population of sp1. In addition, different from the densely populated continent, Hainan Island is a biodiversity hotspot, still maintaining anatural tropical rainforest (Li, 1995;Luo et al., 2020). Human activity such as industrial pollution or pesticide may also cause a strong influence, for example, the industrial melanism of moth (Cook et al., 1970;Cook and Saccheri, 2013).
The differentiated islands in DA as well as islands in the mainland show decreased diversity. The results of GO enrichment on the differentiated islands also failed to linked to any genes that maybe related with speciation (See "Identification and Gene Ontology Enrichment of Differentiated Islands"). Hence, we assume that background selection before population dividing is a more likely causation. The GO enrichment analyses performed on the genes overlapping population-specific SVs, however, presented a more informative result. No gene was detected in the population-specific SVs of VH or SCBG. On the contrary, enormously higher number of genes (4,141) overlap the population-specific SVs of DA. We then performed GO enrichment analysis on those 4,141 genes and obtained 59 significant GO terms with p < 0.05 (Figure 6). It is worth noting that most of the 59 GO terms link to regulation of signaling pathway and external stimulus response. Pollinator wasps recognize the host through VOCs produced by figs (Weiblen, 2002;Chen Y. et al., 2016). Geographic divergence of VOCs was reported in F. hirta, the host of sp1 (Deng et al., 2021). The floral odors of F. hirta in Hainan clearly distinguished from those in the continent. The differences that occur in hosts could also drive the adaptation of pollinator wasps. For example, Ficus tikoua presents morphological differences of figs along its habitat (Deng et al., 2016). Its pollinator wasps show a genetic structure that loosely corresponds to the distribution of those morphologically different figs (Deng et al., 2020). Hence, the intra-species divergence of sp1 is probably a consequence of the co-evolution with its host. The 59 GO terms also link to the developmental process, indicating that DA may present difference in Besides the divergence of genome, the differences on transcription and alternative splicing may result in the local adaptation of each population as well. The distribution of SVs' localization differs among populations ( Figure 5B). VH had an unusually higher proportion of population-specific SVs on the donor of splice sites and DA on the transcript region. Therefore, studies on transcriptome would provide more information. In brief, we confirm that local adaptation is an important force underlying the genetic divergence between DA and the continental populations. For adapting the divergence of its host population and the tropic environment, the Hainan wasp population may evolve special mechanisms for host recognition and life cycle regulation. SVs contribute more than SNPs to the genetic basis of adaptation.
Structural variation is an important source of materials for evolution (Frazer et al., 2009;Weissensteiner et al., 2020). Analyses based on SVs are still rare in insects. Most studies focus on the well-studied model species, Drosophila melanogaster (Dopman and Hartl, 2007;Emerson et al., 2008;Zichner et al., 2013;Long et al., 2018). Those works collected SVs across D. melanogaster genome, and calculated the SV types. In D. melanogaster, deletions also take the largest part of SVs. Long et al. (2018) annotated the distribution of SVs in exons and pointed out that the types and numbers of SVs highly differed among populations as well as in sp1. Hence, SVs contribute to the genetic divergence within D. melanogaster. A recent preprint study on Manduca sexta sheds light on the importance of a single inversion which enhances adaptation of the Z chromosome (Mongue and Kawahara, 2020). Generally speaking, SVs attract much less attention than SNPs. The present study highlights its importance in local adaptation. Nevertheless, it is still challenging to precisely annotate SVs with short-read sequencing data (Tattini et al., 2015). Complex SVs such as TRA are excluded in our study and its effect on the intraspecies divergence of Sp1 unfortunately remains unknown. The high proportion of DEL in SVs seems unusual in sp1 and D. melanogaster. Transposon activity is a possible explanation, but more accurate calling of SVs is required for transposon annotation. Further, the methods for downstream analysis for SV are much less than for SNP. Suitable methods for SV are urgently required.

The Demographic Reconstruction Within Sp 1
The reconstruction of a demographic history for fig wasp is intractable due to their relatively short adult longevity and life cycle. The varied mutation rates along the genome complicate the demographic estimation even more. Thirdly, frequent mutation within the population reduces the robustness of its demography reconstruction when repeated sampling is applied. Most important, local extinction-recolonization usually occurs in insects (Harrison, 2000;Devoto et al., 2005;Liu et al., 2013;Lever et al., 2014). The factors affecting fig production could also affect the survival of wasps. For example, high temperature and dryness are not only lethal for adult wasp, but also cause stress for fig trees. The fig trees in Borneo failed to produce figs during a drought, which then caused local extinction of their pollinator wasp (Harrison, 2001). The demography of fig wasp is therefore remarkably changeable even in a very recent period. The demographic studies of wasps usually based on field observation in a relatively short duration focus on the yearly fluctuations of demographic factors (e.g., number, daily survival rates, and sex ratio) (Forouzan et al., 2013;Asadi et al., 2019). For example, the 3-year field experiments (1997)(1998)(1999) in a social wasp species (Polistes fuscatus) (Nadeau and Stamp, 2003), and the observation of a primitively eusocial wasp (Ropalidia fasciata), in Japan began in 1983 (Ito, 1996;Ito and Kasuya, 2005). Some other studies estimated a rough demographic history based on a few highly conservative gene markers (Wang et al., 2013;Tian et al., 2015;Yu et al., 2019). To our knowledge, the present study is the first that attempted to reconstruct the demographic history of fig wasp by whole-genome re-sequencing data. The demographic histories of the three populations were traced back 4,000 generations (1,000 ya). The Ne of all populations declined since 1,000 ya with a recent expansion (Figure 5). In one continental population VH, a recent population expansion was detected. However, the demography history does not differ between DA and the other continental population, SCBG.
The similar results of DA and SCBG indicate that Hainan and continental populations still share an evolutionary process due to incompletely blocked gene flow, or incomplete lineage sorting. VH is geographically remote from DA and SCBG ( Figure 1A) and locates in a landlocked habitat. Hence, we assume that the climatic contrasts between inland and coastal areas is the critical factor. Unfortunately, we failed to provide a more accurate estimation on the divergence time of sp1. Comparing with the whole-genome re-sequencing data and population genomics, we suggest the transcriptome approaches because the transcriptome is more conservative than genome. To be specific, we hope a molecular clock based on single copy genes could provide more reliable evidence for the divergence time estimation of fig wasp. We also performed the demography reconstruction with an approximated mutation rate according to the substitution rate at silent sites of Nasonia (Oliveira et al., 2008). Specific estimation of Valisia species mutation rate, for accuracy, will offer more interesting information.

DATA AVAILABILITY STATEMENT
The raw sequence data reported in this paper have been deposited in the Genome Sequence Archive in National Genomics Data Center (https://ngdc.cncb.ac.cn/), China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences, under project number: PRJCA005939, accession number CRA004600 that are publicly accessible at https://ngdc. cncb.ac.cn/gsa.

AUTHOR CONTRIBUTIONS
HY conducted the sample collection and preparation of wasp materials. XX conducted the bioinformatic analyses and wrote the manuscript. B-SW and HY conceived of the study. All authors contributed to the article and approved the submitted version.