Lack of Genetic Structure Among Populations of Striped Flea Beetle Phyllotreta striolata (Coleoptera: Chrysomelidae) Across Southern China

The striped flea beetle (SFB) Phyllotreta striolata (Fabricius) (Coleoptera: Chrysomelidae) is a major pest of cruciferous vegetables in southern China. The population diversity and genetic structure of SFB are unknown. Here, we assembled a draft genome for the SFB and characterized the distribution of microsatellites. Then, we developed 12 novel microsatellite markers across the genome. We used a segment of the cox1 gene and newly developed microsatellite markers to genotype the genetic diversity of SFB across southern China. There were 44 mitochondrial haplotypes in the SFB populations, with haplotype 2 as the most widespread. The population genetic differentiation was very low, indicated by FST-values (<0.05 except for Guangxi population with other populations based on cox1), high gene flow (4.10 and 44.88 of cox1 and microsatellite, respectively) and Principal Coordinate Analysis across all populations. Mantel test showed genetic distance in SFB was significantly associated with geographic distance based on microsatellites (R2 = 0.2373, P = 0.014) while result based on cox1 (R2 = 0.0365, P = 0.155) showed no significant difference. The phylogenetic analysis did not find any geographically related clades among all haplotypes. Analyses based on microsatellites showed a lack of population genetic structure among all populations. Our study provides a foundation for the future understanding of the ecology and evolution of SFB and its management.

The SFB is mainly distributed in Asia, Europe, and North America (Soroka et al., 2018;Cao et al., 2020;Atirach et al., 2021). Native to Eurasia, SFB was introduced to North America in 1663 from Europe (Rousseau and Lesage, 2016). In China, the SFB is one of the most important cruciferous crops pests in southern areas. When the adult density of SFB is 20∼50 individuals every 100 Brassica pekinensis, the damage rate will be 50∼76%; if the adult density more than 50, the damage rate will be 100% in Shenzhen (Zhang et al., 2000). There are six to nine generations of SFB throughout the year without overwintering in southern China , compared to one to two generations per year in North America (Olfert et al., 2017). Recently, damage of SFB has increased. Its geographic range expanded both in China and North America (Chai, 2010;Sun, 2010;Lee et al., 2011;Kielen, 2012). Current studies mainly focus on this pest's biology and control methods (Gao et al., 2000;Chai, 2010;Lee et al., 2011;Soroka et al., 2018;Yan et al., 2018;Atirach et al., 2021); however, no studies that we are aware of on the ecology and evolution of the SFB.
In this study, we examined the population genetic diversity and genetic structure of the SFB among populations across its main distribution range in China. Due to the lack of genetic markers, we developed a novel set of microsatellite markers from first-time assembled random genomic sequences of SFB using Illumina sequencing. We used both microsatellites and a segment of the mitochondrial cytochrome oxidase subunit I (cox1) gene to determine the genetic diversity and differentiation among eight representative populations across southern China. We tested the hypothesis that levels of genetic diversity and population differentiation for the SFB are low due to high levels of gene flow among populations. These results provide a basis to understand the ecology and evolution of this pest and its management.

Sample Collection and DNA Extraction
Adults of SFB were collected across southern China from cruciferous vegetables (Table 1 and Figure 1). To avoid collecting siblings, we collected specimens from about 20 sites at least 10 m apart at each sampling location. In total, eight populations were collected. The species were first identified by morphology (He et al., 2012) and then validated by molecular identification using the mitochondrial cox1 gene (see below). All samples were kept in 100% ethanol before DNA extraction.
To construct a high-throughput sequencing library for microsatellites development, we extracted genomic DNA from 50 adults collected in Xiaoshan, Zhejiang province, using the DNeasy Blood & Tissue Kit (Qiagen, Hilden, Germany). Then, we used 192 individuals for population-level genotyping, with 24 individuals from each population. Total genomic DNA was extracted from individual whole adult using the DNeasy Blood & Tissue Kit. The voucher specimens were stored at -80 • C in the Integrated Pest Management Laboratory of the Beijing Academy of Agriculture and Forestry Sciences.

Library Construction, Genome Sequencing, and Assembly
The high-throughput sequencing library with 500-bp insert size was prepared using the Illumina TruSeq DNA PCR-Free HT Library Prep Kit (Illumina, San Diego, CA, United States). The prepared library was sequenced on an Illumina Hiseq4000 Sequencer using the Hiseq Reagent Kit v3 (Illumina, San Diego, CA, United States) by Beijing BerryGenomics Co., Ltd. The paired-end 150 bp raw data were trimmed by removing the low-quality reads using Trimmomatic 0.36 (Bolger et al., 2014), and then the sequences were evaluated by FastQC v 0.11.5 (Andrews, 2004). The genome size of P. striolata was estimated by JELLYFISH v2.2.6 software with a K-mer method (Kingsford, 2011). IDBA_UD v1.1.1 was used to assemble the generated genomic sequences with K-mer from 20 to 140 (Peng et al., 2010).

Development of Universal Microsatellite Markers
The Microsatellite Search and Building Database (MSDB) 1 was used to identify potential microsatellite sites from the whole genome sequences of P. striolata. A minimum of 12, 5, 5, 5, 5, and 5 repeats was used to distinguish mononucleotide, dinucleotide, trinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide motifs, respectively (Lianming et al., 2013). The QDD3 program (Emese et al., 2010) was then used to extract microsatellites and their flanking sequences (300 bp each) and design the primers. The criteria and parameters for the primer design for the isolated microsatellite markers followed Cao et al. (2016). The primer design was set according to the following parameters: annealing temperature (Tm): 54 • C < Tm < 60 • C; the difference in Tm between paired upstream and downstream primers was < 4 • C; only one pair of microsatellite primers is kept on each locus; the repeat motif of the amplified product was ≥ 3. The microsatellite loci were further filtered under stringent criteria as described in previous studies (Song et al., 2018): (i) the microsatellites had to be pure and specific, (ii) the design strategy of "A" was used, and (iii) the minimum distance between the 3 end of a primer pair and its target region had to be no shorter than 10 bp.
We selected 70 pairs of primers for initial validation using one individual from each representative population (AH, FJ, GD, GX, HN, JS, JX, ZJ). We used three fluorescences (FAM, ROX, HEX) to label the amplified products of each locus (Schuelke, 2000;Blacket et al., 2012). The fluorescences were independently added to a primer C tail (PC-tail) (5 CAGGACCAGGCTACCGTG 3 ). The sequence of the PC-tail was added to the 5 end of each upstream primer (Blacket et al., 2012). The reaction system was set to 10 µL, including 0.5 µL of template DNA (30-160 ng/µL), 5 µL of Master Mix (Promega, Madison, WI, United States), 0.08 µL of the PC tail modified forward primer (10 mM), 0.16 µL of reverse primer (10 mM), 0.32 µL of fluorescence-labeled PC-tail (10 mM) and 3.94 µL of ddH 2 O. The following PCR amplification program was used to amplify the microsatellites: 3 min at 94 • C; 35 cycles of 15 s at 94 • C, 30 s at 56 • C and 1 min at 72 • C, followed by a final 10 min extension at 72 • C. Primer pairs fitting the following criteria were retained in subsequent analysis: (i) primers with PCR amplification rate equal or higher than 75%; (ii) at least two alleles present in tested individuals; (iii) without non-specific amplification. Length of PCR products was analyzed on an ABI 3730xl DNA Analyzer (Applied Biosystems, United States) using the GeneScan TM 500 LIZ TM dye Size Standard (Applied Biosystems, United States) for sizing DNA fragments. Genotypes were determined using GENEMAPPER v4.0 (Applied Biosystems, United States) (Chatterji and Pachter, 2006). The remaining primer pairs were used for population-level genotyping.

Mitochondrial Gene Amplification, Sequencing, and Homologous Gene Downloading
To characterize the mitochondrial variation and validate correct identification of the specimens, a fragment of the cox1 gene involved the DNA barcoding region of insects was sequenced using the universal primers LCO1490 (5 GGTCAACAAATCATAAAGATATTGG 3 ) and HCO2198 (5 TAAACTTCAGGGTGACCAAAAAATCA 3 ) (Folmer et al., 1994). Polymerase chain reactions (PCR) were conducted using the Mastercycler Pro system (Eppendorf, Germany) in a 15 µL volume consisting of 1 µL of template DNA (30-160 ng/µL), 7.5 µL of Master Mix (Promega, Madison, WI, United States), 0.6 µL of forward primer (10 mM), 0.6 µL of reverse primer (10 mM) and 5.3 µL of ddH 2 O. The thermal profiles for DNA amplification were as follows: 3 min at 94 • C; 35 cycles of 15 s at 94 • C, 20 s at 54 • C, and 40 s at 72 • C, followed by a final 10-min extension at 72 • C. Amplified products were purified and sequenced on an ABI 3730xl DNA Analyzer by Tsingke Biotechnology Co. Ltd. (Beijing, China).
Apart from sequences obtained in this study, we searched orthologous cox1 gene sequences from the NCBI nucleotide database and included those with sampling locations in our phylogenetic analysis (Pentinsaari et al., 2014;Hendrich et al., 2015;Nie et al., 2017;Coralşahin et al., 2018;Lalrinfeli et al., Table 1). Sequences with obvious errors were deleted, and others were used for follow-up analysis.

Population Genetic Diversity Analysis
For mitochondrial DNA, sequencing results from both strands were assembled using SeqMan in the LASERGENE version 7.1.2 (DNASTAR, Inc., United States). Sequences of the cox1 were aligned with MUSCLE (Edgar, 2004) implemented in MEGA version X (Kumar et al., 2018). The number of polymorphic sites (S), number of haplotypes (H), haplotype diversity (H d ), nucleotide diversity (Pi), Tajima's D, Fu's F s and gene flow (N m ) were analyzed in DnaSP version 6 (Rozas et al., 2017). Pairwise mean population differentiation (F ST ) was estimated using Arlequin 3.5 (Excoffier and Lischer, 2010).
For microsatellites, the number of alleles and the observed heterozygosity (H o ) were analyzed using the macros Microsatellite Tools (Park, 2001). The null allele frequencies and F ST with excluding of null alleles (ENA) were estimated using the software FreeNA (Chapuis and Estoup, 2007). Deviation from Hardy-Weinberg equilibrium (HWE) for each locus/population combination, F ST and inbreeding coefficients (F IS ) were estimated in GENEPOP v4.0.11 (Rousset, 2008). Principal Coordinate Analysis (PCoA) and genetic differentiation coefficient (G ST ) were performed with the GenAlex program ver. 6.502 (Peakall and Smouse, 2012). The gene flow was calculated using the formula N m = 0.5 (1-G ST )/G ST . HP-RARE v1.1 (Kalinowski, 2010) was used to test allelic richness (A R ) and allelic richness of private alleles (P AR ) of each site. GENCLONE v2.0 (Arnaud-Haond and Belkhir, 2006) was used to estimate the total number of alleles (A T ) and the unbiased expected heterozygosity (H E ). We compared the number of alleles (A S ) among samples with different sample sizes in GENCLONE. Mantel test (Mantel, 1967), used to estimate correlation between genetic and geographic distances of populations to test isolation by distance, was performed with 1,000 permutations based on Genepop. 2

Phylogenetic and Population Genetic Structure Analysis
A Bayesian-inference phylogenetic tree was constructed for mitochondrial DNA using Mrbayes v3.2.2 (Ronquist et al., 2012) to examine phylogenetic relationships among cox1 haplotypes of SFB. Four independent Markov chains for 50 million MCMC generations were run with tree sampling every 5,000 generations and a burn-in of 2,500 trees.
For microsatellite data, the population genetic structure was first investigated using the STRUCTURE v2.3.4 program (Pritchard et al., 2000). We used 30 replicates of each K-value from 1 to 8, with 200,000 Markov chain Monte Carlo iterations and a burn-in of 100,000 iterations. The results were uploaded to the online software Structure Harvester v0.6.94 (Earl and Vonholdt, 2012) to determine the optimal K-value by a Delta K method (last accessed on 28, Dec. 2020). 3 The K-value with the highest K represents the number of potential genetic clusters in the population. Membership coefficient matrices (Q-matrices) associated with the optimal K were processed using CLUMPP v1.12 (Jakobsson and Rosenberg, 2007) and then visualized using the DISTRUCT v1.1 (Taubert et al., 2019). Finally, we used discriminant analysis of principal component (DAPC) to analyze population genetic structure under default settings to complement the STRUCTURE analysis. This analysis was run using an R-package adegenet v1.4-2 (Jombart et al., 2008).

Genome Assembly and Characterization of Microsatellites Across the Genome
A total of 47.88 Gb paired-end (PE) sequences (159,598,840 reads each with a length of 150 bp) were obtained. Trimmed reads were assembled into 595,192 scaffolds with a total length of 342.147 MB ranging from 200 bp to 118.198 KB with an N50 of 600 bp. These contigs were used for microsatellite discovery.

Development of Microsatellite Markers
The QDD3 program designed 2,585 primer pairs for 1,994 din-, 359 tri-, 70 tetra-, 29 penta-, and 21 hexa-nucleotide microsatellites. Following our stringent filter, 70 primer pairs flanking perfect microsatellites were retained. The number of primer pairs flanking tri-, tetra-and penta-nucleotide microsatellites was 67, 1, and 2, respectively. In our initial test of these 70 primer pairs, 16 pairs generated polymorphic genotypes, 12 pairs failed to amplify in any samples, and 46 pairs failed to amplify in more than two individuals. In the final test, 12 microsatellite markers were retained for population-level genotyping ( Table 2).

Population Genetic Diversity
Based on the mitochondrial cox1 genes which were 647 bp long, we found 44 haplotypes from the collected SFB populations. Haplotype #2 was the most common (61.17% of individuals). Thirty-four haplotypes appeared in only one population. The GX population had the most haplotypes with thirteen, while the ZJ population had the least number of haplotypes (n = 4). The haplotype diversity ranged from 0.338 to 0.931, and the nucleotide diversity ranged from 0.00059 to 0.00497. There were 11 polymorphic sites in the HN population which represented the highest number across the most in all populations ( Table 3).
The data of Tajima's D and Fu's F s of each population were all negative except for F S of GX (F S = 0.57 without statistical significance). For all populations Tajima's D and Fu's F s were -2.53 and -6.01, respectively, with statistical significance, which showed the SFB populations followed the neutral evolution model and there was a population expansion in recent history. The Nei's N m of haplotypes and all sequences was 4.10 and 3.37, respectively.
Using all cox1 sequences from the public database, haplotypes were reanalyzed and all populations were divided into thirty-four haplotypes. Haplotypes #30-34 were unique to populations outside China. Haplotype #2 was shared by populations in and out of China. The other twenty-eight haplotypes were unique to the populations of China. The pairwise distance among all haplotypes ranged from 0.0018 to 0.0143 ( Table 4). We also calculated the pairwise population genetic differentiation among eight populations and F ST -values ranged from -0.0187 (FJ-ZJ) to 0.1374 (JX-GX). A moderate genetic differentiation was observed between GX and other populations based on the F ST -values ranging from 0.0367 to 0.1374 (Table 5).   Frontiers in Ecology and Evolution | www.frontiersin.org   Based on the 12 developed microsatellite markers, we evaluated the genetic diversity in eight populations of SFB. Significant departures from HWE (p < 0.05) were detected in 53 of the 96 population-locus combinations after sequential Bonferroni correction. The population-locus pairs colonized by primer PS-22 all showed significant departures from HWE. Thirty-five of the 528 locus-locus pairs showed linkage disequilibrium in at least one population (p < 0.05), whereas five (S08-S21, S20-S13, S09-S22, S22-S33, and S07-S33 pair) of 66 locus pairs showed linkage disequilibrium across all populations.
Genetic diversity parameters varied among populations. The observed (H o ) and expected (H E ) heterozygosity values ranged from 0.387 to 0.507 and from 0.486 to 0.577, respectively. The total number of alleles was the highest in population AH, with a value of 50. In calculating the standardized total number of alleles, we removed locus S06 because of the failure of amplification in the population JX. The inbreeding coefficient of all populations ranged from 0.125 to 0.273 (Table 3) (Table 6). Populations GX and JS have the highest geographical distance and genetic distance. However, the JX and FJ have the nearest geographical distance but not the least genetic distance. The Principal Coordinate Analysis (PCoA) showed no obvious geographic structure (Figure 2). The first and second principal coordinates explained 21.78 and 16.65% of the total variation, respectively.
To determine whether the genetic distance in SFB is significantly associated with geographic distance, we conducted Mantel tests between pairwise genetic differentiation [F ST /(1 -F ST )] and geographical distance matrix based on haplotypes and microsatellites (Figure 3). The relationships of genetic distance and geographic distance were not identical based on microsatellites (R 2 = 0.2373, P = 0.014) and mtDNA (R 2 = 0.0365, P = 0.155).

Population Genetic Structure and Phylogenetic Relationship Analysis
We reconstructed the phylogenetic tree based on thirty-two haplotypes (Figure 4). Haplotypes from different geographical regions were mixed in the phylogenetic tree, which was consistent with PCoA analysis. We did not find any geographically related clades with low supporting data, except for one clade composed of two haplotypes (30 and 32) unique to populations outside of China, and another clade composed of three haplotypes (28, 29, and 31) unique to populations outside of China.
The STRUCTURE analysis indicated that the optimal K-value was two. There was a lack of population structure when K increased from 2 to 3 ( Figure 5A). DAPC analysis revealed similar patterns to those obtained from the

DISCUSSION
This study developed universal microsatellite markers of SFB based on a de novo assembled draft genome. Both mitochondrial DNA and microsatellite markers point to the low population genetic diversity of SFB in southern China. The haplotype analyses using partial mitochondrial cox1 genes also confirms the low genetic diversity in Europe and Asia since only two haplotypes were identified from the public databases. A lack of population structure was supported by population genetic structure analysis results using STRUCTURE and DAPC.
The mononucleotide repeat sites included four types as (A) n , (T) n , (C) n , and (G) n . The number of (A) n and (T) n far exceeded that of (C) n and (G) n in SFB, which is similar to other eukaryotic species (Katti et al., 2001). (A) n , (AT) n, and (AAT) n were the most frequent motifs in SFB, as in other beetle species (Song et al., 2020). The frequency and density of microsatellite markers varied among species (Selkoe and Toonen, 2010;Xu et al., 2017;Liu et al., 2019). The dominant repeats in SFB are dinucleotides except for an occasional mononucleotide. Interestingly, this differs from other beetle species that have been sequenced in the Family Chrysomelidae, in which the dominant repeats are trinucleotide or tetranucleotide repeats in Leptinotarsa decemlineata (Coleoptera, Chrysomelidae) and Diabrotica virgifera (Coleoptera, Chrysomelidae), respectively (Kim et al., 2008;Liu et al., 2018).
In general, the results based on microsatellites and haplotypes of the cox1 gene support the notion that the sampled populations of SFB have low genetic variation and lack population genetic structure in southern China, as suggested by multiple lines of analyses such as pairwise F-statistics, PCoA, DAPC, STRUCTURE and phylogenetic reconstruction. The lack of population genetic structure was found in some species, such as recently introduced species Obolodiplosis robiniae and Frankliniella occidentalis (Shang et al., 2015;Cao et al., 2017) or migratory species like Plutella xylostella, whose Sichuan populations are sources populations for northern immigrants, and southern China and Yunnan populations are sources populations for central-eastern populations . In order to escape suboptimal environmental or weather conditions, some insects have annual migrations (Zhan et al., 2011). In Coleoptera, some species, like Leptinotarsa decemlineata and Pachysternum capense can migrate (Ferro et al., 1991;Greń and Górz, 2020). We did not find a records of migration events for the of SFB and studied have shown that it has been established in Eurasia for several centuries (Rousseau and Lesage, 2016). Our hypothesis, that the low levels of genetic diversity and population differentiation for the SLB are due to high levels of gene flow among populations, needs to be further tested.
Consistently low variation among groups is indicative of high levels of gene flow across the geographic range as seen microsatellite data. Pairwise comparisons of F ST -values from mtDNA data are also consistent with high gene flow across the range, with generally low F ST -values between populations. Haplotypes (n = 2) widely exist in all populations also suggest a high-level of gene flow among 8 populations. Cruciferous vegetables are a kind of common vegetable in China and is the preferential host of SFB. The high gene flow maybe due to the transportation of fresh vegetable from south to north in China, such as from Anhui Province to Zhejiang Province. The passive transportation accelerates gene flow among different populations (Martinez-Hernandez et al., 2021). The high geneflow, however, may be the evidence of its migration (Seymour et al., 2016;Yang et al., 2020).
But there are also some divergences between these two types of molecular markers. A moderate genetic differentiation was observed between GX and other populations based on genetic distances of mtDNA, which were not observed based on microsatellites. Because mtDNA is sensitive to founder effects and small population size, the probable loss or gain of a mtDNA haplotype will be greater for small populations (Roderick, 1996). Furthermore, the Mantel test of microsatellites suggests evidence of a positive relationship in genetic distances (F ST ) between genetic and geographic distances, while the result based on mtDNA had no significant difference. This may be caused by the use of only a single marker of mtDNA vs. 12 markers of microsatellite. Different parts of mtDNA evolve at different rates, thus different parts of mitochondrial DNA should be considered for future studies to find higher-level population differentiation (Avise et al., 1992). The significant Mantel test suggests that while genetic differences between populations are rather small, they tend to accumulate as geographic distances increase, supporting the non-negligible population structure.
We found that the population collected from Guangxi province of southwest China had the highest number of unique haplotypes. Many species in East Asia were found to have originated from southwestern China and expanded their distribution range northward during the interglacial periods in the Quaternary (Dong et al., 2013;Wei et al., 2015;Yang et al., 2020). Our study did not include samples collected from southwestern areas such as Yunnan and Sichuan. The inclusion of Guangxi populations and the relatively higher level of genetic diversity in this population may indicate that the SFB may also have originated from southwestern China and subsequently dispersed to other areas.

CONCLUSION
In conclusion, our study provides a draft genome and a set of microsatellites developed from the genome. Based on these microsatellites and a segment of mitochondrial gene, we investigated the population genetic structure of the SFB in southern China, where it is a common pest. We found a lack of population differentiation of SFB among populations in southern China, which might be caused by high frequencies of gene flow among populations. Either passive transportation or migration events will accelerate the gene flow among different populations. Species collected from Guangxi Province showed more genetic divergence than others based on mitochondrial gene which indicates that the SFB may originate from southwestern China. In consideration of the limited collection range, the hypothesized movements need further investigation.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ncbi.nlm. nih.gov/genbank/, PRJNA761897. Data are also available on the Barcode of Life Datasystems (BOLD) database under Project-SFB "Barcoding of Phyllotreta striolata in southern China" (doi: 10.5883/DS-188PSLQ).

AUTHOR CONTRIBUTIONS
QL analyzed the data and wrote the article. QL and G-ML collected the data. Y-LZ and S-JW revised the article. All authors contributed significantly to the drafts and gave final approval for publication.

FUNDING
This research was funded by the China Agriculture Research System of MOF and MARA (CARS-23-C05), and "Three rural issues and six participants" of Scientific and Technological Cooperation Projects of Zhejiang Province-9#. The integration, demonstration and promotion of green technology of prevention and control of Phyllotreta striolata (CTZB-F180706LWZ-SNY1).