Insights Into the Significance of the Chinense Loess Plateau for Preserving Biodiversity From the Phylogeography of Speranskia tuberculata (Euphorbiaceae)

The significance of the Chinese Loess Plateau (CLP) in maintaining biodiversity for northern China has rarely been shown, as previous phylogeographic studies are mostly woody species and they have revealed that Quaternary refugia are mainly located in mountain regions. We selected a drought-enduring endemic herb, Speranskia tuberculata (Euphorbiaceae), to determine its glacial refugia and postglacial demographic history. To this end, we sampled 423 individuals from 38 populations covering its entire geographic distribution. Three chloroplast DNA (cpDNA) fragments, two low-copy nuclear genes, and six nuclear microsatellites (nSSRs) were used and supplemented with ecological niche modeling (ENM) to infer the phylogeographic history of this species. Populations with private haplotypes and high haplotype diversity of cpDNA are mainly located in the CLP or scattered around northeastern China and the coastal region. Spatial expansion, detected using a neutrality test and mismatch distribution, may have resulted in a widely distributed ancestral cpDNA haplotype, especially outside of the CLP. For nuclear DNA, private haplotypes are also distributed mainly in the CLP. In nSSRs, STRUCTURE clustering identified two genetic clusters, which are distributed in the west (western cluster) and east (eastern cluster), respectively. Many populations belonged, with little to no admixture, to the western cluster while (hardly) pure populations of the eastern cluster were barely found. Genetic differentiation is significantly correlated with geographic distance, although genetic diversity is uniformly distributed. ENM suggests that the distribution of S. tuberculata has recently expanded northwards from the southern CLP, whereas it has experienced habitat loss in the south. Thus, S. tuberculata populations probably survived the last glacial maximum (LGM) in the southern CLP and experienced post-glacial expansion. Wind-dispersed pollen could bring the majority of genotypes to the front during spatial expansion, resulting in uniformly distributed genetic diversity. Based on evidence from molecular data and vegetation and climate changes since the LGM, we conclude that drought-enduring species, especially herbaceous species, are likely to have persisted in the CLP during the LGM and to have experienced expansion to other regions in northern China.


INTRODUCTION
The Chinese Loess Plateau (CLP), which spans from 34 -40 • N and 101 -113 • E in northern China, contains heterogeneous topography and climate (Zhang et al., 2002) and is at the intersection of multiple floristic regions (Wang, 1999;Wu et al., 2010). A number of relict taxa and a diverse drought-enduring plants indicate that the CLP could have acted as a macrorefugium during Quaternary climate fluctuations (Harrison et al., 2001;Zhang et al., 2002;Ni et al., 2010). However, phylogeographic investigations of plants in northern China are limited (Qiu et al., 2011) as previous studies are focused on northwestern China (Meng et al., 2015), subtropical China (Ye et al., 2017a), northeastern China (Ye et al., 2017b) or southwestern China (Liu et al., 2017). Thus, the significance of the CLP for preserving the biodiversity of adjacent regions may be underestimated.
Paleovegetation reconstruction for northern China indicates that steppe and even desert vegetation covered this area during the last glacial maximum (LGM, ca. 26,000 -18,000 years ago), while the present coniferous and deciduous forests experienced southward retreat to below 30 • N (Harrison et al., 2001;Ni et al., 2010). Phylogeographic studies, however, challenged this hypothesis by discovering multiple LGM refugia, which have allowed species to persist across northern China (Chen et al., 2008;Tian et al., 2009;Zeng et al., 2011). In an endemic and dominant coniferous tree, Pinus tabulaeformis, Chen et al. (2008) suggested that there are multiple geographically isolated refugia based on restrictedly distributed mitotypes, although the plastid haplotypes were almost homogeneously distributed, which may be due to greater mobility of pollen relative to seed. Multiple localized refugia patterns were also discovered in a tree, Quercus liaotungensis (Zeng et al., 2011) and a shrub, Ostryopsis davidiana (Tian et al., 2009), as different plastid haplotypes were geographically restricted. Other studies on widely distributed East Asian species have shown that northern China is likely a contact zone between southern subtropical and northern cold temperate populations (Guo et al., 2014;Bai et al., 2016).
Previous studies have largely focused on woody plants (Chen et al., 2008;Tian et al., 2009;Zeng et al., 2011); therefore, these results may not be representative of the biome as a whole. Herbaceous species, especially drought-enduring plants that are mainly distributed in the CLP (Zhang et al., 2002) with much shorter generation times compared to woody species, may have responded differently to past climate changes (Wang et al., 2015). Second, only haplotypes of cytoplasmic markers are used to infer population demographic histories (Chen et al., 2008;Tian et al., 2009;Zeng et al., 2011), whereas the importance of pollen-mediated gene flow has been largely ignored except for P. tabulaeformis (Chen et al., 2008). As a result, many aspects of past population and range dynamics in northern China still need further exploration to provide insights into the preservation of biodiversity.
To assess the significance of the CLP for preserving biodiversity, Speranskia tuberculata (Bunge) Baillon (Euphorbiaceae), belonging to an East Asian endemic genus with two species only, was selected. This species is a drought-enduring endemic herb in northern China that is mainly distributed in the CLP and adjacent areas, where it usually occurs in dry habitats such as xerothermophilous grasslands, sparse thickets, and forest edges. Its pollen is dispersed by wind, while seeds are dispersed by gravity (personal observations). Its generation time is about one year base on personal observations. Genetic variation estimated from chloroplast (cp) and low-copy nuclear (n) DNA as well as nuclear microsatellites (nSSRs) was combined with ecological niche modeling (ENM) to investigate (a) the population genetic structure of S. tuberculata and to test (b) whether any refugia were located in the CLP.

Sampling and DNA Extraction
A total of 423 S. tuberculata individuals representing 38 populations were sampled through its entire geographic distribution ( Table 1). We avoided collecting clones by sampling individuals more than 10 m apart. Genomic DNA was extracted from silica gel-dried leaves using the Ezup DNA Extraction Kit (Sangon Biotech, Shanghai, China). Voucher specimens were deposited in the herbarium SWFC (Southwest Forestry University), Kunming, China.

Chloroplast and Nuclear DNA Sequence Analyses
Three chloroplast fragments, psbJ-petA, psbB-psbF, and trnL-trnF, and two species-specific low-copy nuclear genes (6146 and 38274) developed using transcriptome data (Fu et al., 2016) following the procedure described by Ye et al. (2017c) were used to detect potential genetic variation in S. tuberculata (Supplementary Table 1). The PCRs were performed following the protocol of Ma et al. (2019), except for different annealing temperatures (Supplementary Table 1), and the products were sequenced by Sangon Biotech (Shanghai, China).
cpDNA and nDNA chromatograms were read, edited, and aligned in MEGA 6.0 using the ClustalW algorithm (Tamura et al., 2013) and the alignments were carefully checked manually. For cpDNA, we deleted a region from psbJ-petA because it contained complex indels information and was too divergent to be aligned. The three plastid fragments were concatenated after the indels were recoded as single mutations. For nDNA, heterozygote sites were resolved using the coalescent-based Bayesian method of the phase function in DnaSP 5.10 (Librado and Rozas, 2009). Then, for both cpDNA and nDNA, the number of variable sites, number of haplotypes, haplotype diversity (H d ), and nucleotide diversity (π) were calculated in DnaSP. The most parsimonious networks for both plastid and nuclear data were inferred using Network 5.0 (Bandelt et al., 1999). Haplotype distributions were displayed in ArcGis 10.2 (ESRI. Inc.).
Potential population spatial expansion was detected using mismatch distribution analysis and a neutrality test, Tajima's D (Tajima, 1989) and Fu's F S (Fu, 1997), with default settings for all S. tuberculata populations using cpDNA genetic variations in Arlequin 3.5.1.2 (Excoffier and Lischer, 2010). Neither the absolute expansion time nor a Bayesian skyline plot (BSP) (Heled and Drummond, 2008) that can detect changes in effective population size through time were estimated, as no reasonable substitution rate of cpDNA is provided in S. tuberculata.

Nuclear Microsatellites Analyses
Six of the 18 nSSR markers developed for Speranskia species (Fu et al., 2016), which produced reproducible, scorable, and polymorphic products, were used to detect genetic variation in S. tuberculata ( Table 2). The six nSSRs were amplified (Fu et al., 2016) and then genotyped using a 3730xl automated Genetic Analyzer (Applied Biosystems, Foster City, CA, United States). Allele sizes were determined using GENEMAPPER 3.7 (Applied Biosystems). Null alleles were checked using Micro-checker 2.2 (Oosterhout et al., 2004). Linkage disequilibrium was tested using sequential Bonferroni corrections, and the deviation of F is (fixation index) from zero was used to test for Hardy-Weinberg equilibrium using FSTAT 2.9.3 (Goudet, 2001).  (Smouse et al., 2017). The R S (allele richness) and P AR (private allele richness) in populations with more than eight samples were calculated in hp-rare 1.0 using rarefaction with a sample number of eight (Kalinowski, 2005). To test isolation by distance (IBD), a Mantel test with 10,000 random permutations was performed between the matrix of pairwise F ST and that of the natural logarithm of the geographic distances using GenALEx.
The genetic structure was inferred using the Bayesian clustering method in STRUCTURE 2.3 (Falush et al., 2007) with an admixture model and assuming allele frequencies to be correlated among populations. Of the 38 populations sampled, populations DL, JB, and QYS were excluded due to limited sample size or a high proportion of missing data. Ten independent runs were performed for each K from 1 to 20 with 1 × 10 5 burn-in cycles followed by 1 × 10 6 MCMC cycles. Potential clusters (K) were determined by LnP(D), the change in log-likelihood of the data for each run (Pritchard et al., 2000), and K, the secondorder rate of change of LnP(D) between successive K-values (Evanno et al., 2005).

Ecological Niche Modeling
To explore the historical distribution range shifts of S. tuberculata, its potential distribution at present and during the LGM were modeled using MAXENT (Phillips et al., 2006). A total of 94 occurrence points were collected from our field records (35) and herbarium records (59). Eight low-correlated (r < 0.75) climate variables (Supplementary Table 2, available at 1 ) with 2.5 arc-min resolution were used to model the niche at present (Hijmans et al., 2005). The established model was projected onto the LGM climatic conditions reconstructed using the Community Climate System Model (CCSM) (Collins et al., 2006) at 2.5 arc-min resolution. We used 80% of the species records for training and 20% for model testing with 20 replicates. 1 http://www.worldclim.org/ The accuracy of the model's performance was evaluated based on the area under the receiver operating characteristic curve (AUC) (Fawcett, 2006).

cpDNA Analyses
Overall, 34 variable sites including 15 indels were detected in 423 S. tuberculata individuals, and 26 haplotypes were identified with a total of 2,600 bp using three chloroplast fragments (Supplementary Table 3). The most parsimonious network showed that most haplotypes were closely related, except H22 and H24. The most abundant and centrally located haplotype H1 showed the closest relationship with outgroup S. cantonensis (Figure 1a). Haplotype H1 is widely distributed across the entire range of S. tuberculata, especially outside of the CLP (Figure 1b). Two distantly related haplotypes (H15 and H16) were both distributed in population GS, and three relatively closely related haplotypes (H22-H24) showed disjunct distributions between the CLP (YA and YJ population) and coastal regions (YT population) (Figure 1b). Populations that have private haplotypes were mainly distributed in the CLP except two found in the coastal region (DL and YT population) (Figure 1b). Populations with high haplotype diversity were mainly located in the southern CLP and adjacent regions except the AHQ population in northeastern China (Figure 2a).

nDNA Analyses
Low genetic variation was found in both nuclear loci (6146 and 38274, Supplementary Table 4). In locus 38274, eight variable sites in 405 bp of 181 individuals resulted in nine haplotypes. Haplotypes H2-H9 were all one mutation away from the centrally located haplotype H1 (Figure 3a). Private haplotypes were distributed in populations south of the Qinling Mountains (WC population), the southern CLP (GH, FF, TC, LL, and YJ population), and coastal regions (DL population) (Figure 3b). In locus 6146, six haplotypes were detected through five variable sites along 294 bp in 184 individuals. H1 was separated from H2-H5 by only one mutation site (Figure 3c). H1 was widely distributed, and three populations that have private haplotypes (H3 in GH, H2 in FF, and H4 in the HM population) were distributed in the southern CLP (Figure 3d).

nSSRs Analyses
Null alleles were not found in the six nSSR loci, and no significant genotypic disequilibrium was observed among the 15 locus pairs in any populations. Several populations had a fixation index (F is ) that is significantly deviated from zero, suggesting that these populations are not under Hardy-Weinberg equilibrium ( Table 1). The genetic diversity indices of the 35  sampled populations and six loci are listed in Tables 1 and 2, respectively. Within-population genetic diversities, H E , P AR , and R S , are all uniformly distributed (Figures 2b-d). A Mantel test (r = 0.171, P = 0.037) indicated a significant correlation between genetic differentiation and geographic distance among all sampled populations. Using STRUCTURE, the highest K is observed at K = 2, while the LnP (D) gradually increased from K = 2 to K = 11 (Supplementary Figure 2). We chose K = 2 as the most possible clustering (Figure 4a), and the histogram of the STRUCTURE assignment test for all populations from K = 2 is shown in Figure 4b. The western green cluster is mainly distributed in the CLP and adjacent regions, while the eastern red cluster is in the eastern part of the present S. tuberculata distribution range (Figure 4a). Pure western cluster populations were observed in many populations, while pure eastern populations were barely found.

Ecological Niche Modeling
High ROC values (0.968 ± 0.012) indicate good accuracy of model predictions. At present, S. tuberculata is distributed mainly in the CLP and adjacent regions with extensions from Taihang-Yan mountains to northeastern China and from the North China Plain to the coastal region (Figure 5a). During the LGM, the most suitable habitat is predicted to have been located in the southern CLP and the Qinling Mountains with extension to the Sichuan Basin (Figure 5b). Comparison of absolute changes in habitat suitability showed that the northern distribution of S. tuberculata was reached after the LGM from the southern CLP and that the species has experienced habitat loss in the Sichuan Basin (Figure 5c).

Southern Macrorefugium and Multiple Distant Microrefugia
In the present study, the phylogeography history of S. tuberculata was investigated using multiple genetic markers and potential habitat modeling. Chloroplast and nuclear DNA as well as ENM consistently support the hypothesis that S. tuberculata survived the unsuitable climate in the macrorefugium in the southern CLP and in multiple distant microrefugia. In contrast, nSSRs are ambiguous in inferring refugia and postglacial demographic history.
In chloroplast and nuclear sequences, populations with high haplotype diversity and private haplotypes are mainly distributed in the southern CLP, indicating that the macrorefugium of S. tuberculata is located there (Hewitt, 1996(Hewitt, , 2004. Disjointly distributed closely related or co-distributed distant chloroplast haplotypes are evidence of population loss during past climate changes (Qiu et al., 2011), likely due to the cold and dry climate during the LGM. The mismatch distribution and the neutrality tests of chloroplast sequences support spatial expansion, which results in a wide distribution of haplotype H1 (Figure 1b). ENM also shows that most of the present S. tuberculata distribution north of the Qinling Mountains has been gained after the LGM (Figure 5). Post-glacial colonizations to the eastern part of the present distribution are likely through the Taihang-Yan Mountains and the North China Plain. Private chloroplast or nuclear haplotypes in northeastern China (KLQ and AHQ population) or in coastal regions (DL and YT population) may indicate multiple restricted microrefugia (Provan and Bennett, 2008). Therefore, S. tuberculata is likely to have persisted in the southern CLP macrorefugium and in multiple microrefugia in northeastern China and the coastal region (Ye et al., 2017b).
Seemingly, the nSSRs data tell a different story as two different scenarios can both generate genetic patterns displayed by nSSRs. First, expansions from an eastern and a western macrorefugium can result in genetically mixed populations in the central contact zone (Figure 4), similar to some widely distributed species (e.g., Acer mono and Juglans spp.) that show a genetic mixture in northern China (Guo et al., 2014;Bai et al., 2016). Such populations in contact zones are expected to have higher genetic diversity (Petit et al., 2003). However, this is not the case for S. tuberculata (Figures 3b-d), additionally pure populations belonging to the eastern cluster are barely found. Besides, uniformly distributed genetic diversity is probably attributed to massive gene flow via wind-dispersed pollen, which can reduce the surfing effect during spatial expansion and bring the majority of genetic composition to the expansion front . This scenario could explain that genetic structure changes gradually from west to east in agreement with a significant IBS pattern (Figure 4). Thus, nSSR a data are compatible with the southern macrorefugium scenario.

The Significance of the CLP in Preserving Biodiversity
In previous phylogeographic studies in northern China, detected refugia were mainly located in mountain regions, such as the Qingling Mountains (Tian et al., 2009) or the Yan-Taihang Mountains (Zeng et al., 2011), but not in the CLP. In some other species mainly distributed in arid northwestern China, the CLP acts as a dispersal corridor for colonization (Lagochilus ilicifolius, Meng and Zhang, 2011;Ribes meyeri, Xie and Zhang, 2013). In this study, using the drought-enduring herb S. tuberculata as a case, multiple genetic data and potential habitat modeling uncovered the role of the CLP as a macrorefugium, indicating its importance in preserving biodiversity in northern China.
The CLP is mainly influenced by the Asian winter monsoon, which brings dust from the deserts of the Asian interior. The initial eolian dust deposits could dates back to the Eocene-Oligocene (Licht et al., 2014;Clift et al., 2015).
Such deposits experienced several intensifications up to the Quaternary (Wang et al., 2007). Precipitation, which is higher in the southeast and lower in the northwest, has a direct influence on the vegetation of the CLP as it is located in the dry interior of Asia (Zhang et al., 2012), and this gradient was also present in the interglacial periods (Jiang and Ding, 2005). During the LGM, as the CLP is located in the marginal zone of the Asian summer monsoon, the climate is prone to be the same compared to prominent north-south and west-east gradients (warmer and wetter conditions in the south/east) during the interglacial period (Jiang et al., 2014). Higher precipitation is needed to develop wood forest while lower precipitation is needed for glass forest (Lv et al., 1999), and herbaceous species are thought to be more likely to find refugia in the CLP than tree species that survived the LGM in mountain regions (Tian et al., 2009;Zeng et al., 2011). In the CLP flora, many genera that are adapted to arid environments (Wang, 1999), such as Caragana, Ephedra, and Tamarix, are highly diversified, and majority of the CLP endemic species show tolerance to aridity (Zhang et al., 2002). In addition to the herb S. tuberculata, other drought-enduring species are also likely to have persisted FIGURE 5 | Potential species distribution of Speranskia tuberculata estimated by ecological niche modeling at present (a) and during the last glacial maximum (LGM) using the CCSM3 model (b). Absolute changes in habitat suitability are compared between present and LGM (c); Blue represents suitability gains and green represents losses in suitability. Black dots represent locations of sampled populations, while white dots represent occurrence points collected from herbarium records.
in the CLP macrorefugium and to have experienced postglacial expansions.
The CLP contributes not only to the biodiversity of northern China, but also to other regions in East Asia. First, Meng et al. (2015) have suggested the CLP to be an important refugium for the dry-adapted flora of northwestern China. Furthermore, disjoint distribution of same haplotype between CLP and northeastern China may result from long distance dispersal from the CLP (Tian et al., 2009;Song et al., 2020).
Finally, haplotypes of S. tuberculata in the Sichuan Basin (H13-14) are derived from the ancestral haplotype (H1), indicating the CLP may be the source of genotypes in southwestern China.

CONCLUSION
The present phylogeographic work is an important supplement to previous studies in East Asia, which are much more numerous for northwestern China (Meng et al., 2015), subtropical China (Ye et al., 2017a), northeastern China (Ye et al., 2017b) or southwestern China (Liu et al., 2017). However, further phylogeographic studies on various species are still needed to depict a comprehensive response of the flora from northern China to Quaternary climate changes (Zhang et al., 2002).

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

AUTHOR CONTRIBUTIONS
J-WY and BT conceived the ideas. H-YW, M-JF, and PZ contributed to the sampling and the data analyses. J-WY and BT wrote the manuscript. All authors have read and approved the final manuscript.

FUNDING
This study was supported by the National Natural Science Foundation of China (NSFC) (31260050 and 41861008) and the Ten-thousand Talents Program of Yunnan Province (YNWR-QNBJ-2020).

ACKNOWLEDGMENTS
We thank Mr. Yi Fu for help during the field survey. We are grateful to Dr. Kang-Shan Mao for very helpful comments on earlier drafts of this manuscript.