Genetic Diversity of Phyllanthus emblica From Two Different Climate Type Areas

Phyllanthus emblica L. is a well-known medicinal and edible plant species. Various medicinal compounds in the fruit make it an important medicinal and promising economic material. The plant is widely distributed in Southwestern and Southern China. However, due to massive deforestation and land reclamation as well as deterioration of its natural habitat in recent years, the wild resources of this species have been sharply reduced, and it is rare to see large-scale wild P. emblica forests so far. In order to effectively protect and rationally utilize this species, we investigated the genetic diversity, genetic structure, and population dynamics of 260 individuals from 10 populations of P. emblica sampled from the dry climate area in Yunnan and wet climate area in Guangxi using 20 polymorphic EST-SSR markers. We found high genetic diversity at the species level (He = 0.796) and within populations (He = 0.792), but low genetic differentiation among populations (FST = 0.084). In addition, most genetic variation existed within populations (92.44%) compared with variation among the populations (7.56%). Meanwhile, the NJ tree, STRUCTURE, and hierarchical analysis suggested that the sampled individuals were clustered into two distinct genetic groups. In contrast, the genetic diversity of the dry climate group (He = 0.786, Na = 11.790, I = 1.962) was higher than that of the wet climate group (He = 0.673, Na = 9.060, I = 1.555), which might be attributed to the combined effects of altitude, precipitation, and geographic distance. Interestingly, only altitude and precipitation had significant pure effects on the genetic diversity, and the former was slightly stronger. In addition, DIYABC analysis suggested the effective population size of P. emblica might have contracted in the beginning of the Last Glacial Maximum. These genetic features provided vital information for the conservation and sustainable development of genetic resources of P. emblica, and they also provided new insights and guidelines for ecological restoration and economic development in dry-hot valleys of Yunnan and karst areas in Guangxi.


INTRODUCTION
Phyllanthus emblica is a perennial deciduous tree or shrub distributed across tropical and subtropical areas. It contains many medicinal compounds such as vitamin C, flavonoids, emblicol, emblicanin A/B, and is used as a daily medication by many ethnic minorities such as Tibetan group in China; its various therapeutic effects such as hepatoprotective, antimicrobials have also been confirmed in modern medical research (Variya et al., 2016;Chaphalkar et al., 2017). The World Health Organization has listed it as one of the three health-care plants that are widely planted throughout the world (Li and Zhao, 2007). Its seeds contain high contents of unsaturated fatty acids, such as linoleic and linolenic acid, making it an important potential raw industrial material (Choudhary and Grover, 2019). In nature, P. emblica is mainly pollinated by wind and bees (Bajpai, 1957), and the long-distance transmission of pollen provides the chances of long-distance genetic migration between different populations (Nakanishi et al., 2020). In nature, P. emblica usually depends on seed propagation, and long-distance seed dispersal must depend on certain fruit-eating animals (Li and Zhao, 2007). In China, P. emblica is mainly distributed south of the Yangtze River with a large distribution in the east-west direction, from the humid area in the east to the dry-hot valley formed by the wind-burning effect in the west. The Yunnan dry-hot valley is located on the southern edge of the Hengduan Mountains in the Biodiversity Center in Southwest China. It exhibits unique characteristics of dry-hot valleys due to its geographical location and climate: the average annual precipitation is 600-800 mm, and the annual average evaporation is 3-6 times higher than the precipitation. Precipitation is mainly concentrated in the rainy season, whereas the dry season is extremely arid, which is similar to the typical savannah climate (Jin and Ou, 2000). In contrast, the area near the Tropic of Cancer in central Guangxi belongs to the south subtropical monsoon climate zone with abundant rainfall. Although these two climate types are clearly different, P. emblica is distributed in both of these areas and has adapted well to their respective habitats during its long-term evolution. However, since the 1980s, the wild genetic resources of P. emblica in China have sharply decreased, and its natural habitat has become severely fragmented as a consequence of unreasonably high levels of development and land utilization (Li and Zhao, 2007). Thus, it is highly important to explore the genetic diversity of P. emblica to be able to effectively protect this medicinal, economic and industrial plant species along with rationally developing and utilizing it as an important resource.
Rational utilization and protection of a species depends on understanding its distribution, differentiation, and factors influencing its genetic diversity, which is the basis of species' adaptation to changing environment (Solbrig, 1991). Particularly, for long-lived woody species, genetic diversity not only determines their ability to adapt to the environment but also forms the basis for maintaining long-term stability of forest ecosystems (Hamrick et al., 1992;Ony et al., 2020;Wells et al., 2020). With the increase in genetic diversity within species, their ability to adapt to environmental changes increases, whereas the decrease or disappearance of genetic diversity often results in low adaptability, reproduction, and disease resistance (Hamrick et al., 1992). In general, the decline in population size may cause a decline in genetic diversity through genetic drift and inbreeding (Hamrick et al., 1992). In the long term, the decrease in genetic diversity may lead to the loss of the ability of adaptation to environmental changes and evolutionary capabilities (Lande, 1993;Kaljund and Jaaska, 2010). Therefore, the protection and maintenance of genetic diversity has important theoretical and practical implications for the protection and utilization of plant resources and genetic improvement of forest tree species. The genetic diversity and structure of populations or species can be attributed to multiple factors, such as distribution ranges, life forms, breeding systems, seed dispersal mechanisms, evolutionary history, climate factors, and human interference (overexploitation, overgrazing, landslides, and fragmentation) (Hamrick and Godt, 1996;Sarin et al., 2015;Gaisberger et al., 2020;Zhou et al., 2020). Identifying climatic factors affecting genetic diversity may provide insights into genetic conservation of species in the context of rapid global climate change. Many studies have explored the impact of climate change on species distribution and genetic structure, confirming that climate change may indirectly affect species distribution to reduce the richness of genetic resources (Abeysinghe et al., 2000;Tan et al., 2018;Garot et al., 2019;Helmstetter et al., 2020). However, different climatic factors play different roles in this process. So far, little is understood about how precipitation and altitude shape the genetic diversity of species. It is known that P. emblica is only distributed in Southern and Southwestern areas in China, both in typical dry climate areas and wet climate areas (Pathak, 2003). In addition, in the field investigation, we observed that this species had larger population sizes and better fruit traits (such as fruit size, color and taste) in dry climate areas than that in wet climate areas. Therefore, we hypothesized that altitude and geographical location as well as the environmental stress caused by the longterm extreme precipitation synthetically made the P. emblica populations in dry climate areas have high genetic diversity and environmental adaptability. SSR markers have been widely used in genetic diversity studies because of their co-dominance, high polymorphism rate, good reproducibility, and rapid excavation at a low cost (Bouck and Vision, 2007;Nasim et al., 2020). Based on 20 highly polymorphic EST-SSR markers developed in our previous study (Liu et al., 2018), we evaluated the genetic structure, bottleneck effect, and effective population size of 10 P. emblica populations distributed in typical dry-hot climate areas and typical humid climate areas. We aimed to answer the following questions: (1) What is the level of genetic diversity of P. emblica in both of these typical climate areas? (2) What is the relative importance of geographic and environmental distance as well as precipitation and altitude to the genetic diversity of P. emblica? (3) What is the population dynamics of the species? We hoped to provide a reference for the protection and rational utilization of the wild resources of P. emblica as well as theoretical guidance for the ecological restoration and economic development of mountainous areas in China.

Plant Materials and Study Sites
A total of 260 P. emblica samples were collected from 10 wild populations (YMA, YMB, BCA, BCB, BCC, MSA, MSB, MSC, MSD, and MSE) which were distributed in two main climate areas: arid areas (Yuanmou County and Binchuan County in Yunnan) and humid areas (Mashan County in Guangxi) (Figures 1, 2). The altitude and 12 bioclimatic factors (Fick and Hijmans, 2017) showed significant differences between these two climate areas (Supplementary Table S1). The five populations in the Yunnan dry-hot area were located above 1000 m, whereas the five populations in the humid area of Guangxi were all located below 500 m. The average annual precipitation in the area of the five populations in Yunnan was lower than 900 mm, whereas that in the area of the five populations in Guangxi was higher than 1700 mm. In addition, in order to avoid repeated clone sampling, the sampled individuals in every population were at least 15 m apart from each other when collecting leaves. The collected leaf tissues were quickly dried with silica gel and stored at −20 • C until further analyses.

DNA Extraction and EST-SSR Genotyping
Genomic DNA was isolated with the modified CTAB method (Doyle and Doyle, 1987). All samples were genotyped using 20 EST-SSR markers previously developed for the species (Liu et al., 2018). For all markers, PCR amplification (three replicates of each sample) was performed on ABI 2720 Thermal Cycler (Thermo Fisher Scientific, Waltham, MA, United States). The 15 µL reaction mixture contained 1 µL (15-20 ng) of genomic  The PCR program was as follows: 5 min at 94 • C; 30 cycles of 30 s at 94 • C, annealing at 58 • C for 30 s, 30 s at 72 • C; and a final extension step at 72 • C for 5 min. The PCR products were pooled and purified using the Wizard SV Gel and PCR Clean-Up System (Promega) following the manufacturer's instructions. The 5 end of each forward primer for the 20 markers was tagged with one of three fluorescent dyes (FAM, HEX, or ROX), and multiplex PCR amplifications were performed for the 260 individuals of P. emblica using the PCR conditions described above. Allele sizes for the tagged PCR products were obtained using an ABI 3730 sequencer with a GeneScan 500 LIZ Size Standard and GeneMapper 4.1 (Thermo Fisher Scientific, Waltham, MA, United States).
The standard WorldClim bioclimatic variables with a spatial resolution of 30 s (∼1 km 2 ) were obtained from WorldClim version 2 (Fick and Hijmans, 2017), and they were the average values for the period of 1970-2000. Based on the latitude and longitude information obtained in the field, the bioclimatic variables of the 10 investigated populations were extracted using ArcGIS version 10 (ESRI, 2012). Principal component analysis (PCA) of altitude and bioclimatic variables of the 10 populations was conducted using the FactoMineR package (Sébastien et al., 2008).
Mantel and partial Mantel tests were performed using the R vegan package to assess the relationships between genetic diversity indices (Bray-Curtis distance) and geographical distance or environmental variables (Euclidean distance) with 9999 permutations. To further explore the relative importance of altitude and precipitation to genetic diversity, we also conducted a similar correlation analysis with them as environmental distance (Euclidean distance), respectively.
In order to evaluate the population genetic structure of P. emblica, several approaches were used in this study. Firstly, the Bayesian clustering analysis of the population structure was carried out using STRUCTURE version 2.3.4 (Pritchard et al., 2000). The program was run with the admixture model and independent-allele frequency using a burn-in period of 3 × 10 5 and 1 × 10 5 Markov chain Monte Carlo (MCMC) replications. A total of 20 independent runs were performed for each K value ranging from 1 to 20. The most likely number of clusters was evaluated using both K and the log-likelihood value with STRUCTURE HARVESTER version 0.6.94 (Evanno et al., 2005;Earl and VonHoldt, 2012). Furthermore, the hierarchical analysis on 260 P. emblica individuals was performed in the R stats package with UPGMA algorithm based on the Hellinger distance. Following this, cluster analysis on the basis of shared allele distance (DAS) was performed using POPULATIONS version 1.2.30 (Langella, 2007) with the neighbor-joining (NJ) method. The NJ tree was visualized and edited in iTOL version 3 (Letunic and Bork, 2016). Finally, based on Nei's genetic distance (Nei, 1973), principal coordinate analysis (PCoA) of 260 individuals was conducted with ape package version 5.4 (Paradis and Schliep, 2019).
Analysis of molecular variance (AMOVA) was carried out to estimate the partitioning of genetic variation between and within the populations using ARLEQUIN version 3.0 (Excoffier et al., 2005). The effective population sizes for the 10 populations were calculated using LDNE version 1.31 (Waples and Do, 2008) at three levels of lowest allele frequency (P crit = 0.05, 0.02, 0.01) at 95% confidence interval (CI). In order to gain insight into the population dynamic history of P. emblica, the ABC framework in DIYABC v. 2.0 (Cornuet et al., 2014) was used to test the applicability of seven plausible scenarios based on 20 highly polymorphic SSR loci, and the posterior probability of the optimal model was estimated with two methods: direct estimate and logistic regression. The scenario with the highest posterior probability was selected, and the prior settings of the detailed parameters used to simulate the DIYABC scenarios are shown in Supplementary Table S2. Times and effective population sizes are not strictly to scale. Since DIYABC analysis adopts generation numbers, the generation times of P. emblica was assumed to be 5 years and were converted to a calendar year.

Genetic Diversity
A total of 562 alleles were detected in the 20 analyzed SSR loci. The number of alleles (Na), expected heterozygosity (He), and Shannon's diversity index (I) ranged from 18 to 40, 0.485 to 0.921, and 1.324 to 2.851, respectively, with the means of 28.1, 0.796, and 2.232, respectively (  SSR makers were highly polymorphic. Polymorphic information content (PIC ; Table 1), which was also used to evaluate loci polymorphisms (Xie et al., 2010;Wang et al., 2014), showed that 19 loci were highly polymorphic (PIC > 0.5), and only one locus (PE14171) was moderately polymorphic (0.25 < PIC < 0.5), further indicating high polymorphism of these 20 loci. At the population level, the percentages of polymorphic loci (PPL) in all 10 populations were 100%; however, the genetic diversity of each population was different. The YMB population had the highest genetic diversity (He = 0.819), whereas MSE had lowest genetic diversity (He = 0.604) ( Table 2). Moreover, the mean genetic diversity of the 10 populations investigated in the present study (He = 0.792) was at a high level; however, the average genetic diversity of the five populations of P. emblica from dry climate areas of Yunnan (He = 0.786) was higher than that of the five populations from wet climate areas of Guangxi (He = 0.673) ( Table 2). In addition, at the species level, we found that the genetic diversity (He = 0.796) of P. emblica was also at a high level ( Table 2).

Effects of Geographical and Environmental Distances on Genetic Diversity Indices of 10 P. emblica Populations
The PCA results show that PC1 (97.81%) not only divided the seven precipitation-related bioclimatic variables and altitude into two clusters, but also made a clear distinction between the populations in dry and wet climatic areas (Figure 3). The altitude might be the main factor of environmental difference between these two climatic areas, showing a positive effect in the dry climatic area, whereas the precipitation-related bioclimatic variables showed a positive effect in the wet climatic area.
According to the Mantel tests, both environmental distance and geographic distance had significant effects on three genetic diversity indices (Figure 4). Compared with geographic distance, environmental distance had a stronger effect on average number of alleles (Na) and Shannon's diversity index (I), while geographic distance explains more variation of expected heterozygosity (He) (Figure 4). In addition, there was a significant correlation between geographic distance and environmental distance. However, according to the partial Mantel tests, only the pure effect of environmental distance on Shannon's information index (I) was significant (Figure 5; p = 0.036 < 0.05).
Furthermore, the Mantel tests of precipitation and altitude showed that altitude had a greater impact on genetic diversity parameters than precipitation ( Table 3 and Supplementary  Figures S1, S2). In addition, under controlling for geographic distance, altitude was significantly related to both average number of alleles (Na) and Shannon's diversity index (I), while precipitation was only significantly related to Shannon's diversity index (I) and its effect was weaker than that of altitude (Table 3 and Supplementary Figure S2). These results implied that both altitude and precipitation had significant effects on the genetic diversity of P. emblica, however, compared with precipitation, the pure effect of altitude was slightly stronger.

Genetic Differentiation and Genetic Structure of Populations
The results showed that the genetic differentiation between populations (F ST ) was 0.084 (Table 2), indicating a moderate genetic differentiation among the 10 populations (0.05 < F ST < 0.15). The populations YMB, BCA, BCB, BCC, and MSD had significant deviations from the Hardy-Weinberg equilibrium (p < 0.05 or p < 0.01) ( Table 2). The inbreeding coefficients (F IS ) of these five populations were positive; correspondingly, their outcrossing rates (t) were < 1 ( Table 2), indicating the presence of inbreeding in these five populations.
The STRUCTURE analysis showed that K reached the maximum value ( K = 577.307) at K = 2 (Figures 6A,B), which suggested that the 260 individuals from the 10 populations most likely belonged into two principal genetic clusters ( Figure 6C). Cluster II mainly included individuals from the dry climate group, containing YMA, YMB, BCA, BCB, and BCC populations, whereas cluster I consisted of most of the individuals from the wet climate group, containing MSA, MSB, MSC, MSD, and MSE populations (Figures 6D,E). However, there were some individuals with admixed genotype in both two groups, indicating ongoing gene flow or weak genetic differentiation. Although L(K) has the highest value and least variance at K = 4 (Figure 6A), the division of genetic clusters was unclear and the gene introgression between populations was relatively disordered (Supplementary Figure S3), which could not reasonably explain the background of the sampling populations. The result of the  hierarchical analysis showed that 260 individuals were divided into two main groups (dry and wet climate groups) at the distance of 0.02 (Supplementary Figure S4). The neighborjoining tree showed that 260 individuals were grouped into two principal clades (Figure 7). One clade included the majority of the individuals from the dry climate group, and the other clade mainly included the individuals from the wet climate group. However, the PCoA results show that PCoA1 and PCoA2 explained 6.54% and 2.91% of the total genetic variance, respectively (Figure 8). At the species level, the AMOVA revealed that the majority of the observed genetic variation could be attributed to differences within populations (92.44%) rather than to the variation among populations (7.56%) ( Table 4). At the group level, higher genetic variation was observed within populations than among populations in both climate groups; however, the genetic variation within the populations from the dry climate group (96.24%) was higher than that within the populations from the wet climate group (93.36%) ( Table 4). There was moderate genetic differentiation between the two climate groups (F = 0.095); 4.66% of the genetic variations were attributed to the differences among the groups, 90.53% were attributed to the differences within populations, and 4.81% were attributed to the differences among populations within groups ( Table 4).

Effective Population Size and Population Dynamic History Based on ABC Model
Estimations of effective population sizes with the lowest allele frequency (P crit ) are listed in Table 5. At P crit = 0.02 and P crit = 0.01, effective population size of BCC was the largest, whereas MSA had the smallest effective population size. The DIYABC results showed that the optimal model was Scenario 7 (Supplementary Figure S5), i.e., an ancestral range expansion (the ancestral range expansion was the ancient expansion in Figure 9) followed by a recent range bottleneck (Figure 9). The posterior probabilities obtained from this scenario under direct estimate and logistic regression were 76.25% and 93.60%, respectively; followed by Scenario 6, with posterior probabilities (15.80% and 6.00%). In addition, Scenario 7 had the lowest average type I and type II error rates (14% and 0%, respectively) (Supplementary Table S3). The posterior distribution of the dynamic historical parameters evaluated under the best model (Scenario 7) showed that the initial effective population size of the studied P. emblica populations was 8980 (95% confidence interval 7120-9950) individuals (N5; Figure 10 and Supplementary  Table S4), and there was a strong expansion at 126 (55-227.50) Ka, resulting in the effective population size up to 53.79 (25.28-76.50) times (N3) of the original. Next, a certain degree of bottleneck appeared at 49.85 (49.55-50.00) Ka, and the effective population size was further reduced to 0.2 (0.04-0.39) times (N1; Figure 10 and Supplementary Table S4) that of the expansion FIGURE 5 | Plot of partial Mantel tests for the correlation between genetic diversity parameter (Bray-Curtis distance) and explanatory distances (geographic and environmental distance) using Spearman's coefficient.
period. Although the 95% confidence interval was wide, the initial expansion seems to occur in the Last Interglacial (LIG), whereas the most recent bottleneck occurred in the beginning of the Last Glacial Maximum (LGM).

High Genetic Diversity Within Populations and Low Genetic Differentiation Among Populations
The evolutionary potential of a species and its ability to withstand adverse environments depend on both genetic diversity of the species and its population genetic structure (Stebbins, 1950). Different levels of genetic variation and genetic structure of populations can be attributed to multiple factors, such as distribution ranges, life forms, breeding systems, seed dispersal mechanisms, evolutionary history, natural selection, and human interference (Hamrick and Godt, 1996;Sarin et al., 2015;Zhou et al., 2020). In general, the genetic diversity of widespread plant species is higher than that of narrowly distributed species (Hamrick and Godt, 1996), and compared to annual and perennial herbaceous plants, perennial woody plants tend to have higher genetic diversity at the species level and within Alt, altitude as environmental distance measured as Euclidean distance; Geo, geographic distance; Pre, precipitation as environmental distance measured as Euclidean distance using seven precipitation-related bioclimatic variables shown in Figure 3; Na, average number of alleles; He, expected heterozygosity; I, Shannon's information index. *p < 0.05, **p < 0.01, ***p < 0.001. The significances are tested based on 9999 permutations. populations, but lower genetic variation among populations (Hamrick et al., 1979(Hamrick et al., , 1992Chung et al., 2020). As a longlived perennial woody plant widely distributed in tropical and subtropical areas, P. emblica was found to have high genetic diversity at the species level (He = 0.796), and the mean genetic diversity of the 10 populations investigated in the present study (He = 0.792) was similar to that reported for this species in Thailand (He = 0.619) (Pandey and Changtragoon, 2012) and Sri Lanka (He = 0.753) (Mawalagedera et al., 2014), but significantly higher than that reported by Nybom (2004) for multiple plant species (He = 0.22 or 0.23). Compared to congeneric herbs, such as P. amarus (total diversity h T = 0.62; average intrapopulation diversity h S = 0.47; level of genetic differentiation G ST = 0.24) and P. fraternus (h T = 0.77; h S = 0.57; G ST = 0.26) (Sarin et al., 2015), P. emblica showed higher genetic variation at the species level (0.796) and within populations (0.792), but lower genetic differentiation among populations (F ST = 0.084) in the present study. Outcrossing species usually have higher genetic diversity than inbreeding species (Hamrick and Godt, 1989;Nybom, 2004). A previous study revealed that the mating system of P. emblica is dominated by outcrossing (Singh et al., 1998). In the present study, the MSB and MSE populations were mainly outcrossing populations (F IS < 0, t > 1); however, for the other eight populations, the inbreeding coefficients (F IS ) were positive, and the outcrossing rates (t) were lower than 1, indicating insufficient heterozygotes and the possibility of inbreeding. According to our long-term field investigations, this could be attributed to human interference in the habitats of these eight populations, resulting in insufficient heterozygotes and a high inbreeding coefficient. This might also be attributed to hybridization (inbreeding) between adjacent individuals with the same genotype. The richness of plant genetic diversity is related to the complexity of its habitat and ecological factors such as precipitation, temperature, and soil (Abeysinghe et al., 2000), which is a possible reason for different genetic diversity of the same species in different habitats.
The high genetic diversity of species with a high inbreeding coefficient may be related to the heterogeneity and complexity of their habitat. In China, P. emblica is a widely distributed species and occupies different habitats, which could create a rich gene pool through long-term adaptive evolution, leading to increased genetic diversity. In addition, it is also possible that these populations exhibit a high degree of genetic diversity from ancestral populations originating from southeastern tropical Asia (Morton, 1987;Setoguchi et al., 2011). Gene flow plays an important role in population genetic differentiation. Genetic migration causes an increase in genetic variation within populations and a decrease in differentiation among populations, which is the opposite of the genetic drift effect (Whitlock and McCauley, 1999). In addition, the genetic differentiation of natural populations is thought to be a dynamic process which relies on the equilibrium relationship between gene flow and selection (Whitlock and McCauley, 1999). In the present study, the populations YMA, MSA, MSB, MSC, and MSD had not significant deviations from the Hardy-Weinberg equilibrium, the estimation of gene flow (Nm) of these populations of P. emblica was >1, indicating that gene flow was high enough to resist the genetic differentiation among populations caused by genetic drift, thus achieving a homogenizing effect (Wright, 1949). STRUCTURE analysis, NJ tree, and hierarchical analysis revealed that the 260 investigated individuals were grouped into two genetic clusters, and this genetic structure was consistent with geographical distribution, except for some FIGURE 8 | Principal coordinate analysis (PCoA) of 260 P. emblica individuals from 10 populations, based on a genetic distance matrix of 20 EST-SSRs. The PCoA1 explains 6.54% of the total genetic variance, and the PCoA2 accounts for 2.91%. The confidence interval of 95% is showed as polygon. admixed genotypes, which may be a consequence of high genetic variation within populations, resulting in the occurrence of similar genetic variations in the individuals of the two groups. This may be also attributed to genetic migration between the two groups, or to some genes from the ancestral populations shared in the two groups. Pollen spread and seed dispersal are the most important determinants of gene flow. Woody species, especially trees, usually have low population density and large statures, which is beneficial to the long-distance migration of pollen and seeds (Paffetti et al., 1996). In nature, P. emblica is mainly pollinated by wind and bees (Bajpai, 1957), and the long-distance transmission of pollen provides the chances of long-distance genetic migration between different populations (Nakanishi et al., 2020). In nature, P. emblica usually depends on seed propagation, and long-distance seed dispersal must depend on certain fruit-eating animals (Li and Zhao, 2007). Because of the limited range of activities of fruit-eating animals and the limited number of seeds they are able to carry, the contribution of seed dispersal to the homogenization of P. emblica populations is expected to be small. We presumed that the frequent gene flow among P. emblica populations might be caused by pollen flow by wind or pollinators. Therefore, we concluded that the investigated species exhibited high levels of genetic diversity and low levels of genetic differentiation.

Differences Between the Genetic Diversity of P. emblica Populations From the Two Climate Areas
According to the niche breadth variation hypothesis, molecular genetic variation is related to environmental variation, and high level of genetic variation is an adaptive strategy of plants for responding to heterogeneous environments (Levins, 1968). The current research supported the original hypothesis, i.e., the level of genetic diversity in the dry climate group (Na = 11.790, He = 0.786, I = 1.962) was higher than that in the wet climate group (Na = 9.060, He = 0.673, I = 1.555), and both of these values were higher than the genetic diversity of Camellia nitidissima at the population level ; Na = 3.881, He = 0.546, I = 0.988). Moreover, the AMOVA suggested that genetic variation within the populations of the dry climate group (96.24%) was also higher than that within the populations of the wet climate group (93.36%). The Mantel tests confirmed that the genetic diversity of this species was affected by altitude, precipitation and geographic distance. Interestingly, the partial Mantel tests showed that the pure effects of altitude and precipitation only on Shannon's diversity index (I) was significant, which implied that this genetic diversity index was more sensitive to the environmental response to some extent. Just as it is widely used in microbial and forest landscape researches (Allen et al., 2009;Trivedi et al., 2020), Shannon's diversity index (I) is also a key indicator for genetic diversity assessment of woody plants. This study supported the conclusion that altitude significantly affects the distribution of genetic variation in populations (Chen et al., 2008;Hahn et al., 2012). In this study, the populations in low-altitude wet climate areas had lower genetic diversity. On the one hand, it might be because of undergoing stronger genetic bottlenecks and founder effects for these populations (possible range contractions and expansions during the Pleistocene as supported by Scenario 7) (Hensen et al., 2012). On the other hand, it might also be explained by the restriction of sexual reproduction, as the yield and quality of P. emblica populations in the wet climate areas were significantly lower than those in the wet climate areas (Liu et al., unpublished data). Recent studies have proposed that genetic diversity basically exhibited a high-low-high pattern with increasing altitude (Wang et al., 2019). However, this study did not involve the genetic diversity analysis of the populations distributed at intermediate altitudes. However, the characteristics of higher genetic diversity in the populations at high altitudes lay a solid foundation for carrying out artificial assisted transplantation of this species in the context of climate warming. As far as precipitation is concerned, according to the statistical data of 79 vascular plants, the precipitation and seasonal drought had important implications for the genetic diversity of vascular plants species (Tan et al., 2018). Although drought is a limiting factor for plant growth, after long-term adaptive evolution, plants in arid environments can developed different droughtresistant mechanisms (such as resistance genes) as well as genetic structures for adaptation to the environment (Ceccarelli and Grando, 1996;Ricardo-Trejo and O'Connell, 2005). The genetic variation and diversity of the populations in arid environments is predicted to increase as a consequence of drought stress, and the rate of evolution of plant populations in these environments is higher than that of the populations in humid environments (Stebbins, 1952). Therefore, this can explain why P. emblica in the dry-hot valley of Yunnan had higher genetic diversity than that in the humid area of Guangxi. In addition, it is worth noting that the Mantel tests also revealed the significant correlation between environmental and geographical distance, implying a certain FIGURE 9 | Schematic representation of the seven demographic scenarios (including model parameters) for P. emblica tested by the DIYABC method using the SSR loci. (Scenario 1) constant effective population size (N1); (Scenario 2) an old expansion (N2 to N1, t2); (Scenario 3) a recent expansion (N2 to N1, t1); (Scenario 4) an old bottleneck (N3 to N1, t2); (Scenario 5) an recent bottleneck (N3 to N1, t1); (Scenario 6) an old bottleneck (N4 to N2, t2) followed by a recent expansion that led to current effective size (N2 to N1, t1); and (Scenario 7) an old expansion (N5 to N3, t2) followed by a recent bottleneck (N3 to N1, t1).
FIGURE 10 | Schematic representation for the estimates of effective population size and the effective population size transition time of the best DIYABC scenario (Scenario 7). Black vertical bars and white horizontal bars represent 95% confidence intervals for the estimates of effective population size and effective population size transition time, respectively. Note that the y-axis of effective population size value is in log 10 format.
degree of isolation by distance (IBD). Given that intermediate populations between the two climate zones were not sampled and compared, in future researches, further work should be carried out to study the populations distributed in more climatic gradients, altitude gradients and distance gradients to reveal the general effects of altitude, precipitation and geographical distance on genetic diversity. Our results showed moderate genetic differentiation between the two climate groups (F = 0.095). The genetic differentiation among the populations in the wet climate group (F ST = 0.071) was higher than that among the populations in the dry climate group (F ST = 0.042), and it might be the result of combined effects of isolation by distance (IBD) and environmental selection. The effective population size of the dry climate group was larger than that of the wet climate group, and the effective population sizes of BCC was the largest (83.5), whereas MSA had the smallest effective population size (10.1). However, DIYABC analysis suggested the effective population size of P. emblica might have contracted in the beginning of the Last Glacial Maximum. The above results supported that genetic differentiation was associated with decline in the effective population size (Frankel et al., 1995).

Population Dynamics
Compared with the expansion period, the most likely Scenario 7 revealed that the effective population size of P. emblica shrank by approximately 0.8 times at 49.85 Ka. Although there was a certain degree of uncertainty in the estimation of population dynamics history in terms of the 95% confidence interval, the detected bottleneck events corresponded well to the beginning of LGM (Clark et al., 2009). The effective population size of P. emblica after experiencing a genetic bottleneck was lower than the estimated value of the ancestral population of the model plant Poplar (Populus) (Ingvarsson, 2008), implying the intensity of the bottleneck experienced to some extent. After experiencing the bottleneck effect to some extent, the populations of P. emblica still maintained high genetic diversity, which may have been a consequence of gene mutation or high gene flow, and consequently, the level of genetic variation remained close to or returned to its original level. However, if human disturbance continues in the future, it will probably result in population decline, habitat fragmentation, or even population isolation of this species, and P. emblica may experience the bottleneck effect once again. According to our long-term field investigations, significant deviations from Hardy-Weinberg equilibrium in the YMB, BCA, BCB, BCC, and MSD populations may be a consequence of human activities which negatively influenced the habitats of these five populations, resulting in heterozygote deficiency and high inbreeding coefficient in these populations (Bourguiba et al., 2020). It is worth noting that P. emblica is not distributed in Northern China, whereas almost continuous distribution areas are formed in Southern and Southwestern China (Pathak, 2003). Based on dynamic history of this species preliminarily speculated in this study, the unique distribution bottleneck in China may be explained by the detected genetic bottleneck event that occurred in the beginning of LGM.

Implications for Ecological Restoration
China has abundant resources of P. emblica. However, this famous plant species which has important medicinal, commercial, and ecological values is being consumed at a very high and alarming rate, with the increase in human population leading to the increase in the demand for such plant materials. In Yunnan and Guangxi, the natural P. emblica forests are commonly being cut down for fodder, fuel, and wood. Because of the negative influence of climate change and human activities, the native vegetation types in the dry-hot P. emblica habitat have declined and are almost non-existent. The existing habitat types mainly include semi-savannah vegetation similar to the valley savannah vegetation in Africa (Jin and Ou, 2000), and in some places, they have even become bare lands and barren hills, leading to the destruction of biodiversity and ecosystem stability as well as serious soil erosion. The habitat of P. emblica in the humid climate area of Guangxi is karst landform, where plants live in harsh and extremely fragile conditions with serious soil erosion and high ecological sensitivity. Over-exploitation is a serious threat to the survival of this species, and currently, it is difficult to find well-preserved wild P. emblica forests in China. At the same time, the harsh and fragile habitats of P. emblica are also facing various threats, including climate change and human activities, and it is difficult for the ecosystem to recover naturally after these disturbances. Therefore, it is crucial to protect and rationally utilize the wild resources of P. emblica as well as to put efforts into the ecological restoration of its habitats.
P. emblica is a species with excellent tolerance to drought and barren soil because of its well-developed roots, and is often used as a pioneer tree species for forestation. Meanwhile, its high genetic diversity provides more possibilities to breed highquality varieties of this species to be used as an important medicinal economic plant. The genetic resource conservation of local P. emblica populations should be considered mainly for the ecological restoration of dry-hot valleys and karst areas. In addition to in situ conservation of the populations with high genetic diversity, ex situ germplasm collection for different purposes, such as breeding and conservation, is also significant for achieving our goals. Compared to the five populations from the Guangxi area, the five populations from the dry-hot valley of Yunnan had higher genetic diversity, particularly the YMA and YMB populations, and attention should be payed to this difference. Furthermore, given that the genetic variation mainly existed within populations, more individuals should be selected and propagated within populations used for ecological restoration in Yunnan dry-hot valley area and Guangxi karst area. In the present study, P. emblica samples were divided into two gene clusters, which should be taken into account when collecting germplasm resources and carrying out ecological restoration. Finally, environmental protection should be advocated in order to increase local farmers' awareness of the protection of this valuable species, and it should be forbidden to cut down P. emblica trees in its ecological restoration areas.

CONCLUSION
According to our investigations of the genetic diversity and genetic structure of P. emblica in two different typical climate areas of China by polymorphic EST-SSR markers, P. emblica had more genetic diversity at the species level and within populations, but less genetic variation among populations. However, the genetic diversity of P. emblica in the dry climate area of Yunnan was higher than that in the wet climate area of Guangxi, and both altitude and precipitation had significant pure effects on this result. In addition, we found that P. emblica populations had experienced a bottleneck event in the beginning of the Last Glacial Maximum. The findings of this study will provide new ideas and guidance for the protection and rational development and utilization of P. emblica resources, and they will also provide a scientific basis for the ecological restoration and economic development in Yunnan dry-hot valley areas and Guangxi karst areas.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Materials, further inquiries can be directed to the corresponding author/s.