Species Delimitation and Lineage Separation History of a Species Complex of Aspens in China

Species delimitation in tree species is notoriously challenging due to shared polymorphisms among species. An integrative survey that considers multiple operational criteria is a possible solution, and we aimed to test it in a species complex of aspens in China. Genetic [four chloroplast DNA (cpDNA) fragments and 14 nuclear microsatellite loci (nSSR)] and morphological variations were collected for 76 populations and 53 populations, respectively, covering the major geographic distribution of the Populus davidiana-rotundifolia complex. Bayesian clustering, analysis of molecular variance (AMOVA), Principle Coordinate Analysis (PCoA), ecological niche modeling (ENM), and gene flow (migrants per generation), were employed to detect and test genetic clustering, morphological and habitat differentiation, and gene flow between/among putative species. The nSSR data and ENM suggested that there are two separately evolving meta-population lineages that correspond to P. davidiana (pd) and P. rotundifolia (pr). Furthermore, several lines of evidence supported a subdivision of P. davidiana into Northeastern (NEC) and Central-North (CNC) groups, yet they are still functioning as one species. CpDNA data revealed that five haplotype clades formed a pattern of [pdNEC, ((pdCNC, pr), (pdCNC, pr))], but most haplotypes are species-specific. Meanwhile, PCA based on morphology suggested a closer relationship between the CNC group (P. davidiana) and P. rontundifolia. Discrepancy of nSSR and ENM vs. cpDNA and morphology could have reflected a complex lineage divergence and convergence history. P. davidiana and P. rotundifolia can be regarded as a recently diverged species pair that experienced parapatric speciation due to ecological differentiation in the face of gene flow. Our findings highlight the importance of integrative surveys at population level, as we have undertaken, is an important approach to detect the boundary of a group of species that have experienced complex evolutionary history.


INTRODUCTION
Species is a fundamental unit of biology, but there has been much debate about how to define species (e.g., Sites and Marshall, 2003;De Queiroz, 2007). During the last decade, great efforts have been made to delimit plant species based on DNA sequence variation (Kress et al., 2005;China Plant BOL Group et al., 2011;CBOL Plant Working Group et al., 2009), yet species delimitation between closely related plant species remains a challenge (Naciri and Linder, 2015). Speciation is a temporally extended process, typically requiring millions of years before total reproductive isolation is achieved (Coyne and Orr, 2004;Seehausen et al., 2014). During speciation, divergence does not happen at an even rate across the genome, because of selection, genetic drift, reinforcement (if sympatric), and varying mutation rates between DNA regions (Noor and Feder, 2006;Nosil et al., 2009;Nosil and Feder, 2012;Abbott et al., 2013).
Of particular concern to efforts to delimit plant species based on DNA markers is lineage sorting (Naciri and Linder, 2015). Lineage sorting ultimately renders diverging species reciprocally monophyletic for genetic markers, but until this process is completed, one or both species may appear non-monophyletic for some DNA markers, even if they have achieved complete reproductive isolation (Shaffer and Thomson, 2007;Freeland et al., 2011).
Species need not be completely reproductively isolated, provided there is some ecological separation ; indeed a unifying concept defining species as separately evolving meta-population lineages is now widely accepted (De Queiroz, 2007;Fujita et al., 2012;Su et al., 2015). Hence interspecific gene flow (i.e., introgression) can, like incomplete lineage sorting, also lead to shared polymorphisms between closely related species (Degnan and Rosenberg, 2009). Given that ecological niches commonly overlap within plants, there is no single operational criterion that can consistently reveal true boundaries between closely related species (Givnish, 2010). A potential solution to this is to simultaneously evaluate multiple operational criteria, for example reciprocally monophyletic haplotypes or genotypes, reproductive isolation, ecological divergence, and distinct morphology (De Queiroz, 2007;Bond and Stockman, 2008;Fujita et al., 2012;Su et al., 2015); boundaries between species will be found where these criteria are largely in agreement (e.g., Leaché et al., 2009;Satler et al., 2013;Su et al., 2015).
The genus Populus L. (Poplars, Salicaceae) is widely distributed in the Northern Hemisphere (Bradshaw et al., 2000;Hamzeh and Dayanandan, 2004;Cervera et al., 2005), and plays an important ecological role in boreal and temperate forests, serving as wildlife habitats and watersheds; they can dominate riparian forests, but are ecologically adaptable (Braatne et al., 1992;Dickmann, 2001). In addition, they are widely cultivated for their wood (Dickmann and Stuart, 1983;Stettler et al., 1996;Heilman, 1999). However, due to high levels of morphological variation and extensive interspecific hybridization, species delimitation within Populus is highly contentious (Eckenwalder, 1996;Hamzeh and Dayanandan, 2004;Fladung and Buschbom, 2009;Schroeder et al., 2012). The number of proposed species in Populus has ranged from 22 to 85, plus hundreds of hybrids, varieties and cultivars (Eckenwalder, 1977(Eckenwalder, , 1996Dickmann and Stuart, 1983;Hamzeh and Dayanandan, 2004). Various markers have been tested for use in differentiating species, hybrids, and even clones of Populus, i.e., nuclear DNA fragments, simple sequence repeats (SSRs), amplified fragment-length polymorphisms (AFLPs), chloroplast DNA fragments, and mitochondrial DNA fragments (Cervera et al., 2005;Smulders et al., 2008;Feng et al., 2013;Wan et al., 2013). Based on a sample of 95 individuals from 21 native Chinese Populus species, it was found that the sharing of chloroplast haplotypes and nuclear genotypes among closely related species is common (Feng et al., 2013). From this, Populus in China might better be regarded as a series of species complexes, i.e., groups of closely related species that are difficult to differentiate and may still exchange some germplasm. Species complexes could be separated from one another relatively easily using sparse sampling and a universal DNA barcode, but species delimitation within a species complex would require dense, population-level sampling, and highly variable markers (Feng et al., 2013).
The P. davidiana-rotundifolia complex, within section Populus, comprises P. davidiana and P. rotundifolia (Fang et al., 1999). Populus davidiana occurs in northern and central parts of China, plus Mongolia, Korea, and the Far East of Russia. Populus rotundifolia occurs in southwestern China, specifically the southeastern Qinghai-Tibetan Plateau, the Hengduan Mountains, and the Yunnan-Guizhou Plateau; also Bhutan (Fang et al., 1999). Where allopatric, the two species differ consistently in subtle morphological traits (see Table S1). However, transitional morphological traits blur the distinction between them where their ranges meet, i.e., the eastern Qinghai-Tibetan Plateau to central China. Based on 14 individuals, these two species together formed a monophyletic group for cpDNA and were identical for nuclear ITS (Feng et al., 2013); they were shown to be closely related to each other in phylogenetic and population genetic studies (Wang Z. et al., 2014;Du et al., 2015).
In the current study, we sought to identify independent evolutionary lineages within the P. davidiana-rotundifolia complex. We surveyed and analyzed genetic variation of four chloroplast DNA (cpDNA) regions and 14 nuclear microsatellite loci (nSSR) for 375 individuals from 76 populations, and conducted morphometric analyses of leaf traits for representative populations across the distribution range of P. davidiana and P. rotundifolia. Subsequently, Bayesian clustering of nSSR genotypes were adopted to differentiate separate evolutionary lineages, Principal Component Analysis (PCA) was used to examine morphological variation across lineages, a maximum likelihood model (MIGRATE) was employed to assess gene flow between/among lineages, and ecological niche modeling was conducted to quantify niche differentiation between/among lineages. We aimed to address the following questions: (1) How many species are there in the P. davidiana-rotundifolia complex using an integrative survey, e.g., genetic variation, morphological variation and ecological divergence? (2) What lineage separation history has the species complex experienced?

Sample Collection
Populations of the Populus davidiana-rotundifolia complex were sampled throughout its geographical distribution in China. We sampled a total of 375 trees from 76 populations. From 3 to 5 trees at least 100 m apart in each population, leaf samples were taken and dried immediately in silica gel for DNA extraction. No population was encountered that appeared to contain both P. davidiana and P. rotundifolia. Latitude, longitude, and altitude for each sampled population were recorded using an Etrex GIS monitor (Garmin, Taiwan; Table S2; Figure 1).

DNA Isolation, PCR, Genotyping, and Sequencing
Total genomic DNA was isolated from each individual using the hexadecyltrimethyl ammonium bromide (CTAB) method (Doyle and Doyle, 1987), following the modifications of Su et al. (2015). The total genomic DNA was then subject to nSSR genotyping and cpDNA fragments sequencing. A total of 16 SSRs primers were developed based on the genome sequences of Populus euphratica Oliv. and P. trichocarpa Torr. using the MIcroSAtellite (MISA) program (Thiel et al., 2003;Ma et al., 2013;Jiang et al., 2016); these were used to genotype each individual (see Table S3A for details of each SSR primer pair). The PCRs were performed in a volume of 25 ml, which contained: 50-100 ng diluted genomic DNA, 0.5 mM of each dNTP, 0.5 µl of each primer, 2.5 µl 10× Taq buffer and 0.5 unit of Taq polymerase (Vazyme Biotech, Nanjing, China). The PCR program used was: initially a single cycle at 95 • C for 5 min, followed by 36 cycles at 95 • C for 45 s, 55 • C for 40 s, and 72 • C for 80 s, with a final extension at 72 • C for 10 min. The PCR products were checked on 1% agarose gels and sent to Honor Tech (Beijing, China) for nSSR genotyping. Allele sizes for each nSSR locus were analyzed with GeneMarker version 2.2.0 (Softgenetics, Pennsylvania, USA).
We also sequenced four chloroplast DNA (cpDNA) fragments: matK, trnG-psbK, psbK-psbI, and ndhC-trnV, for three individuals from each sampled population; in addition, one individual of P. adenopoda was sequenced as outgroup. Primers for the ndhC-trnV fragment were designed according to the complete chloroplast genomes of P. rotundifolia (GenBank accession number KX425853; Zheng et al., 2016) and P. alba (GenBank accession number AP008956; Okumura et al., 2006; Table S3B). Primers for the other cpDNA fragments were taken from Feng et al. (2013) (Table S3B). Protocols for all cpDNA PCRs followed Schroeder et al. (2012) and the China Plant BOL Group et al. (2011). The PCR products were checked with 1% agarose gels and sent to Tsingke Biological Technology (Beijing, China) for DNA sequencing.

Nuclear Microsatellite Data: Genetic Diversity and Population Structure
Steps were taken to minimize two types of potential error at each nSSR locus. First, the effective allele sizes that are generated by ABI sequencers may often be longer or shorter than the true allele size. Therefore, we used the Program FlexiBin (Amos et al., 2007) to automate the binning of nSSR alleles in order to obtain accurate genotyping results. Second, null alleles are alleles that fail to amplify in a PCR (Oddou-Muratorio et al., 2009); these are common in microsatellites. The inclusion of null alleles in population genetic analyses can lead to false results, e.g., an apparent excess of the proportion of homozygous genotypes within a population as compared to the expected proportion under Hardy-Weinberg equilibrium (Paetkau and Strobeck, 1995). Therefore, we checked for null alleles using CERVUS version 3.0 (Kalinowski et al., 2007;http://www.fieldgenetics. com). Two SSR loci that showed high null allele frequencies (GCPM_126, PeuSSR_4817; p > 0.40; Dakin and Avise, 2004) were excluded from all subsequent analyses.
Since all population genetic analyses will require a delimitation of separate evolutionary lineages, we first conducted a Bayesian clustering approach implemented in STRUCTURE version 2.3.4 (Pritchard et al., 2000) to infer the number of randomly mating groups in the P. davidiana-rotundifolia complex. STRUCTURE simulations were run, using the admixture model separately with each of correlated allele frequencies and independent allele frequencies (Miao et al., 2013;Havrdová et al., 2015;Zeng et al., 2015), under K-values from 1 to 8. Each simulation had 20 independent repeats, and comprised a burn-in of 500,000 steps followed by 1,500,000 MCMC (Monte Carlo Markov Chain) steps. The optimal value of K was determined using the method of Pritchard et al. (2000) and Evanno et al. (2005). To visualize the STRUCTURE output, we used Structure Harvester (http://taylor0.biology.ucla.edu/ structureHarvester/; Earl and von Holdt, 2012). To cross-validate the results of STRUCTURE, we also conducted a Principal Coordinates Analysis (PCoA) on the nSSR data using GenAlEx version 6.5 (Peakall and Smouse, 2012).
Having identified separate groupings, here termed evolutionary lineages (i.e., potential species), using STRUCTURE, we conducted a series analyses. Genetic diversity indices were estimated in GenAlEx version 6.5 (Peakall and Smouse, 2012), for each population in each presumed evolutionary lineage across all SSR loci. For each nSSR locus, descriptive statistics were assessed by estimating the average number of alleles (A a ), effective number of alleles (A e ), observed heterozygosity (H o ), expected heterozygosity (H e ), Shannon's information index (I) (Lewontin, 1972), Nei's (1973) expected heterozygosity, and F-statistics (Wright, 1965(Wright, , 1978. The distribution of genetic variation was examined using analysis of molecular variance (AMOVA) as implemented in ARLEQUIN version 3.0 (Excoffier et al., 2005), with significance tests based on 1,000 permutations. Genetic variation was hierarchically partitioned into three levels: among evolutionary lineages, among populations within evolutionary lineage, and within populations.
To test the significance of isolation by distance, we performed a Mantel test on the matrix of genetic distances and the matrix of geographical distances between populations with 1,000 random permutations, using GenAlEx version 6.5 (Peakall and Smouse, 2012). This was done separately for each evolutionary lineage, and for all samples grouped as one species complex.

CpDNA Data: Genetic Variation and Phylogeographic Structure
Sequences were edited and aligned with ClustalW in MEGA 5 (Tamura et al., 2011) with subsequent manual adjustments. All sequences were then deposited in GenBank (Accession Numbers: KY285968-KY285983, KY285946-KY285967). Haplotypes and variable sites were identified in DnaSP v5 (Librado and Rozas, 2009), and indels were coded as single binary characters using Gapcoder (Young and Healy, 2003). Network version 4.2.0.1 (Bandelt et al., 1999) was then used to construct the network of relationships between haplotypes according to the median-joining model. Additionally, a phylogenetic tree of haplotypes was constructed based on cpDNA sequences using Bayesian method. Bayesian analysis was conducted using the parallel version of MrBayes 3.2 (Ronquist et al., 2012). MrBayes was run for 10,000,000 generations, sampling and printing every 1,000 generations. Two independent Markov chain Monte Carlo (MCMC) chains runs with four chains (one cold and three hot) were conducted per Bayesian analysis. Subsequently, genetic diversity was estimated, including haplotype diversity (H d ) and nucleotide diversity (π s ) at the species and the population level respectively using DNASP v5. Furthermore, using 1,000 permutations within PERMUT (available at: http://www.pierroton.inra.fr/genetics/ labo/Software/PermutCpSSR) we estimated the population differentiation coefficients G ST and N ST , the total genetic diversity (H T ), and average genetic diversity within population (H S ). A significantly larger N ST than G ST implies the presence of significant phylogeographic structure (Pons and Petit, 1996). Finally, a hierarchical analysis of genetic differentiation for cpDNA was examined between and within the evolutionary lineages by analysis of molecular variance (AMOVA) as implemented in ARLEQUIN version 3.0 (Excoffier et al., 2005), with significance tests based on 1,000 permutations.

Examination of Gene Flow between the Two Evolutionary Lineages
Based on nSSR variation, cpDNA variation, and previous phylogeographic hypotheses Liu et al., 2014;Bai et al., 2016), we divided populations of the species complex into three range sectors to aid analysis: Southwestern China ("SWC", P1-P42), Central-North China ("CNC", P43-P60), and Northeastern China ("NEC", P61-P76; Table S2; Figure 6B). Historical gene flow among different evolutionary lineages and range sectors of the P. davidiana-rotundifolia complex was assessed using the software package MIGRATE version 3.2.6 (Beerli, 2006). The amount of immigrants received per generation from neighboring populations and direction of gene flow was estimated from the nSSR data by calculating mutationscaled effective population sizes (θ = 4N e µ) and mutationscaled migration rates (M = m/µ), under an assumption that θ is variable between/among populations and M is symmetric between any pair of populations. We adopted the mutation rate of 10 −3 per gamete per generation in Populus (Lexer et al., 2005), and applied the continuous Brownian motion model. We used uniform priors for both effective population size (θ = 4N e µ) and mutation-scaled migration (M = m/µ) with ranges (0, 0.05) and (0, 5,000), respectively. The initial 100,000 steps were discarded as burn-in, and followed by two long chains of 10,000,000 steps, with sampling every 100 steps, under a constant mutation model. After checking for data convergence from two parallel runs, we estimated for each parameter the mode, mean, median, and 95% highest posterior density (HPD) values. The MIGRATE analyses were evaluated based on effective sampling size as well as the posterior distribution of each parameter.

Ecological Niche Modeling
To determine the degree of ecological divergence between the evolutionary lineages comprising the P. davidiana-rotundifolia complex, we employed ecological niche modeling (ENM) to predict their potential distribution at present, during the Middle Holocene [MH, ca. 6,000 years ago (Kya) before present] and the last glacial maximum (LGM, ca. 21 to 18 Kya before present). To model the ecological niches of each evolutionary lineage, the maximum entropy machine-learning algorithm was implemented in MAXENT v3.3.3 software package (Phillips et al., 2006;Pearson et al., 2007;Phillips and Dudík, 2008). All 76 locilities of occurence, from our field survey dataset (Table S2) were assigned to lineages and range sectors according to our DNA analyses, creating a subset of occurrence points for each lineage and range sector. Because the northwestern part of the range of the P. davidiana-rotundifolia complex contained only one genealogical cluster from the STRUCTURE analysis when K = 2, five additional records from Korea (Table S4; Lee et al., 2011) were assumed to belong to this lineage, and used to provide additional presence points for it.
For each of the three periods, 20 environmental variables (altitude and 19 bioclimatic variables) were downloaded from WorldClim database (Hijmans et al., 2005). Data layers were of 2.5 arc-min spatial resolutions. To avoid contradiction between different global climate models during the MH (CCSM4, MIROC-ESM, MPI-ESM-P, and the other six models available at the WorldClim database) and the LGM (CCSM4, MIROC-ESM, MPI-ESM-P), we generated average-over-pixel bioclimatic variables for these two periods using DIVA-GIS 7.5 (http://www. diva-gis.org/; Hijmans et al., 2001). To reduce over-fitting of ecological niche modeling, we conducted Pearson correlation for environmental variables using the methods of Sheppard (2013). Any environmental variable that possessed pairwise Pearson correlation (with any other variable) greater than 0.75 with two or more other environmental variables was excluded; this reduced to 10 the number of environmental variables. These 10 ( Table S5) were used to model the distributional ranges of each evolutionary lineage.
ENMs were constructed according to the present-day environmental layers and then projected onto the MH and the LGM periods. The maximum entropy model was simulated for 20 replicates, 80% of the distribution coordinates for training and 20% for testing, and the maximum number of iterations was set to 5,000. The "10 percentile presence" threshold was applied because presence-only data were available. The output format was set to be logistic, and for each grid cell the probability of suitable environmental conditions may range from 0 to 1. DIVA-GIS version 7.5 (Hijmans et al., 2001) was employed to draw the graphics for the potential distributions of niche model for each period.
To evaluate the performance of each niche model, the area under the ROC curve (AUC) can quantify the ability of the model to discriminate between sites with or without the presence of the species in question (Peterson et al., 2008;Elith and Leathwick, 2009). AUC values range from 0 to 1, where a score of 1 indicates perfect discrimination, a score of 0.5 indicates that the model performs no better than random, and scores above 0.7 are considered to indicate good model performance (Fielding and Bell, 1997). We also performed a jackknife test to measure the percent contributions of different environmental variables to model simulations.
To measure niche differences between evolutionary lineages or range sectors, we calculated Schoener's D (Schoener, 1968) and standardized Hellinger distance (calculated as I) in ENMTOOLS version 1.3 (Warren et al., 2008(Warren et al., , 2010. Both D and I ranged from 0 (no niche overlap) to 1 (identical niches). We then performed an identity test with 100 replicates to estimate the similarity of the ENMs of the identified evolutionary lineages. The observed niche overlaps were compared with the null distribution, and tested for significance; histograms were drawn using R 2.13 (http://www.rproject.org/).

Detecting the Differentiation of Morphological Traits
Finally, to test whether or not the differentiation of morphological traits corroborates with genetic divergence, we examined a set of morphological traits of leaves and analyzed their pattern of variation. We took images from 252 representative herbarium specimens, gathered from 53 of the sampled populations during fieldwork (23 populations, 118 specimens for P. rotundifolia; 30 populations, 134 specimens for P. davidiana; Figure S1), and transformed every image into a vector diagram using tpsUtil32 software. We recorded the x and y coordinates of 16 landmarks from the leaf blade, and 1 ruler landmark from each image by using TPSDIG (Rohlf, 2001). We implemented morphometrics analyses in MORPHOJ software package, within which a principal component analysis of morphological variations was conducted and plotted (http:// www.flywings.org.uk/MorphoJ_page.htm; Klingenberg, 2011).

Population Genetic Diversity and Structure Inferred From Nuclear Microsatellite Markers
We genotyped 16 nSSR loci for 375 sampled individuals from 76 populations of the P. davidiana-rotundifolia complex. Two loci that showed a high frequency of null alleles were eliminated from further analyses. A total of 141 alleles were scored for the remaining 14 loci, and across all populations the number of alleles per locus varied from 4 to 19 alleles, with an average of 10.071 (Table S6). Averaged across all 76 populations, allele number (A a ) was 34.408, effective allele number (A e ) per locus was 1.990, observed heterozygosity (H o ) was 0.390, and expected heterozygosity (H e ) was 0.376 (Table S7). The fixation index averaged across all loci (average F ST = 0.363; Table S6) indicated a pronounced level of genetic differentiation among populations.
Our Bayesian clustering analyses using STRUCTURE with correlated allele frequencies suggested that the optimal number of free mating meta-populations across the 76 sampled populations is two (K = 2). The log-likelihood value reached a plateau after K = 2, although it increased gradually as K raised from 2 to 8; meanwhile, the delta K had a single peak value at K = 2 (Figure 2). When K = 2, the southwestern populations clustered into one group and the northeastern and central populations clustered into the other group, although lineage admixture was observed in a few populations where the distribution of the two lineages overlapped (Figure 3). Similar results were obtained using STRUCTURE with independent allele frequencies (see Figure S2). The PCoA based on genetic distance revealed a clear separation between the same two lineages (Figure 4). Under STRUCTURE analysis with K = 3, southwestern populations remained as one group, whereas the northeastern populations now formed a separate group from the central populations although there was considerable admixture between them ( Figure S3).
Mantel tests revealed a significant correlation between geographical distance and genetic differentiation across the P. davidiana-rotundifolia complex (r 2 = 0.0407, P = 0.01; Figure 5). However, when Mantel tests were applied to each of the two evolutionary lineages separately, no significant correlation between geographic structure and genetic differentiation was detected (southwestern cluster: r 2 = 0.0003, P = 0.420; central/northeastern cluster: r 2 = 0.0001, P = 0.430). These results suggested that the hierarchical population structure is credible, as geographic isolation may have contributed to differentiation between lineages but not within them.
At the same time, AMOVA analyses revealed that 15.58% of genetic variation was attributed to genetic differentiation between the two groups (i.e., evolutionary lineages), 17.47% was due to genetic differentiation among populations within groups, and 66.95% was ascribed to genetic differentiation between individuals within populations ( Table 1).
Therefore, the allocation of genetic variation at our 14 sampled nSSR loci suggested that the species complex comprises   Figure 2C for the histogram of STRUCTURE assignment test. Brown and orange dashed lines encompass the putative assignment of populations to P. davidiana and P. rotundifolia, respectively. two separate evolutionary lineages. Since the geographic distributions of them roughly correspond to that of P. davidiana and P. rotundifolia, from here on we will refer to the southwestern populations (SWC sector, and Group 1 in Figures 2-4) as P. rotundifolia, and the northeastern and central populations as P. davidiana (NEC + CNC sectors; Group 2 in Figures 2-4).

Genetic Variation of CpDNA Markers
The total length of the alignment matrix of concatenated cpDNA sequences is 2,113 bp, within which 14 substitutions and 21 indels were detected (Table S8). These polymorphisms differentiated a total of 21 haplotypes (Figure 6), which were clustered into five clades (I-V) according to NETWORK analysis ( Figure 6A). Among these, haplotypes H6-H12 were present only in the populations of P. rotundifolia; these form subgroups III and IV in the network analysis, which occur only in the SWC range sector (Figure 6). Likewise haplotypes H18-H21 form Group I, a monophyletic clade present only in the NEC range sector of P. davidiana. The other two groups, II and V, each formed monophyletic clades, but were shared between evolutionary lineages. Group V comprised five haplotypes (H13-H17), of which four H13-H16 were present in P. davidiana's CNC range sector, whereas H17 was only in the westernmost edge of the NEC range sector. Notably, H13 also occurred disjunctly in the southern part of P. rotundifolia's range (Figure 6; Figure S4). Group II likewise occurred mainly in the western part of P. davidiana's range; but was also in three populations of P. rotundifolia. Among the haplotype groups, I is basal, Group V is derived from Group IV, and these two together are sister to a clade wherein Group II is sister to Group III ( Figure 6B). Bayesian analysis also supported the basal position of Group I, but could not resolve relationships among the other four groups ( Figure S5). A significant phylogeographic structure was detected for cpDNA across the whole species complex, as well as within each of P. davidiana and P. rotundifolia individually (N ST > G ST , P < 0.05; Pons and Petit, 1996; Table 2), reflecting that genetic closely related haplotypes tend to occur in adjacent areas (Figure 6). Furthermore, AMOVA analyses of the cpDNA sequence dataset revealed a high F CT value, which indicated significant differentiation between P. davidiana and P. rotundifolia (F CT = 0.3162, P < 0.01; Table 1), although the percentage of variation among populations within evolutionary lineages (45.90%) is higher than variation between evolutionary lineages (31.62%).
The total genetic diversity (H T ) and the diversity within populations (H S ) based on cpDNA were higher in P. davidiana than P. rotundifolia ( Table 2). The within-population haplotype diversity (H d ) was 0.7523 for P. rotundifolia, 0.8884 for P. davidiana, and 0.8950 across all populations. Nucleotide diversity (π s ) was 0.00112 for P. rotundifolia, 0.00207 for P. davidiana, and 0.00189 across all populations.

Examination of Gene Flow between and Within the Two Evolutionary Lineages
The MIGRATE analysis produced a single module posterior distribution for θ and M parameters and the effective sampling size of all parameters are >5,000. θ for P. rotundifolia was slightly higher than P. davidiana, and effective immigration was slightly  Table S2. In (A), the black dot represents an outgroup haplotype from P. adenopoda that was involved as outgroup for rooting purpose; each circle represents a haplotype and circle sizes are proportional to the number of samples per haplotype; oval black dashed lines encompass haplotypes representing the five cpDNA haplotype groups. Brown and orange dashed lines in (B) delineate P. davidiana and P. rotundifolia, and gray dashed line in (B) delineate the Central-North China (CNC) and Northeastern China (NEC) regions within P. davidiana.  Figure S6).

Ecological Niche Modeling
The predicted distributions of the two evolutionary lineages at present, during the MH and the LGM are illustrated in Figure 7A. The respective areas under the receiver operating characteristic curve (AUC) values for the present-day model, the MH model and the LGM model, for different groupings, were as follows: P. rotundifolia, 0.989 ± 0.004, 0.988 ± 0.007, 0.985 ± 0.006; P. davidiana, 0.954 ± 0.019, 0.948 ± 0.022, 0.957 ± 0.018. This indicates that all models were better than random expectation. According to variable jackknife analyses, the environmental variables that contributed most to potential models were Altitude, Isothermality (bio 3) and Mean Temperature of the Driest Quarters (bio 9) for P. rotundifolia, and Mean Temperature of the Driest Quarters (bio 9), Precipitation of Wettest Month (bio 13) and Precipitation of Warmest Quarter (bio 18) for P. davidiana ( Figure S7). Although, the potential distribution of both evolutionary lineages for the present day partly overlaps in the southwestern China (Figure 7A), the niche identity tests of the pair of evolutionary lineages showed that observed values for both I and D were significantly smaller than the predicted scores under the null hypothesis ( Figure 7B), suggesting that they occupy significantly different ecological niches. Meanwhile, considering that P. davidiana populations in the NEC and CNC are genetically different from each other, we further tested ecological niche differentiation between them, as well as between the CNC populations and P. rotundifolia (i.e., the SWC populations) ( Figure S8A). The AUC values for the present-day model of CNC and NEC populations of P. davidiana were 0.966 ± 0.017 and 0.968 ± 0.023, respectively. Niche identity tests suggested that CNC occupies a significantly different ecological niche from both NEC and SWC (Figures S8B,C).

Principal Component Analysis on Morphology
Based on morphological traits of all specimens sampled across the entire species complex, statistical analysis detected no clear differentiation between P. davidiana and P. rotundifolia (Figure 8). However, when populations of P. davidiana from the CNC and NEC sectors are treated separately, then a clear dividing line appears between NEC and P. rotundifolia, whereas CNC overlaps with P. rotundifolia more than it overlaps with NEC (Figures S9A,B). Hence the CNC populations might contain an admixture of characters from the two lineages.

Recognition of Two Species Based on Multiple Nuclear Loci and Ecological Niche Differentiation
An integrative survey of 76 representative populations of the P. davidiana-rotundifolia complex revealed clear evidence from nSSRs for two separate evolutionary lineages (Figures 2, 4). Bayesian clustering suggested that the most likely number of free mating meta-populations is two (Figure 2), and PCoA of genetic variation based on genetic distance also supported the same genetic clustering pattern (Figure 4). Admixture between lineages for these markers is only detectable in those populations close to the contact zone of the two evolutionary lineages (Figure 3), consistent with the limited gene flow between lineages that was indicated by our coalescent-based approach ( Table 3A).
The two lineages occur in central (CNC) to northeastern (NEC) China, and in southwestern China (SWC; Figure 3), which corresponds with the respective distributions of the described species P. davidiana and P. rotundifolia (Fang et al., 1999). Ecological niche modeling confirms that these two lineages have a clear ecological separation (Figure 7). Considering only areas predicted to have high habitat suitability (>0.50), the distributions of the two lineages have no overlap; however when predicted ranges also incorporate areas of lower suitability (0.15-0.50), then there is overlap in the eastern Hengduan mountains and northern Yunnan-Guizhou Plateau (Figure 7A). This visible pattern is supported by the niche identity test, where both indices (D and I) indicated that they occupy significantly different niches ( Figure 7B).
Out of 21 cpDNA haplotypes detected, 17 were lineage specific (Figure S10A), and 31.62% of cpDNA variation occurred between lineages (Table 1). Despite this, the network of haplotypes resolved neither of the two lineages as monophyletic ( Figure 6A; Figure S10B). Morphological separation was also incomplete: P. rotundifolia was clearly differentiated from NEC populations of P. davidiana (Figure S9B), but CNC populations of P. davidiana overlapped the morphology of both (Figure 8).
According to a unified species concept that defines species as separately evolving meta-population lineages (De Queiroz, 2007), sister species may become genetically isolated, and then subsequently become genetically monophyletic; they also gradually become morphologically and ecologically distinct as the speciation process advances. In the case of the P. davidianarotundifolia complex, separation has arisen at the functional level-i.e., for nuclear germplasm (multiple loci) and ecology; however, for other markers often used to distinguish taxa, i.e., morphology and cpDNA, separation is incomplete. Hence, P. davidiana and P. rotundifolia are functioning as two distinct species. Our findings support arguments that multiple criteria should be considered when delimitating closely related species in evolutionarily complex taxonomic groups (De Queiroz, 2007;Leaché et al., 2009;Fujita et al., 2012;Hendrixson et al., 2013;Su et al., 2015), and demonstrate the value of using population level sampling and highly variable markers, as a part of a tiered barcoding system, for such a purpose (Feng et al., 2013).

Parapatric Speciation between P. davidiana and P. rotundifolia
In the geographic context, the modes of speciation could be classified as allopatric, sympatric or parapatric speciation depending on the degree of range overlaps between evolutionary lineages during the speciation process (Mayr, 1942;Schluter, 2001;Coyne and Orr, 2004;Butlin et al., 2008). Where there is total (sympatric) or partial (parapatric) range overlap, speciation can proceed despite a degree of gene flow between speciating lineages (Schluter, 2001;Butlin et al., 2008). In these modes, ecological divergence and the formation of an intrinsic barrier to gene flow may have played important roles .
In the case of P. davidiana and P. rotundifolia, multiple lines of evidence suggest that they most likely have undergone parapatric speciation. First of all, as noted above, the two species occupy significant different ecological niches (Figures 7A,B) yet their distribution ranges are adjacent. Interestingly, the distribution of each species roughly matches a different floristic subkingdom in China (Wu and Wu, 1996;Qiu et al., 2011). The range of P. davidiana corresponds roughly to the northern part of the "Sino-Japanese Forest" floristic subkingdom (Wu and Wu, 1996;Qiu et al., 2011), which comprises North China (30/33N-42N), subtropical (Central/East/South) China (22N-30/33N), the Korean Peninsula, and the Japanese Archipelago. Likewise, P. rotundifolia occurs within the "Sino-Himalayan Forest" floristic subkingdom (Wu and Wu, 1996;Qiu et al., 2011), stretching from the eastern Himalaya Mountains through the Hengduan Mountains to the Yunnan-Guizhou Plateau. Hence the genetic divergence between these two species might reflect adaptation to these two environmentally divergent subkingdoms. Genetic divergence between sister species or sister lineages within species due to ecological differentiation have been observed in many plant species in China (Li et al., 2013;Wang et al., 2013;Sun et al., 2015;Yin et al., 2016), especially within the "Sino-Himalayan Forest" floristic subkingdom (Fan et al., 2013;Liu et al., 2013;Zhao et al., 2016) where environmental heterogeneity is relatively high.
A second line of evidence is the lack of functional geographic barriers that would prevent gene flow between these species, based both on currently known sites (Figure 1) and the range predicted by ecological niche modeling ( Figure 7A). The mountains at the eastern edge of the Qinghai-Tibetan Plateau form a potential geographic barrier, yet four populations containing P. davidiana occur to the south of these, very close to P. rotundifolia (Figure 3). Moreover, the predicted distributions of these species during the MH and LGM periods revealed a greater degree of overlap than exists at present (Figure 7), showing no evidence for geographic separation nor functional geographic barriers to gene flow between them. Hence there is no obvious mechanism for allopatric speciation based on current geography or reconstructed past ranges; for it to have happened, there would have to have been some other separating factor not detectable by our analysis.
MIGRATE analyses based on nSSR markers, which may be dispersed via both seeds and pollen, revealed that a considerable level of gene flow has occurred in both directions between the two lineages (2N e m ≥ 7.43, Table 3A). Seeds and pollen of poplars are wind-dispersed (e.g., Fang et al., 1999); hence both are able to travel a long distance, mediating gene flow between populations. A similar level of bi-directional gene flow was detected using nuclear markers in a pair of ecologically diverged Populus species (euphratica and pruinosa), which cooccur in desert regions in Western China, and which underwent speciation during the Pleistocene (Wang et al., 2011b;. Similarly, nuclear gene flow occurs between P. trichocarpa and P. balsamifera, which diverged even more recently, and occupy significantly different habitats in the North America (Levsen et al., 2012). Hence, speciation in the face of gene flow appears to be a common pattern in the genus Populus.
Finally, our data is consistent with gene flow having occurred at different periods during the speciation process. In addition to gene flow we have detected using nSSR loci, NETWORK analysis revealed sharing between lineages of both haplotypes and clades (haplotype groups). The 21 detected cpDNA haplotypes were clustered into five groups ( Figure 6A), with the basal Group I confined to the northeastern range of P. davidiana (NEC). Otherwise, the haplotypes formed two pairs of groups, with each pair containing one group that was mainly in P. davidiana and another mainly in P. rotundifolia. Group III is almost exclusively P. rotundifolia, the exception being its presence in two nearby populations that are mainly P. davidiana according to nuclear data. However, it is sister to Group II, which occurs mainly in P. davidiana but also three populations of P. rotundifolia that are close to where the species overlap. A very similar pattern occurs in Groups IV and V: Group IV occurs in P. rotundifolia, plus one nearby population that is mainly P. davidiana according to nuclear data; Group IV is derived from Group V, all of whose haplotypes occur in P. davidiana, although haplotype H13 is also present in eight populations in the southeastern range of P. rotundifolia ( Figure S11). Taken at face value, such a pattern fits the stochastic nature of lineage sorting (Freeland et al., 2011), with the five groups predating the divergence of the two species, and then the diverging P. rotundifolia happening to contain groups III and IV while the others were in P. davidiana. Limited sharing of groups II-IV near the contact zone between these species could be attributed to recent and possibly ongoing but limited gene flow, as detected by our data.
Our data is also consistent with an alternative hypothesis, wherein the initially diverging lineages gave rise to the current NEC and SWC populations, which subsequently gave rise to the CNC populations through ongoing admixture and hybridization, with SWC populations contributing cpDNA haplotypes and NEC contributing most of the nuclear genomes (based on nSSR clusters). In this scenario, the sister relationship of two pairs of cpDNA haplotype groups could represent two independent rounds of hybridization and introgression between lineages shortly after initial divergence. A third possibility is that the initial split was between NEC and the common ancestor of CNC and SWC populations, following which CNC diverged from SWC, and then finally the CNC populations were homogenized by nuclear gene flow from NEC, which left their chloroplast genomes unaffected.
Regardless of exactly how speciation occurred, the interspecific sharing of H13 must reflect a gene flow event that occurred long after speciation, because it concerns one haplotype, rather than a clade or its common ancestor. Even so, the event must have occurred some time ago, because the haplotype is spread across eight populations of P. rotundifolia. Hence it might have been caused by Quaternary climate oscillations.

Intraspecific Differentiation of P. davidiana
Our data indicates that the NEC and CNC populations function as a single species, P. davidiana, yet multiple lines of evidence support a subdivision between NEC and CNC. First, as we have discussed above, NEC harbors predominantly haplotype Group I, and CNC has groups II and V (Figure 6; Figures S4,S10B). Second, STRUCTURE analysis using the admittedly suboptimal value of K = 3 split P. davidiana into two groups, roughly matching this geographic divide, although mixed populations occurred especially in CNC ( Figure S3). Third, it appears from MIGRATE analysis that the CNC group received similar levels of gene flow from both NEC and P. rotundifolia (SWC ; Table 3B). Fourth, while NEC material had a consistent morphological separation from P. rotundifolia, CNC material was closer in morphology to P. rotundifolia than to NEC material (Figure 8). This morphological pattern might be due to recent divergence between the species (Figure 6), bi-directional gene flow between them (Table 3B; Figure S6), or both, but either is consistent with a degree of separation between NEC and CNC material. Finally, the ecological niche of the CNC group is significantly different from the NEC group (Figures S8A,C). The geographical dividing line between these NEC and CNC roughly corresponds to an infraspecific genetic divide within Acer mono Liu et al., 2014), and a significant between-species dividing line among members of Juglans section Cardiocaryon (Bai et al., 2016). This suggests the presence of an enduring of incomplete phytogeographic barrier between these regions.
Remarkably, our cpDNA data appears to indicate that NEC material diverged from CNC material before the divergence between the latter and P. rotundifolia (SWC material), because NEC material is dominated by the earliest diverging haplotype group, I (Figure 6; Figure S5). Conversely, nuclear divergence between NEC and CNC, indicated by nSSR data, happened later, and after the divergence of P. rotundifolia. This fits with a hypothesis of ongoing, or periodic, nuclear gene flow between CNC and NEC, but could also reflect separation followed by a phase of nuclear homogenization. Either way, the two currently seem to function as a single species, and are separated neither by the optimal STRUCTURE value of K (K = 2; Figure 3), nor PCoA analysis (Figure 4). Genetic differentiation between NEC and CNC occurs when K = 3 ( Figure S3), but it is much weaker than that between the species. Moreover, Mantel tests on nSSR data within P. davidiana found no significant correlation between geographic structure and genetic differentiation (r 2 = 0.0001, P = 0.430), indicating at most weak geographical structuring, hence minimal separation between NEC and CNC. Conversely, significant geographic structuring is revealed when both species are examined together (r 2 = 0.0407, P = 0.01; Figure 5). In summary, weak differentiation and periodic or ongoing gene flow between populations of NEC and CNC based on nSSR data suggest that they are currently functioned as one species.
In summary, our integrative approach which considering multiple lines of evidence suggests that although P. davidiana and P. rotundifolia are not completely separated according to cpDNA and morphology data, they are functionally two species that possess distinct nuclear germplasms and habitats. The species pair most likely have experienced a lineage separation history that is consistent with parapatric speciation in the face of gene flow due to adaptation to different ecological niches. A further subdivision of P. davidiana into Central-North and Northeastern groups is supported by multiple lines of evidence, with the former sharing morphological traits and some cpDNA with P. rotundifolia. This indicates a complex history, with interspecific gene flow likely occurring after the incipient species began to diverge. Hence, P. davidiana and P. rotundifolia can be regarded as a recently diverged species pair, where the speciation process is more or less complete, but the signature of the early divergence stages is still visible. Our findings emphasize that taking integrative survey at population level, as we have undertaken here, is an important approach to detect the boundary of a group of species that have experienced complex evolutionary history.

ACKNOWLEDGMENTS
The authors thank Drs. Matthew Olson, Markus Rhusam, and Jianquan Liu for their constructive suggestions on earlier version of the manuscript, Dr. Frederic Chain and two reviewers for their constructive comments, and Qianlong Liang for his helps with ecological niche modeling.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpls.2017. 00375/full#supplementary-material Table S1 | Morphological difference between Populus davidiana and P. rotundifolia according to the Flora of China (Fang et al., 1999). Table S2 | Detailed information for the 76 sampled populations of the Populus davidiana-rotundifolia complex that were adopted for genetic survey using nSSR and cpDNA.      Figure S1 | The sampling location of the 53 representative populations of P. rotundifolia (small red pie), CNC P. davidiana (small green pie), and NEC P. davidiana (small blue pie), respectively, that were adopted for morphologically statistical analysis.   Figure 2D for the histogram of STRUCTURE assignment test. Brown and orange dashed lines encompass the putative assignment of populations to P. davidiana and P. rotundifolia, respectively.  Table 1. In (A), the black dot represents an outgroup haplotype from P. adenopoda that was involved as outgroup for rooting purpose; each circle represents a haplotype and circle sizes are proportional to the number of samples per haplotype; oval black dashed lines encompass haplotypes representing the five cpDNA haplotype groups. Brown and orange dashed lines in (B) delineate P. davidiana and P. rotundifolia, and gray dashed line in (B) delineate the Central-North China (NCN) and Northeastern China (NEC) regions within P. davidiana.        Table S2. In (A), the black dot represents an outgroup haplotype from P. adenopoda that was involved as outgroup for rooting purpose; each circle represents a haplotype and circle sizes are proportional to the number of samples per haplotype; oval black dashed lines encompass haplotypes representing the five cpDNA haplotype groups. Brown and orange dashed lines in (B) delineate P. davidiana and P. rotundifolia. The boundary between the CNC and NEC populations of P. davidiana runs between Pop 60 and 61.