Phylogenomics and Biogeography of Populus Based on Comprehensive Sampling Reveal Deep-Level Relationships and Multiple Intercontinental Dispersals

Populus not only has significant economic and ecological values, but also serves as a model tree that is widely used in the basic research of tree growth, physiology, and genetics. However, high levels of morphological variation and extensive interspecific hybridization of Populus pose an obstacle for taxonomy, and also to the understanding of phylogenetic interspecific relationships and biogeographical history. In this study, a total of 103 accessions representing almost all the wild species of Populus were collected and whole-genome re-sequenced to examine the phylogenetic relationships and biogeography history. On the basis of 12,916,788 nuclear single nucleotide polymorphisms (SNPs), we reconstructed backbone phylogenies using concatenate and coalescent methods, we highly disentangled the species relationships of Populus, and several problematic taxa were treated as species complexes. Furthermore, the phylogeny of the chloroplast genome showed extensive discordance with the trees from the nuclear genome data, and due to extensive chloroplast capture and hybridization of Populus species, plastomes could not accurately evaluate interspecies relationships. Ancient gene flow between clades and some hybridization events were also identified by ABBA–BABA analysis. The reconstruction of chronogram and ancestral distributions suggested that North America was the original region of this genus, and subsequent long dispersal and migration across land bridges were contributed to the modern range of Populus. The diversification of Populus mainly occurred in East Asia in recent 15 Ma, possibly promoted by the uplift of the Tibetan Plateau. This study provided comprehensive evidence on the phylogeny of Populus and proposed a four-subgeneric classification and a new status, subgenus Abaso. Meanwhile, ancestral distribution reconstruction with nuclear data advanced the understanding of the biogeographic history of Populus.


INTRODUCTION
Populus, collectively known as poplar, aspen, cottonwood, or "Yang Shu" in Chinese, is widely distributed from subtropical to boreal forests in the northern hemisphere, primarily in temperate forests (Ding, 1995;Eckenwalder, 1996). In native ecosystems, trees of the genus Populus play a major role, recolonizing sites after disturbances and providing important habitats for wildlife. Owing to its excellent features, poplars have been planted in many parts of the world and have become a highly promising crop option. They are not only widely used in windbreaks, protective stands to prevent soil erosion, and in row or gallery plantings to stabilize the banks of streams and canals, but also show impressive productivity for desirable wood products (Stettler et al., 1996). Furthermore, poplar is also recognized as an excellent model tree for the study of genetics, tree growth, and its underlying physiology for its fast growth rate, easy vegetative propagation, and relatively small genome (Stettler et al., 1996;Bradshaw et al., 2000;Cronk, 2005). The first completion of tree genome sequence of Populus trichocarpa also shed new light on the exploration of wood development, nutrient and water movement, crown development, and disease resistance in trees (Tuskan et al., 2006). Moreover, a clear understanding of the evolutionary history of Populus will provide an important foundation for biological studies and genetic breeding programs.
Even though vastly valuable, the genus Populus has consistently been considered a troublesome taxonomic group. It is widely accepted that Populus consists of six sections [Abaso, Turanga, Populus (synonym: Leuce), Leucoides, Aigeiros, and Tacamahaca], primarily based on morphological evidence (Eckenwalder, 1996). However, there is still much debate over species delimitation and classification. Depending on classification predilections, approximately 22-100 natural species and hundreds of hybrids and cultivars have been proposed for this genus (Dickmann and Stuart, 1983;Eckenwalder, 1996;Fang et al., 1999;Chao et al., 2009;Slavov and Zhelev, 2012). Flora of China (Fang et al., 1999) recorded 71 species within Populus in China and reckoned that nearly 100 species exist throughout the world. On the other hand, Chinese taxonomists were criticized as "splitters, " who wish to give formal recognition to any entries that can be recognized in the field and herbaria and were reluctant to accept within species variation, instead of "lumpers, " who accept a broader range of variation within species (Eckenwalder, 1996). Thus, it is particularly important to detect the interspecies variation and phylogenetic relationships of Populus using genetic evidence.
Molecular methods have been used to analyze the phylogenetic relationships of Populus, but the results have not consistently supported morphological classifications, and intersectional relationships have been the subject of controversy. Three non-coding regions of chloroplast trnT-trnF and two ITS sequences were used to reconstruct the phylogeny of 17 species represented sect. Tacamahaca, sect. Aigeiros, and sect. Populus; the results showed that sections Tacamahaca and Aigeiros were polyphyletic (Hamzeh and Dayanandan, 2004). In another phylogenetic study, based on four chloroplast fragments (rbcL-a, psbI-psbK, psbA-trnH, and trnL-trnF), the monophyly of sections Leucoides and Populus were recovered, but sect. Tacamahaca were divided into two distinct clades (Yun et al., 2015). Moreover, a study based on 23 single-copy nuclear DNA and 34 chloroplast fragments firstly suggested the phylogenetic positions of the sixth section, sect. Abaso (only include Populus mexicana) . With the combination of nuclear and chloroplast DNA fragments, sect. Tacamahaca and sect. Aigeiros have usually been inferred to be polyphyletic or hybrid origin (Hamzeh and Dayanandan, 2004;Wang et al., 2014;Yun et al., 2015;. Nevertheless, phylogenetic positions of many species have not been resolved owing to limited genetic information from molecular markers and DNA fragments. Chloroplast genomes can provide much more informative sites to reconstruct a more solid phylogeny which illustrated the relationship of maternal lineage. Five major clades have been recovered within Populus in previous plastome studies (Zhang L. et al., 2018;Zong et al., 2019). Sect. Populus formed a monophyletic clade with Populus nigra nested, species of sect. Leucoides, sect. Tacamahaca, and sect. Aigeiros expressed polyphyletic relationships. However, topology based on the whole length of genomes and result from 77 chloroplast protein-coding genes were inconsistent (Zhang L. et al., 2018;Zong et al., 2019). For Populus, chloroplast data seldom accurately reflect species relationships due to the severe effects of chloroplast capture  and hybridization. Whole-genomic data extracted from next-generation sequencing has been extensively used to resolve complicated systematic problems (Xu et al., 2012(Xu et al., , 2014Zhou et al., 2014;Chen et al., 2016Chen et al., , 2018. With the support of low coverage whole-genome sequencing, four main clades from 29 Populus species were recovered with high bootstrap supports (Wang et al., 2020) Sect. Abaso, sect. Turanga, and sect. Populus were all well-founded monophylies. The last clade, which was named ATL clade in Wang et al. (2020), consisted of three polyphyletic sections, sect. Leucoides, sect. Aigeiros, and sect. Tacamahaca. These three sections were ever morphologically identified as subg. Eupopulus (Dode, 1905). Hitherto, a relatively large number of Eupopulus species were not involved in previous phylogenetic studies, and molecular makers also should be improved to provide higher resolution. In addition, hybridization prevented the sole use of chloroplast genome to address these issues. Therefore, whole-genome data and comprehensive sampling should be conducted to decipher the phylogenetic relationships of Populus.
Populus has a widespread distribution throughout the northern hemisphere, and its diversification center and distribution center are in East Asia (Gong, 2004). However, the origin and diffusion of Populus still remain unclear and controversial. Two main hypotheses about the area of origin, China or North America, have been proposed. First, owing to the highest degree of diversification and the earliest fossils (Populus latior) in East Asia, Populus was considered to have originated from East Asia and spread from east to west (Ding, 1995;Gong, 2004). In contrast, another hypothesis is that the Populus species first appeared in North America and then dispersed to other continents, primarily because the earliest fossil records were found in North America . The earliest leaves fossils in North America that resembled Populus were also present in the Paleocene, and the first credible fossils with a rare twig bearing both leaves and fruits of Populus were identified in UT, United States and attributed to sect. Abaso (Collinson, 1992;Eckenwalder, 1996). Fossils of the extinct sister genus of Populus and Salix, Pseudosalix, were also found at the same formation (Boucher et al., 2003). The occurrence of them raised the possibility of a North American origin of Salicaceae. Alternatively, the phylogenetic position of P. mexicana provides additional evidence for this hypothesis. This only extant species of sect. Abaso, which is restricted to Mexico, was supported as the basal taxon of Populus phylogeny using nuclear markers Wang et al., 2020). Both assumptions merely consulted morphological and fossil evidence, and no robust phylogenetic studies have been reconstructed to provide evidence for the origin and evolution of Populus.
Since previous phylogenetic examinations involved limited sequencing techniques or incomplete sampling, interspecific relationships of Populus still remain unclear. Thus, its divergence and biogeographical history were not yet well defined. A wellresolved phylogeny is the key to understand the evolutionary history and patterns of species diversification of Populus. In this study, with extensive and nearly complete sampling, we respectively reconstructed the backbone phylogeny for Populus based on variants identified from nuclear genomic data and complete chloroplast genomes to clarify the phylogenetic relationships of Populus, especially subg. Eupopulus. These two phylogenies were then compared to examine the conflicts between biparental and maternal inheritance. We further reconstructed the divergence and biogeographical history of Populus.

Plant Material
Sampling was conducted primarily following the taxonomy system of Eckenwalder (Eckenwalder, 1996) and the Flora of China (Fang et al., 1999). We also referred to other treatises and databases, including The Woody Plants of Korea 1 , Salicaceae of Japan (Hiroyoshi, 2001), the flora of USSR (Nazarov, 1936), Kenya trees, shrubs, and Lianas (Beentje et al., 1994), the Euro + Med PlantBase 2 , and The Plant List 3 . In total, 103 accessions of 54 Populus species were collected, which represented nearly all of the native species, as well as two variants and two hybrid species (Supplementary Table 1). The cultivated species and artificial hybrids were not included. Fresh leaves of 80 samples were collected from natural adult trees and dried using silica gel. The voucher herbarium specimens were deposited in the Museum of Beijing Forestry University (BJFC) and Herbarium, Institute of Botany, CAS (PE), Beijing, China. Three tissue samples were obtained from New York Botanical Garden DNA Bank (New York, NY, United States), and the images of voucher herbarium specimens were available on the web of the New York Botanical Garden 4 . Two tissue samples of P. mexicana were transferred from the Florida Museum of Natural History (Gainesville, FL, United States) and University of Arizona Campus Arboretum (Tucson, AZ, United States), respectively. The images of voucher herbarium specimens or live tree were also available on their website 5,6 . Six DNA samples were obtained from the DNA and Tissue Bank at Kew 7 . In summary, most of taxa were sampled by 1-2 accessions. Considering the wide distribution or disputed classification, P. nigra, Populus pseudoglauca, Populus szechuanica, Populus rockii, P. szechuanica var. tibetica, Populus cathayana, and Populus koreana were collected from 3 to 5 accessions, respectively. In addition, previous sequencing data of 12 samples were included from Sequence Read Archive (SRA) in the National Center for Biotechnology Information (NCBI) database and the Genome Sequence Archive (GSA) in the BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, respectively. Two Salix species were used as outgroups.

Sequencing and Quality Control
We utilized the CTAB protocol with minor modifications to extract the whole-genomic DNA from the tissue samples (Doyle, 1987). All of the DNA samples were shipped to Novogene 8 (China) for subsequent sequencing. After the DNA quality assessment, each sample was sheared to construct a pair-end sequencing library with an insert size of c. 350 bp and sequenced using an Illumina HiSeq 4000 platform (Illumina, San Diego, CA, United States).
Low-quality reads of raw sequencing data were removed with the following criteria: (1) the proportion of N was greater than 10%, and (2) the proportion of low-quality bases (Q ≤ 5) was greater than 50%. The clean data from sequencing company were than inspected and filtered by FastQC 9 and trimmomatic v.0.36 (Bolger et al., 2014). The bases with a quality <3 at the head and tail were filtered. A sliding window of size 4 bp was used to filter the bases with a mean quality <15. At last, reads with a size not <50 bp were retained.

Complete Chloroplast Genomes Assembly
Geneious Primer (Kearse et al., 2012) was used to assemble the chloroplast genome. The paired-end reads were assembled to the reference chloroplast genome of P. trichocarpa (Tuskan et al., 2006). The Map tool with default parameters was used to filter the available reads, which were used for de novo assembling. Medium-low sensitivity was used. We then used the Fine Tuning function to bridge the gaps. Complete and partial assemblages for the organelle genomes were remapped using original reads. Plann (Huang and Cronk, 2015) was employed to annotate the chloroplast genome sequence, and all the annotations were manually corrected using the Geneious Primer.
We also conducted two coalescent approaches to reconstruct the species tree. The sliding-window method is the most common practice to build a species tree, and has been utilized in studies of bacteria, insects, mammals, and plants (Takuno et al., 2012;Zhang et al., 2016;Liu Y. et al., 2017;Chen et al., 2019). Thus, concatenated SNPs were filtered as 10-kb non-overlapping windows across the genomic alignments as "super-genes" for tree constructions. The gene trees were constructed using RAxML v8.2.11 (Stamatakis, 2014) based on the GTRGAMMA model. The fast bootstrap was conducted for 100 times with the option "f a." Finally, the species tree was inferred using ASTRAL v5.15.1 (Zhang C. et al., 2018). In addition to evaluating the gene tree discordance, an alternative quartet topologies test to show quartet supports for the tree topologies was conducted using the -t 8 option in ASTRAL. Additionally, we performed an SVDquartets analysis (Chifman and Kubatko, 2014) in PAUP * v4.0a168 (Swofford, 2003) using the concatenated SNPs with 100 bootstrap replicates and all quartets sampled to infer a species tree.
The plastome sequences with one inverted repeat (IR) removed were aligned using the MAFFT online version with

ABBA-BABA Analysis
At first, an alignment-based method was used to infer the ancestral state and the nuclear genome of Salix dunnii (He et al., 2021) was aligned to the P. trichocarpa reference genome (Tuskan et al., 2006) using minimap2 (Li, 2018). The alignment was transferred to variant call format (VCF) using SAMtools v1.6 ) and BCFtools v1.11 (Danecek and McCarthy, 2017). Then, the ABBA-BABA test was performed using Dsuite v0.3 (Malinsky et al., 2021) to assess evidence for introgression of Populus. The Patterson's D-statistic (Durand et al., 2011) was calculated using the Dtrios program from Dsuite. In brief, for the ordered alignment {[(P1, P2), P3], O}, the ABBA site pattern refers the shared derived alleles of P2 and P3, and BABA site pattern refers to the shared derived alleles of P1 and P3. Under the null hypothesis of incomplete lineage sorting (ILS), the number of ABBA sites and BABA sites is expected to be equal (D = 0). Alternatively, significant deviation of D from 0 suggests other events, in particular P3 exchanging genes with P1 or P2 (Durand et al., 2011).

Divergence Time Estimation
The best-preserved fossils of Populus with a rare twig that bore both leaves and fruits originate from the Eocene Green River Formation in Utah, Colorado, and Wyoming in the United States, and these fossils occurred in the early Middle Eocene (∼48 Ma) (Boucher et al., 2003;Manchester et al., 2006). All of them were similar to P. mexicana (Manchester et al., 1986) and attributed to sect. Abaso (Eckenwalder, 1977;Manchester et al., 2006). At the same formation, another famous fossil is Pseudosalix handleyi, which also has leaves and fruits attached and represents a lineage separate from Populus and indicates that Populus (and likely Salix) represented a unique lineage(s) at this time (Boucher et al., 2003). Thus, the divergence time between Populus and Salix was constrained to 48.13 and 48.22 Ma. Another relatively reliable fossil was recorded in the Middle Miocene, designated Populus zhenyuanensis (Liang et al., 2016), which is similar to extant P. szechuanica in leaf shape, size, and surface detail. Therefore, the divergence age between the P. szechuanica clade (including P. szechuanica, P. rockii, and Populus xiangchengensis) and the Populus ciliata clade (including P. ciliata, Populus yatungensis, P. szechuanica var. tibetica, Populus haoana, P. pseudoglauca, and Populus mainlingensis) was calibrated to 11.61-15.97 Ma.
Since the monophyly of many species was not recovered in the plastomes tree, which only represented the maternal lineage, nuclear genome SNPs data was used to determine the divergence times. With the exception of two hybrids, Populus wenxianica (presented as a cultispecies) and redundant accessions, a new set, including 55 Populus individuals and two outgroups, was generated to estimated divergence dates. Each species or variety remained as only one accession, but P. szechuanica and P. koreana remained as two. At first, a new tree was reconstructed as the input tree using IQ-TREE v1.6.9 (Nguyen et al., 2014) with the same parameter as above. Then, two distinct runs were conducted with the same approximation likelihood analysis using MCMCtree of the PAML package (Yang, 2007). The model of site substitute was set as GTR. A Markov Chain Monte Carlo (MCMC) run was discarded for the first 40,000,000 burnin iterations. Sampling was taken at every 1,000 iterations with 100,000 times.

Ancestral Area Reconstruction
To infer the ancestral geographic distributions, the R package "BioGeoBEARS" (Matzke, 2013) implemented in RASP (Yu et al., 2015) was used to test the best model. The DEC + J (Dispersal-Extinction-Cladogenesis) model with the highest AIC_wt value was selected to reconstruct the ancestral geographic distributions. The divergence tree reconstructed by MCMCtree was used as the input tree. According to the distribution of extant Populus species and extinct fossils, four operational geographic areas were divided into (A) Europe, North Africa, West-Central Asia and Xinjiang, China; (B) North Asia, East Asia and the edge of Himalayas; (C) Kenya; and (D) North America. The boundaries of the four areas were defined referring to the biogeographical regionalization of distributions of dicotyledonous plants in the world (Shen et al., 2018). In addition, the maximum number of areas was set to four.

Sequence Data Processing
We resequenced 91 accessions, and 676.58 Gb raw data was generated. Totally, 735.14 Gb clean data of 103 individuals were obtained for subsequent analysis. After a series of strict analyses, an average of 56.71% reads was mapped to the genome P. trichocarpa. The average coverage depth and average mapping coverage of the genome P. trichocarpa were 10.93× and 78.20%, respectively (Supplementary Table 1). Finally, 12,916,788 highquality SNPs were obtained (Supplementary Table 3).
In total, we assembled 103 complete chloroplast genomes, which had a common typical quadripartite structure, containing a large single-copy (LSC) region and a small single-copy (SSC) region separated by two IRs. The plastid genomes length of these 103 accessions ranged from 155,170 (Populus grandidentata #1) to 158,474 bp (Populus ilicifolia #1) (Supplementary Table 2). The plastomes of almost all the species have been identified with 129 genes, composing 85 coding sequences (CDSs), 8 rRNAs, and 36 tRNAs. Owing to the loss of two rps7 genes, four species, Populus pruinosa, P. pseudoglauca, P. mainlingensis, and Populus glauca, contained only 127 genes, respectively (Supplementary Table 2).

Phylogeny Based on Nuclear Single Nucleotide Polymorphisms
For the 12,916,788 SNPs data set, 3,110,298 were constant; 1,878,806 were singleton sites, and 7,927,684 were parsimonyinformative sites. The ML tree recovered four main clades of Populus. P. mexicana, the only extant species of sect. Abaso, diverged first (Figure 1). Sect. Turanga was monophyletic and was the sister of another monophyletic clade, sect. Populus. Species of sect. Leucoides, sect. Aigeiros, and sect. Tacamahaca together formed a monophyletic clade, which was a sister to sect. Turanga + sect. Populus clade. These three sections were ever morphologically identified as subg. Eupopulus (Dode, 1905) and named ATL clade in Wang et al. (2020). Three species of sect. Leucoides diverged successively at the base of subg. Eupopulus (Figures 1, 2A). In addition, we also reconstructed a concatenate phylogeny using SNPs from Wang et al. (2020) with the same method as above and recovered the same topology (Supplementary Figure 2).
Five subclades were recovered within sect. Aigeiros + sect. Tacamahaca. Subclade I diverged first and contained seven species and one variety of sect. Tacamahaca, two species and a hybrid of sect. Aigeiros. Subclade II subsequently diverged and included two species of sect. Aigeiros and one species of sect. Tacamahaca. Subclade III included three species of sect. Tacamahaca. The remaining species of sect. Tacamahaca were divided into subclades IV and V. Seven species and one variety were contained in subclade IV, and 11 species composed of subclade V. The species of each subclade and its distribution are shown in Figure 1 and Supplementary Table 4.
The species trees inferred from two coalescent methods also supported the four main clades of Populus and similar five subclades within sect. Aigeiros + sect. Tacamahaca, but topologies conflicted with the concatenated genome tree and each other (Figure 3 and Supplementary Figure 1). The basal position of sect. Abaso was also recovered based on ASTAL method, but sect. Populus diverged earlier than sect. Turanga and subg. Eupopulus (Figure 3). The SVDquartets analysis revealed the first divergence of subg. Eupopulus, and successive sect. Populus, sect. Abaso and sect. Turanga (Supplementary Figure 1). Additionally, the phylogenetic position of some supposed hybrid taxa of subg. Eupopulus severely conflicted with the concatenated genome tree, such as P. wenxianica, P. szechuanica var. tibetica, and Populus pseudomaximowiczii.

Phylogeny Based on Chloroplast Genomes
The plastome data matrix consisted of 174,083 characters, of which 166,358 were constant; 1,677 were singleton sites, and 6,048 were parsimony-informative sites. Compared with the nuclear genome tree, the phylogeny reconstructed by complete chloroplast genomes presented a quite different topology ( Figure 2B and Supplementary Figure 3). The genus Populus was divided into five clades (Alrt > 95, UFB ≥ 99), and only sect. Turanga was monophyletic with full support (Clade A). Moreover, Sect. Populus clustered together as Clade B, with P. nigra, Populus × irtyschensis #1 of sect. Aigeiros and Populus iliensis of sect. Tacamahaca nested. Clade C was composed of Populus afghanica of sect. Aigeiros, Populus lasiocarpa of sect. Leucoides and 15 Asian species of sect. Tacamahaca. With the exception of two species of sect. Populus, Populus tremuloides and P. grandidentata, all of the other North American species formed a monophyletic clade, clade D, which was the sister to clade E.  Clade E was composed of P. glauca of sect. Leucoides and the rest of Asian species of sect. Tacamahaca with P. × irtyschensis #2 of sect. Aigeiros embedded. Not only do the relationships between sections differ from morphological classifications and the nuclear SNPs tree, but many species did not recover as monophyly, particularly in clade C and E, such as P. lasiocarpa and P. koreana.

Introgression Analysis
Totally, the ancestral state identified from S. dunnii (He et al., 2021) covered about 82.8% of the whole genome SNPs. Taking S. dunnii as the outgroup for all comparisons, the introgression analysis between clades showed that sect. Abaso was more closely related to subg. Eupopulus than to other species of sections Turanga and Populus (Supplementary Figure 5a), which was similar to the results of Wang et al. (2020). We also found that P. nigra was more closed related to Populus alba than to any other species of sect. Populus (Supplementary  Figure 5b). Further ABBA-BABA analysis showed that the degree of introgression of P. szechuanica var. tibetica, and P. pamirica was higher than that of other species of subg. Eupopulus (except P. ciliata), while P. pseudomaximowiczii was more closely related to P. koreana than any other species of subg. Eupopulus (except P. rockii, Supplementary Figures 5c,d).
These results suggested an extensive and frequently gene flow history of Populus. These complicated admixture histories were also verified by the pervasive discordance between the nuclear tree and chloroplast tree (Figure 2).

Analysis of Molecular Dating
The divergence times s of genus Populus were estimated nearly identical by the two MCMCtree runs. The divergence age between Populus and Salix was constrained to 48.175 ± 0.045 Ma (million years ago) in the early Eocene. Section Abaso was estimated to diverge with other Populus species at 46.93 Ma [95% HPD (highest probability density): 43.80-48.14 Ma; node 1; Table 1 and Figure 4] in the middle Eocene, the next split of Populus   Figure 4), respectively.

Ancestral Area Reconstruction
The reconstruction of ancestral geographic distributions inferred that North America was the most probable original area of the genus Populus [D(0.34)/B(0.22)/BD(0.16), node 1; Figure 5A]. The ancestral area of subg. Eupopulus was also presumed to be North America [D(0.47)/B(0.32)/BD(0.20), node2; Figure 5A]. The common ancestral area of sect. Aigeiros + sect. Tacamahaca was reconstructed as North Asia, East Asia, and the edge of Himalayas [B(0.86), node 3; Figure 5A]. Moreover, 11 migration events were inferred with DEC + J model, and a pair of migration events from East Asia toward to North America and then spread back were also presumed to have occurred sequentially, as the common ancestral area subclade of III, IV, and V of sect. Aigeiros + sect. Tacamahaca was inferred to be North America [D(0.50)/B(0.45), node 5; Figure 5].

Phylogenetic Relationships of Populus
With unprecedentedly complete sampling, the genus Populus was recovered as four clades with high supports based on nuclear SNPs (Figures 1, 3 and Supplementary Figure 1), while five clades were divided based on complete chloroplast genomes ( Figure 2B and Supplementary Figure 3). Similar to the results of Zong et al. (2019), the complete plastome dataset provided high support for the divergence of five clades (Alrt > 95, UFB ≥ 99; Figure 2B and Supplementary Figure 3), but a previous study based on 77 chloroplast protein coding genes presented a different topology among the five clades and less supports in the second and the third clades (Zhang L. et al., 2018). Complete plastomes could more effectively resolve the phylogeny of poorly diverged Populus taxa compared with those diverged from protein coding gene sequences. Both biparental and maternal molecular phylogenies were contrasted with six sections acknowledged based on morphological characters (Eckenwalder, 1996). According to our results, the plastome phylogeny was highly inferior at solving interspecific relationships with few informative sites (5,720 parsimony-informative sites). In contrast, the nuclear genome could provide millions of variants (7,927,684 parsimony-informative sites) and be more effective at dissecting the deep phylogeny of Populus. In addition, different accessions of some species did not cluster with each other in the chloroplast genome tree, which reflected complex hybridizations. Therefore, the phylogenetic issues and the relationships among species primarily resulted from nuclear genomic phylogeny as follows.
With respect to nuclear genomic phylogeny, four clades were well supported, sect. Abaso, sect. Turanga, sect. Populus, and subg. Eupopulus. Sect. Abaso, the monotypic section distributed in the south of North America, was identified as a basal monotypic clade in most analysis, followed by the successive divergence of the other three clades: sect. Turanga, sect. Populus, and subg. Eupopulus (Figures 1, 2A, 3). SVDquartets analysis showed a quite different topology of these four clades. Wang et al. (2020) have recovered more topologies among these four clades and discovered more phylogenetic conflicts among different datasets or analysis methods. Especially, the internodes between these major clades were relatively short and each topology had low support values from gene trees [see Figures S4 and S5 from Wang et al. (2020) and Figure 3 in this study]. Furthermore, extensive gene flow among clades was also detected using ABBA-BABA and IBD (identity-by-descent) analysis (Wang et al., 2020). Species tree inference has also been presented to be inconsonant in the presence of gene flow (Solís-Lemus and Ané, 2016;Long and Kubatko, 2018). Thus, both gene flow and incomplete lineage sorting likely led to the phylogenetic inconsistence, and gene flow was likely to play the more important role (Wu, 1991;Wang et al., 2020).
Within subg. Eupopulus, there are no clear differences in the flowers and inflorescences between sect. Tacamahaca and sect. Aigeiros, and they have ever been considered to be accommodated in a single section (Eckenwalder, 1996). Alternatively, species of sect. Tacamahaca and sect. Aigeiros could be freely interfertile (Zsuffa, 1975;Eckenwalder, 1984), which primarily contributed to the conflict between the nuclear and chloroplast genome trees. These two sections can be divided into five subclades (Figures 1, 3). With more extensive sampling, more complex relationships were revealed within sect. Aigeiros + sect. Tacamahaca clade, even though the results were not entirely consistent with previous nuclear genome phylogeny described by Wang et al. (2020). For the subclade I, Populus simonii (sect. Tacamahaca) presented a close relationship with P. nigra (sect. Aigeiros); both of these species share the same characters of a flat petiole and narrowly ovoid and 2-valved capsules (Fang et al., 1999). However, Wang et al. (2020) showed that Populus yunnanensis, rather than P. simonii, was more closely related to P. nigra. In consideration of the fact that both P. yunnanensis and P. simonii have been cultivated for a long time, their origins are intricate and difficult to trace. Further studies based on more samples that utilize population genetic methods are needed to unravel the problem.
According to our results, some taxa were expected to be treated synonyms for another species. For instance, P. mainlingensis seemed to be a synonym of P. pseudoglauca, as they are monophyly in all the data sets (Figures 1-3). Alternatively, no significant morphological characters can distinguish these two species, and they are sympatrically distributed in southeast Tibet. Alternatively, a previous study also suggested that they are the same species (Chao, 1994). Likewise, both Populus intramogolica and Populus shanxiensis were highly similar to P. cathayana in morphology, although the previous two species were distinguished with the latter one based on the characters of leaf shape or pilosity, respectively (Wang and Tung, 1982;Sun, 1986). In our analysis, close relationships within these three species were recovered in both nuclear and chloroplast phylogenetic trees. Since the leaves of P. cathayana vary greatly, and the potential gene flow intensified the difficulty of resolving the relationship between P. cathayana and related species, P. intromogolica and P. shanxiensis seem to be synonyms of P. cathayana. Of course, deep study based on specimens and population genetics are needed to verify this hypothesis.
Additionally, some hybrid taxa could be detected by our concatenated and coalescent phylogenies. P. wenxianica was endemically distributed in Wenxian County, Gansu Province, China, rather than North America. Since a number of hybrid strains from Populus deltoides were cultivated in China for a long time, and the characters of the shape of leaf blades and the ovary carpel number differed between P. wenxianica and the two sect. Aigeriros species (Fang et al., 1999), we hypothesized that P. wenxianica was a hybrid from P. deltoides and another unknown parental ancestor. The leaf and capsule morphologies of P. pseudomaximowiczii were similar to those of P. rockii, but their distribution was discontinuous and cut off by the Taihang Mountains (Fang et al., 1999). But P. koreana is sympatric with P. pseudomaximowiczii in the Wuling Mountain, which increased the likelihood that hybridization took place. Given the conflict for the phylogenetic position of P. pseudomaximowiczii between concatenate and coalescent phylogenies, we hypothesized that P. pseudomaximowiczii is a hybrid taxon, and P. rockii could be the female parent, as supported by the chloroplast tree and ABBA-BABA test (Supplementary Figure 5c), and P. koreana could be another parent. Significantly, P. szechuanica var. tibetica did not cluster with P. szechuanica. We collected two individuals cultivated at Lhasa City and a native tree from Cuona County, Xizang Autonomous Region, China. They were found to be close to the Himalaya lineage in both biparental and maternal phylogenies, but the two cultivated samples were close to P. pamirica with respect to the chloroplast genome tree (Figure 2). P. szechuanica var. tibetica was first published with the specimen collected from a cultivated tree in Ladakh, and described as having smooth bark and nearly glabrous leaves (Schneider, 1916). P. szechuanica var. tibetica was once considered to be a variety of P. pamirica (Zhao and Liu, 2001) or a synonym of P. ciliata (Skvortsov, 2009). The conflicts among the nuclear concatenated, coalescent, and chloroplast trees and ABBA-BABA test demonstrated not only close relationship between P. szechuanica var. tibetica and P. ciliata, but potential hybridization between P. szechuanica var. tibetica and P. pamirica (Figure 2 and Supplementary Figure 5d).
Furthermore, the phylogenetic relationships of some species were still questionable and were treated as species complexes. P. szechuanica #1, collected from the type locality, was a sister to P. rockii, whereas P. szechuanica #2, #3, and #4 presented a close relationship with P. xiangchengensis, which was parapatric with P. szechuanica (Figures 1, 2A). These three taxa could be distinguished based on their angle branchlets, pilosity of branchlets, petiole, leaf veins, catkin rachis, and capsule (Fang et al., 1999). However, they were confused in the sympatric area. Thus, we considered these three species to be the P. szechuanica complex. The relationships within Populus suaveolens lineage were also confused and conflicted with morphological taxonomy, not only in the nuclear SNPs tree but also in the plastomes tree (Figure 2). It may be caused by extensive gene flow and perplexing classification within these four sympatric species. Further study should be conducted using more samples based on population genetics to further elucidate the relationships within these species complex.

Cytonuclear Discordance in Populus
We reconstructed backbone phylogenetic trees based on nuclear genomic SNPs and complete chloroplast genomes, separately. They presented considerable conflicts and relatively short internodes between major clades (Figure 2). The incongruence between biparental and matrilineal phylogenetic trees commonly contributes to incomplete lineage sorting (ILS), convergent evolution or introgression (Degnan and Rosenberg, 2009;Cristina Acosta and Premoli, 2010;Pelser et al., 2010). There into, lineage sorting would not be expected to exhibit the strong geographical partitioning observed at the chloroplast genome tree (Figure 2B; . Thus, cytonuclear conflicts of Populus were primarily caused by hybridization, or specifically, chloroplast capture and gene flow (recent hybridization).
Chloroplast capture, which introgressed a chloroplast genome from one plant or ongoing backcrossing of F1s with the parental populations, has frequently been thought to explain the inconsistencies between nuclear gene trees and cytoplasmic trees (Rieseberg and Soltis, 1991;Tsitrone et al., 2003;Cristina Acosta and Premoli, 2010). Sect. Abaso (P. mexicana) formed the basal clade in nuclear phylogeny but was close to Populus heterophylla in plastome phylogeny (Figure 2). Significant gene flow among them have been identified using ABBA-BABA tests (Wang et al., 2020). However, no IBD blocks were shared between these two clades, which implied an early gene flow event, and these IBD blocks were expunged by subsequent recombination. All this evidence pointed to a common hypothetical scenario that P. mexicana had suffered a significant chloroplast capture from the P. heterophylla-like ancestor Wang et al., 2020). The high level of cytonuclear conflict also occurred in P. nigra, as P. nigra, P. iliensis, and P. × irtyschensis #1 nested among the members of section Populus in clade B of plastome tree ( Figure 2B). Previous studies based on chloroplast genome data and morphological traits proposed a hypothesis that P. nigra was an introgression of the P. alba lineage and another unknown paternal lineage (Smith and Sytsma, 1990;Hamzeh and Dayanandan, 2004;Zong et al., 2019). Our ABBA-BABA analysis supported this scenario (Supplementary Figure 5b). Cytonuclear conflicts between deep clades were also likely caused by ancient chloroplast genome capture. Because sect. Turanga and sect. Populus maintained similar topologies in two phylogenies, the predecessors of these two sections should not have suffered chloroplast genome capture events. As shown in Figure 2, cytonuclear conflicts primarily occurred in subg. Eupopulus. With the exception of P. nigra, P. iliensis and one sample of P. × irtyschensis, subg. Eupopulus was divided into three clades in the chloroplast genome tree, clade C, D, and E (Supplementary Figure 3). As described above, the ancient chloroplast of these three exceptive taxa was captured from P. alba of sect. Populus. In the plastome tree, clade D, and E formed a monophyly and was a sister to the ancestors of sect. Turanga and sect. Populus, while clade C was a sister to sect. Populus. This topology suggested that the ancestors of clade C might suffered a chloroplast genome capture event from the ancestors of sect. Populus.
Alternatively, the interspecific gene flow is an evolutionary process that is responsible for generating gene tree discordance and therefore, hindering the estimation of species tree (Leaché et al., 2014). Recent hybridization can cause gene tree conflicts not only between species but also within species. IBD and ABBA-BABA analyses detected obvious gene flow between species within each clade in a previous study (Wang et al., 2020). In addition, species within sect. Aigeiros + sect. Tacamahaca that have overlapping ranges in China and North America are sexually compatible and hybridize freely (Eckenwalder, 1984;Mona et al., 2006;Schroeder and Fladung, 2010). The maternal phylogenetic relationships of intraspecies and interspecies of these two sections still remained puzzling, with different accessions of the same species not clustering together ( Figure 2B). In addition, the topology of these species remained exceedingly incompatible within subg. Eupopulus (Figure 2). For example, two samples of Populus trinervis were found to be separated from each other in the chloroplast genome tree (Figure 2B), which suggested its maternal origin (chloroplast) would come from at least two maternal lineages. Obvious hybridization has also been found and indicates that P. × irtyschensis was the F1-dominent hybrid between P. nigra and Populus laurifolia (Jiang et al., 2016). Our chloroplast genome tree also supported this hybridization event ( Figure 2B  and Supplementary Figure 3).

Biogeographical History
Ancestral area reconstruction with DEC + J model based on nuclear SNPs phylogeny indicated that Populus originated in the New World (Figure 5). Both morphological and fossil evidence also support the hypothesis of North American origin (Manchester et al., 1986(Manchester et al., , 2006Collinson, 1992;Eckenwalder, 1996). The first confirmed fossil of Populus in the early Middle Eocene, were attributed to sect. Abaso and occurred in North America (Collinson, 1992;Eckenwalder, 1996;Manchester et al., 2006). Though earlier leaves fossils of Populus were reported both in East Asia and North America, the fossils of these leaves could not be entirely identified as Populus (Guo, 1983;Collinson, 1992;Manchester et al., 2006). In addition, P. mexicana, the only extent species of sect. Abaso, which recovered as the base clade in species tree (Figures 1, 3), was also endemic in North America.
Land bridges also played a major role in the dispersal of Populus. The ancestors of sect. Populus, sect. Turanga, and subg. Eupopulus were presumed to have arrived in the Old World through the North Atlantic Land Bridge (NALB) and rapidly diverged within 5 Ma in the late Eocene (Figures 4, 5). After arriving the Old World, there has been a single migration from Eurasia to Africa that occurred in P. ilicifolia of sect. Turanga. The persistent warm and humid climate in the Eocene (McInerney and Wing, 2011) may have promoted the spread and the divergence of Populus. A dramatic decrease in temperature with 17 million years of cooling occurred at the Eocene-Oligocene boundary (∼33.7 Ma) (Zachos et al., 2001), which may have been the cause of extinction of many species of sect. Abaso and sect. Leucoides (Collinson, 1992;Budantsev, 2005). The extant P. mexicana that survived in Mexico and three sect. Leucoides species discontinuously distributed in southwest China and southeast America could have caused by the temperature drop and subsequent climate fluctuation. This persistent cooling and increasing aridification driven by the Eocene-Oligocene transition (Zachos and Kump, 2005;Dupont-Nivet et al., 2007;Abels et al., 2011;Passchier et al., 2013;Kalyaanamoorthy et al., 2017) seems to have led to the divergence between P. heterophylla and other species of subg. Eupopulus (Figures 4, 5). moreover, an interesting dispersal from East Asia to North America and a subsequent back and forth spread occurred from the late Oligocene to the early Miocene (node 3, 4, and 5; Figure 5). During this period, the Bering Land Bridge (BLB) was certainly available as a route (Tiffney, 2000;Tiffney and Manchester, 2001;Milne and Abbott, 2002). Another curiously intercontinental diffusion of sect. Populus occurred during the Miocene, a date range too young for the NALB yet very consistent with vicariance across the BLB, possibly as a result of climate cooling in Beringia before (Wolfe, 1980;Tiffney and Manchester, 2001;Gladenkov et al., 2002).
Though the main clades split early, most divergence of Populus occurred after 15 Ma (Figure 4), and mainly in East Asia. Southwest China also retained the most abundant Populus species. Similarly, most of the related species with distribution centers in the Tibetan plateau diverged after recent ∼15 Ma (Qiu et al., 2011). The uplift of the Himalayan-Tibetan plateau with significant geographical effects (Harris, 2006;Riesselman et al., 2007;Su et al., 2019) and drove the occurrence of new niches. The changes in climate and terrestrial ecosystems led to the retreat of a number of plants and provided strong opportunity for pioneers.

TAXONOMIC TREATMENT
Phylogenetic results from this and an earlier study (Wang et al., 2020) clearly indicated four clades among species among six sections. The paraphyletic three sections (i.e., sects. Aigeiros, Leucoides, and Tacamahaca) could be resolved by sinking all these taxa into a single subgenus, "Eupopulus, " which was proposed by Dode (1905) for their resinous winter buds. As all the four clades are wellsupported by both morphological and molecular evidence, and diverged prior to ca. 35 Ma, we here propose a new subgenus of Populus, and try to establish a foursubgeneric classification.

CONCLUSION
In this study, we collected almost all the wild species of Populus and reconstructed robust phylogenies from nuclear genome and chloroplast genome data. The deep phylogenetic relationships of Populus species were well resolved based on nuclear genomic phylogeny. According to the phylogeny, a new status, subg. Abaso was proposed, and a new classification system of four subgenera was established. Conflicts among clades and the inconsistency of phylogenetic positions of some hybrid species were also detected by concatenated and coalescent methods. In contrast, chloroplast genomic phylogeny was composed of five clades and inadequately settled the relationships among species of Populus. The cytonuclear discordance within Populus was extensive primarily owing to chloroplast capture and gene flow and was supported by the ABBA-BABA analysis. A New World origin of Populus and several migration events through land bridges were suggested by the phylogeny, divergence time analyses, and biogeographic implications. Based on comprehensive sampling, this study inferred the clear evolutionary history of Populus, and some confused taxa were proposed as species complexes, such as P. szechuanica complex and P. suaveolens complex. Further study based on population genetics method and more samples should be conducted to clarify relationships of close species.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are publicly available. This data can be found here: All newly assembled plastomes were deposited in GeneBank with accession numbers MW376757 -MW376858 and MK288023. Resequencing raw read data are stored at the NCBI Sequence Read Archive (SRA) in the Bioproject PRJNA687326.

AUTHOR CONTRIBUTIONS
YW, CS, and ZXZ conceived the idea. YW, JH, ZFZ, XZ, ZY, and CS collected the plant materials. YW, EL, FG, KL, DL, and XS analyzed the data. YW wrote and revised the manuscript. CS and ZZ revised the manuscript. All authors contributed to the article and approved the submitted version.