Plastome Sequences Help to Resolve Deep-Level Relationships of Populus in the Family Salicaceae

Populus, a core genus of Salicaceae, plays a significant ecological role as a source of pioneer species in boreal forests. However, interspecific hybridization and high levels of morphological variation among poplars have resulted in great difficulty in classifying species for systematic and comparative evolutionary studies. Here, we present phylogenetic analyses of 24 newly sequenced Populus plastomes and 36 plastomes from GenBank, which represent seven genera of Salicaceae, in combination with a matrix of eighteen morphological characters of 40 Populus taxa to reconstruct highly supported relationships of genus Populus. Relationships among the 60 taxa of Salicaceae strongly supported two monophyletic genera: Populus and Salix. Chosenia was nested within the genus Salix, and five clades within Populus were divided. Clade I included the three taxa P. euphratica, P. pruinosa, and P. ilicifolia. Clade II contained thirteen taxa [P. adenopoda, P. alba, P. bolleana, P. davidiana, P. hopeiensis, P. nigra, P. qiongdaoensis, P. rotundifolia, P. rotundifolia var. duclouxiana, P. tremula, P. tremula × alba, P. tomentosa, and P. tomentosa (NC)]. Clade III included the ten taxa P. haoana, P. kangdingensis, P. lasiocarpa, P. pseudoglauca, P. qamdoensis, P. schneideri, P. simonii, P. szechuanica, P. szechuanica var. tibetica, and P. yunnanensis. Clade IV included P. cathayana, P. gonggaensis, P. koreana, P. laurifolia, P. trinervis, P. wilsonii, and P. xiangchengensis. The last clade comprised P. angustifolia, P. balsamifera, P. deltoides, P. deltoides × nigra, P. fremontii, P. mexicana, and P. trichocarpa. This phylogeny is also supported by morphological traits, including bark smoothness, bud size, petiole shape, leaf inflorescence, male anther length and male anther tip.


INTRODUCTION
The family Salicaceae is primarily distributed in cold, tropical and warm temperate regions worldwide and is a dioecious woody and shrub plant throughout the northern hemisphere (Leskinen and Alström-Rapaport, 1999;Wang et al., 2014). Salicaceae includes over 50 genera, which comprise nearly 1000 species (Stevens, 2001;Chase et al., 2002;Byng et al., 2016).
Phylogenetic analyses of Salicaceae using DNA sequences from the chloroplast gene rbcL (Azuma et al., 2000;Chase et al., 2002) and ITS of nuclear rDNA (Leskinen and Alström-Rapaport, 1999) strongly suggest that Populus and Salix are both monophyletic.
The species of the genus Populus, commonly known as poplars, aspen and cottonwood, are widely distributed throughout the northern hemisphere from subtropical to boreal forests (Brawdshaw et al., 2000), and China is one of the most important poplar distribution areas (Wan et al., 2013). Populus species play important roles in ecosystems and serve as model organisms for basic research in molecular biology and genetics, as well as in plant domestication and conservation (Kuzovkina and Vietto, 2014), mainly due to their fast growth rates, easy vegetative propagation, numerous wood uses, and small genome sizes (Braatne et al., 1992;Stettler et al., 1996;Cronk, 2005;Hamzeh et al., 2006). Eckenwalder (1996) classified the genus Populus into 29 species grouped into six sections based on important morphological characteristics: persistence of the floral disk, presence or absence of marked foliar heteroblasty, overall leaf shape, distribution and shape of foliar teeth, number of carpels and flattened vs. round petiole. Section Abaso comprises a single species, P. mexicana; section Aigeiros contains three species and widely distributed in Europe and North America; section Leucoides shows a limited geographic distribution, with two species in China and the remaining species in southeastern United States; section Turanga contains three species and is geographically restricted to Central and West Asia and North Africa; sections Tacamahaca and Populus are relatively large, comprising nine and ten species, respectively (Hamzeh et al., 2006). However, the phylogenetic affinities between sections and the position of several taxa within these sections remain controversial. Many of the Populus species within a section, as well as between sections, are cross-compatible with each other (Hamzeh et al., 2006;Fladung and Schroeder, 2010). For example, species of the sections Aigeiros and Tacamahaca are sexually compatible, and natural hybridization occurs among several species of these sections (Zsuffa, 1975;Rajora and Zsuffa, 1984). Because of interspecific hybridization and high levels of morphological variation among poplars, the number of Populus species currently described in the literature ranges from 22 to 85, and hundreds of Populus hybrids and cultivars exist (Eckenwalder, 1977a(Eckenwalder, ,b, 1996Dickmann and Stuart, 1983;Wang and Gilbert, 2007;Wan et al., 2013). Discrepancies in the total number of Populus species could be attributed to the misinterpretation of some hybrids and difficulties involved in delineating species boundaries (Hamzeh and Dayanandan, 2004).
For the assessment of genetic relationships among Populus species, different methods have been applied. A phylogenetic analysis of poplars using 76 morphological traits of leaves, flowers, fruits, and inflorescences supported the monophyly of all sections except for Tacamahaca, which resolved into two paraphyletic groups (Eckenwalder, 1996). ITS sequences were used in a study of fifteen representative species from five sections to resolve the relationships among the species in the genus Populus (Shi et al., 2001). The results showed that Populus is a monophyletic group and can be divided into two main clades: section Leuce and the remaining sections. Another phylogenetic study, based on three non-coding regions of cpDNA (intron of trnL and intergenic regions of trnT-trnL and trnL-trnF) and two nuclear rDNAs (ITS1 and ITS2), showed polyphyletic relationships among species in the sections Tacamahaca and Aigeiros (Hamzeh and Dayanandan, 2004). The phylogenetic tree of seventeen species or hybrids from four sections in Populus based on the chloroplast marker trnL-F showed that the Populus section formed a separate clade, while sections Tacamahaca and Turanga could not be clearly separated, and P. szechuanica var. tibetica formed a weakly supported clade with the other species of section Tacamahaca (Wei et al., 2010). Moreover, based on the different combinations of plastid genomes (rbcL-a, psbI-psbK, psbA-trnH, and trnL-trnF), the phylogenetic relationship of 63 Populus individuals representing 32 species from five sections suggests that sections Populus and Leucoides formed a separate clade, while section Tacamahaca was divided into two groups, albeit with very weak bootstrap support . Although previous molecular systematic studies have been conducted, they were unable to solve the phylogenic relationships of Populus well. To reconstruct a more reliable phylogeny of Populus, it was necessary to sample more taxa, as well as to use more variable markers and genome sequences to resolve the phylogenetic problems at a lower taxonomic level. In addition, the most recent molecular studies of Populus have not explicitly analyzed morphology, therefore, a combined analysis including both molecular data and morphological data. Here, 24 newly sequenced Populus plastomes, sixteen Populus, fifteen Salix, and four previously reported plastomes and one partial Chosenia chloroplast genome sequence, all of which represent seven genera of the Salicaceae family, were used to reconstruct phylogenetic relationships.

Plant Materials
Twenty-four taxa, P. adenopoda, P. alba, P. bolleana, P. cathayana, P. davidiana, P. deltoides, P. euphratica, P. gonggaensis, P. haoana, P. hopeiensis, P. kangdingensis, P. lasiocarpa, P. pseudoglauca, P. qamdoensis, P. rotundifolia var. duclouxiana, P. schneideri, P. tomentosa, P. tomentosa narrow crown (P. tomentosa (NC)), P. simonii, P. deltoides × nigra, P. szechuanica, P. szechuanica var. tibetica, P. trinervis, and P. yunnanensis, representing the genus Populus of the family Salicaceae, were sampled following the taxonomy system of Eckenwalder (1996) and the Flora of China (Fang et al., 1999;Wu, 1999). We collected healthy, tender and fresh leaves from adult plants of target species. Details of the samples are given in Table 1. The voucher herbarium specimens for the 24 sampled Populus taxa were deposited at Southwest Forestry University (SWFU), Kunming, China. Approximately 15.0 GB of raw data were generated with 150 bp paired-end read lengths. Then, the raw data were used to assemble the complete chloroplast genome using GetOrganelle software (Jin et al., 2018) with P. trichocarpa as the reference. Genome annotation was performed with the program Geneious R8 (Biomatters Ltd., Auckland, New Zealand), and the start and stop codons were manually adjusted by comparison with the cp genome of P. trichocarpa. The tRNA genes were further confirmed using online tRNAscan-SE web servers (Schattner et al., 2005). The gene map of the annotated Populus chloroplast genome was drawn by OGdraw online (Lohse et al., 2007).  (Katoh and Standley, 2013). After manual adjustment with MEGA 5 (Tamura et al., 2011), sliding window analysis was performed to assess the variability (Pi) of the whole plastomes in DnaSP version 5 software (Librado and Rozas, 2009). The window length was set to 600 bp, and the step size was set to 200 bp.

Phylogenetic Analysis
To estimate phylogenetic relationships within the Populus, 59 taxa with available complete plastid genomes and one taxa ( . The complete genome matrix was aligned using MAFFT version 7 software (Katoh and Standley, 2013) and then manually edited using MEGA 5 (Tamura et al., 2011). The matrix of 60 chloroplast genome sequences included 180,546 characters, 4,425 of which were parsimary informative sites. Based on these parsimary informative sites, we performed maximum likelihood and Bayesian inference analyses. A maximum likelihood method for phylogenetic analysis was performed based on the GTR+I+G model in the RAxML version 8 with 1000 bootstrap replicates (Stamatakis, 2014). Maximum likelihood (ML) bootstrap support values (BS) ≥70% were considered well supported, and ML BS <50% were considered poorly supported or unresolved. Bayesian inference (BI) was performed using MrBayes 3.1.2 (Ronquist and Huelsenbeck, 2003). jModelTest 2.0 (Darriba et al., 2012)
To evaluate the morphological characters supporting relationships based on molecular data and to evaluate whether the investigated characters were phylogenetically conserved at the level of the whole phylogeny, we calculated mean Pagel's lambda (λ) (Pagel, 1999) and Blomberg's K values (Garland et al., 1992) at the species level for each character, thus obtaining phylogenetic information. Both indices assume the classic Brownian motion (BM) evolutionary model, with values varying from zero to one for λ and from zero to higher than one for K. λ values close to zero indicate there is no phylogenetic signal (the traits have evolved independently of phylogeny, and the traits of close relatives are not more similar than those of distant relatives), and λ values close to one indicate trait evolution according to BM. K-values close to zero indicate the phylogenetic signal is weaker than expected from the BM model of character evolution (low levels of phylogenetic character conservation). Kvalues close to or higher than one indicate a strong phylogenetic signal (Molina-Venegas and Rodríguez, 2017).
The significance of phylogenetic signals was determined by shuffling species' character values (999 times) across the tips of the phylogenetic tree and comparing the resulting K-values to those computed from the observed character data (Eichenberg et al., 2015), whereas the statistical significance of λ was assessed based on a comparison with the likelihood of a model that assumes complete phylogenetic independence (Pagel, 1999). The Bayesian tree based on the complete chloroplast genome sequences provided the standard tree topology. Phylogenetic signal analyses were carried out using the routines provided in the picante package available for R (Kembel et al., 2010).

Range of Variation in Different Plastomes of Populus
In this study, we determined the structure characteristics and gene contents of the complete plastid genomes of 24 Populus taxa. The cp genome lengths of the 24 Populus taxa ranged from 155,177 bp for P. rotundifolia var. duclouxiana to P. euphratica for 157,839 bp (Figure 1). All of these assembled into single circular, double-stranded DNA sequences, presenting a typical quadripartite structure, including one LSC with a length of 85,858 bp (P. euphratica) to 84,450 bp (P. rotundifolia var. duclouxiana), one SSC with a length of 16,421 bp (P. szechuanica var. tibetica) to 16,879 bp (P. hopeiensis), and a pair of IR with a length of 26,903 bp (P. hopeiensis) to 27,672 bp (P. cathayana) ( Table 4). All 24 chloroplast genomes contained 130 genes, of which 112 were unique and eighteen were duplicated in

IR Expansion and Contraction of Populus
The expansion and contraction of the IR region and the single copy (SC) boundary regions is considered a primary mechanism causing the length variation of angiosperm cp genomes (Kim and Lee, 2005). Although the overall genomic structure, including gene number and gene order, was well conserved among the Populus plastid genomes, these genomes exhibited obvious differences in the IR/SC boundary regions. The IR regions of the Populus plastid genomes ranged from 26,903 bp (P. hopeiensis) to 27,672 bp (P. cathayana) in size, and two complete or fragmented copies of rpl22 and ycf1 were located at the boundaries between the LSC or SSC regions and IRs regions in the 24 Populus plastomes. The full lengths of the rpl22 and ycf1 genes were 399 and 5442 to 5472 bp, respectively. The rpl22 gene crossed the IR-LSC boundary, with only 1 bp variation in sequence length among the Populus plastomes. The gene trnH in the LSC region contracted by 14 bp from the junction region of the IR-LSC boundary in four taxa, i.e., P. alba, P. bolleana, P. tomentosa, and P. tomentosa (NC), while the other taxa contracted by 3 bp. In addition, rps19 in the IRa region also contracted by a different number of bases (200-242 bp) among taxa. Gene ycf1 in the IRb region extended from 15 to 170 bp, whereas gene ycf1 in the IRa region extended 979 to 1725 bp (Figure 2).

Comparative Analysis of the Populus Plastid Genomes
Complete plastid genomes are valuable sources of genetic markers for phylogenetic analyses because of their highly conserved genome structure (Provan et al., 2001;Chaney et al., 2016). Although the plastid genome generally has a nearly collinear gene order, changes in the genome, such as sequence inversion and gene expansion or contraction at the boundary of the SC and IR regions, occur over the course of evolution (Cho et al., 2015;Choi et al., 2016). Our results showed that the genome size, gene order, and compositions of the 24 newly sequenced Populus genomes were very similar with each other and with previously sequenced Populus plastid genomes; the genome sizes ranged from 155,177 to 157,839 bp Wang et al., 2016;Zheng et al., 2016;Han et al., 2017). Among the genomes, three plastomes of P. alba, P. tomentosa, and P. tomentosa (NC), two plastomes of P. haoana and P. kangdingensis, two plastomes of P. trinervis and P. xiangchengensis, and two plastomes of P. deltoides and P. deltoides × nigra, each had the same length. All of the newly examined chloroplast genomes of Populus contained more AT than GC contents, with values ranging from 36.5 to 39.8%, and the GC contents of the IR regions were higher than those of the SC regions, possibly due to the presence of rRNA genes. Furthermore, we found that all 24 Populus plastomes encoded 85 protein-coding genes, 37 tRNA genes and eight rRNA genes, and eighteen of these genes were duplicated within the IRs. Moreover, the total intron numbers in the plastome were the same among the 24 Populus genomes, and in each plastome, twelve genes contained one intron, and three genes contained two introns.
The IR/LS boundary regions of the 24 complete Populus cp genomes were compared and showed slight differences in the junction positions. The rpl22 gene crossed the LSC/IRb boundary with only one bp variation in sequence length among the Populus plastomes. The gene trnH was located in the LSC region of all genomes but contracted by 3 or 14 bp from the LSC/IRa junction region, with a contraction of 14 bp in the plastomes of P. alba, P. bolleana, P. tomentosa, and P. tomentosa (NC) and a contraction of 3 bp in the other taxa. Gene ycf1 in the IRb region extended from 15 to 170 bp, whereas gene ycf1 in the IRa region extended 979 to 1725 bp. The length changes in the truncated ycf1 gene directly drive the contraction of the IR regions in the plastome of Populus.

Relationships in Populus
Most previous studies have used sequences from only one or more chloroplast loci in Salicaceae (Hamzeh and Dayanandan, 2004;Wei et al., 2010;Yun et al., 2015). However, the section delimitation and species relationships within Populus remain unclear. Our study included 60 plastid genomes for plants from seven genera of Salicaceae. All of these plastome sequences of Populus and related genera yielded a fully resolved tree, consistent with the Angiosperm Phylogeny Group's most recent phylogeny, APG IV (Chase et al., 2002;Byng et al., 2016). In addition, two monophyletic genera: Populus and Salix, were strongly supported, and Chosenia taxa were nested within the genus Salix, which is consistent with previous studies (Leskinen and Alström-Rapaport, 1999;Chase et al., 2002). The results suggested that Populus is a monophyletic sister group to Salix. Forty Populus taxa were separated into the following five clades in our study. Section Turanga in the study of Wu (1999) formed the first clade, including P. euphratica, P. ilicifolia, and P. pruinosa, in our phylogeny, which share the same character of micro flat petiole. Section Populus in the study of Wu (1999) formed the second clade, and the taxa in this clade shared the same character of flat petiole in contrast to other taxa (Figure 6). The phylogenetic placements of the first and second clades were consistent with previously published phylogenetic relationships Wei et al., 2010). The results of Li et al. (2007) showed that section Turanga and section Populus can be divided into two independent groups based on AFLP markers, and the trnL-F sequence analysis also supported section Populus as a separate clade (Wei et al., 2010). In our morphology analysis, sections Turanga and Populus shared the same characters of bud size, but the characters of bark smoothness, petiole shape and leaf in inflorescence were different (Figure 6). Therefore, we can distinguish the two sections according to the three characters. Interestingly, our phylogenomic analysis showed that P. nigra, which was classified in section Aigeiros in the study of Wu (1999) is nested among the members of section Populus. The character of bud size different between P. nigra and P. deltoides (section Aigeiros), but the characters of small bud, flat petiole, no leaf in inflorescence, short anther length and cuneate anther tip were similar in section Populus. Therefore, the plastid data and morphology characteristics both revealed that P. nigra had a close affinity to species of section Populus, which supported the possibility of ancient hybridization by which P. nigra appeared to be an introgressant of the P. alba lineage and some other presently unknown paternal lineage of section Populus (Smith, 1988;Hamzeh and Dayanandan, 2004).
Section Tacamahaca in the study of Wu (1999) contained nearly 60 species and varieties that share the same characters of petiole eglandular, flower disk, and capsule usually glabrous or rarely pilose (Fang et al., 1999;Song, 2006). In our analysis, the members of section Tacamahaca were divided into clade III, clade IV and clade V, and the three clades shared the same character of a tetrete petiole. Clade III contained P. haoana, P. kangdingensis, P. qamdoensis, P. schneideri, P. simonii, P. szechuanica, P. szechuanica var. tibetica, P. yunnanensis, and the taxa of P. lasiocarpa and P. pseudoglauca, which had previously been retrieved as members of section Leucoides in the study of Wu (1999). Clade IV included the taxa P. cathayana, P. koreana, P. laurifolia, P. trinervis, P. xiangchengensis, P. gonggaensis and P. wilsonii, but the last two taxa were classified in section Leucoides. Clade V included P. angustifolia, P. balsamifera, P. trichocarpa, P. deltoides, P. deltoides × nigra, P. fremontii, and P. mexicana. All of the taxa in clade V had the character of a middle bud unlike the other two clades. Section Aigeiros in the study of Wu (1999), including P. deltoides, P. deltoides × nigra, and P. fremontii, was nested within clade V and shared the same characters of a middle bud, no leaf in inflorescence, short anther length and cuneate anther tip with the taxa of section Tacamahaca (Figure 6). Furthermore, the character of shallowly furrowed bark in P. fremontii was shared with P. cathayana, P. koreana, P. laurifolia, and P. trinervis which were members of section Tacamahaca. Therefore, our analysis did not support section Aigeiros as a sister section of sections Populus, Tacamahaca and Turanga. Curiously, P. deltoides × nigra was placed among the species of section Aigeiros, but the character of bud size was inherited from the parent P. nigra unlike the other members in clade V. Meanwhile, P. mexicana shared the four characters of petiole shape, leaf in inflorescence, male anther length and tip shape with section Turanga. Therefore, we speculated that P. mexicana may be caused by ancient hybridization or chloroplast capture from excluded samples or now-extinct North America poplar, similar to previous studies on two North America species P. trichocarpa and P. balsamifera (Huang et al., 2014;Liu et al., 2017).
Our analysis showed that clade V was a sister clade to clade IV. Clade V includes seven species (P. angustifolia, P. balsamifera, P. deltoides, P. deltoides × nigra, P. fremontii, P. mexicana, and P. trichocarpa) that are primarily distributed in North America. Clade IV comprises seven species that mainly exist within Eurasia. The phylogenetic relationships between clade IV and clade V matched well with the morphology data, such as bud size, although not well with smooth bark and petiole shape (Figure 6). The phylogenetic relationships between clade IV and clade V supported the hypothesis regarding the migration of species from Asia to North America (Tiffney and Manchester, 2001;Milne, 2006;Liu et al., 2017). The authors who proposed this hypothesis speculated that a common ancestor first appeared in North America and then dispersed to other continents via the North Atlantic Land Bridge and the Bering Land Bridge. Then, with the disappearance of the Bering Land Bridge, new species emerged due to the geographic isolation (Tiffney and Manchester, 2001;Milne, 2006;Liu et al., 2017).
The taxa in clade III and clade IV had an overlapping distribution, such as P. xiangchengensis and P. gonggaensis in clade IV, which were distributed in southwest China, while the taxa of P. schneideri and P. kangdingensis in clade III were also distributed in this area. Clade III and IV showed high similarity in morphology characters (Figure 6), thus complicating their distribution. Liu and Fu (2004) considered P. xiangchengensis a hybridization of P. schneideri and P. pseudoglauca based on morphological characteristics, while another study suggested that P. xiangchengensis was a likely hybrid species of P. kangdingensis and P. pseudoglauca (Wan et al., 2009). P. schneideri was also considered a natural hybrid formed by P. kangdingensis and P. cathayana (Chen et al., 2007;Wang, 2012). The uplift of the Qinghai-Tibet (Q-T) plateau began approximately 40 million years ago (Ma) following the collision of India and Asia (Chung et al., 1998). Some new niches were created by the uplifts, while other plants retreated in response to the climatic changes, thus providing strong opportunities for new hybrid species to emerge (Wang et al., 2005;Liu et al., 2006;Wang and Gilbert, 2007;Lu et al., 2014). Meanwhile, the uplift of the Q-T plateau also induced significant geographical effects (He et al., 2015). These complex factors account for section Tacamahaca' s presence among the current phylogenetic relationships. Further research using more genetic data and more individuals per species is needed to deeply illuminate the complex relationships in section Tacamahaca in the future.

AUTHOR CONTRIBUTIONS
DZ performed the experiments, analyzed the data, wrote the manuscript, and prepared the figures and/or tables. PG performed the experiments and prepared the figures and/or tables. AZ, YZ, and XZ analyzed the data and reviewed drafts of the manuscript. AD, YS, and CH conceived and designed the experiments, reviewed the drafts of the manuscript, and approved the final draft. All authors read and approved the final manuscript.

FUNDING
This work was supported by the National Natural Science Foundation of China (Grant Nos. 31460205, 31860219, and 31360184) and the Forestry Public Benefit Research Program (Grant No. 201104076).