Plastid Phylogenomics and Plastome Evolution of Nandinoideae (Berberidaceae)

Subfamily Nandinoideae Heintze (Berberidaceae), comprising four genera and ca. 19 species, is disjunctively distributed in eastern North America vs. Eurasia (eastern Asia, Central Asia, Middle East, and southeastern Europe), and represents an ideal taxon to explore plastid phylogenomics and plastome evolution in Berberidaceae. Many species of this subfamily have been listed as national or international rare and endangered plants. In this study, we sequenced and assembled 20 complete plastomes, representing three genera and 13 species of Nandinoideae. Together with six plastomes from GenBank, a total of 26 plastomes, representing all four genera and 16 species of Nandinoideae, were used for comparative genomic and phylogenomic analyses. These plastomes showed significant differences in overall size (156,626–161,406 bp), which is mainly due to the expansion in inverted repeat (IR) regions and/or insertion/deletion (indel) events in intergenic spacer (IGS) regions. A 75-bp deletion in the ndhF gene occurred in Leontice and Gymnospermium when compared with Nandina and Caulophyllum. We found a severe truncation at the 5’ end of ycf1 in three G. altaicum plastomes, and a premature termination of ropC1 in G. microrrhynchum. Our phylogenomic results support the topology of {Nandina, [Caulophyllum, (Leontice, Gymnospermium)]}. Within the core genus Gymnospermium, we identified G. microrrhynchum from northeastern Asia (Clade A) as the earliest diverging species, followed by G. kiangnanense from eastern China (Clade B), while the rest species clustered into the two sister clades (C and D). Clade C included three species from West Tianshan (G. albertii, G. darwasicum, G. vitellinum). Clade D consisted of G. altaicum from northern Central Asia, plus one species from the Caucasus Mountains (G. smirnovii) and three from southeastern Europe (G. odessanum, G. peloponnesiacum, G. scipetarum). Overall, we identified 21 highly variable plastome regions, including two coding genes (rpl22, ycf1) and 19 intergenic spacer (IGS) regions, all with nucleotide diversity (Pi) values > 0.02. These molecular markers should serve as powerful tools (including DNA barcodes) for future phylogenetic, phylogeographic and conservation genetic studies.

Within Nandinoideae, only six out of the ca. 19 species have fully sequenced plastomes, i.e., Nandina domestica, Caulophyllum robustum, Leontice armeniaca, L. incerta, Gymnospermium microrrhynchum, and G. kiangnanense (Moore et al., 2006;Sun et al., 2018;Yang et al., 2018;He et al., 2019). In this study, we newly sequenced 20 complete plastomes of Nandinoideae, representing three genera and 13 species of this subfamily (including additional accessions of C. robustum, G. microrrhynchum, and G. kiangnanense). Together with the six plastomes previously reported, we included a total of 26 complete plastomes, representing all four genera and 16 species of Nandinoideae (ca. 84% of the group's total species diversity). Based on this extensive plastome dataset, our specific aims were  to (1) characterize and compare the structure as well as gene content and order among these plastomes to gain further insights into their evolution; (2) infer phylogenetic relationships within Nandinoideae, especially for the core genus Gymnospermium; and (3) identify highly variable plastome regions in Nandinoideae for future phylogenetic, phylogeographic and/or conservation genetic studies.

Plant Materials and DNA Extraction
For whole plastome sequencing, we collected fresh leaves of 13 Nandinoideae species, including 10 of Gymnospermium, one of Leontice, and two of Caulophyllum, resulting in 20 individuals overall (1-2 individuals per species, see Supplementary Table 1). Voucher specimens were deposited at the herbarium of Tarim University (TARU), Alar, Xinjiang, China (Table 1). Total genomic DNA was extracted from the silica-gel dried leaf tissues using DNA Plantzol Reagent (Invitrogen, Carlsbad, CA, United States), following the manufacturer's protocol. The quality and quantity of genomic DNA was determined using both 1% agarose gel electrophoresis and an ultraviolet spectrophotometer (K5800, KAIAO, Beijing, China).

Comparative Plastome Analyses
Gene differences among all above 26 Nandinoideae plastomes, including those downloaded from GenBank (see "Genome Sequencing, Assembly, and Annotation"), were performed in Shuffle-LAGAN mode on mVISTA (Frazer et al., 2004). Rearrangements of plastomes were checked by Mauve 2.3.0 (Darling et al., 2004). The structure and junctions between the SSC/LSC regions and the IR regions (see Results) were investigated using IRscope (Amiryousefi et al., 2018).
Protein-coding (CDS), intergenic spacer (IGS), and intron regions with alignment length > 200 bp and containing at least one mutation were extracted sequentially. Nucleotide diversity (Pi) values for each of those regions were calculated in DNASP v6 (Rozas et al., 2017).
were constructed using maximum likelihood (ML) and Bayesian inference (BI) methods in RAxML-HPC2 on XSEDE v8.2.12 (Stamatakis, 2014) and MrBayes on XSEDE v3.2.7 (Ronquist et al., 2012), respectively. Both analyses are implemented on the CIPRES Science Gateway website 2 . The best-fitting nucleotide substitution model (GTR + I + G) is based on the Akaike Information Criterion (AIC) in jModelTest v2.1.6 (Darriba et al., 2012). For the ML analysis, we set 1,000 bootstrap replicates and defaulted the other parameters. For the BI analysis, we run two independent Markov chain MonteCarlo (MCMC) chains, each for 1,000,000 generations, and sampling every 1,000 generations; the first 25% of the trees were discarded.

Plastome Features
All 26 plastomes of Nandinoideae, representing 16 species and four genera, exhibited the typical angiosperm quadripartite structure, including a pair of IR regions (IRa, IRb), separated by SSC and LSC regions (Figure 2 All 26 plastomes of Nandinoideae had their LSC/IRb junctions located within the rps19 gene, and thus contained a ψrps19 (62-173 bp) in IRa (Figure 3). In almost all plastomes, the SSC/IRa boundary was located within the ycf 1 gene, except for G. altaicum (II), in which the SSC/IRa boundary was situated in the spacer region ycf 1-trnN-GUU (Figure 3). Therefore, there was no ψycf 1 in G. altaicum (II), while for the remaining 25 plastomes, there was a ψycf 1 with different length. The shortest ψycf 1 (98 bp) was found in two (of the three) G. altaicum accessions (I and III), and the longest (4,786 bp) in G. smirnovii; in the other 23 plastomes, the length of this pseudogene varied from 868 to 1,184 bp (Figure 3). Hence, within Nandinoideae, the plastome of G. smirnovii (161,406 bp) was significantly longer than other plastomes, and this is mainly due to the increased length of ψycf 1, which resulted in a significant expansion of the respective IR regions.
Of all the 114 unique genes identified across the 26 Nandinoideae plastomes, 26 differed in length, mostly less than 10 bp, albeit with some exceptions (see Supplementary Tables   1, 2). Specifically, in all three accessions of G. altaicum (I-III), the 5 end of the ycf 1 gene was ca. 1,000 bp shorter (I and III: 919 bp; II: 1,127 bp) than in the other plastomes, i.e., the ycf 1 genes were seriously deleted at the 5 end of the three plastomes of G. altaicum. In G. microrrhynchum (I), the ropC1 gene was ca. 750 bp shorter than the other 25 Nandinoideae plastomes due to an insertion of "A" in the locus of 1214 bp/1314, which resulted in its premature termination. For all the plastomes of Leontice and Gymnospermium, the ndhF gene had a 75-bp deletion (in the middle part) when compared with that of Nandina and Caulophyllum, the two early diverging genera of Nandinoideae (see "Comparative Plastome Analysis and Identification of diversity hotspot regions"). The GC content of the 26 plastomes ranged from 37.7% to 38.2%, whereby the highest values occurred in the IR regions (41.5-43.2%), followed by the LSC (36.1-36.6%) and SSC (30.7-32.6%) regions ( Table 2).

Phylogenetic Analyses
The 80 plastome CDS regions of the 16 species (n = 26 accessions) of Nandinoideae and the two Berberis outgroup species were aligned with a total length of 70,192 bp. The topologies of the resulting ML and BI trees (Figure 5) were fully congruent and well resolved, with highly supported nodes. In fact, all three multi-species genera (Caulophyllum, Leontice and Gymnospermium) and all species with multiple accessions formed distinct clades, with both maximum ML bootstrap support (BS = 100%) and highest posterior probability (PP = 1). Only a few nodes (2 for ML tree vs. 4 for BI tree) in the tree received no full support. Within Nandinoideae, Nandina (N. domestica) was identified as the first diverging lineage, followed by Caulophyllum (represented by 2 spp./n = 3) and the sister genera Leontice (3 spp./n = 3) and Gymnospermium (10 spp./n = 19). Within Leontice, L. armeniaca was recovered as sister to L. ewersmannii + L. incerta. Within Gymnospermium, G. microrrhynchum (northeastern Asia) was the first diverging species (Clade A), followed by the East China endemic G. kiangnanense (Clade B). All the remaining species formed two highly supported sister clades (C, D), comprising three species from West Tianshan (Clade C: G. albertii as sister to G. vitellinum + G. darwasicum) vs. five species (Clade D) distributed in the Altai Mountains (G. altaicum), Caucasus Mountains (G. smirnovii), and southeastern Europe (G. odessanum, G. peloponnesiacum, G. scipetarum). Noticeably, within the subclade G. darwasicum (I and II) + G. vitellinum, the latter was embedded within the former, with low support (BS = 62%, PP = 0.8) at the node of G. darwasicum (II) and G. vitellinum. Within Clade D, G. scipetarum + G. smirnovii appeared to be a sister group to a subclade (BS = 91%, PP = 1) comprised of G. altaicum + G. odessanum + G. peloponnesiacum.

Comparative Plastome Genomics
All the 20 plastomes of Nandinoideae newly sequenced in this study contained 114 unique genes and in the same order (Supplementary Table 1 and Supplementary Figures 1, 2). This is consistent with six previously reported plastomes of this subfamily (Moore et al., 2006;Sun et al., 2018;Yang et al., 2018;He et al., 2019), and also agrees with the gene content and order of plastomes of Podophylloideae (Ye et al., 2018). However, these Nandinoideae plastomes have one more gene (rpoA) than the genera of Berberidoideae, such as Alloberberis C. C. Yu and K. F. Chung, Berberis, Mahonia, and Moranothamnus C. C. Yu and K. F. Chung (Hsieh et al., 2022), and two more genes (inf A, ycf 15) than Epimedium , a member of Podophylloideae. In the present study, we also observed that the ψycf 1 pseudogene of Gymnospermium smirnovii was ∼3,700 bp longer than in the 25 other Nandinoideae plastomes, which resulted in a significant IR expansion, as likewise found in several species of Berberidoideae and Podophylloideae (Epimedium ecalcaratum G. Y. Zhong, MN939634; E. brevicornu Maxim., MN371716) (Ma et al., 2013;Sun et al., 2018;Zheng et al., 2019;Hsieh et al., 2022). Insertions/deletions (indels) in the intergenic spacer (IGS) regions also contributed significantly to plastome length variations among species of Nandinoideae. As a case in point, the plastomes of two accessions of Gymnospermium microrrhynchum (I and KM057373) and one of G. kiangnanense (MH298010) were > 160,100 bp, while most other plastomes were < 158,600 bp, except for G. smirnovii (161,406 bp). Moreover, within G. kiangnanense, the plastome of accession 'I' was 2,180 bp shorter than that of 'MH298010, ' mainly due to indels in the IGS regions. However, such large differences were not found in other species of Nandinoideae, and further study is needed to clarify the apparently dynamic evolution of the G. kiangnanense plastome. In general, indel and premature gene termination events are well known factors of plastome evolution in angiosperms (Fu et al., 2017;Wang et al., 2019;Zhang et al., 2020), including species of Berberidaceae (Sun et al., 2016(Sun et al., , 2018Hsieh et al., 2022). However, the severe truncation at the the 5 end of the ycf 1 gene in the three analyzed plastomes of G. altaicum plastomes (I and III: 919 bp; II: 1,127 bp), and the premature termination in the ropC1 gene have not been reported in the previous complete plastome analyses of Berberidaceae. Notably, a 75-bp deletion in the ndhF gene occurred in Leontice and Gymnospermium, which thus can be interpreted to present a molecular synapomorphy of these sister genera ( Figure 5). As to another interesting observation, the length of the rps7 gene was consistently 468 bp across the 26 Nandinoideae plastomes analyzed, similar length ( 460 bp) with most genera of Berberidaceae (Berberis, Mahonia, Achlys DC., Bongardia C. A. Mey., Vancouveria C. Morren and Decne., Epimedium and Jeffersonia W. Bartram), while this gene is considerably short (63-78 bp) in some species of Podophylloideae Sun et al., 2018;Ye et al., 2018;Hsieh et al., 2022). On the other hand, significant variation in the length of the accD gene has recently been reported for Berberidoideae (Hsieh et al., 2022), but this is not observed in Nandinoideae (Supplementary Table 2).

Phylogenetic, Taxonomic and Biogeographic Inferences
Our phylogenomic ML and BI analyses of Nandinoideae, based on 80 plastome-derived CDS genes from 16 (out of ca. 20) species and 26 accessions, plus two species of Berberis as outgroup (Figure 5), is the most comprehensive phylogenetic study of this subfamily by now, and provided a robust resolution for both generic and species relationships. Our results revealed a sister relationship between Gymnospermium and Leontice, which is consistent with previous studies (Kim and Jansen, 1996;Wang et al., 2007;Sun et al., 2018;Hsieh et al., 2022). However, this result differs from previous morphologicalcladistic studies (Loconte and Estes, 1989;Nickol, 1995), which inferred Leontice as sister to either Caulophyllum or Bongardia. Within Leontice, we identified L. armeniaca from Western Asia as sister to a clade comprising two other species from Central Asia (L. ewersmannii, L. incerta). In recent studies, Gymnospermium maloi Kit Tan & Shuka has been reduced to a heterotypic synonym of G. scipetarum, based on morphological, karyological and molecular evidence (Tan et al., 2011;Barina et al., 2015Barina et al., , 2017. Hence, our sampling of Gymnospermium contained all species except one, G. silvaticum from western Tianshan. Rosati et al. (2018) sampled six Gymnospermium species for molecular analysis based on ITS and trnL-F sequences, and identified Italian (southern Apennine) populations of Gymnospermium representing the genus' western range limit, as a subspecies of G. scipetarum (subsp. eddae Rosati, Farris, Fascetti and Selvi). In an earlier phylogenetic study, Barina et al. (2017) had sampled seven species of Gymnospermium to infer the genus' 'European' evolutionary history, but lacked particular sequence information for G. microrrhynchum (ITS) and G. kiangnanense (ndhF-trnL), and neither included any West Tianshan species. Our phylogeny (Figure 5) is the first to identify G. microrrhynchum (Clade A) from northeastern Asia as the genus' earliest diverging species, followed by G. kiangnanense (Clade B) from eastern China (representing the genus' southern range limit). Three species from West Tianshan (G. albertii, G. darwasicum, G. vitellinum) formed a highly supported derived clade ('C'), with G. vitellinum showing closer relationships to the two samples of G. darwasicum (Figure 5). Indeed, based on morphological characters, G. vitellinum has previously been inferred to be most closely related to G. darwasicum (Kral, 1981), yet due to lack of resolution, our phylogenomic data cannot exclude the possibility that these two species might be conspecific and/or hybridizing (etc.). As sister to the West Tianshan Clade C, we identified a highly supported group (Clade 'D') of mostly (but not exclusively) westerly distributed species (see also Tian and Mullaj, 2001a,b;Barina et al., 2007;2015;Rosati et al., 2014), including G. smirnovii (Caucasus Mountains) + G. scipetarum (central Albania, southern Montenegro, Italy/southern Apennines) as likely sister group of a subclade (BS = 91, PP = 1) comprising G. altaicum (Central Asia; Abidkulova et al., 2021) as putative sister to G. odessanum (northern Black Sea area) + G. peloponnesiacum (southern Greece/Peloponnese). Clearly, these relationships have to be treated with some caution given that one species of Gymnospermium is missing in our taxon sampling, and further studies are required to clarify their biogeographic history, but which already now seems to point at an east-to-west expansion. Considering the apparently close relationship between G. peloponnesiacum and G. altaicum, it is noteworthy they retained several distinct morphological features even when cultivated together in the Göteborg Botanical Garden (Karl and Strid, 2009). Moreover, with regard to the putative sister pair G. smirnovii/G. scipetarum (I, II), it is worth to recall that the plastome of only the former species accession shows a significant expansion of the IR regions (caused by ycf 1 length variation); yet again a wider sampling within both species would be needed to clarify whether the presence/absence of this IR expansion could serve as a species-diagnostic molecular marker. Altogether, we suggest that the five species of Clade D should be considered as independent species, rather than subspecies or varieties of G. altaicum (Stearn et al., 1993;Tian and Mullaj, 2001a;Phitos et al., 2003;Doroftei and Mierla, 2007).
Considering broader, yet preliminary biogeographic inferences from our phylogeny (Figure 5), it seems possible that Gymnospermium originated in northeast Asia and subsequently spread southward and westward, resulting in the genus' high species diversity seen today. Notably, in our field surveys in China (Pan Li and Zhaoping Yang, personal observation), species of the Eurasian/North American genus Erythronium L. (Liliaceae; ca. 27-32 spp.) were often found to be associated with Gymnospermium, for instance, G. microrrhynchum with E. japonicum Decne. in northeast China, and G. altaicum with E. sibiricum (Fisch. and C. A. Mey.) Krylov in the Altai Mountains of northwest China. Based on an earlier molecular (plastid/nuclear) phylogeny of Erythronium (Allen et al., 2003), there is relatively little divergence among the four Eurasian species (distributed from Portugal to Japan). In addition, we found many other spring ephemerals accompanying Gymnospermium and Erythronium in northeast and northwest China, such as Gagea Salisb. and Tussilago L., all of which share similar distribution patterns across Eurasia (Peterson et al., 2008;Chen and Nordenstam, 2011). Hence, it would be extremely interesting to clarify in future comparative phylogeographic studies whether all these spring ephemerals share a similar biogeographical history of east-to-west expansion routes across Eurasia, including joint areas of (glacial) refugial survival.
Finally, it worth to briefly review previous time estimates of divergence within Nandinoideae. For example, in their plastid (rbcL) study on plant (sister-) species pairs with disjunct distributions in East Asia (EA) and eastern North America (ENA), Xiang et al. (2000) estimated the divergence time between Caulophyllum robustum (EA) and C. thalictroides (ENA) to the Early Pleistocene, ca. 2.38 ± 1.69 million years ago (Mya). By contrast, Barina et al. (2017), using nuclear (ITS) and plastid sequences (ndhF-trnL), dated the split between Leontice and Gymnospermium (likely in East Asia) to the mid-Miocene, ca. 16.4 (32.7-7.4) Mya, while the onset of diversification the latter genus, including the westward spread of the G. altaicum lineage to the Black Sea area, likely occurred in the early Late Miocene, ca. 11.4 (23.4-4.8) Mya. Based on a plastid phylogenomic approach, Sun et al. (2018) dated the divergence of Nandina and the other three genera of Nandinoideae at 33-13 Mya, and ca. 5 Mya for Leontice and Gymnospermium. Altogether, these authors included only one species for each genus of Nandinoideae, and failed to reconstruct the ancestral areas of Caulophyllum, Leontice, and Gymnospermium, with uncertain ancestral area (Sun et al., 2018). As a result, the evolutionary history of Nandinoideae was still unclear. In total, these inferences of divergence times and dispersal routes were mainly based on a limited sampling of taxa and mostly plastid (Xiang et al., 2000;Sun et al., 2018), or more rarely, plastid and nuclear (ITS) data (Barina et al., 2017). Hence, further phylogenomic analyses of both plastome and nuclear genomic (or transcriptomic) data are required to fully uncover the evolutionary and biogeographic history of Nandinoideae.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

AUTHOR CONTRIBUTIONS
PL and ZY designed the research and provided the research resources. PL, DZ, ZY, SS, XL, XZ, and JL collected the plant materials. SS, HL, and ZY assembled and analyzed the data and prepared the figures and tables. ZY, SS, PL, and HC wrote the manuscript. All authors revised the manuscript.

ACKNOWLEDGMENTS
We are grateful to Xinjie Jin and Xiaokai Fan for helpful suggestions in the data analyses.