Species Delimitation and Interspecific Relationships of the Genus Orychophragmus (Brassicaceae) Inferred from Whole Chloroplast Genomes

Genetic variations from few chloroplast DNA fragments show lower discriminatory power in the delimitation of closely related species and less resolution ability in discerning interspecific relationships than from nrITS. Here we use Orychophragmus (Brassicaceae) as a model system to test the hypothesis that the whole chloroplast genomes (plastomes), with accumulation of more variations despite the slow evolution, can overcome these weaknesses. We used Illumina sequencing technology via a reference-guided assembly to construct complete plastomes of 17 individuals from six putatively assumed species in the genus. All plastomes are highly conserved in genome structure, gene order, and orientation, and they are around 153 kb in length and contain 113 unique genes. However, nucleotide variations are quite substantial to support the delimitation of all sampled species and to resolve interspecific relationships with high statistical supports. As expected, the estimated divergences between major clades and species are lower than those estimated from nrITS probably due to the slow substitution rate of the plastomes. However, the plastome and nrITS phylogenies were contradictory in the placements of most species, thus suggesting that these species may have experienced complex non-bifurcating evolutions with incomplete lineage sorting and/or hybrid introgressions. Overall, our case study highlights the importance of using plastomes to examine species boundaries and establish an independent phylogeny to infer the speciation history of plants.


INTRODUCTION
It is rather difficult to delimit recently diverged species and construct their interspecific relationships because of insufficient informative variations of sampled DNA fragments (Schluter, 2000;Arnold, 2006). The genome-scale sequence variations were found to increase the phylogenetic resolutions of both high-and low-taxonomic groups (e.g., Yoder et al., 2013;Lamichhaney et al., 2015). It is still expensive to collect nuclear genome variations between species for most none-model genera without the reference genome. However, chloroplast genomes (plastome) are relatively easy to be assembled to examine interspecific relationships for phylogenetic analyses, especially in addressing unresolved relationship at low taxonomic levels (Wu et al., 2010;Nock et al., 2011;Yang et al., 2013;Huang et al., 2014;Carbonell-Caballero et al., 2015). Plastomes are haploid with maternal inheritance in most angiosperms (Corriveau and Coleman, 1988;Zhang and Liu, 2003;Hagemann, 2004) and are highly conservative in gene order and genome structure with rare recombinations (Jansen et al., 2007;Moore et al., 2010). In this study, we aimed to examine species delimitation and interspecific relationships in Orychophragmus through assembling chloroplast genomes of multiple individuals of tentatively delimited species (Hu H. et al., 2015).
Orychophragmus is a small genus in the mustard family (Brassicaceae, Cruciferae) distributed in northern, central, and southeastern China (Zhou et al., 2001). Its plants have been widely cultivated as ornamentals, vegetables, or source of seed oil (Sun et al., 2011). Despite controversial species delimitations in the genus (Zhou, 1987;Tan et al., 1998;Al-Shehbaz and Yang, 2000;Zhou et al., 2001;Wu and Zhao, 2003;Sun et al., 2012), our recent study based on nuclear (nr) ITS sequence variations suggested the recognition of seven species (Hu H. et al., 2015). Orychophragmus is sister to Sinalliaria, which is a genus endemic to China with one (Zhou et al., 2014) or two independent species (Hu H. et al., 2015). Nuclear ITS sequence variations support the recognition of seven species and strongly resolve their interspecific relationships, but four DNA fragments (matK, rbcL, trnH-psbA, and trnL-F) of the chloroplast genome failed to do so (Hu H. et al., 2015). It was hypothesized that this was caused by slow rates of chloroplast DNA evolution, frequent introgression, and incomplete lineage sorting. In order to test these hypotheses, we assembled 17 plastomes from six species of the genus using next-generation Illumina genomeanalyzer platform with a guided-reference plastome. We aimed to address the following questions: (1) could the plastome sequence variations confirm the species delimitation obtained by the ITS dataset? (2) are interspecific relationships well resolved with high statistical supports? (3) if so, are phylogenetic relationships based on the plastome sequence variations consistent or inconsistent with those obtained from the nuclear ITS dataset?

Plant Materials
We used at least two individuals from two different populations for each of six species, except the recently extinct Orychophragmus ziguiensis. In total, 17 individuals from 17 populations were sampled (see Table S1), and fresh leaves of each individual were immediately dried in silica gel for DNA extraction. For the final analyses, we downloaded five complete plastomes from five populations of two species of Sinalliara (Zeng et al., 2016). We also included the plastomes of Brassica napus L., B. juncea (L.) Czern., Eutrema heterophyllum (W. W. Sm) H. Hara, Lobularia maritima (L.) Desv., Capsella grandiflora (Fauché & Chaub.) Boiss., Arabidopsis thaliana (L.) Heynh., and Aethionema grandiflorum Boiss. and Hohen. That we downloaded from GenBank (Genbank accessions see in Table S1). The plastome of Carica papaya L. was used as the outgroup for phylogenetic analyses.

DNA Extraction and Plastome Sequencing
Total genomic DNA was extracted from 20 mg silica gel-dried leaves by using a modified 2 × cetyltrimethylammonium bromide (CTAB) procedure (Doyle, 1987). The library construction and sequencing were finished at the Beijing Novogene bio Mdt InfoTech Ltd (Beijing, China). The qualified and purified DNA samples were randomly fragmented with a Covaris sonication device. The DNA fragments were endrepaired, phosphorylated, and A-tailed. Adapters were then ligated with index adapters. The ligated fragments were amplified for library construction. The qualified libraries were applied to an Illumina flowcell for cBOT cluster generation. Sequencing was performed on an Illumina MiSeq instrument.

Plastome Assembly and Annotation
The raw reads for all samples contained a few adapterrelated paired reads. Reads with over 10% containing N or with low quality (Q <= 5) were trimmed to acquire clean reads to ensure the high-quality following analysis. All of the clean reads were initially mapped to all published Brassicacae chloroplast genomes (29 species) using BWA v.0.7.12 (Li and Durbin, 2009) and SAMtools v.1.2 . We then applied Velvet v.1.2.10 (Zerbino and Birney, 2008) to assemble these reads into the complete plastid genomes, and gaps were filled with GapCloser v.1.12 (http://soap.genomics.org.cn/index.html). We finally annotated the chloroplast genomes using Plann v.1.1.2 (Huang and Cronk, 2015) and manually corrected for start and stop codons and for intron/exon boundaries to match gene predictions with Geneious v.R.8.1.4 (Kearse et al., 2012) and Sequin v.15.10 (http://www.ncbi.nlm.nih.gov/Sequin/) based on Arabidopsis thaliana chloroplast genome as a reference annotation. The visual images about annotation information were generated by OGDRAW v.1.1 (http://ogdraw.mpimp-golm.mpg.de/) (Lohse et al., 2013). Full alignments with annotation information were plotted using the mVISTA (Mayor et al., 2000). All plastomes are reported here for the first time and were submitted to GenBank with the accession numbers of KX364399 and from KX756547 to KX756551.

Phylogenetic Analyses
All plastome sequences were aligned using MAFFT v.7 (Katoh and Standley, 2013) and adjusted manually where necessary using MEGA v.6 (Tamura et al., 2013). Phylogenetic analyses were performed by using the whole plastome data and the aligned data including only the hotspot mutation regions. We used JModeltest v.2.1.1 (Posada, 2008;Darriba et al., 2012) to select the most appropriate nucleotide substitution model and parameter settings for Bayesian analyses based on Bayesian Information Criterion (BIC) (the best-fit model chosen was TVM+I+G). To avoid the potential heterogeneity within whole plastome sequences resulting in un-reliable phylogenetic reconstruction (Arbiza et al., 2011;Zhong et al., 2011; and Huelsenbeck, 2003;Ronquist et al., 2012) for the Bayesian inference analyses. MrBayes was run for 1,000,000 generations, sampling and printing every 100 generations. We conducted two independent Markov Chain Monte Carlo (MCMC) runs with four chains (one cold and three hot). We estimated branch supports from 500 ML bootstrap values (BS) and from the posterior probabilities (PP) of Bayesian trees after a 50% "burn-in." FIGURE 1 | Gene map of the Orychophragmus chloroplast genomes. Genes shown outside of the map circle are transcribed clockwise, while those drawn inside are transcribed counterclockwise. Genes belonging to different functional groups were color-coded. The innermost darker gray corresponds to GC while the lighter gray corresponds to AT content of the chloroplast genomes.
Frontiers in Plant Science | www.frontiersin.org

Estimate of Divergence Time
We used the whole plastome data and the hotspot mutation region data to estimate divergence time. Due to the lack of fossil record, we used the average substitution rate 0.051952 ± 0.000537 × 10 −8 substitutions per site per year or two fixed calibrations (23.5 million years ago (Ma) between the Arabidopsis clade vs. the sister clade and 20.85 Ma between the Lobularia subclade vs. the sister subclade) estimated from the calibrated plastome phylogeny of Brassicaceae (Hohmann et al., 2015) to estimate the species divergence within the genus. The Bayesian dating analysis was performed with a relaxed clock approach using BEAST v.2.4.0 (Bouckaert et al., 2014). BEAST runs were conducted by choosing the general time-reversible model GTR+I+G and getting relative parameter settings (TVM+I+G) from JModeltest software. For each analysis, we ran 100,000,000 generations of MCMC run, sampling parameters every 10,000 generations, using a lognormal relaxed clock model (Drummond et al., 2006) under a Yule speciation tree prior with the substitution rate. The convergence of the MCMC searches and the effective samples size (ESS) of the posterior probability were in most cases >200, and always >150 for every estimated parameter were checked in Tracer v.1.6 (Rambaut et al., 2014). Two replicates were combined by removing 25% as burn-in using LogCombiner v.2.4.0 (Bouckaert et al., 2014). TreeAnnotator v.2.4.0 (Bouckaert et al., 2014) was used to produce maximum clade credibility trees (MCCT) from the post-burn-in trees and to determine the 95% posterior density of ages for all nodes in the tree by setting burning-in of 25% and a posterior probability limit of 0.  (Figure 1). Most genes occurred in a single copy, while 16 were duplicated on the IR regions, including 3 rRNA (4.5S, 16S, and 23S rRNA), 7 tRNA, and 6 PCG species (rpl2, rpl23, ycf 2, ndhB, rps7, and ycf 1). The rps12 gene was a unique trans-spliced gene with three exons. The rps19 gene was located in the boundary regions between LSC/IRb, and the ndhF gene was situated in the boundary regions between IRb/SSC. The gene ycf 1 was crossed at the junction of IRb/SSC and SSC/IRa, leading to incomplete duplication of the protein-coding gene within IRs (Figure 1). The overall GC contents of cpDNA ranged from 36.28 to 36.35% ( Table 1), suggesting that the AT-rich contents of this genus are similar to other Brassicaceae plastid genomes sequenced so far (Hu S. L. et al., 2015). In general, the genome features of six species were found to be quite similar in gene content, gene order, introns, intergenic spacers, and AT content. The overall sequences identity of 16 plastomes was visualized using the mVISTA tool (Mayor et al., 2000) based on the annotation of one of them (O. taibaiensis) as a reference with LAGAN mode. The sequences identity was 98% between all plastomes. Moderate genetic divergences were detected, and the most divergent regions were located in the intergenic spacers while nine divergence hotspot regions were identified (Figure 2).

Phylogenetic Analyses and Divergence Estimations Based on Plastome Sequence Variations
Phylogenetic trees were reconstructed by RAxML and Mrbayse softwares, rooted by the outgroup Carica papaya. The ML tree was congruent with the Bayesian consensus tree in the phylogenetic topologies based on the whole plastome data, although statistical supports (BP and PP) were different in some clades or subclades (Figure 3). All posterior probabilities (PP) were higher than bootstrap supports (BP). All sampled individuals of each species were found to cluster together as one monophyletic lineage, although the BP support of Orychophragmus longisiliqus was lower than 90%, but larger than 50%.  Individual number is given after original species name. The right-hand tree was cited from our previous study (Hu H. et al., 2015).
analyses of the hotspot mutation regions produced the similar tree topologies, but relationships between species or subclades of the genus Orychophragmus remained unsolved ( Figure S1). We further estimated divergence time of the genus and among its species using the general plastome substitution rate (Figure 4) and two secondary calibration points ( Figure S2). Divergence times estimated by substitution rate the major nodes were general older than times from the secondary calibration points. For instance, the divergence between Orychophragmus and sister genus Sinalliaria was dated to about 13.92 million years ago (Mya) using the average rate, while this divergence was four million years younger by using two calibration points ( Figure S2 and Table S3). In the similar way, the first divergence between the two major clades of Orychophragmus was calculated to have occurred around 5.91 or 3.72, while the divergence between the subclades and species ranged from 0.96∼0.59 to 5.16∼3.25 Mya (Figure 4 and Figure S2; Table S3). As expected, all divergence times of three major nodes (Figure 4) estimated based only on the hotspot mutation regions and the average substation rate were much older (Table S3) than those based on the whole plastome dataset because of the more accumulated divergence. However, when two assumed diverged points were adopted, the divergences of the major nodes were estimated to be similar to those based on the whole plastomes (Table S3).

DISCUSSION
The plastomes of land plants are known to be highly conserved in genome structure, gene order, and gene content with a quardripartite structure and two copies of large inverted repeat (IR) separating two regions (large single-copy region LSC and small single-copy region SSC) (Raubeson and Jansen, 2005;Jansen and Ruhlman, 2012). Most plastomes are circular and ranged from 120 to 160 kb in length, with 110 to 130 different genes, including 70 (gymnosperms) to 88 (liverworts) protein-coding genes mostly involved in photosynthesis or gene expression, and 33 (most eudicots) to 35 (liverworts) structure RNA genes (Raubeson and Jansen, 2005;Bock, 2007;Wicke et al., 2011). The markers (or DNA fragments) based on sequence variations of plastomes have been widely used to examine population genetic structure, delimit species boundaries, and construct phylogeny due to their high-copy number (as many as 1000 per cell), easy amplification, and relative conservation of all targeted regions (Raubeson and Jansen, 2005;Wicke et al., 2011). In addition, sequence variations of total plastome was found to provide higher resolution in constructing plant phylogenies than few DNA fragments (Parks et al., 2009).
In this study, 17 plastomes for six species of Orychophragmus were assembled for the first time, and all have typical quadripartite structure, as in most angiosperms, including LSC, SSC, and a pair of IRa and IRb (Palmer, 1991). The overall sequence identity of all plastomes was high (around 98%), and the divergence between them, especially the insertionsdeletions (INDELS), commonly occurred in the intergenic spacers. The nine intergenic spacers of trnH-psbA, atpI-rps2, trnM-atpE, atpB-rbcL, ndhC-trnV, accD-psaI, petB-petD, rpl23-trnL, and ndhE-ndhG were identified as the divergence hotspots of sequence similarities below 50% (Figure 2), which would be FIGURE 4 | Divergence time estimates using the average substitution rate based on the whole chloroplast genome sequences. Divergence times of species based on uncorrelated relaxed clock method, using a substitution rate of 5.1952E-4 per base per million years calculated by BEAST program over the whole chloroplast genome sequences. The legend describes the divergence time in million years, and the gray boxes represent the 95% highest probability density of divergence times. In addition, three major nodes were used for divergence comparisons in the Table S3. useful if these regions were developed as potential molecular markers for identification of the different species units for the genus. However, phylogenetic analyses of these hotspot regions (4807 bp in length) failed to discern the interrelationships of the closely related species and subclades of the genus ( Figure S1) although this dataset could delimitate species units. Our previous study of Orychophragmus sequenced four cpDNA regions (matK, rbcL, trnH-psbA, and trnL-F), and the total sequence alignment was around 3060 bp long (Hu H. et al., 2015). However, that cpDNA dataset failed to clarify species boundaries and construct interspecific relationships, whereas the nuclear ITS sequence dataset did despite only around 640 bp in length (Hu H. et al., 2015). Based on total plastome sequences or the hotspot mutation regions, the present study successfully differentiated all sampled species units although fewer individuals from different populations of each assumed species were used (Figure 3 and Figure S1). The study clearly suggests that the total plastomes or the hotspot mutation regions accumulated more mutations than few cpDNAs and were by far quite useful in the delimitation of boundaries of closely related species, as did nuclear ITS region (Hu H. et al., 2015). However, the interspecific relationships recovered by the plastomes differs from that from the nuclear ITS sequence variations (Figure 3). For example, O. violaceus is sister to O. hupehensis on the plastome tree while O. violaceus is closely related to O. longisiliqus on the ITS tree. In fact, none of the interspecific relationships inferred from the ITS sequence variations were confirmed by the plastome dataset (Figure 3). Only the accumulated mutations along the total plastome can delimit species boundaries (Figure 3), whereas the much shorter ITS fragments most likely had enough mutations to delineate species (Hu H. et al., 2015). In addition, we found that the estimated divergences between Orychophragmus and Sinalliaria, subclades, and species in the plastome phylogeny were also lower than those inferred from the ITS dataset. For example, the divergence time between Orychophragmus and Sinalliaria was estimated to occur around 13.92∼9.16 Mya (Figure 4 and Figure S2), while that estimated on ITS sequence variations was about 20 Mya (Hu H. et al., 2015). The divergences within Orychophragmus were dated to between 5.9∼3.25 to 0.96∼0.59 Mya, in contrast to 7.7 to 2.7 Mya (Hu H. et al., 2015). This discrepancy was likely the result of different mutation rates between cpDNA and ITS. In addition, the estimations based on the hotspot mutation regions (Table S3) might be older than on the total plastomes when using the average substitution rate because of the accumulated divergences from these regions. It should be cautioned that all of the present estimates were conducted based on the average substitution rate or two secondary calibrations from the calibrated whole family plastome phylogeny (Hohmann et al., 2015). This second calibration might result in the unavoidable bias (Schenk, 2016). The generic assignment of the only relatively old fossil (Thlaspi primaevum) for the family and the age reliability are highly debated (Brenner, 1996;Beilstein et al., 2010;Franzke et al., 2016). All of these limitations restrict the direct and accurate calibration of the family and within-family lineages. However, the average plastome mutation rate (0.051952 ± 0.000537 × 10 −8 ) fell well within the range rate recorded for chloroplast DNAs (Wolfe et al., 1987(Wolfe et al., , 1989Koch et al., 2000Koch et al., , 2001. Therefore, the estimated divergences presented here can still serve as a rough temporal framework for understanding the evolutionary diversification of the genus Orychophragmus, although further refined calibrations and estimates are highly needed. Numerous studies (e.g., Ellstrand, 2014;Suh et al., 2015;Mallet et al., 2016;Novikova et al., 2016;Pease et al., 2016), reported inconsistent phylogenetic relationships within plants if constructed based on different genes or genomes, especially cpDNAs and ITS (Maddison, 1997;Morando et al., 2004;Arnold, 2006;Pollard et al., 2006). In most angiosperms, cpDNA has uniparental inheritance while the nuclear ITS is biparental (Hagemann, 2004;Petit et al., 2005;Wicke et al., 2011). Both incomplete lineage sorting and hybrid introgressions were suggested to explain such inconsistent relationships between ITSand cpDNA-based phylogenies (Arnold, 2006 It is also highly likely that incomplete lineage sorting occurred during the fast speciation of this genus that produced the current species, which resulted in the phylogenetic inconsistences between plastome and ITS datasets. During the fast radiative speciation of the genus Arabidopsis, ancestral polymorphisms at different loci were randomly fixed, and recent gene flow mediated the trans-specific introgressison of newly derived alleles (Novikova et al., 2016). Both incomplete lineage sorting and recent gene flow may have together resulted in the widespread inconsistences in gene trees and non-bifurcating speciation of Arabidopsis. Although, species divergences within Orychophragmus, as estimated on plastome or ITS (Hu H. et al., 2015), were older than those within Arabidopsis (Novikova et al., 2016), it is highly likely that non-bifurcating radiations with both incomplete lineage sorting and hybrid introgressions might have also occurred in the speciation history of Orychophragmus. Further studies based on nuclear genomic population data, as well as modeling tests, are needed to test the occurrences of both incomplete lineage sorting and trans-specific introgressions in Orychophragmus.

AUTHOR CONTRIBUTIONS
HH and JL conceived and designed this study. HH and TZ collected the samples. HH, XL, and XG extracted total genomic DNA. HH and QH analyzed the data and wrote the manuscript. IA and JL revised the manuscript.
Team (2014TD003), both of which we are profoundly grateful for.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpls.2016. 01826/full#supplementary-material Figure S1 | Maximum likelihood tree based on analyses of nine hotspot intergenic regions for 17 Orychophragmus individuals. The ML tree is topologically congruent with the Bayesian consensus tree. Statistical support from maximum likelihood with different values were shown as different line forms. The different taxa of Orychophragmus and Sinalliaria are marked by different colors. Figure S2 | Divergence time estimates using two secondary points based on the whole plastid genome sequences. Divergence times of species based on uncorrelated relaxed clock method, using two calibrations (23.5 million years ago (Ma) between the Arabidopsis clade vs. the sister clade and 20.85 Ma between the Lobularia subclade vs. the sister subclade) calculated by BEAST program over the whole chloroplast genome sequences. The legend describes the divergence time in million years, and the gray boxes represent the 95% highest probability density of divergence times.
Table S1 | List of samples used in this chloroplast phylogenomic analyses. Table S2 | The best-fit model of each plastome section chosen by Jmodeltest using Bayesian Information Criterion (BIC) and Akaike Information Criterion (AIC). Table S3 | Relaxed clock age estimates obtained with BEAST for the nodes of interest shown and numbered in Figure 4. Ma, million years ago.