Comparative Genomic Analysis Reveals the Mechanism Driving the Diversification of Plastomic Structure in Taxaceae Species

Inverted repeat (IR) regions in the plastomes from land plants induce homologous recombination, generating isomeric plastomes. While the plastomes of Taxaceae species often lose one of the IR regions, considerable isomeric plastomes were created in Taxaceae species with a hitherto unclarified mechanism. To investigate the detailed mechanism underpinning the IR-independent genesis of plastomic diversity, we sequenced four Taxaceae plastomes, including Taxus cuspidata Siebold & Zuccarini, Taxus fauna Nan Li & R. R. Mill, and two individuals of Taxus wallichiana Zuccarini. Then we compared these structures with those of previously reported Taxaceae plastomes. Our analysis identified four distinct plastome forms that originated from the rearrangements of two IR-flanking inverted fragments. The presence of isomeric plastomes was then verified in T. cuspidata individuals. Both rearrangement analyses and phylogenetic results indicated that Taxaceae were separated into two clades, one including Taxus and Pseudotaxus and another formed by Amentotaxus and Torreya. Our reconstructed scenario suggests that the minimum number of inversion events required for the transformation of the plastome of Cephalotaxus oliveri Masters into the diversified Taxaceae plastomes ranged from three to six. To sum up, our study reveals a distinct pattern and the mechanism driving the structural diversification of Taxaceae plastomes, which will advance our understanding of the maintenance of plastomic diversity and complexity in conifers.


INTRODUCTION
Chloroplast (cp) is the organelle responsible for photosynthesis and providing energy for plants and photosynthetic algae (Dyall et al., 2004). Each chloroplast has its own genome (plastome) with a typical circular organization (Palmer, 1991). With the development of next-generation sequencing (NGS) and other methods for obtaining the plastomic sequences, the availability of plastome sequences has increased dramatically for land plants, offering opportunities for the comprehensive comparison and dissection of plastomic structure and variation (Du et al., 2015).
The plastomes of most land plants and algae are composed of four parts, namely, two copies of large inverted repeats (IRs) that contain four ribosomal RNA genes (rrn16, rrn23, rrn4.5, and rrn5), a large single copy (LSC) and a small single copy (SSC) region (Wang et al., 2008;Wicke et al., 2011;Davis et al., 2014). However, not all gymnosperms follow this rule. Gymnosperms are generally categorized into five groups, namely, cycads, ginkgo, gnetophytes, Pinaceae (conifers I), and cupressophytes (conifers II) (Chaw et al., 1997;Rai et al., 2008). While the plastomes of cycads, ginkgo, and gnetophytes are known to be quadripartite, recent comparative analyses of conifer plastomes have revealed that Pinaceae and cupressophyte species possess reduced IR copies. Comparative analysis have demonstrated that the Pinaceae have lost inverted repeat region B (IRB) and the cupressophytes have lost inverted repeat region A (IRA), suggesting the IR loss is homoplasious rather than synapomorphic (Wu et al., 2011;Wu and Chaw, 2014).
So far, two mechanisms have been proposed to explain the genesis of isomeric plastomes. For plants wherein the two IRs are present, the IRs can trigger homologous recombination, leading to the coexistence of two isomeric plastomes within individual plants (Palmer, 1983). However, conifers wherein only one IR is present have evolved short IRs that are associated with inversions in their plastomes . In Pinaceae, a 40-to 50-kb inversion that can be used to distinguish Pseudotsuga species from that of Pinus has been found (Tsai and Strauss, 1989) and a 42-kb inversion was detected in Abies and Tsuga species (Tsumura et al., 2000). In cupressophytes, recombination of plastomes always occurs with a 34-to 36-kb inversion containing trnQ-UUG, as evidenced in Cephalotaxus (Yi et al., 2013) and Juniperus species . Although inversion has been shown to contribute to the genesis of isomeric plastomes in plants, the underlying mechanism remains not well studied (Tsumura et al., 2000;Hirao et al., 2008;Wu et al., 2011;Hsu et al., 2014;Wu and Chaw, 2014). Therefore, it is of scientific interest to elucidate the mechanism driving the diversification of plastomic organization in plants with reduced IR copies.
Taxaceae, the smallest family of conifers, belongs to the conifer II group. It consists of four genera, namely, Amentotaxus, Pseudotaxus, Taxus, and Torreya (Fu et al., 1999;Cheng et al., 2000). Some Taxaceae species have been commercially exploited for the extraction of the anticancer chemotherapeutic drug Taxol since 1990s (e.g., Poudel et al., 2013). Up to now, most of the studies concerning Taxaceae plants focused on the improvement of Taxol production (Wang et al., 2001;Li et al., 2017). For plastomic studies, genetic analyses using plastome fragments have been conducted to reconstruct the phylogeny of the Taxaceae family (e.g., Ran et al., 2010), to develop barcodes to differentiate different species (Gao et al., 2017;Liu et al., 2018), and to explore the demographic history of certain species or species complexes (Ge et al., 2015;Su et al., 2018). Recently, several groups released a number of Taxaceae plastomes Tao et al., 2016;Jia and Liu, 2017), and both intraspecific and interspecific isomeric arrangements have been reported (Hsu et al., 2014;Fu et al., 2019). However, the relative location and genomic organization of the inverted f r a g m e n t s i n T a x a c e a e p l a s t o m e s h a v e n o t b e e n comprehensively studied yet.
In this study, we firstly sequenced and analyzed four newly obtained Taxus plastomes from Taxus cuspidata Siebold & Zuccarini, Taxus fauna Nan Li & R. R. Mill, and two individuals of Taxus wallichiana Zuccarini. Next, we compared them with 11 other published Taxaceae plastomes to explore the evolutionary pattern underlying the structural diversification of Taxaceae plastomes. The information provided here will significantly advance our understanding of the functional significance of inversion and the nature of Taxaceae plastomes.

Sampling, DNA Extraction, and Processing
Fresh leaves were sampled from four individuals of three Taxus species including T. cuspidata, T. fauna, T. wallichiana XI3, and T. wallichiana XU10 (Supplementary Table S1). The vouchers of these species were deposited in Beijing Forestry University, Beijing, China. The chloroplast DNA was extracted and processed following the protocol developed by our group (Du et al., 2015). Next, 5 µg of the purified rolling circle amplification (RCA) product of chloroplast DNA from individual plants was used for library preparation.
Sequences were downloaded from the National Center for Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih. gov/). Dot-plot analysis between the plastomes of Taxaceae species and that of C. oliveri was conducted using the Blast program (https://blast.ncbi.nlm.nih.gov). In order to better understand the structure of Taxaceae plastomes, locally co-linear blocks (LCBs) among the 16 plastomes were identified using Mauve v2.4.0 (Darling et al., 2010). For this analysis, the start point of each genome was manually set to the start codon of psbA. The REPuter software (https://bibiserv.cebitec.uni-bielefeld.de/reputer/) was used to locate inverted repeats which flanked the inverted fragment with a minimum repeat size of 8 bp and sequence identity greater than 80% (Kurtz et al., 2001). The minimum number of inversion events required for the gene order transformations was estimated by GRIMM (Tesler, 2002; http:// grimm.ucsd.edu/cgi-bin/grimm.cgi).

Phylogenetic Analyses
To determine the evolutionary relationship of Taxaceae, we performed phylogenetic analyses using 73 protein-coding genes shared by all plastomes. After alignment by MAFFT v7 (Katoh and Standley, 2013), the third codon sites were deleted by DAMBE (Xia and Xie, 2001). We found model "GTR+I+G" is the fittest model for phylogenetic construction by jModelTest (Posada, 2008). Finally, Bayesian inference (BI) and maximum parsimony (MP) phylogenomic tree of 16 plastome sequences based on the protein-coding genes without the third codon were constructed. BI phylogenetic analyses were performed in MrBayes v3.2.3 (Ronquist and Huelsenbeck, 2003). The Markov chain Monte Carlo (MCMC) algorithm was run for 1,000,000 generations with trees sampled every 500 generations. MP phylogenetic analyses were performed in PAUP v4 (Swofford, 2003) using 1,000 bootstrap replicates.

Detection of Isomeric Plastomes
We investigated the presence of isomeric plastomes in three Taxus species (Supplementary Table S3). Voucher specimens of these samples were deposited in Beijing Forestry University, Beijing, China. Primer pairs listed in Supplementary Table S4 were used to amplify DNA fragments specific to the four isomeric plastomes from Taxaceae species. Each of a 25-µl PCR reaction mixture contained 1.5 µl of total genomic DNA as the template, 0.75 µl of each of the primers (10 µmol/L), 12.5 µl of 2× PCR buffer for KOD FX, 5 µl of 2 mM dNTPs, 0.5 µl of KOD FX, and 4 µl of double-distilled water. The PCR conditions were 94°C for 2 min, followed by 35 cycles of 98°C for 10 s, 56°C for 30 s, and 68°C for 1 min.

Diversification in the Structure of Taxaceae Plastomes
In this study, we found that the rpl23-rps3 cluster (cyan solid boxes in Figure 1) was adjacent to the unique rRNA operon (rrn16, rrn23, rrn4.5, and rrn5). Through a careful check of the conserved gene order as suggested by Wu et al. (2011), we further determined that the lost IR copy was likely to be IRA, rather than IRB. Our dot-plot analyses indicated that there were two inverted fragments. Fragment of infA-rps12 was approximately 18 kb in length and hereafter termed as R1. Fragment of trnQ-IR was approximately 34 kb in length and resulted from the full duplication of trnQ-UUG gene, so it was hereafter termed as R2. A relocated fragment of petN-psbM in R2 separated R2 into two parts, namely, psbK to trnC-GCA (~18 kb) and trnD-GUC to trnT-UGU (~16 kb) (Figures 2 and 3 and Supplementary  Figure 1).
We categorized the Taxaceae plastomes into four forms, namely, A, B, C, and D, based on the relative location and orientation of R1 and R2 (Figure 2 and Supplementary  Figure 1). Plastome assembly form A is considered as the reference hereafter. Form B plastomes carried an inverted R2, resulting in increased adjacencies between chlB and rps4, psbK and trnL and reduced adjacencies between chlB and psbK, rps4 and trnL (Figure 4). Form C plastomes differed from those of form A in the arrangement of R1 and petN-psbM and exhibited increased adjacencies between clpP and rps12, infA and rps8, trnD and psbM, and petN and trnC and reduced adjacencies between clpP and infA, rps12 and rps8, trnC and psbM, and petN and trnD (Figure 4). Form D plastomes had R1 and R2 arrangements distinct from those of the other three forms.
To better understand the plastome organization of Taxaceae species, we compared the plastomes of 15 Taxaceae species or varieties with that of C. oliveri and determined 11 LCBs (Figure 3). We found the form A plastomes were characterized by +R1 and +R2 (+ denotes the forward strand and − denotes the reverse strand); for example, in Pseudotaxus chienii W. C. Cheng and six other Taxus species or varieties, namely, T. cuspidata, T. fauna, T. mairei, T. wallichiana var. chinensis, T. wallichiana XI3, and T. wallichiana XU10. The form B plastomes, characterized by +R1 and −R2, were found in four Taxus species, namely, T. baccata, T. mairei NN014, T. mairei SNJ046, and T. mairei WC052. The form C plastomes, characterized by −R1 and +R2, were found in Amentotaxus argotaenia Pilger and Amentotaxus formosana H. L. Li. The form D plastomes were characterized by −R1 and −R2 and were found in Torreya fargesii Franchet and Torreya grandis Fortune. Among the 15 Taxaceae plastomes, we observed that the organization of T. mairei plastome was of form A and the Frontiers in Genetics | www.frontiersin.org January 2020 | Volume 10 | Article 1295 FIGURE 2 | Dot-plot analyses of the four Taxaceae plastomes. The Cephalotaxus oliveri plastome (horizontal axes) was used as the reference. A positive slope denotes that the plastomic sequences of the two species can be aligned in the same orientation, whereas a negative one indicates that the two sequences are in the opposite orientations. Labeling of the genes is based on their corresponding positions in the C. oliveri plastome. The red line represents the region between infA and rps12 and the blue line represents the region between trnQ-IRs. These two regions were termed R1 and R2, respectively. The form (A) plastomes were characterized by +R1 and +R2, the form (B) plastomes were characterized by +R1 and −R2, the form (C) plastomes were characterized by −R1 and +R2 and the form (D) plastomes were characterized by −R1 and −R2. Plastome form of Taxus cuspidata that showed in the rectangle with dashed lines represents the structure discovered by PCR. Plastome form of T. cuspidata that showed without the rectangle is based on the plastome sequence assembled from next-generation sequencing data.
plastomes from three Taiwan T. mairei individuals (NN014, SNJ046, and WC052) were of form B, indicating that plastomes of these two forms are present in T. mairei (Figure 3).

Experimental Validation of the Four Arrangement Forms
We next performed PCR to verify the presence of these arrangement forms in plants of Taxaceae species (Figure 4). We found that 20 T. cuspidata individuals from 20 natural populations harbored two isomeric plastomes, namely, forms A and B, but not forms C and D (Figure 3, Supplementary  Figure 2, and Supplementary Table S5). However, we did not find isomeric plastomes in T. fauna and T. wallichiana.

Evolutionary Path of Plastome IR Formation in Taxaceae
We detected several pairs of 9-to 12-bp inverted short repeats that specifically flanked the region of R1 in Taxaceae species, and these repeats were similar to those in species within the same genus ( Figure 5). However, we failed to detect any inverted short repeats larger than 9 bp within the petN-psbM fragment. The inverted fragment R2 was flanked by the trnQ-IR that resulted from the full duplication of the trnQ-UUG gene of which the length ranged from 114 to 560 bp in the 15 Taxaceae plastomes ( Table 2). As aforementioned, we identified 11 LCBs among the plastomes of the 15 Taxaceae individuals (Figure 3). The arrangements of these LCBs and that of C. oliveri were used to infer the most parsimonious rearrangement scenario that will provide information for the dissection of plastomic inversion history in Taxaceae species. The minimum number of inversion events required for the transformation from the C. oliveri plastome to the plastomes of forms A, B, C, and D was five, six, three, and four, respectively ( Figure 6). As for the first transformation, the inversion of block 4 (clpP) took place, and then the inversion of block 8 (petN-psbM) occurred in one clade, generating the Taxus and Pseudotaxus plastomes. The other clade underwent the inversion of block 1 (psbA) and became the Amentotaxus and Torreya plastomes.
A total of 41,596 aligned sites, including 5,357 variable and 2,849 parsimony-informative sites, were used to reconstruct the phylogenetic tree. MP and BI results both suggest that Taxaceae are separated into two clades: one including Taxus and Pseudotaxus and another formed by Amentotaxus and Torreya at the basal position (Supplementary Figure 3). These phylogenetic relationships are consistent with our GRIMM analysis results which also recovered these two clades (the A/B and C/D plastome forms) ( Figure 6). Additionally, species from the same genus were placed on a single clade. However, neither T. wallichiana nor T. mairei appear to be monophyletic species in our tree (Supplementary Figure 3).

DISCUSSION
Issues related to the IR region have been a research focus for several decades (Kolodner and Tewari, 1979;Zhu et al., 2016). In recent years, the rapidly increasing genomic resources of plants in the five gymnosperm groups have revealed considerable variations in plastomic structure (Raubeson and Jansen, 1992;Wu et al., 2009;Lin et al., 2012). The presence of only one IR copy in Pinaceae and Cupressophytes species was considered as a homologous character (Raubeson and Jansen, 1992). However, recent comparative analyses of conifer plastomes uncovered distinct evolutionary patterns in Pinaceae and cupressophytes plastomes, namely, the former lost IRB and the latter lost IRA (Wu et al., 2011;Hao et al., 2016). Due to the presence of isomeric plastomes, it has been difficult to determine which IR copy was lost (Yi et al., 2013). In this study, using a comparative plastome profiling, we revealed that the plastomes of all the 12 species from four different genera of Taxaceae contained an rpl23-rps3 cluster that is located downstream of the IRB, thus demonstrating that Taxaceae plastomes lack IRA (Figure 1).
IRs trigger homologous recombination and the coexistence of two isomeric plastomes has been found at the infraspecific level in numerous species (Palmer, 1983). Despite the absence of IRs, conifers contain isomeric plastomes that are produced by inversions, and these inversions are mediated by specific short IRs . In our study, we revealed four new forms of organization in Taxaceae plastomes, which have possibly been generated by the rearrangements of two large inverted fragments (R1: infA-rps12 and R2: trnQ-IR) that are flanked by specific short repeats ( Figure 5 and Table 2). We experimentally verified the presence of the isomeric arrangements of R2 inversion in T. cuspidata. However, we failed to identify isoforms in T. fauna and T. wallichiana. Diversified plastomic structure has also been found to be ubiquitous in other conifers. Four distinct types, generated by two inverted fragments (F1: trnR-UCU to trnE-UUC; F2: rps4 to trnG-GCC), have been uncovered in Pinaceae, and these two inverted fragments are flanked by Pinaceae-specific repeats (Tsumura et al., 2000;Wu et al., 2011). In cupressophytes, several types of inversion have been discovered. For example, a 36-kb inversion mediated by the 216-to 552-bp trnQ-IR has been found in the isomeric plastomes of Cephalotaxus (Yi et al., 2013), Juniperus , Pseudotaxus, and Taxus species (Fu et al., 2019). Besides, a 73-kb inversion in the isomeric plastomes of Sciadopitys species is mediated by a 370-bp IR (Hsu et al., 2016), while the isomeric plastomes of Calocedrus species are found to harbor a 34-kb rpl23-trnL/CAA inversion mediated by an 11-bp IR (Qu et al., 2017). Dealing with the rearrangements of two specific inverted fragments, our study provides novel insights into the structural diversification of Taxaceae plastomes. Phylogeny of the Taxaceae genera has been controversial for many years (Lu et al., 2014;Wu and Chaw, 2016;Zhang et al., 2019). In this study, 73 shared protein-coding genes were used to determine the relationships among the four genera. We successfully obtained a high resolution of evolutionary relationships for the four genera and placed them on two clades of Taxaceae. Both BI and MP trees showed Amentotaxus and Torreya to branch just after the outgroup (Supplementary Figure 3). Our results are consistent with the phylogenic tree based on the rps3 gene (Ran et al., 2010) and transcriptomic data (Ran et al., 2018). Our rearrangement analyses by GRIMM provided the most parsimonious evolutionary scenario, which is consistent with the phylogenic tree ( Figure 6).
It should be noted that several issues remain to be addressed. For example, the identification of the diversified plastomic structures here was based on the published Taxaceae plastomic sequences. Whether these diversified structures exist in other uncharacterized Taxaceae species needs further investigation. Besides, the presence of isomeric plastomes in Taxaceae species was only verified within only a few individuals. At the population level, however, their distribution remains unknown. Further exploration, such as determining whether there is a bias in the structure of plastome at the population level, will surely provide more information regarding the evolution of plastomes.

CONCLUSIONS
Using next-generation sequencing and de novo assembly, we obtained the complete plastomes of four individuals from three Taxus species and profiled their structures along with 11 published Taxaceae plastomes. Our analyses revealed that in Taxaceae plastomes, the rearrangements of two large fragments could lead to the genesis of the four distinct plastome forms. Further experiments verified the presence of two isomeric plastomes in T. cuspidata individuals. Based on these findings, both rearrangement analyses and phylogenetic results indicated that Taxaceae were separated into two clades, one including Taxus and Pseudotaxus and another formed by Amentotaxus and Torreya. These findings highlight that the evolution of plastomes FIGURE 6 | The putative transformation from the plastome of Cephalotaxus oliveri to the four diversified Taxaceae plastomes. Eleven locally collinear blocks (LCBs) were denoted using different colors. Gene orientations were shown by placing them inside or outside of the circles. The minimum number of inversion events required for the transformation from the plastome of C. oliveri to the Taxaceae plastomes was calculated using GRIMM. may be more complicated than previously thought. The information provided here will significantly advance our understanding of the dynamic and complex evolution of plastomes in conifers.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in NCBI SRA BioProject IDs PRJNA554197, PRJNA554072, PRJNA554062 and PRJNA553347.

AUTHOR CONTRIBUTIONS
FD designed the study. YZ, YX, and HC performed the analysis. FD, YZ, KY and LW wrote the manuscript. All the authors revised the manuscript.

ACKNOWLEDGMENTS
We thank suggestions from Dr. Saneyoshi Ueno working in Forest Research and Forest Products Research Institute, Japan, for comments on the ms; Prof. Yanhong Liu, Dr. Wande Liu, Prof. Deyou Qiu, Prof. Zhongling Guo, Prof. Gang Pan, Dr. Bin Tian, and Jia Song for sampling; and Dr. Jinhua Ran for discussion of the phylogeny of Taxaceae.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2019. 01295/full#supplementary-material SUPPLEMENTARY FIGURE S1 | Comparative dot-plot analyses between the plastome of Cephalotaxus oliveri and the four Taxaceae plastomes. A positive slope denotes that the two sequences (horizontal and vertical axes) are matched in the same orientation, whereas a negative one indicates that the two sequences can be aligned but in the opposite orientations. Genes are labeled based on their corresponding positions in the C. oliveri plastomes.
SUPPLEMENTARY FIGURE S2 | The PCR results of the verification of isomeric plastomes in 20 individuals of T. cusipidata. The codes in the picture stand for different combination of individuals and primer pairs can be found in Table S5.
SUPPLEMENTARY FIGURE S3 | Phylogenetic relationships of the Taxaceae plastomes. The trees inferred from Bayesian inference (BI) and Maximum Parsimony (MP) are based on a nucleotide supermatrix of 73 shared protein-coding genes excluding the third codon positions. The BI topology is shown with BI posterior probability/MP bootstrap values given at each node.