The Complete Plastome Sequences of Seven Species in Gentiana sect. Kudoa (Gentianaceae): Insights Into Plastid Gene Loss and Molecular Evolution

The chloroplast (cp) genome is useful in the study of phylogenomics, molecular dating, and molecular evolution. Gentiana sect. Kudoa is a predominantly alpine flowering plant that is valued for its contributions to medicine, ecology, and horticulture. Previous evolutionary studies showed that the plastid gene loss pattern and intra-sectional phylogenetics in sect. Kudoa are still unclear. In this study, we compared 11 Gentiana plastomes, including 7 newly sequenced plastomes from sect. Kudoa, to represent its three serious: ser. Ornatae, ser. Verticillatae, and ser. Monanthae. The cp genome sizes of the seven species ranged from 137,278 to 147,156 bp. The plastome size variation mainly occurred in the small single-copy and long single-copy regions rather than the inverted repeat regions. Compared with sect. Cruciata, the plastomes in ser. Ornatae and ser. Verticillatae had lost approximately 11 kb of sequences containing 11 ndh genes. Conversely, far fewer losses were observed in ser. Monanthae. The phylogenetic tree revealed that sect. Kudoa was not monophyletic and that ser. Monanthae was more closely related to other sections rather than sect. Kudoa. The molecular dating analysis indicated that ser. Monanthae and sect. Kudoa diverged around 8.23 Ma. In ser. Ornatae and ser. Verticillatae, the divergence occurred at around 0.07–1.78 Ma. The nucleotide diversity analysis indicated that the intergenic regions trnH-psbA, trnK-trnQ, ycf3-trnS and rpl32-trnL constituted divergence hotspots in both sect. Kudoa and Gentiana, and would be useful for future phylogenetic and population genetic studies.


INTRODUCTION
Encompassing 15 sections and 362 species (Ho and Liu, 2001), Gentiana is the largest genus in the family Gentianaceae (He, 1988). Gentiana is predominantly alpine and occurs in numerous mountain systems of the world. Hosting c. 250 species (Ho and Pringle, 1995), the mountain ranges surrounding the Qinghai-Tibetan Plateau (QTP) are the main diversity center of Gentiana. Gentiana has been widely used in traditional Chinese and Tibetan medicines and acts as an edificator in the QTP alpine meadows (Ho and Liu, 2001). In light of their chemical and horticultural value, several species have already been cultured (Rybczyński et al., 2015). Section Kudoa (Masamune) Satake & Toyokuni ex Toyokuni is characterized by roots arising from a collar and stems branching monopodially. This section contains three series and 28 species (Ho and Liu, 2001). Series Ornatae Marquand is the biggest series in sect. Kudoa and contains 16 species. The species in ser. Ornatae have showy flowers, and several of them have been domesticated for horticultural gardening (Rybczyński et al., 2015). ser. Monanthae (H. Smith) T. H. Ho and Series Verticillatae Marquand contain four and eight species, respectively (Ho and Liu, 2001).
A comparative analysis of the chloroplast genomes between sect. Kudoa and sect. Cruciata Gaudin indicated a loss of an approximately 10 kb sequence that mainly comprised 11 ndh genes in Gentiana lawrencei var. farreri, which belongs to ser. Ornatae . Variable ndh gene loss has been reported in other plant groups such as orchid (Barrett and Davis, 2012;Yang et al., 2013;Lin et al., 2015;Niu et al., 2017). To further assess whether the ndh gene has been lost in sect. Kudoa, additional taxa should be sequenced and included in the comparative analysis. In addition, the phylogenetic relationships in sect. Kudoa are currently controversial. In the latest classification system proposed by Ho and Liu (2001), sect. Kudoa contained three series. A phylogenetic study of the subtribe Gentianinae based on the intergenic spacer (ITS) region and a plastid fragment included only four taxa from sect. Kudoa, which clustered into a monophyletic group (Favre et al., 2010). In another phylogenetic study on Gentiana based on ITS and two plastid fragments, 18 taxa from sect. Kudoa were included and grouped into three clades (Favre et al., 2016), indicating that sect. Kudoa was not monophyletic, and that ser. Verticillatae was embedded in ser. Ornatae. The molecular phylogenetic relationship is not consistent with the classification system proposed by Ho and Liu (2001). Whether the plastid ndh gene loss pattern in sect. Kudoa is helpful to the intra-sectional phylogenetics is worth exploring.
The cp genomes are circular DNA molecules in angiosperms that range in size from 120 to 160 kb and contain 110-130 genes (Palmer, 1985). In land plants, the cp genome typically contains a pair of inverted repeats (IRs) that separate the remaining regions into one large single-copy region (LSC) and one small singlecopy region (SSC) (Palmer, 1985;Jansen et al., 2005). The cp genome is recognized as the "workhorse" in plant systematics research due to its uniparental inheritance, haploid nature, highly conserved structures, and slower evolutionary rate of change compared to nuclear genomes (Wolfe et al., 1987;Shaw et al., 2014). Plastid phylogenomics has been widely applied to reassess classifications, for example, the reassessment of Alismatales (Ross et al., 2015), Rosaceae (Zhang S.D. et al., 2017), Gaultheria series Trichophyllae (Zhang M.Y. et al., 2017), and Leptaspis and Streptochaeta in Poaceae (Burke et al., 2016). In addition to phylogenetic classification, the cp genome is widely used in studies of molecular identification, divergence dating, and molecular evolution (Nikiforova et al., 2013;Carbonell-Caballero et al., 2015). Comparative cp genome analysis can reveal insights into the evolution of the cp genome, including sequence inversion (Cho et al., 2015) and gene loss (Wakasugi et al., 1994;Millen et al., 2001), and has been used in the identification of mutational hotspots for the screening of the most informative regions (e.g., Ahmed et al., 2013;Niu et al., 2017).
Presently, only four complete cp genomes have been sequenced in Gentiana, in which three belong to sect. Cruciata (Ni et al., 2016a,b) and one belongs to sect. Kudoa . The development of more genomic resources for Gentiana should inform our understanding of the phylogenetic relationships and evolutionary history of this large genus. In this study, we focused on sect. Kudoa and sequenced the complete cp genomes of seven species in this section. Based on a comparative analysis of these seven species, as well as four species in sect. Cruciata and sect. Kudoa with available genomes, the genome structure, gene loss, phylogenetic relationships, divergence times and mutational hotspots of sect. Kudoa were analyzed to discover (1) gene loss pattern, particularly ndh, in sect. Kudoa, and (2) plastome phylogenetic implication in sect. Kudoa. This study also makes available sequence information for phylogenetic and evolutionary studies of Gentiana.

Sample Collection, Genome Sequencing, and Assembly
A total of seven species were sampled in the QTP (Supplementary  Table S1) to represent all three series of sect. Kudoa. Five species (G. veitchiorum Hemsley, G. ornata Grisebach, G. caelestis H. Smith, G. obconica T. N. Ho, and G. oreodoxa H. Smith) belong to ser. Ornatae, one (G. stipitata Edgeworth) belongs to ser. Monanthae, and one (G. hexaphylla Maximowicz ex Kusnezow) belongs to ser. Verticillatae. The species were identified by Dr. Peng-Cheng Fu and Dr. Shi-Long Chen. Voucher specimens were deposited in the herbarium of the College of Life Science, Luoyang Normal University. The samples were collected from a single plant of each species. Total genomic DNA isolation, DNA fragmentation, and sequencing library construction followed the process described in Fu et al. (2016). Based on the genome size of some Gentiana taxa (Mishiba et al., 2009) and reported examples of sequenced cp genomes , we expected to obtain approximately 5 Gb raw data for each species. The fragmented genomic DNA of the seven Gentiana species was sequenced using the Illumina HiSeq 4000 platform (Novogene, Tianjing, China), yielding 150-bp paired-end reads from a library of approximately 300-bp DNA fragments.
Reads corresponding to plastid DNA were identified using a BLASTN (E-value: 10 −6 ) search against the plastome sequences of G. lawrencei var. farreri (GenBank accession no. KX096882). The recovered reads were assembled using Velvet 1.2.10 (Zerbino and Birney, 2008). Detailed information regarding the raw reads for each taxon is presented in Supplementary Table S2. All the genomic regions located at the junction between the two assembled contigs were verified by Sanger sequencing. The primers used were designed by PRIMER V. 5.0 software and are listed in Supplementary Table S3. The plastome sequences of the seven species were deposited in GenBank (MG192304-MG192310).

Genome Annotation
For each species, the protein coding genes (PCGs), rRNAs, and tRNAs in the cp genome were predicted and annotated using Dual Organellar GenoMe Annotator (DOGMA) using the default parameters (Wyman et al., 2004). The positions of the start and stop codons, or intron/exon junctions of the PCGs, were manually corrected using a BLAST search against reported cp genomes of other closely related species. The cp gene maps of the seven species were drawn using OGDraw V. 1.2 (Lohse et al., 2007).

Comparative Analysis
In addition to the seven newly sequenced species, the cp genome sequences of G. lawrencei var. farreri (KX096882), which belongs to ser. Ornata, and G. straminea (KJ657732), G. robusta (KT159969), and G. crassicaulis (KJ676538), which belong to sect. Cruciata, were obtained for comparative analysis from the National Center for Biotechnology Information. Genome comparisons were performed to identify the differences among the 11 taxa using mVISTA (Frazer et al., 2004) and Geneious Basic 5.6.4 (Kearse et al., 2012). To identify divergence hotspots, nucleotide diversity (Pi) was determined using DnaSP V. 5.10 (Librado and Rozas, 2009).

Molecular Dating
The PCG dataset was used to estimate divergence times using the Bayesian method implemented in the program BEAST 1.7.5 (Drummond et al., 2012) under the GTR+I+G substitution model, the Yule model, and an uncorrelated lognormal clock model (Drummond et al., 2006). Due to the limited fossils available for Gentiana, we constrained only one of the nodes with a seed fossil of sect. Cruciata. For the seed fossil, we used lognormal priors with an offset at 5.0 Ma, a mean of 0.7, and a standard deviation of 1.0, as applied by Pirie et al. (2015) and Favre et al. (2016). We did not use uniform priors for sect. Cruciata, as they were rejected following comparison with the lognormal priors provided by Favre et al. (2016). We ran three independent Markov chain Monte Carlo analyses for 50 million generations, sampling every 5,000th generation. We assessed the convergence of the estimated parameters in Tracer 1.5 (Rambaut and Drummond, 2010), ensuring that the effective sample size values exceeded 200. Trees were summarized in TreeAnnotator 1.7.5 (Drummond et al., 2012) after setting the burn-in to 10%, and then visualized in FigTree 1.4.0 1 .

Features of the Seven Newly Sequenced Plastomes
Complete plastome sequences of seven Gentiana species were newly sequenced in this study and deposited in GenBank. The seven plastid genomes constituted closed circular molecules whose sizes ranged from 137,278 to 147,156 bp with an average of 138,822 bp ( Table 1). Each cp genome comprised a pair of IR regions (IRa and IRb), one LSC region, and one SSC region. They all possessed the overall typical quadripartite structure that resembles the majority of land plant cp genomes (Shinozaki et al., 1986). The IR regions of the 7 species ranged from 23,864 to 25,229 bp; the LSC regions ranged from 77,754 to 79,712 bp; and the SSC regions ranged from 11,353 to 16,986 bp ( Table 1). Gentiana stipitata possessed the longest LSC, SSC, and IR regions of the seven species. The average GC contents of the LSC, SSC, and IR regions and the whole cp genome in the seven species were 35.7, 30.6, 43.8, and 36.8%, respectively, which corroborates other reported Gentiana cp genomes Ni et al., 2016a,b). Furthermore, these plastid genomes were similar in structure and gene arrangement to previously published Gentiana plastomes Ni et al., 2016a,b). All the plastome maps are presented in Supplementary Figures S1-S7.

Comparison of cp Genomes
The comparative analysis indicated that the six species in Gentiana ser. Ornatae possessed very similar plastomes, with genome sizes ranging from 137,278 to 138,750 bp. The only obvious difference was located at the boundary between IRb and SSC (Figure 1A), in which four sequence patterns were detected. The first pattern, which appeared in G. lawrencei var. farreri, possessed almost all the sequences of ycf 1 and most of the forward sequences of ndhF ( Figure 1A). The second pattern, appearing in G. caelestis and G. ornata, possessed the forward 416 bp of ycf 1, but had lost the forward sequence of ndhF ( Figure 1A). In comparison with the second pattern, the third pattern only possessed the forward 181 bp of ycf 1 (Figure 1A). The third pattern appeared in G. obconica, G. veitchiorum, and G. oreodoxa. In sect. Kudoa, G. hexaphylla from ser. Verticillatae also possessed the third pattern. However, G. stipitata from ser. Monanthae possessed all of the whole sequences of ycf 1 and ndhF, constituting the fourth pattern. This species differed from all the other taxa in sect. Kudoa, but did not differ from the three taxa in sect. Cruciata (Figure 1A).
The comparative analysis in Gentiana revealed that the variation in plastid genome size in the 11 plastomes could mainly be attributed to sequence loss in four locations. One of these locations was the IRb-SSC boundary mentioned above. The second was the region between ccsA to rps15, in which about 5 kb had been lost in some taxa. The lost sequences mainly contained a small section of ndhD, the majority of ndhE and ndhH, and all of ndhG, ndhI, and ndhA ( Figure 1B). These ndh genes are shown in yellow in the Supplementary Figures S1-S7. The third was the region between trnF-GAA and trnV-UAC, in which about 2.2 kb had been lost in some taxa. The lost sequences mainly contained all of ndhJ, ndhK, and ndhC ( Figure 1C). The fourth was the region between trnL-CAA and rps7, in which about 1 kb had been lost in some taxa. The lost sequences mainly contained the whole exon 1 and part of the intron of ndhB ( Figure 1D). All the sequences that have been lost in the three regions were missing in the species of ser. Ornatae and ser. Verticillatae, but present in ser. Monanthae and sect. Cruciata.

Phylogenetic and Molecular Dating Analyses
The ML phylogenetic tree constructed using the 46 PCGs clearly identified the three families (Gentianaceae, Apocynaceae, and Rubiaceae) as being monophyletic with high bootstrap support (Figure 2). Two monophyletic groups were identified within the Gentianaceae clade in these analyses. Taxa from ser. Ornatae and ser. Verticillatae clustered into one monophyletic group whose monophyletic sister group contained taxa from sect. Cruciata and ser. Monanthae (Figure 2). The ML tree showed that G. stipitata was more closely related to other sections rather than sect. Kudoa.

Divergence Hotspots in Gentiana
The coding genes, introns, and non-coding regions were compared to detect divergence hotspots. We compared all 11 species mentioned above as well as the 8 species in sect. Kudoa. A total of 114 regions (49 coding genes, 9 intron regions, and 55 intergenic regions) greater than 200 bp were generated in both comparisons.

Loss of Plastid ndh Genes in Gentiana sect. Kudoa
Variations in plastome length were detected in Gentiana. Compared with sect. Cruciata, ser. Ornatae and ser. Verticillatae lost approximately 11 kb of sequences, while ser. Monanthae lost almost none. The plastomes of ser. Ornatae and ser. Verticillatae were highly similar in size and structure. The majority of length variation of the newly sequenced plastomes The chloroplast genomes of sect. Cruciata were downloaded from GenBank, while the remainder were sequenced in this study.
Frontiers in Plant Science | www.frontiersin.org of ser. Ornatae and ser. Verticillatae in this study mainly occurred in the SSC and LSC regions, rather than the two IR regions. The length variation pattern is similar to that observed in G. lawrencei var. farreri , whose genome size variation was not caused by deletions in the IR regions, but by deletions in the SSC and LSC regions. An explanation for the size variation pattern is that the junction between the IR and LSC region is located within the rps19 gene, which is a coding gene, and thus contributes to the more constant size of the IRs than the SSC and LSC region in the great majority of angiosperms (Palmer, 1985;Fu et al., 2016). As noted in G. lawrencei var. farreri , the genome size variation led to the loss of plastid ndh genes. Plastid ndh genes have been retained in the majority of higher plants (Martín and Sabater, 2010), and appear to have been frequently lost in parasitic and epiphytic plants (e.g., Stefanović and Olmstead, 2005). Along with the publication of numerous plastomes, the independent loss of ndh genes has been detected in increasing numbers of higher plants including orchids (Chang et al., 2006;Yang et al., 2013;Lin et al., 2015;Niu et al., 2017), gnetophytes (Braukmann et al., 2009;Wu et al., 2011), slender naiads (Peredo et al., 2013), and saguaros (Sanderson et al., 2015). The independent ndh gene loss in   various groups could be an example of convergent evolution in plants. In Gentiana, the loss of ndh genes was previously detected in G. lawrencei var. farreri, which belongs to sect. Kudoa, but was not detected in the other three previously sequenced plastomes Ni et al., 2016a). Upon analysis of the plastomes of sect. Kudoa, we discovered that the loss of ndh genes was common in ser. Ornatae and ser. Verticillatae. The 11 ndh genes in the plastome encode a protein complex that catalyzes the transfer of electrons from NADH to plastoquinone at photosystem I (Sazanov et al., 1998;Martín and Sabater, 2010). However, the PGR5-dependent cyclic electron transport pathway already exists in cells. Transgenic plants defective in ndh genes showed impaired photosynthesis rates, demonstrating that the NDH complex is required for the optimization of photophosphorylation rates and might play an important role in regulating CO 2 assimilation under stress conditions (Wang et al., 2006;Martín and Sabater, 2010). However, no deleterious effects have been observed in ndhdeficient mutants under favorable growing conditions (Ruhlman et al., 2015). This suggests that the plastid ndh genes might be dispensable in contemporary plants. A plastome study in orchids proposed that the expansion/contraction of IR boundaries might be associated with the loss of ndh genes, especially ndhF Niu et al., 2017). Previous studies have shown that the expansion of IRs is common to ndh-absent plastomes (Niu et al., 2017). In Gentiana, it is likely that the expansion/contraction of IR boundaries is correlated with the deletion of the ndh gene, particularly ndhF. However, the contraction, rather than expansion, of IRs was observed in ndhdeleted Gentiana plastomes. This suggests that the evolution of ndh genes in plastomes may vary between different taxa, and thus requires further exploration.
The loss of plastid ndh genes was common in sect. Kudoa, with the exception of ser. Monanthae. Compared with sect. Cruciata, ser. Monanthae did not exhibit significant size variation. As observed in sect. Cruciata, G. stipitata in ser. Monanthae maintained all 11 plastid ndh genes. This suggests that ser. Monanthae might be evolutionarily more closely related to other sections rather than sect. Kudoa (although cp genome sequences from additional sections are necessary to confirm this). Considering that gene loss in plastomes is an ongoing process in evolution (Martin et al., 1998), ser. Ornatae and ser. Verticillatae may have shorter evolutionary histories than ser. Monanthae.

Phylogenetic Relationships and Divergence Times
Plastid phylogenomics has been successfully applied in the phylogenetic reassessment of several groups (Burke et al., 2016;Zhang M.Y. et al., 2017;Zhang S.D. et al., 2017). In this study, the phylogenetic relationships constructed using the 46 PCGs were consistent with previous studies whereby the three families (namely Gentianaceae, Apocynaceae, and Rubiaceae) were classified as three monophyletic clades, and also identified Rubiaceae as the base group in Gentianales (Backlund et al., 2000).
The delimitation of Gentiana sect. Kudoa has been controversial. Ho (1985) and Ho and Pringle (1995) erected four serious in sect. Kudoa: ser. Monanthae, ser. Ornatae, ser. Verticillatae, and ser. Apteroideae (H. Smith) T. N. Ho. Based on 61 informative characters from morphology, palynology, and cytology, Ho et al. (1996)   Kudoa was accepted by Ho and Liu (2001), and was also corroborated by the phylogenetic analysis of Favre et al. (2016) based on atpB-rbcL, trnL-trnF, and ITS. The taxa from sect. Monopodiae, namely Kudoa I in Favre et al. (2016), constituted a monophyletic group and were paraphyletic with the other sections (Favre et al., 2016). We therefore adopted the treatment of three series in sect. Kudoa as the starting point of this study.
Contrary to the classical classification, molecular phylogeny shows that sect. Kudoa is paraphyletic. Although the phylogeny of sect. Kudoa was not mainly discussed in Favre et al. (2016), the phylogenetic trees in their study also indicated that the three series are not a monophyletic group. Our plastid phylogenomic relationships is in accord with this. The cp-based phylogenomic tree suggested that ser. Monanthae was more closely related to other sections rather than sect. Kudoa. The previous phylogenetic trees reconstructed based on atpB-rbcL, trnL-trnF and ITS data indicated that ser. Monanthae, sect. Isomeria and sect. Microsperma clustered into one clade (Favre et al., 2016). The instability of ser. Monanthae in the phylogenetic tree may suggest a potential hybrid origin of this group, since they have a partly sympatric distribution and nearly contemporaneous flowering times to other groups like sect. Kudoa and sect. Cruciata (Ho and Liu, 2001), and furthermore, hybridization is an important means of speciation in the QTP (Li et al., 2007;Liu et al., 2014;Wen et al., 2014). In terms of morphology, ser. Verticillatae is characterized by whorled phyllotaxy. However, our phylogenomic tree showed that ser. Verticillatae did not constitute a monophyletic group, but was rather inlaid in ser.
Ornatae. This is also supported by the phylogenetic tree based on atpB-rbcL, trnL-trnF, and ITS (Favre et al., 2016). The results suggested that the present classification in sect. Kudoa based on morphology is not consistent with the molecular phylogenetic reconstruction. Therefore, the phylogenetic relationships in sect. Kudoa require further evaluation.
Molecular dating analysis in this study showed that ser. Monanthae and sect. Cruciata diverged around 6.11 Ma (95% HPD: 5.06-9.25 Ma) when sect. Cruciata diverged from its sister groups (Favre et al., 2016). The divergence of the ser. Verticillatae and ser. Ornatae clade occurred at around 0.07-1.78 Ma, which is much younger than 1.34-9.55 Ma estimated in Favre et al. (2016). Since the ndh gene loss indicated that ser. Verticillatae and ser. Ornatae were younger than ser. Monanthae, we believe that the time of divergence between ser. Verticillatae and ser. Ornatae found in this study may be true. We recommend that the plastomes and ndh gene remain/loss patterns of more taxa that are closely related to sect. Kudoa are included in future studies.

Mutational Hotspot
The comparative analysis indicated that the plastomes of ser. Ornatae and ser. Verticillatae exhibited little variation. The popular barcoding plastid markers such as matK and rbcL (Li et al., 2015;Coissac et al., 2016) exhibited very poor sequence variation in the two series. Some popular genetic markers used in intra-species population genetics such as trnS-trnG, rps15-ycf1, and rpl20-rps12 (e.g., Burnier et al., 2009;Takayama et al., 2013;Khan et al., 2016) show very limited variation in the populations of some species such as G. veitchiorum and G. lawrencei var. farreri (Fu, Unpublished Data). Additionally, trnQ-rps16 (Shepherd et al., 2017) was not detected as rps16 is a pseudogene in Gentiana (Ni et al., 2016a). Since the plastome-wide comparisons could facilitate the screening of mutational hotspots used for inter-species phylogenetics (Shaw et al., 2014) and intra-species discrimination (Ahmed et al., 2013), suitable molecular markers for phylogenetic and population genetic studies could be identified in the mutational hotspots in Gentiana, particularly in sect. Kudoa.

CONCLUSION
The complete cp genome sequences of seven species from Gentiana sect. Kudoa were reported in this study, and the evolutionary characteristics of 11 Gentiana plastomes from two sections were described. We discovered that the loss of plastid ndh genes is common in ser. Ornatae and ser. Verticillatae, but not in ser. Monanthae. The phylogenetic tree and deletion patterns in the plastid ndh genes indicated that ser. Monanthae is more closely related to other sections rather than sect. Kudoa, which is not monophyletic. The sequence and divergence hotspot information presented here could be useful in future studies on the population genetics, phylogenetics, and evolution of Gentiana.

AUTHOR CONTRIBUTIONS
S-SS designed the experiments, performed the experiments, analyzed the data, prepared the figures and/or tables, and revised the manuscript. P-CF conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, and prepared the figures and/or tables. X-JZ and Y-WC performed the experiments and analyzed the data. F-QZ and S-LC collected the samples and reviewed the drafts of the paper. Q-BG analyzed the data, prepared figures and/or tables, reviewed the drafts of the paper, and revised the manuscript.