Abstract
Euphrasia (Orobanchaceae) is a genus which is widely distributed in temperate regions of the southern and northern hemisphere. The taxonomy of Euphrasia is still controversial due to the similarity of morphological characters and a lack of genomic resources. Here, we present the first complete chloroplast (cp) genome of this taxonomically challenging genus. The cp genome of Euphrasia regelii consists of 153,026 bp, including a large single-copy region (83,893 bp), a small single-copy region (15,801 bp) and two inverted repeats (26,666 bp). There are 105 unique genes, including 71 protein-coding genes, 30 tRNA and 4 rRNA genes. Although the structure and gene order is comparable to the one in other angiosperm cp genomes, genes encoding the NAD(P)H dehydrogenase complex are widely pseudogenized due to mutations resulting in frameshifts, and stop codon positions. We detected 36 dispersed repeats, 7 tandem repeats and 65 simple sequence repeat loci in the E. regelii plastome. Comparative analyses indicated that the cp genome of E. regelii is more conserved compared to other hemiparasitic taxa in the Pedicularideae and Buchnereae. No structural rearrangements or loss of genes were detected. Our analyses suggested that three genes (clpP, ycf2 and rps14) were under positive selection and other genes under purifying selection. Phylogenetic analysis of monophyletic Orobanchaceae based on 45 plastomes indicated a close relationship between E. regelii and Neobartsia inaequalis. In addition, autotrophic lineages occupied the earliest diverging branches in our phylogeny, suggesting that autotrophy is the ancestral trait in this parasitic family.
Introduction
The chloroplast (cp) is the most important organelle for green plants as it is the place where photosynthesis and carbon fixation occurs. The cp genome is uniparentally inherited and generally has a quadripartite structure consisting of one large single-copy (LSC) region, one small single-copy (SSC) region, and two inverted repeat regions (IRs) of the same length (Bendich, 2004). The cp genome is more conserved than the nuclear and mitochondrial genomes in terms of gene structure and composition (Asaf et al., 2017a). Due to the highly conserved and non-recombinant nature of the cp genome, it has been shown to be a very useful genetic resource for inferring evolutionary relationships at different taxonomic levels (Caron et al., 2000; Cho et al., 2015). Recently, with the advent of next generation sequencing, it has become comparatively easy to sequence the complete cp genome of non-model taxa and infer phylogenetic relationships based on whole plastomes (Ruhsam et al., 2015; Guo et al., 2017; Saarela et al., 2018).
The genus Euphrasia (Orobanchaceae) is widely distributed throughout temperate regions of the southern and northern hemispheres, and contains about 458 species and subspecies, most of which occur in the northern hemisphere (Gussarova et al., 2008; Secretariat, 2017; Moura et al., 2018). Euphrasia plants are either perennial or annual herbs which mainly parasitise the roots of Gramineae species (Wu et al., 2005; Gussarova et al., 2008). Some species in this genus are used as folk medicine to treat diseases such as blepharitis, conjunctivitis and coughs (Li and Wang, 2003). Euphrasia was once included in the tribe Rhinantheae of the Scrophulariaceae but based on molecular data, was moved with all other parasitic plants in this family to Orobanchaceae (Olmstead et al., 2001). Due to frequent autogamy as well as interspecific hybridization and morphological diversity, Euphrasia comprises a taxonomically complex group of taxa where species delimitation remains challenging (Vitek, 1998; French et al., 2008; Gussarova et al., 2008).
In China, 11 species of Euphrasia are currently recognized which are divided into two sections based on morphological characteristics, namely Sect. Semicalcaratae and Sect. Paradoxae (Hong et al., 1998). The annual herb Euphrasia regelii Wettst., belongs to Sect. Semicalcaratae, and is used for the treatment of hyperglycemia, inflammation, hay fever, conjunctivitis, colds, influenza and coughs (Shuya et al., 2004). Due to the medicinal value of E. regelii, research has mainly focused on identifying the effective chemical constituents of this species (Li and Wang, 2003; Shuya et al., 2004). Few studies have been conducted to infer the phylogenetic position of E. regelii or its genetic diversity due to a lack of informative genetic markers. Additionally, research has been hampered because E. regelii is difficult to distinguish from other Euphrasia species due to morphological similarities. Therefore, more discriminating genetic markers are needed to infer the phylogenetic relationship of E. regelii with other Euphrasia taxa and to facilitate reliable genetic authentication of this important medicinal herb. Although the cp genome of some Orobanchaceae species has been sequenced and utilized in phylogenetic studies (Wicke et al., 2013; Samigullin et al., 2016; Zeng et al., 2017), no cp genome which could have been used for the development of new and variable markers has been published for the genus Euphrasia until now.
In this study, we characterize the complete cp genome of E. regelii and compare it with the available cp genomes of Orobanchaceae taxa. Our results will be useful for marker development, species discrimination, and the inference of phylogenetic relationships in the genus Euphrasia.
Materials and Methods
Plant Material and DNA Extraction
Euphrasia regelii was collected from Taibai mountain (107°16′47.172″ E, 33°59′27.1068″ N) in the Chinese province of Shaanxi. Young leaves were put into silica gel for DNA extraction and a voucher specimen was deposited at the herbarium of Xi’an Jiaotong University (XJTU) (Xi’an, China). Total genomic DNA was extracted using a modified CTAB protocol (Doyle, 1987), and the quantity and quality of the extracted DNA was determined by gel electrophoresis and a NanoDrop 2000 Spectrophotometer.
Chloroplast Genome Sequencing and Assembly
The DNA Library with an insert size of 270 bp was constructed using TruSeq DNA sample preparation kits and sequenced on an Illumina HiSeq X Ten platform with an average paired end read length of 150 bp. The raw reads were filtered to obtain high-quality reads by removing adapters, low-quality sequences such as reads with unknown bases (“N”), and reads with more than 50% low-quality bases (quality value ≤ 10) using the NGS QC Toolkit v2.3.3 (Patel and Jain, 2012). To filter reads from the chloroplast genome, paired-end high quality reads were mapped to the previously published cp genomes (NC_034308, NC_027838, NC_022859, KF922718, NC_022859; Supplementary Table S1) in the Orobanchaceae using Bowtie v2.2.6 with default parameter (Langmead and Salzberg, 2012). Matched paired-end reads were de novo assembled using SPAdes v3.6.0 (Bankevich et al., 2012), and the longest contig was selected as Seed sequence for further assembly using NOVOPlasty v2.6.2 (Dierckxsens et al., 2017). Finally, all the clean reads were mapped to the unannotated cp genome using Geneious v10.1 with bowtie 2 algorithm (Biomatters, Ltd., Auckland, New Zealand) in order to avoid assembly errors. Seven regions with low coverage were Sanger sequenced (Supplementary Table S2). The cp genome was aligned to its reverse complement to determine inverted repeat regions. The boundaries of the inverted repeats and single copy regions were also verified by Sanger sequencing (Supplementary Table S2).
Genome Annotation, Codon Usage, and Repeat Structure
The complete cp genome was annotated using the automatic annotator DOGMA (Wyman et al., 2004) with manual verification via BLAST searches against the cp genomes of other Orobanchaceae species. During the annotation process, open reading frames (ORFs) that can be matched with known cp genes were annotated, and the remaining ORFs lacking protein evidence were disregarded. Genes that contained one or more frameshift mutations or premature stop codons were considered potential pseudogenes. The circular annotated plastid genome map was drawn using the online program OrganellarGenome DRAW (Lohse et al., 2013) and deposited in GenBank (MK070895). The codon usage frequency was calculated based on protein-coding genes using MEGA v6 (Tamura et al., 2013). Tandem repeat sequences were searched for using the Tandem Repeats Finder program (Benson, 1999) with the following parameters: 2 for the alignment parameter match and 7 for mismatch and indels. Dispersed and palindromic repeats were identified using REPuter with a minimum repeat size of 30 bp and sequence identity of no less than 90% (hamming distance equal to 3) (Kurtz et al., 2001). Simple sequence repeats (SSRs) were identified using the software MISA (Thiel et al., 2003) with the following minimum number of repeats: 10 for mono, 5 for di-, 4 for tri-, and 3 for tetra-, penta, and hexa-nucleotide SSRs.
Genome Comparison and Sequence Divergence
Eleven plastome sequences, including two from non-parasitic taxa (Rehmannia glutinosa, NC_034308; Lindenbergia philippensis, NC_022859), three from facultative hemiparasites (Triphysaria versicolor, KU212369 Aureolaria virginica, MF780870; Buchnera americana, MF780871), four from obligate hemiparasites (Neobartsia inaequalis, KF922718; Schwalbea americana NC_023115; Castilleja paramensis, NC_031805; Pedicularis cheilanthifolia, NC_036010; Striga aspera, MF780872) and one from a holoparasite (Lathraea squamaria, NC_027838), were retrieved from GenBank and used in the subsequent analyses. Comparative Genomics of 12 Orobanchaceae plastomes was performed and visualized using the mVISTA software (Frazer et al., 2004) with the annotation of R. glutinosa as a reference. Any large structural changes such as gene order rearrangements were recorded using Mauve v1.1.1 with default settings (Darling et al., 2004). IR expansion/contraction of these plastomes were also analyzed. The nucleotide diversity (Pi) and sequence polymorphism of Rhinantheae species were analyzed using DNAsp v6.0 (Rozas et al., 2017). In order to detect whether plastid genes were under selection, the non-synonymous (dN), synonymous (dS), and dN/dS values of 64 protein coding gene from Rhinantheae species were calculated using the PAML package v 4.0 with YN algorithm (Yang, 2007). Nucleotide substitution rates were not calculated for pseudogenes due to the existence of premature stop codons.
Phylogenetic Analysis
To infer phylogenetic relationships within Orobanchaceae a total of 42 cp genomes were used with Salvia miltiorrhiza (Lamiaceae), Tectona grandis (Lamiaceae), and Solanum lycopersicum (Solanaceae) as outgroup (Supplementary Table S1). All cp genome sequences were aligned using MAFFT v7.402 (Katoh and Standley, 2013) and the most variable positions were excluded from the alignment using Gblocks v0.91b (Talavera and Castresana, 2007). A maximum likelihood (ML) and a Bayesian inference (BI) approach were used to infer phylogenetic relationships. The Maximum likelihood analyses were conducted using IQ-TREE v1.6.1 (Nguyen et al., 2015) with the best best-fit model selected by ModelFinder and 1,000 bootstrap replicates. Bayesian inference was conducted using MrBayes v3.2.6 (Ronquist et al., 2012) with a nucleotide substitution model inferred by Modeltest v3.7 (Posada and Crandall, 1998) (Supplementary Table S3). The Markov chain Monte Carlo (MCMC) algorithm was run for 1 million generations and sampled every 100 generations. The first 25% of resultant trees were discarded and the remaining trees were used to build a majority-rule consensus tree with posterior probability (PP) values for each node. As gene loss from the cp genome is a common phenomenon in the parasitic family of Orobanchaceae, the most conserved regions (TMCRs) of the cp genomes were retrieved using HomBlocks (Bi et al., 2018). TMCRs were then used to construct the phylogenetic trees using the two methods specified above. Additionally, the phylogeny of the genus Euphrasia was inferred using the following chloroplast regions: trnL, trnL-trnF, and atpB-rbcL. The sequences of 39 Euphrasia species were downloaded from TreeBase with the Accession No. 224921.
Results
The Chloroplast Genome of Euphrasia regelii
A total of 7,867,077 paired-end reads were retrieved with a sequence length of 150 bp. A total of 7,861,321 of high quality reads were used for the cp genome assembly. The raw reads were deposited in NCBI SRA database under the Accession No. SRR8237421. Based on a combination of de novo and reference guided assemblies, the cp genome of E. regelii was obtained with the average coverage of 956×. The complete cp genome of E. regelii is 153,026 bp in length and possesses the typical quadripartite structure including a LSC region of 83,893 bp separated from the 15,801 bp long SSC region by two inverted repeats (IRs), each 26,666 bp (Figure 1 and Table 1).
FIGURE 1
TABLE 1
| E. regelii | L. squamaria | N. inaequalis | S. americana | T. versicolor | L. philippensis | R. glutinosa | A. fasciculatum | |
|---|---|---|---|---|---|---|---|---|
| Genome length (bp) | 153,026 | 150,504 | 151,349 | 160,910 | 152,448 | 155,103 | 153,622 | 106,796 |
| LSC length (bp) | 83,893 | 81,981 | 83,806 | 84,756 | 83,650 | 85,606 | 84,605 | 43970 |
| SSC length (bp) | 15,801 | 16,061 | 16,327 | 6,517 | 17,520 | 17,885 | 17,579 | 530 |
| IR length (bp) | 26,666 | 26,231 | 25,566 | 34,818 | 25,639 | 25,800 | 25,719 | 31148 |
| No. of different genes | 107 | 78 | 102 | 108 | 106 | 117 | 121 | 66 |
| No. of different protein-coding genes | 69 | 46 | 69 | 74 | 73 | 80 | 82 | 28 |
| No. of different tRNA genes (duplicated in IR) | 30 (7) | 30 (7) | 27 (5) | 30 (7) | 28 (7) | 30 (7) | 30 (7) | 29 (7) |
| No. of different rRNA genes (duplicated in IR) | 4 (4) | 4 (4) | 4 (4) | 4 (4) | 4 (4) | 4 (4) | 4 (4) | 4 (4) |
| No. of genes duplicated in IR | 15 | 14 | 10 | 19 | 15 | 16 | 12 | 17 |
| No. of different genes with introns | 16 | 12 | 18 | 17 | 15 | 18 | 18 | 10 |
| No. of pseudogenes | 11 | 32 | 0 | 2 | 1 | 4 | 0 | 8 |
| GC content (%) | 38.4 | 38.1 | 37.5 | 38.1 | 38.2 | 37.8 | 38.0 | 34.7 |
Statistics of the chloroplast genomes of Euphrasia regelii and seven other Orobanchaceae species.
The plastome of E. regelii was predicted to contain 105 unique genes, including a set of 71 protein-coding genes, 30 tRNA genes and 4 rRNA genes (Table 1 and Supplementary Table S4). Unexpectedly, 10 plastid genes encoding the subunits of the NAD(P)H dehydrogenase complex (ndh genes) were pseudogenized, and only the intact ORF of ndhF existed. Ycf15 was also found to be a pseudogene due to an internal stop codon in its ORF frame. Of 105 genes, four protein-coding genes (rpl2, ycf2, rpl23, rps7), seven tRNA genes (trnH-CAU, trnL-CAA, trnV-GAC, trnI-GAU, trnA-GAC, trnR-ACG, trnN-GUU), and four rRNA genes (rrn16, rrn23, rrn4.5, rrn5) were duplicated in the IR regions. Sixteen intron-containing genes were detected in the E. regelii cp genome, including seven protein-coding genes and six tRNA genes with one intron, whereas the remaining three protein-coding genes (clpP, rps12, ycf3) had two introns (Table 2). We found that trnK-UUU had the largest intron (2,472 bp) and included the gene matK. The tRNA gene trnL-UAA had the smallest intron (462 bp) (Table 2). The overall GC content of 38.4% of the E. regeli cp genome was generally low (LSC, SSC, and IR regions had 36.2, 33.9, and 42.9% GC content, respectively).
TABLE 2
| Exon I | Intron I | Exon II | Intron II | Exon III | ||
|---|---|---|---|---|---|---|
| Gene | Location | (bp) | (bp) | (bp) | (bp) | (bp) |
| trnK-UUU | LSC | 37 | 2,472 | 35 | ||
| rps16 | LSC | 40 | 839 | 194 | ||
| trnG-UCC | LSC | 23 | 663 | 48 | ||
| atpF | LSC | 234 | 687 | 411 | ||
| rpoC1 | LSC | 456 | 767 | 1,668 | ||
| ycf3 | LSC | 134 | 698 | 229 | 705 | 153 |
| trnL-UAA | LSC | 35 | 462 | 50 | ||
| trnV-UAC | LSC | 38 | 582 | 35 | ||
| clpP | LSC | 71 | 728 | 292 | 627 | 228 |
| petB | LSC | 6 | 728 | 642 | ||
| petD | LSC | 8 | 765 | 475 | ||
| rpl16 | LSC | 9 | 865 | 399 | ||
| rpl2 | IR | 394 | 669 | 434 | ||
| rps12* | IR/LSC | 114 | 232 | 538 | 26 | |
| trnI-GAU | IR | 37 | 946 | 35 | ||
| trnA-UGC | IR | 38 | 812 | 35 |
Genes with introns in the chloroplast genome of Euphrasia regeli.
*rps12 gene is trans-spliced gene with the two duplicated 3′ end exons in IR regions and 5′ end exon in the LSC region.
Codon Usage Bias of E. regelii cp Genome
The frequency of codons in the E. regelii cp genome was calculated based on protein-coding genes (Table 3). In total, all genes were encoded by 23,629 codons. We found that leucine was the most frequent amino acid (2,427 codons, 10.27%) and cysteine (265 codons, 1.1%) the least frequent in the cp genome (Table 3). Similar to other angiosperms cp genomes, codon usage in the E. regelii plastome was biased toward a high representation of U and A at the third codon position [relative synonymous codon usage values (RSCU) > 1].
TABLE 3
| Codon | Amino acid | Count | RSCU | tRNA | Codon | Amino acid | Count | RSCU | tRNA |
| UUU | F | 882 | 1.33 | trnF-GAA | UAU | Y | 667 | 1.62 | trnY-GUA |
| UUC | F | 441 | 0.67 | UAC | Y | 156 | 0.38 | ||
| UUA | L | 747 | 1.85 | trnL-UAA | UAA | 42 | 1.68 | ||
| UUG | L | 511 | 1.26 | trnL-CAA | UAG | 17 | 0.68 | ||
| CUU | L | 510 | 1.26 | trnL-UAG | CAU | H | 435 | 1.48 | trnH-GUG |
| CUC | L | 157 | 0.39 | CAC | H | 152 | 0.52 | ||
| CUA | L | 330 | 0.82 | CAA | Q | 649 | 1.5 | trnQ-UUG | |
| CUG | L | 172 | 0.43 | CAG | Q | 216 | 0.5 | ||
| AUU | I | 952 | 1.48 | trnI-GAU | AAU | N | 927 | 1.55 | trnN-GUU |
| AUC | I | 405 | 0.63 | AAC | N | 272 | 0.45 | ||
| AUA | I | 569 | 0.89 | trnI-CAU | AAA | K | 1034 | 1.5 | trnK-UUU |
| AUG | M | 506 | 1 | trnM-CAU | AAG | K | 342 | 0.5 | |
| GUU | V | 452 | 1.41 | trnV-GAC | GAU | D | 753 | 1.6 | trnD-GUC |
| GUC | V | 163 | 0.51 | GAC | D | 187 | 0.4 | ||
| GUA | V | 479 | 1.49 | GAA | E | 933 | 1.5 | trnE-UUC | |
| GUG | V | 192 | 0.6 | trnV-UAC | GAG | E | 314 | 0.5 | |
| UCU | S | 499 | 1.63 | trnS-GGA | UGU | C | 198 | 1.49 | trnC-GCA |
| UCC | S | 302 | 0.99 | UGC | C | 67 | 0.51 | ||
| UCA | S | 326 | 1.07 | UGA | 16 | 0.64 | |||
| UCG | S | 214 | 0.7 | trnS-UGA | UGG | W | 400 | 1 | trnW-CCA |
| CCU | P | 334 | 1.37 | trnP-UGG | CGU | R | 312 | 1.25 | trnR-ACG |
| CCC | P | 205 | 0.84 | CGC | R | 119 | 0.48 | trnR-UCU | |
| CCA | P | 276 | 1.13 | CGA | R | 327 | 1.31 | ||
| CCG | P | 160 | 0.66 | CGG | R | 130 | 0.52 | ||
| ACU | T | 507 | 1.63 | AGA | R | 450 | 1.8 | ||
| ACC | T | 223 | 0.72 | AGG | R | 163 | 0.65 | ||
| ACA | T | 365 | 1.17 | trnT-UGU | AGU | S | 385 | 1.26 | trnS-GCU |
| ACG | T | 149 | 0.48 | trnT-GGU | AGC | S | 106 | 0.35 | |
| GCU | A | 533 | 1.72 | trnA-UGC | GGU | G | 503 | 1.27 | trnG-GCC |
| GCC | A | 200 | 0.64 | GGC | G | 165 | 0.42 | ||
| GCA | A | 363 | 1.17 | GGG | G | 339 | 0.85 | ||
| GCG | A | 147 | 0.47 | GGA | G | 582 | 1.47 | trnG-UCC |
Codon–anticodon recognition pattern and codon usage in the Euphrasia regeli chloroplast genome.
*indicates the stop codon.
Repeat Analysis
Of the E. regelii cp genome, 19 forward repeats, 17 palindromic repeats, and 7 tandem repeats were detected (Figure 2). More than half of the repeats (58.3%) were found in intergenic regions and introns, and 74.4% of these repeats have a repeat length between 30 and 50 bp (Figure 2). Within the CDS region, only two genes (ycf1 and ycf2) contained six forward repeats, six palindromic repeats and two tandem repeats, respectively (Supplementary Tables S5, S6). A total of 44 SSRs were detected in the E. regelii cp genome, the majority of which were mononucleotide repeats (22), followed by dinucleotide (12), tetranucleotide (6), and trinucleotide (4) repeats. Most SSRs (29) were distributed in non-coding regions with the remaining 15 SSRs located in genic regions including rpoC2, psbC, atpB, rpoA, ycf1, ccsA (Figure 2). Just over half (54.5%) of the SSRs were located in the LSC region, whereas 36.4 and 9.1% were found in the SSC and the IR regions (Figure 2).
FIGURE 2
Genome Comparison and Selective Pressure Analyses
To investigate cp genome divergence between E. regelii and other Orobanchaceae species, sequence alignment of 12 cp genomes were conducted using the annotated cp genome of R. glutinosa as a reference. The results indicated that the IR regions are more conserved than the SC regions and that the divergence in intergenic regions is higher than in genic regions (Figure 3). Many differences were found in the SSC regions of these plastomes, and the LSC regions of B. americana and S. aspera differed markedly from other autotrophic and parasitic species (Figure 4). The cp genome of E. regelii is very similar to the plastomes of R. glutinosa, L. philippensis, T. versicolor, L. squamaria, and N. inaequalis. All other plastomes contained multiple rearrangements, especially in B. americana and S. aspera. No rearrangements were detected in the three included Rhinantheae species (E. regelii, L. squamaria, N. inaequalis) except that some genes within the SSC region of N. inaequalis were lost. However, the orientation of the SSC region of S. americana was inverted and showed a reverse gene order compared to the other three Rhinantheae species. A sliding window analysis indicated that most of the variation in the cp genomes of the three Rhinantheae species occurred in the LSC and SSC regions (Figure 5). The most divergent non-coding regions among the four Rhinantheae cp genomes were trnH (GUG) – psbA, rps16 – trnQ (UUG), trnS (GCU) – trnG (UCC), atpH-atpI, petN – psbM, trnT (GGU) – psbD, ndhC – trnV (UAC), rbcL – accD, petA – psbJ, clpP – psbB, ndhF – rpl32, rpl32 – trnL (UAG). Although coding regions were conserved in these cp genomes, minor sequence variation was observed among the four cp genomes in the rpoC2, rpoC1, ndhF, ycf1, and ycf2 gene.
FIGURE 3
FIGURE 4
FIGURE 5
Genomic structure and size varied in the 12 Orobanchaceae cp genomes and the IR/SC border regions of these species were also different (Figure 6). Fifteen genes including rps19, ycf1, rpl2, ndhF, ndhE, rpl23, rpl32, psbK, ndhA, ndhG, atpF, atpA, psbI, petL and trnH, were found in the LSC/IR and SSC/IR borders of the 12 plastomes. Of these, S. aspera, B. americana, and S. americana all exhibited larger plastome sizes due to the increased IR length, and the corresponding genes distributed in the SSC/IR border were quite different from other plastomes. Apart from the above three cp genomes, the IRs of E. regelii were much longer than in other cp genomes, especially in the area of the LSC/IRb and the IRb/SSC regions (Figure 6). The ndhF 3′-end sequence in the cp genomes of E. regelii and L. squamaria shared the region in the IRb with the rest of the ycf1 3′-end sequence, while the IRb-SSC border of L. philippensis, R. glutinosa, and N. inaequalis
FIGURE 6
were separated from the stop codon of ndhF by 32, 73, and 82 bp, respectively. Notably, genes located at the IR/SSC border of Castilleja paramensis, P. cheilanthifolia, and A. virginica showed a reverse gene order compared to E. regelii.
In order to detect whether the protein-coding genes of four Rhinantheae cp genomes (E. regelii, L. squamaria, N. inaequalis, and S. americana) were under selective pressure, rates of synonymous (dS) and non-synonymous (dN) substitutions, and the dN/dS ratios were calculated. As many pseudogenes were found in the cp genomes of E. regelii and L. squamaria, only 64 cp genes could be used for this analysis. The average dS values between paired Rhinantheae species (E. regelii-S. americana/E. regelii-L. squamaria/S. americana-L. squamaria/E. regelii-N. inaequalis/S. americana-N. inaequalis/L. squamaria-N. inaequalis) were 0.2175/0.1006/0.2131/0.0781/0.1861/0.0711 and the dN values ranged from 0 to 1.1435, with an average of 0.0718/0.0224/0.0857/0.0113/0.0639/0.0146, respectively (Supplementary Table S7). 305 paired dN/dS values were obtained most of which were less than 1, indicating that cp genes were under purifying selection. Only three genes (clpP, ycf2, rps14) had dN/dS values > 1, indicating that these genes had undergone positive selection.
Phylogenetic Analyses Based on Chloroplast Genome Sequence
Forty-five complete chloroplast genomes were used to infer the phylogenetic position of E. regelii (Supplementary Table S1). Phylogenetic analyses were performed using Maximum likelihood (ML) and Bayesian inference (BI) with Salvia miltiorrhiza, Tectona grandis, and Solanum lycopersicum as outgroup. Two datasets were used to infer phylogenetic relationships, one dataset included the complete cp genome and the other dataset only TMCRs of the 45 cp genomes. Both datasets yielded a consistent phylogenetic signal (Figure 7 and Supplementary Figure S1) Except for P. cheilanthifolia, which clustered with the outgroup, all other species of the Orobanchaceae formed a monophyletic group with high bootstrap and BI support. Similarly, E. regelii and two other Rhinantheae species (L. squamaria and N. inaequalis) formed a highly supported clade, with E. regelii being sister to N. inaequalis. Unexpectedly, S. americana, another species in the Rhinantheae tribe, clustered with Buchnera and Striga. Apart from L. squamaria, all holoparasitic species clustered in the same clade which also was the most derived in Orobanchaceae. Autotrophic genera including Lindenbergia and Rehmannia belonged to the earliest diverging groups, suggesting that autotrophic lineages may be the ancestors of parasitic lineages in Orobanchaceae.
FIGURE 7
The phylogenetic relationship of 39 Euphrasia species was infered based on three cpDNA makers. All Euphrasia species formed a highly supported clade, however, species relationships remained unresolved (Supplementary Figure S2).
Discussion
Here we present the complete chloroplast genome of E. regelii which is the first complete plastome for this hemiparasitic genus. The chloroplast genome of E. regelii displays the typical quadripartite structure with a LSC and a SSC region which are separated by two inverted repeat regions. The structure is comparable to the one of other hemiparasitic species in Orobanchaceae (Wicke et al., 2013, 2016; Cho et al., 2018). In the plastome of E. regelii, only 71 protein-coding genes are retained due to pseudogenization of some plastid genes, especially ndhA – E, ndhG – K, and ycf15. Previous studies indicated that relaxed selective constraints in relation to photosynthesis resulted in extensive pseudogenization of ndh genes in some parasitic genera such as Lathraea, Pedicularis, and Schwalbea (Barrett et al., 2013; Wicke et al., 2013; Cho et al., 2018). In contrast, the ycf15 gene is usually truncated as a pseudogene in many angiosperm chloroplast genomes (Dong et al., 2013; Fajardo et al., 2013; Hu et al., 2016; Lu et al., 2016; Ge et al., 2018). Gene loss or plastome reduction is a common phenomenon in most parasitic plant species (Wicke et al., 2013; Samigullin et al., 2016), however, this was not observed in the cp genome of E. regelii as ndh genes were only pseudogenized but not lost. It has been shown that ndh genes were pseudogenized or lost entirely several times during land plant evolution, which is largely related to a heterotrophic lifestyle (Wicke et al., 2011; Barrett and Davis, 2012; Barrett et al., 2014; Graham et al., 2017; Wicke and Naumann, 2018). Euphrasia species are facultative hemiparasites which can complete their lifecycle without a host, however, they grow much better attached to a suitable host (Twyford et al., 2019). This facultative lifestyle probably accounts for the retention and subsequent pseudogenization of ndh genes. Similar to most angiosperm cp genomes, the overall GC content and the codon usage of E. regelii cp genome is heavily biased.
Repeat elements in plastomes were shown to play an important role in genomic rearrangements and recombination (Asano et al., 2004; Weng et al., 2013). Low number of repeat elements were found in the cp genome of E. regelii compared to the previously published Rehmannia plastome (Zeng et al., 2017). Most repeats were located in intergenic regions or ycf genes (ycf1 and ycf2) which is similar to the situation in other angiosperm lineages (Curci et al., 2015; Yang et al., 2016; Zhou et al., 2016). Chloroplast simple sequence repeats (cpSSRs) have been proven to be an important molecular marker for distinguishing species at lower taxonomic levels, and are therefore potentially useful marker for population genetics (Provan et al., 2001; Yang et al., 2011; Xue et al., 2012; Hu et al., 2016; Ruhsam et al., 2016). In the present study, 44 SSRs were detected in the E. regelii cp genome with mononucleotide repeats (A/T) being the most abundant type. Poly (A)/(T) SSRs are usually more common than other SSR repeat types in many plant cp gnomes (Yang et al., 2016; Asaf et al., 2017b; Dong et al., 2017; Li et al., 2018; Wang et al., 2018b; Ye et al., 2018; Zhou et al., 2018). Likewise, most cpSSRs were observed in non-coding regions, and only a small proportion was found in coding regions. CpSSRs located in non-coding regions are generally short mononucleotide tandem repeats and commonly show intraspecific variation in repeat numbers (Eguiluz et al., 2017). Therefore, cpSSR loci detected in this study will be useful tools for investigating levels of genetic diversity in Euphrasia and might even be able to discriminate between species.
Morphological similarity renders the reliable identification of many Euphrasia species challenging. In addition, standard DNA plant barcodes (Techen et al., 2014; Li et al., 2015) have failed to discriminate between Euphrasia species (Wang et al., 2018a). Therefore, it is necessary to develop Euphrasia specific DNA barcodes. Here, several highly variable cpDNA markers were obtained based on the comparative chloroplast genome analyses of Orobanchaceae species which could be tested as Euphrasia specific DNA barcodes. These regions might also provide sufficient genetic variation for resolving the phylogenetic relationships between Euphrasia species. Compared to two photoautotrophic species our results indicated that there are no structural rearrangements in the cp genome of E. regelii which is probably related to the facultative hemiparasitic life form of this species (Frailey et al., 2018). No major gene rearrangements were detected among four Rhinantheae species, except for B. americana which had a reversed SSC region. The SSC region is usually flipped in plastomes and the reversed SSC often show in a 50:50 ratio in plant cells (Palmer, 1985; Frailey et al., 2018). A similar situation was also detected in A. virginica and two other Pedicularideae species.
Size variability in cp genomes is usually due to the contraction and expansion of the IRs (He et al., 2017). This was apparent in the plastomes of S. aspera, B. americana, and S. americana where the IRs were much longer. Interestingly, the cp IR borders of B. americana are quite different from other Rhinantheae species as most of the repeat region extended into the SSC region. A previous study showed that B. americana belongs to an early diverging lineage in the Rhinantheae clade (McNeal et al., 2013), which suggests that the repeat expansion occurred independently in the B. americana lineage. The IR length of E. regelii was the longest out of the other three above cp genomes sequenced and was expanded much more than cp genomes of L. squamaria and N. inaequalis. Generally, ycf1 in the IRb is often pseudogenized in several angiosperm cp genomes (Daniell et al., 2006; Yao et al., 2016). However, no internal stop codons were detected in the coding sequence of ycf1 in E. regelii, thus the additional length of ycf1 affected the IR length and the gene distribution at the SC/IR borders. We hypothesize that the expansion of the IR caused a duplication of ycf1, like it has been reported for Eucommia ulmoides and Fagopyrum dibotrys (Wang et al., 2016, 2018b).
The results from the sequence divergence analysis of protein coding genes in four Rhinantheae plastomes indicated low sequence divergence and purifying selection (dN/dS < 1) for most genes which is consistent with the results from other studies (Rousseau-Gueutin et al., 2015; Xu et al., 2015; Zhou et al., 2016; Yin et al., 2018). Only three protein-coding genes (clpP, ycf2, rps14) were under positive selection. ClpP, which encodes a proteolytic subunit of the ATP-dependent protease, is very important for chloroplast biogenesis (Shikanai et al., 2001). Clp proteases are highly conserved in many organisms (Schirmer et al., 1996; Shikanai et al., 2001) but previous studies indicated that clpP genes showed significantly accelerated substitution rates and were under positive selection in Pelargonium plastid genomes (Weng et al., 2017). It is likely that clpP may have higher substitution rates in parasitic plant species. Ycf2 is one of the largest genes encoding for a putative membrane protein (Drescher et al., 2000; Kikuchi et al., 2013) and has rapidly evolved in several species of Fagopyrum, Ipomoea, Ophrys, and Mimosoideae (Cho et al., 2015; Mensous et al., 2017; Park et al., 2018; Roma et al., 2018). Likewise, ycf2 may have evolved at a faster rate in the Rhinantheae plastomes.
Chloroplast genomes which contain sufficient informative sites have been proven to be effective in resolving phylogenetic relationships among angiosperms even at lower taxonomic levels (Ma et al., 2014; Carbonell-Caballero et al., 2015; Yang et al., 2016; Dong et al., 2017; Zhang et al., 2017; Zhao et al., 2018). We retrieved the available cp genomes of non-parasitic (autotrophic) and parasitic species in the Orobanchaceae and inferred the phylogeny of Orobanchaceae based on ML and Bayesian methods. Our results were consistent with the results of previous studies based on nuclear and plastid markers (McNeal et al., 2013) as well as 17 cp genomes (Samigullin et al., 2016). Except for the placement of P. cheilanthifolia, all the parasitic species formed a highly supported clade. Unexpectedly, the overall genomic structure of P. cheilanthifolia is more similar to the cp genome of autotrophic species than to that of the closely related Pedicularis species. Thus, high sequence divergence of the P. cheilanthifolia plastome resulted in a discordant phylogenetic position. Previous phylogenetic analyses based on a few cpDNA markers did not support the monophyly of Rhinantheae (Olmstead et al., 2001) which is consistent with our results of four Rhinantheae species where S. americana was not included in Rhinantheae but was sister to Buchnereae. Also, Euphrasia was more closely related to Neobartsia than to Lathraea which is consistent with previous phylogenetic studies in the Rhinantheae tribe (McNeal et al., 2013; Pinto-Carrasco et al., 2017). Our results suggested that all non-parasitic species belonged to the earliest diverging lineages in Orobanchaceae indicating that autotrophy was the ancestral state in this mainly parasitic family. This has also been highlighted by previous studies (Bennett and Mathews, 2006; McNeal et al., 2013). However, to obtain a reliable inference of ancestral states a comprehensive sampling of all taxa in Orobanchaceae is necessary as limited taxon sampling can result in different tree topologies (Leebens-Mack et al., 2005; Eguiluz et al., 2017).
Due to the recent divergence of many Euphrasia species (Gussarova et al., 2008), the commonly used standard DNA barcodes are not variable enough to resolve phylogenetic relationships in Euphrasia which is obvious from our results based on three cpDNA fragments as well as previous phylogenetic studies (Gussarova et al., 2008; Wang et al., 2018a). However, even the complete chloroplast genome might not substantially raise species discriminatory power in evolutionarily young lineages, and very large numbers of characters from the nuclear genome are likely to be required for this task (Ruhsam et al., 2015).
Conclusion
The complete chloroplast genome of E. regelii, which is the first published cp genome in Euphrasia, provides a valuable genomic resource for this important medicinal plant and other Euphrasia species. The structure and gene content of the cp genome are comparable to other hemiparasitic and two photoautotrophic species in Orobanchaceae. No structural rearrangements were detected, however, 10 genes encoding the NAD(P)H dehydrogenase complex were widely pseudogenized but not lost. Coding gene sequence divergence analyses indicated that only three plastid genes were under positive selection. We also identified cpSSRs that could be used for population genetic studies in Euphrasia and whole cp genome comparison of E. regelii with other Orobanchaceae species indicated several variable hotspots, which could be used to develop DNA markers suitable for the discrimination between Euphrasia species, and for the inference of phylogenetic relationships.
Statements
Author contributions
XW and TZ conceived and designed the experiments. JW, WL, YX, HZ, and FX performed the experiments and analyzed the data. TZ, XZ, XW, and MR wrote the manuscript. All authors read and approved the final manuscript.
Funding
This research was supported by the Scientific Research Supporting Project for New Teacher of Xi’an Jiaotong University (Grant Nos. YX1K105 and 1191319802). The Royal Botanic Garden Edinburgh was supported by the Scottish Government’s Rural and Environment Science and Analytical Services Division.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2019.00444/full#supplementary-material
FIGURE S1Phylogenetic relationship inferred from Maximum Likelihood/Bayesian Inference analysis based on the most conserved regions (TMCRs) of the chloroplast genome. The numbers associated with each node are bootstrap support and posterior probability values, respectively. Asterisks indicate support values of 100/1.0.
FIGURE S2Phylogenetic relationships of Euphrasia using cpDNA trnL intron, trnL-trnF, and atpB-rbcL. (A) Phylogenetic tree inferred from ML analysis. (B) Phylogenetic tree inferred from BI analysis. The numbers associated with each node are bootstrap support and posterior probability values.
TABLE S1List of plastome sequences included in the phylogenetic analyses.
TABLE S2Primers used for Sanger re-sequencing.
TABLE S3Models in ML and BI analysis based on different datasets.
TABLE S4Genes encoded in the Euphrasia regelii chloroplast genome.
TABLE S5Forward and Palindromic repeats in the Euphrasia regelii chloroplast genome.
TABLE S6Tandem repeats in the Euphrasia regelii chloroplast genome.
TABLE S7dN/dS ratio between pairwise of four Rhinantheae species protein coding sequences.
Footnotes
References
1
AsafS.KhanA. L.Aaqil KhanM.Muhammad ImranQ.KangS.-M.Al-HosniK.et al (2017a). Comparative analysis of complete plastid genomes from wild soybean (Glycine soja) and nine other Glycine species.PLoS One12:e0182281. 10.1371/journal.pone.0182281
2
AsafS.WaqasM.KhanA. L.KhanM. A.KangS.-M.ImranQ. M.et al (2017b). The complete chloroplast genome of wild rice (Oryza minuta) and its comparison to related species.Front. Plant Sci.8:304. 10.3389/fpls.2017.00304
3
AsanoT.TsudzukiT.TakahashiS.ShimadaH.KadowakiK. (2004). Complete nucleotide sequence of the sugarcane (Saccharum officinarum) chloroplast genome: a comparative analysis of four monocot chloroplast genomes.DNA Res.1193–99. 10.1093/dnares/11.2.93
4
BankevichA.NurkS.AntipovD.GurevichA. A.DvorkinM.KulikovA. S.et al (2012). SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.J. Comput. Biol.19455–477. 10.1089/cmb.2012.0021
5
BarrettC. F.DavisJ. I. (2012). The plastid genome of the mycoheterotrophic Corallorhiza striata (Orchidaceae) is in the relatively early stages of degradation.Am. J. Bot.991513–1523. 10.3732/ajb.1200256
6
BarrettC. F.DavisJ. I.Leebens-MackJ.ConranJ. G.StevensonD. W. (2013). Plastid genomes and deep relationships among the commelinid monocot angiosperms.Cladistics2965–87. 10.1111/j.1096-0031.2012.00418.x
7
BarrettC. F.FreudensteinJ. V.LiJ.Mayfield-JonesD. R.PerezL.PiresJ. C.et al (2014). Investigating the path of plastid genome degradation in an early-transitional clade of heterotrophic orchids, and implications for heterotrophic angiosperms.Mol. Biol. Evol.313095–3112. 10.1093/molbev/msu252
8
BendichA. J. (2004). Circular chloroplast chromosomes: the grand illusion.Plant Cell161661–1666. 10.1105/tpc.160771
9
BennettJ. R.MathewsS. (2006). Phylogeny of the parasitic plant family Orobanchaceae inferred from phytochrome A.Am. J. Bot.931039–1051. 10.3732/ajb.93.7.1039
10
BensonG. (1999). Tandem repeats finder: a program to analyze DNA sequences.Nucleic Acids Res.27:573. 10.1093/nar/27.2.573
11
BiG.MaoY.XingQ.CaoM. (2018). HomBlocks: a multiple-alignment construction pipeline for organelle phylogenomics based on locally collinear block searching.Genomics11018–22. 10.1016/j.ygeno.2017.08.001
12
Carbonell-CaballeroJ.AlonsoR.IbañezV.TerolJ.TalonM.DopazoJ. (2015). A phylogenetic analysis of 34 chloroplast genomes elucidates the relationships between wild and domestic species within the genus Citrus.Mol. Biol. Evol.322015–2035. 10.1093/molbev/msv082
13
CaronH.DumasS.MarqueG.MessierC.BandouE.PetitR. J.et al (2000). Spatial and temporal distribution of chloroplast DNA polymorphism in a tropical tree species.Mol. Ecol.91089–1098. 10.1046/j.1365-294x.2000.00970.x
14
ChoK.-S.YunB.-K.YoonY.-H.HongS.-Y.MekapoguM.KimK.-H.et al (2015). Complete chloroplast genome sequence of tartary buckwheat (Fagopyrum tataricum) and comparative analysis with common buckwheat (F. esculentum).PLoS One10:e0125332. 10.1371/journal.pone.0125332
15
ChoW.-B.LeeD.-H.ChoiI.-S.LeeJ.-H. (2018). The complete chloroplast genome of hemi-parasitic Pedicularis hallaisanensis (Orobanchaceae).Mitochondrial DNA Part B3235–236. 10.1080/23802359.2018.1437820
16
CurciP. L.De PaolaD.DanziD.VendraminG. G.SonnanteG. (2015). Complete chloroplast genome of the multifunctional crop globe artichoke and comparison with other asteraceae.PLoS One10:e0120589. 10.1371/journal.pone.0120589
17
DaniellH.LeeS.-B.GrevichJ.SaskiC.Quesada-VargasT.GudaC.et al (2006). Complete chloroplast genome sequences of Solanum bulbocastanum, Solanum lycopersicum and comparative analyses with other Solanaceae genomes.Theor. Appl. Genet.112:1503. 10.1007/s00122-006-0254-x
18
DarlingA. C. E.MauB.BlattnerF. R.PernaN. T. (2004). Mauve: multiple alignment of conserved genomic sequence with rearrangements.Genome Res.141394–1403. 10.1101/gr.2289704
19
DierckxsensN.MardulynP.SmitsG. (2017). NOVOPlasty: de novo assembly of organelle genomes from whole genome data.Nucleic Acids Res.45:e18.
20
DongW.XuC.ChengT.ZhouS. (2013). Complete chloroplast genome of Sedum sarmentosum and chloroplast genome evolution in Saxifragales.PLoS One8:e77965. 10.1371/journal.pone.0077965
21
DongW.XuC.LiW.XieX.LuY.LiuY.et al (2017). Phylogenetic resolution in Juglans based on complete chloroplast genomes and nuclear DNA sequences.Front. Plant Sci.8:1148. 10.3389/fpls.2017.01148
22
DoyleJ. J. (1987). A rapid DNA isolation procedure for small quantities of fresh leaf tissue.Phytochem. Bull.1911–15.
23
DrescherA.RufS.CalsaT.CarrerH.BockR. (2000). The two largest chloroplast genome-encoded open reading frames of higher plants are essential genes.Plant J.2297–104. 10.1046/j.1365-313x.2000.00722.x
24
EguiluzM.RodriguesN. F.GuzmanF.YuyamaP.MargisR. (2017). The chloroplast genome sequence from Eugenia uniflora, a myrtaceae from neotropics.Plant Syst. Evol.3031199–1212. 10.1007/s00606-017-1431-x
25
FajardoD.SenalikD.AmesM.ZhuH.SteffanS. A.HarbutR.et al (2013). Complete plastid genome sequence of Vaccinium macrocarpon: structure, gene content, and rearrangements revealed by next generation sequencing.Tree Genet. Genomes9489–498. 10.1007/s11295-012-0573-9
26
FraileyD. C.ChaluvadiS. R.VaughnJ. N.CoatneyC. G.BennetzenJ. L. (2018). Gene loss and genome rearrangement in the plastids of five hemiparasites in the family orobanchaceae.BMC Plant Biol.18:30. 10.1186/s12870-018-1249-x
27
FrazerK. A.PachterL.PoliakovA.RubinE. M.DubchakI. (2004). VISTA: computational tools for comparative genomics.Nucleic Acids Res.32W273–W279.
28
FrenchG. C.HollingsworthP. M.SilversideA. J.EnnosR. A. (2008). Genetics, taxonomy and the conservation of British Euphrasia.Conserv. Genet.91547–1562. 10.1007/s10592-007-9494-9
29
GeJ.CaiL.BiG.-Q.ChenG.SunW. (2018). Characterization of the complete chloroplast genomes of Buddleja colvilei and B. sessilifolia: implications for the taxonomy ofBuddleja L.Molecules231248. 10.3390/molecules23061248
30
GrahamS. W.LamV. K. Y.MerckxV. S. F. T. (2017). Plastomes on the edge: the evolutionary breakdown of mycoheterotroph plastid genomes.New Phytol.21448–55. 10.1111/nph.14398
31
GuoX.LiuJ.HaoG.ZhangL.MaoK.WangX.et al (2017). Plastome phylogeny and early diversification of Brassicaceae.BMC Genomics18:176. 10.1186/s12864-017-3555-3
32
GussarovaG.PoppM.VitekE.BrochmannC. (2008). Molecular phylogeny and biogeography of the bipolar Euphrasia (Orobanchaceae): recent radiations in an old genus.Mol. Phylogenet. Evol.48444–460. 10.1016/j.ympev.2008.05.002
33
HeL.QianJ.LiX.SunZ.XuX.ChenS. (2017). Complete chloroplast genome of medicinal plant Lonicera japonica: genome rearrangement, intron gain and loss, and implications for phylogenetic studies.Molecules22:249. 10.3390/molecules22020249
34
HongD.YangH.JinC.NoelH. (1998). Scrophulariaceae Flora of China.Beijing: Science Press.
35
HuY.WoesteK. E.ZhaoP. (2016). Completion of the chloroplast genomes of five chinese Juglans and their contribution to chloroplast phylogeny.Front. Plant Sci.7: 1955. 10.3389/fpls.2016.01955
36
KatohK.StandleyD. M. (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability.Mol. Biol. Evol.30772–780. 10.1093/molbev/mst010
37
KikuchiS.BédardJ.HiranoM.HirabayashiY.OishiM.ImaiM.et al (2013). Uncovering the protein translocon at the chloroplast inner envelope membrane.Science339571–574. 10.1126/science.1229262
38
KurtzS.ChoudhuriJ. V.OhlebuschE.SchleiermacherC.StoyeJ.GiegerichR. (2001). REPuter: the manifold applications of repeat analysis on a genomic scale.Nucleic Acids Res.294633–4642. 10.1093/nar/29.22.4633
39
LangmeadB.SalzbergS. L. (2012). Fast gapped-read alignment with Bowtie 2.Nat. Methods9357–359. 10.1038/nmeth.1923
40
Leebens-MackJ.RaubesonL. A.CuiL.KuehlJ. V.FourcadeM. H.ChumleyT. W.et al (2005). Identifying the basal angiosperm node in chloroplast genome phylogenies: sampling one’s way out of the felsenstein zone.Mol. Biol. Evol.221948–1963. 10.1093/molbev/msi191
41
LiL.WangH. (2003). Studies on the chemical constituents from the water-soluble part of Euphrasia regelii.China J. Chin. Mater. Med.28733–734.
42
LiX.LiY.ZangM.LiM.FangY. (2018). Complete chloroplast genome sequence and phylogenetic analysis of Quercus acutissima.Int. J. Mol. Sci.19:2443. 10.3390/ijms19082443
43
LiX.YangY.HenryR. J.RossettoM.WangY.ChenS. (2015). Plant DNA barcoding: from gene to genome.Biol. Rev.90157–166. 10.1111/brv.12104
44
LohseM.DrechselO.KahlauS.BockR. (2013). OrganellarGenomeDRAW–a suite of tools for generating physical maps of plastid and mitochondrial genomes and visualizing expression data sets.Nucleic Acids Res.41W575–W581.
45
LuS.HouM.DuF. K.LiJ.YinK. (2016). Complete chloroplast genome of the oriental white oak: Quercus aliena blume.Mitochondrial DNA Part A272802–2804.
46
MaP.-F.ZhangY.-X.ZengC.-X.GuoZ.-H.LiD.-Z. (2014). Chloroplast phylogenomic analyses resolve deep-level relationships of an intractable bamboo tribe Arundinarieae (Poaceae).Syst. Biol.63933–950. 10.1093/sysbio/syu054
47
McNealJ. R.BennettJ. R.WolfeA. D.MathewsS. (2013). Phylogeny and origins of holoparasitism in Orobanchaceae.Am. J. Bot.100971–983. 10.3732/ajb.1200448
48
MensousM.Van De PaerC.ManziS.BouchezO.Baâli-CherifD.BesnardG. (2017). Diversity and evolution of plastomes in Saharan mimosoids: potential use for phylogenetic and population genetic studies.Tree Genet. Genomes13:48.
49
MouraM.DiasE. F.Belo MacielM. G. (2018). Conservation genetics of the highly endangered Azorean endemics Euphrasia azorica and Euphrasia grandiflora using new SSR data.Conserv. Genet.191211–1222. 10.1007/s10592-018-1089-0
50
NguyenL.-T.SchmidtH. A.Von HaeselerA.MinhB. Q. (2015). IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies.Mol. Biol. Evol.32268–274. 10.1093/molbev/msu300
51
OlmsteadR. G.PamphilisC. W.WolfeA. D.YoungN. D.ElisonsW. J.ReevesP. A. (2001). Disintegration of the Scrophulariaceae.Am. J. Bot.88348–361. 10.2307/2657024
52
PalmerJ. D. (1985). Comparative organization of chloroplast genomes.Annu. Rev. Genet.19325–354. 10.1146/annurev.genet.19.1.325
53
ParkI.YangS.KimW. J.NohP.LeeH. O.MoonB. C. (2018). The complete chloroplast genomes of six Ipomoea species and indel marker development for the discrimination of authentic Pharbitidis Semen (Seeds of I. nil or I. purpurea).Front. Plant Sci.9:965. 10.3389/fpls.2018.00965
54
PatelR. K.JainM. (2012). NGS QC Toolkit: a toolkit for quality control of next generation sequencing data.PLoS One7:e30619. 10.1371/journal.pone.0030619
55
Pinto-CarrascoD.ScheunertA.HeublG.RicoE.Martínez-OrtegaM. M. (2017). Unravelling the phylogeny of the root-hemiparasitic genus Odontites (tribe Rhinantheae, Orobanchaceae): evidence for five main lineages. Taxon66886–908. 10.12705/664.6
56
PosadaD.CrandallK. A. (1998). Modeltest: testing the model of DNA substitution.Bioinformatics14817–818. 10.1093/bioinformatics/14.9.817
57
ProvanJ.PowellW.HollingsworthP. M. (2001). Chloroplast microsatellites: new tools for studies in plant ecology and evolution.Trends Ecol. Evol.16142–147. 10.1016/s0169-5347(00)02097-8
58
RomaL.CozzolinoS.SchlüterP. M.ScopeceG.CafassoD. (2018). The complete plastid genomes of Ophrys iricolor and O. sphegodes (Orchidaceae) and comparative analyses with other orchids.PLoS One13:e0204174. 10.1371/journal.pone.0204174
59
RonquistF.TeslenkoM.Van Der MarkP.AyresD. L.DarlingA.HöhnaS.et al (2012). MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space.Syst. Boil.61539–542. 10.1093/sysbio/sys029
60
Rousseau-GueutinM.BellotS.MartinG. E.BoutteJ.ChelaifaH.LimaO.et al (2015). The chloroplast genome of the hexaploid Spartina maritima (Poaceae, Chloridoideae): comparative analyses and molecular dating.Mol. Phylogenet. Evol.935–16. 10.1016/j.ympev.2015.06.013
61
RozasJ.Ferrer-MataA.Sánchez-DelbarrioJ. C.Guirao-RicoS.LibradoP.Ramos-OnsinsS. E.et al (2017). DnaSP 6: DNA sequence polymorphism analysis of large data sets.Mol. Biol. Evol.343299–3302. 10.1093/molbev/msx248
62
RuhsamM.ClarkA.FingerA.WulffA. S.MillR. R.ThomasP. I.et al (2016). Hidden in plain view: cryptic diversity in the emblematic Araucaria of New Caledonia.Am. J. Bot.103888–898. 10.3732/ajb.1500487
63
RuhsamM.RaiH. S.MathewsS.RossT. G.GrahamS. W.RaubesonL. A.et al (2015). Does complete plastid genome sequencing improve species discrimination and phylogenetic resolution in Araucaria?Mol. Ecol. Resour.151067–1078. 10.1111/1755-0998.12375
64
SaarelaJ. M.BurkeS. V.WysockiW. P.BarrettM. D.ClarkL. G.CraineJ. M.et al (2018). A 250 plastome phylogeny of the grass family (Poaceae): topological support under different data partitions.PeerJ6:e4299. 10.7717/peerj.4299
65
SamigullinT. H.LogachevaM. D.PeninA. A.Vallejo-RomanC. M. (2016). Complete plastid genome of the recent holoparasite Lathraea squamaria reveals earliest stages of plastome reduction in Orobanchaceae.PLoS One11:0150718. 10.1371/journal.pone.0150718
66
SchirmerE. C.GloverJ. R.SingerM. A.LindquistS. (1996). HSP100/Clp proteins: a common mechanism explains diverse functions.Trends Biochem. Sci.21289–296. 10.1016/s0968-0004(96)10038-4
67
SecretariatG. (2017). GBIF Backbone Taxonomy Checklist dataset. Available at: https://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c (accessed October 16, 2018).
68
ShikanaiT.ShimizuK.UedaK.NishimuraY.KuroiwaT.HashimotoT. (2001). The chloroplast clpP gene, encoding a proteolytic subunit of ATP-Dependent protease, is indispensable for chloroplast development in tobacco.Plant Cell Physiol.42264–273. 10.1093/pcp/pce031
69
ShuyaC.ShengdaQ.XingguoC.ZhideH. (2004). Identification and determination of effective components in Euphrasia regelii by capillary zone electrophoresis.Biomed. Chromatogr.18857–861. 10.1002/bmc.401
70
TalaveraG.CastresanaJ. (2007). Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments.Syst. Biol.56564–577. 10.1080/10635150701472164
71
TamuraK.StecherG.PetersonD.FilipskiA.KumarS. (2013). MEGA6: molecular evolutionary genetics analysis version 6.0.Mol. Biol. Evol.302725–2729. 10.1093/molbev/mst197
72
TechenN.ParveenI.PanZ.KhanI. A. (2014). DNA barcoding of medicinal plant material for identification.Curr. Opin. Biotechnol.25103–110. 10.1016/j.copbio.2013.09.010
73
ThielT.MichalekW.VarshneyR.GranerA. (2003). Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.).Theor. Appl. Genet.106411–422. 10.1007/s00122-002-1031-0
74
TwyfordA. D.FrachonN.WongE. L. Y.MetherellC.BrownM. R. (2019). Life history evolution and phenotypic plasticity in parasitic eyebrights (Euphrasia, Orobanchaceae).bioRxiv [Preprint]. 10.1101/362400
75
VitekE. (1998). Are the taxonomic concepts of agamospermous genera useful for autogamous groups – A critical discussion using the example of Euphrasia (Scrophulariaceae).Folia Geobot.33349–352. 10.1007/bf03216211
76
WangL.WuyunT.-N.DuH.WangD.CaoD. (2016). Complete chloroplast genome sequences of Eucommia ulmoides: genome structure and evolution.Tree Genet. Genomes121–15.
77
WangX.GussarovaG.RuhsamM.HollingsworthP. M.De VereN.MetherellC.et al (2018a). DNA barcoding a taxonomically complex hemiparasitic genus reveals deep divergence between ploidy levels but lack of species-level resolution.AoB Plants10:ly026.
78
WangX.ZhouT.BaiG.ZhaoY. (2018b). Complete chloroplast genome sequence of Fagopyrum dibotrys: genome features, comparative analysis and phylogenetic relationships.Sci. Rep.8:12379.
79
WengM.-L.BlazierJ. C.GovinduM.JansenR. K. (2013). Reconstruction of the ancestral plastid genome in Geraniaceae reveals a correlation between genome rearrangements, repeats and nucleotide substitution rates.Mol. Biol. Evol.31645–659. 10.1093/molbev/mst257
80
WengM.-L.RuhlmanT. A.JansenR. K. (2017). Expansion of inverted repeat does not decrease substitution rates in Pelargonium plastid genomes.New Phytol.214842–851. 10.1111/nph.14375
81
WickeS.MullerK. F.De PamphilisC. W.QuandtD.WickettN. J.ZhangY.et al (2013). Mechanisms of functional and physical genome reduction in photosynthetic and nonphotosynthetic parasitic plants of the broomrape family.Plant Cell253711–3725. 10.1105/tpc.113.113373
82
WickeS.MüllerK. F.DepamphilisC. W.QuandtD.BellotS.SchneeweissG. M. (2016). Mechanistic model of evolutionary rate variation en route to a nonphotosynthetic lifestyle in plants.Proc. Natl. Acad. Sci. U.S.A.1139045–9050. 10.1073/pnas.1607576113
83
WickeS.NaumannJ. (2018). “Chapter eleven – Molecular evolution of plastid genomes in parasitic flowering plants,” in Advances in Botanical Research, edsChawS.-M.JansenR. K. (Cambridge, MA: Academic Press), 315–347. 10.1016/bs.abr.2017.11.014
84
WickeS.SchneeweissG. M.DepamphilisC. W.MüllerK. F.QuandtD. (2011). The evolution of the plastid chromosome in land plants: gene content, gene order, gene function.Plant Mol. Biol.76273–297. 10.1007/s11103-011-9762-4
85
WuM. J.HuangS. F.HuangT. C.LeeP. F.LinT. P. (2005). Evolution of the Euphrasia transmorrisonensis complex (Orobanchaceae) in alpine areas of Taiwan.J. Biogeogr.321921–1929. 10.1111/j.1365-2699.2005.01327.x
86
WymanS. K.JansenR. K.BooreJ. L. (2004). Automatic annotation of organellar genomes with DOGMA.Bioinformatics203252–3255. 10.1093/bioinformatics/bth352
87
XuJ.-H.LiuQ.HuW.WangT.XueQ.MessingJ. (2015). Dynamics of chloroplast genomes in green plants.Genomics106221–231. 10.1016/j.ygeno.2015.07.004
88
XueJ.WangS.ZhouS.-L. (2012). Polymorphic chloroplast microsatellite loci in Nelumbo (Nelumbonaceae).Am. J. Bot.99e240–e244. 10.3732/ajb.1100547
89
YangA.-H.ZhangJ.-J.YaoX.-H.HuangH.-W. (2011). Chloroplast microsatellite markers in Liriodendron tulipifera (Magnoliaceae) and cross-species amplification in L. chinense.Am. J. Bot.98e123–e126. 10.3732/ajb.1000532
90
YangY.ZhouT.DuanD.YangJ.FengL.ZhaoG. (2016). Comparative analysis of the complete chloroplast Genomes of five Quercus species.Front. Plant Sci.7:959. 10.3389/fpls.2016.00959
91
YangZ. (2007). PAML 4: phylogenetic analysis by maximum likelihood.Mol. Biol. Evol.241586–1591. 10.1093/molbev/msm088
92
YaoX.TanY.-H.LiuY.-Y.SongY.YangJ.-B.CorlettR. T. (2016). Chloroplast genome structure in Ilex (Aquifoliaceae).Sci. Rep.6:28559.
93
YeW.-Q.YapZ.-Y.LiP.ComesH. P.QiuY.-X. (2018). Plastome organization, genome-based phylogeny and evolution of plastid genes in Podophylloideae (Berberidaceae).Mol. Phylogenet. Evol.127978–987. 10.1016/j.ympev.2018.07.001
94
YinK.ZhangY.LiY.DuF. (2018). Different natural selection pressures on the atpF gene in evergreen sclerophyllous and deciduous oak species: evidence from comparative analysis of the complete chloroplast genome of Quercus aquifolioides with other oak species.Int. J. Mol. Sci.19:1042. 10.3390/ijms19041042
95
ZengS.ZhouT.HanK.YangY.ZhaoJ.LiuZ.-L. (2017). The complete chloroplast genome sequences of six Rehmannia species.Genes8:103. 10.3390/genes8030103
96
ZhangX.ZhouT.KanwalN.ZhaoY.BaiG.ZhaoG. (2017). Completion of eight Gynostemma BL. (Cucurbitaceae) chloroplast genomes: characterization, comparative analysis, and phylogenetic relationships.Front. Plant Sci.8:1583. 10.3389/fpls.2017.01583
97
ZhaoM.-L.SongY.NiJ.YaoX.TanY.-H.XuZ.-F. (2018). Comparative chloroplast genomics and phylogenetics of nine Lindera species (Lauraceae).Sci. Rep.8:8844.
98
ZhouT.ChenC.WeiY.ChangY.BaiG.LiZ.et al (2016). Comparative transcriptome and chloroplast genome analyses of two related Dipteronia Species.Front. Plant Sci.7:1512. 10.3389/fpls.2016.01512
99
ZhouT.WangJ.JiaY.LiW.XuF.WangX. (2018). Comparative chloroplast genome analyses of species in Gentiana section Cruciata (Gentianaceae) and the development of authentication markers.Int. J. Mol. Sci.19:1962. 10.3390/ijms19071962
Summary
Keywords
Euphrasia regelii, hemiparasite, chloroplast genome, pseudogenization, phylogenetic analyses
Citation
Zhou T, Ruhsam M, Wang J, Zhu H, Li W, Zhang X, Xu Y, Xu F and Wang X (2019) The Complete Chloroplast Genome of Euphrasia regelii, Pseudogenization of ndh Genes and the Phylogenetic Relationships Within Orobanchaceae. Front. Genet. 10:444. doi: 10.3389/fgene.2019.00444
Received
20 November 2018
Accepted
29 April 2019
Published
14 May 2019
Volume
10 - 2019
Edited by
Fulvio Cruciani, Sapienza University of Rome, Italy
Reviewed by
Quanjun Hu, Sichuan University, China; Yuguo Wang, Fudan University, China; Julia Naumann, Dresden University of Technology, Germany
Updates
Copyright
© 2019 Zhou, Ruhsam, Wang, Zhu, Li, Zhang, Xu, Xu and Wang.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Xumei Wang, wangxumei@mail.xjtu.edu.cn
This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.