The Complete Chloroplast Genome Sequences of Three Veroniceae Species (Plantaginaceae): Comparative Analysis and Highly Divergent Regions

Previous studies of Veronica and related genera were weakly supported by molecular and paraphyletic taxa. Here, we report the complete chloroplast genome sequence of Veronica nakaiana and the related species Veronica persica and Veronicastrum sibiricum. The chloroplast genome length of V. nakaiana, V. persica, and V. sibiricum ranged from 150,198 bp to 152,930 bp. A total of 112 genes comprising 79 protein coding genes, 29 tRNA genes, and 4 rRNA genes were observed in three chloroplast genomes. The total number of SSRs was 48, 51, and 53 in V. nakaiana, V. persica, and V. sibiricum, respectively. Two SSRs (10 bp of AT and 12 bp of AATA) were observed in the same regions (rpoC2 and ndhD) in three chloroplast genomes. A comparison of coding genes and non-coding regions between V. nakaiana and V. persica revealed divergent sites, with the greatest variation occurring petD-rpoA region. The complete chloroplast genome sequence information regarding the three Veroniceae will be helpful for elucidating Veroniceae phylogenetic relationships.


INTRODUCTION
Chloroplast (cp) are photosynthetic organelles that provide energy to green plants (Douglas, 1990). Chloroplast genomes are valuable sources of phylogenetic information because of their relatively stable genome structure and higher evolutionary rate relative to mitochondrial genomes. Chloroplast genomes consist of a large inverted repeat (IR) separated by a large single-copy (LSC) region and a small single-copy (SSC) region. Approximately 100-130 genes are located along the circular genome structure of chloroplasts. These genes exhibit a highly conserved gene order and contests (Wicke et al., 2011) and typically encode ∼79 proteins, ∼30 transfer RNAs and four ribosomal RNAs. However, some parasitic plants contain fewer genes than photosynthetic plants (Funk et al., 2007;Wicke et al., 2013). Cp genome sequences are useful genetic markers for DNA barcoding, transplastomic studies and evolutionary studies from the population level, as well as for phylogenetic relationships (Bock and Khan, 2004;Jansen et al., 2007).
Here, we report the first chloroplast genome of three Veroniceae (Veronica nakaiana, subgenus Pseudolysimachium; Veronica persica, subgenus Pocilla; Veronicastrim sibiricum). The specific goals of the present study were to: (1) present the complete chloroplast genome sequence of three Veroniceae, (2) evaluate the phylogenetic position of 78 coding genes in three Veroniceae plants, (3) compare the coding and noncoding regions of three Veroniceae plants and suggest effective regions of the chloroplast genome for phylogenic analyses in Veroniceae.

Plant Samples and Sequencing
This research was approved by the Korean National Arboretum. V. nakaiana, V. persica, and V. sibiricum were obtained from the living collections at the greenhouse of Yeungnam University. Total DNA was extracted using a DNeasy Plant Mini Kit (Qiagen Inc., Valencia, CA, USA) and quantified using a HiGenTM Gel and PCR Purification System (Biofact Inc., Daejeon, Korea). Genomic DNA was sequenced using Illumina Miseq (Illumina Inc., San Diego, CA). V. nakaiana, V. persica, and V. sibiricum were sequenced to produce 3,205,654-4,475,346 raw reads and 301 bp were obtained from them. These paired-end reads were aligned with Sesamum indicum (JN637766.2), and 207,913 to 1,450,476 reads were mapped to the reference genomes. The fold coverage of V. nakaiana, V. persica, and V. sibiricum was 1223, 1911, and 404, respectively. The genome coverage was estimated using the CLC Genomics Workbench v7.0.4 software (CLC-bio, Aarhus, Denmark).

Annotation, Genome Mapping, and Sequence Analysis
DOGMA (Wyman et al., 2004) was used to annotate the V. nakaiana, V. persica, and V. sibiricum. Initial annotation, putative starts, stops, and intron positions were determined by comparison with homologous genes in other cp genomes. tRNA genes were annotated using DOGMA and tRNAscan-SE (Schattner et al., 2005). To compare the structure and gene content within the three Veroniceae, plants were aligned using MAFFT (Katoh et al., 2002). A circular cp genome map was drawn using the OGDRAW program (Lohse et al., 2007).

Phylogenetic Analysis
A total of 78 coding genes of 18 Lamiles plants were compiled into a single file of 71,314 bp and aligned with MAFFT (Katoh et al., 2002). Seventeen Lamiales (including V. nakaiana, V. persica, and V. sibiricum) were selected as the in-groups and Buxus was included as the outgroup (Supplementary Table  1). Maximum likelihood (ML) analyses were performed using RAxML v7.4.2 with 1000 bootstrap replicates and the GTR+I+G model (Stamatakis, 2006).

Divergence Hotspot Identification
The three completed chloroplast genome sequences (V. nakaiana, V. persica, and V. sibiricum) were aligned using MAFFT (Katoh et al., 2002). To analyze nucleotide diversity (Pi), the total number of mutations (Eta) and average number of nucleotide differences (K) were determined using DnaSP (Librado and Rozas, 2009).

RESULT Genome Organization of Three Veroniceae
The total genome size of the Veroniceae (Figure 1) (Figure 1, Table 1). Among the tree species, the cp genomes (LSC, SSC, and IR regions) of V. sibiricum are larger than that of the other species, and V. persica is smallest ( Table 1). Both contain 112 different genes, including 79 proteins (nine large ribosomal subunits, 12 small ribosomal subunits, four DNAdependent RNA polymerases, one translation initiation factor, 45 genes encoding photosynthesis-related proteins and eight genes encoding other proteins), 29 tRNA genes and 4 rRNA genes were annotated (Figure 1, Table 1) in three Veroniceae chloroplast genomes.
The overall AT content was 61.7-62.1%, indicating nearly identical levels among the three complete Veroniceae cp genomes. The AT content was 63.5-64, 62.4-68.3, and 56-56.8% of the genome for the LSC, SSC, and IR regions, respectively ( Table 1).

IR Expansion and Contraction
The IR/LSC and IR/SSC borders of the Veroniceae cp genomes were compared (Figure 2). In the V. nakaiana chloroplast genome, the IR extended into the rps19 gene, resulting in a short rps19 pseudogene of 3 bp at the IR/LSC border. However, the rps19 gene of V. persica and V. sibiricum was separated from the LSC/IRb border by 3-47 bp and not duplicated in the IRa/LSC border. The position of ycf1 in the IR regions varied from 1255 to 1323 bp. The ndhF gene and ycf1 gene overlapped in the IRb/SSC border (72 bp in V. nakaiana, 72 bp in V. persica, and 108 bp in V. sibiricum). The trnH gene of the three Veroniceae species was located in the LSC region (separated from the IR/LSC border by 0-7 bp).

SSR and Repeat Analysis
We used REPuter to analyze the repeat sequence of three Veroniceae cp genomes and found forward repeats, palindrome repeats and reverse repeats of at least 30 bp long per repeat unit with a sequence identity of ≥90% (Figure 3). V. nakaiana contained 15 forward repeats, 18 palindrome repeats, and 2 reverse repeats (Supplementary Table 2). Overall, 33 repeats were 30-40 bp long, with two repeats 41 long. V. persica contained 16 forward repeats and 19 palindrome repeats. Of these, 32 repeats were 30-40 bp long, two repeats were 41 bp long, and one repeat was 42 bp long. V. sibiricum contained 22 forward repeats and 21 palindrome repeats. Thirty-two repeats were 30-40 bp long, seven repeats were 40-50 bp long, and four repeats were 56 bp long (Supplementary Table 2). Most of these repeats had lengths between 30 and 41 bp. A total of 113 repeats were detected in these cp genomes, 46% (53 repeats) of which were direct repeats, 51% (58 repeats) were palindromic repeats, and 3% (2 repeats) reverse repeats. The five repeats shared among V. nakaiana, V. persica, and V. sibiricum were as follows: a 30-bp sequence in the rrn4.5-rrn5 spacer, psaB, psaA, ycf2, and a 41 bp repeat that occurred in the rps12-trnV-GAC intergenic spacer. Four repeats were only shared between V. nakaiana and V. sibiricum: a 30 bp direct repeat in the psbI-trnS-GCU intergenic spacer and trnS-GGA tRNA, and a 30 bp palindrome repeat in the accD-psaI intergenic spacer and rpl16 intron. SSRs are repeated DNA sequences consisting of tandem repeats 1-10 bp in length per unit distributed throughout the genome (Figure 3B). SSRs are highly polymorphic and therefore useful as molecular markers and for population genetics (Powell et al., 1995;Zhang et al., 2012). We detected SSRs longer than 10 bp in three chloroplast genomes. Additionally, the total number of SSRs was 48 in V. nakaiana, 51 in V. persica, and 53 in V. sibiricum (Supplementary Table 3). The majority of SSRs in all species are A/T mononucleotides. Chloroplast genome SSRs were composed of adenine or thymine repeats, and rarely contained tandem guanine (G) or cytosine (C) repeats (Kuamg et al., 2011;Qian et al., 2013). Most SSRs were located in intergenic regions (69%), whereas 14 and 17% were located in introns and genes, respectively. Overall, 25 SSRs were located in the genes in three Veroniceae. Among these, three (rpoA, rpoC2, ndhD) were shared by all three Veroniceae species.

Phylogenetic Analyses of 78 Coding Genes in Chloroplast Genome
Phylogenetic analysis of coding genes was not resolved in previous studies of Lamiales plants. Previous phylogenetic studies of Veroniceae have focused on non-coding regions such as rapidly evolving chloroplast DNA (Rahmanzadeh et al., 2005;Schäferhoff et al., 2010). Our analyses of the extended   data with much wider coding genes in Lamiales support a monophyletic family. The monophyly of the Lamiales families, its circumscription is clearly revealed. The position of Veroniceae within the Plantaginaceae corresponded with that reported by Schäferhoff et al. (2010).
In this study, maximum likelihood phylogenetic analysis of 78 protein coding genes was conducted and 18 taxa were aligned (Figure 4). This alignment was used for analyses of 71,314 bp. The Asterids clade was monophyletic and well supported by the bootstrap value (100%). Within the order, two well-supported clades, Lamiales and Solanales, were distinguished. Veroniceae was highly supported (100%) and a sister to Scrophulariaceae, Pedaliaceae, and Lamiaceae.

Divergence Sequence Hotspots in Veroniceae
The coding genes, non-coding regions and intron regions were compared among the three Veroniceae species divergence hotspots. We generated more 200 bp of 114 regions (coding genes, non-coding regions and intron regions) from three Veroniceaea species and the nucleotide variability (Pi) values calculated with the DnaSP 5.0 software.

The Chloroplast Genome of Veroniceae
Recently, some taxonomic studies have used the chloroplast genome to evaluate the relationship of related species. For example, the chloroplast genome of two species of Machilus (Song et al., 2015) and four species of Tilia (Cai et al., 2015) were utilized to evaluate the phylogenomics and relationship within the family. Comparative analyses of three members of the Veroniceae chloroplast genome also showed highly conserved structures and genes. The size of V. nakaiana, V. persica, and V. sibiricum ranged from 150,198 to 152,930 bp. Three Veroniceae contained the same coding genes, tRNAs and rRNAs. However, the start codon of the rps19 gene in V. sibiricum was changed to GTG. Similar findings have been detected in Cymbidium (Yang et al., 2013).
Land plants have a highly conserved chloroplast genome, but four junctions with changed genome sizes (Plunkett and Downie, 2000;Hansen et al., 2007;Qian et al., 2013). For example, Geraniaceae (Blazier et al., 2011) was found to lack a copy of the IR region, Tetracentron (Sun et al., 2013) showed expansion/contraction of the IR region, subfamily Apioideae (Plunkett and Downie, 2000) showed variations in J LB (LSC/IRb region) and Elaeagnaceae contained trnH gene duplication in the IR region (Choi et al., 2015).
We compared three Veroniceae species, among which V. nakaiana was unique in that its rps19 gene was duplicated and J LB (LSC/IRb) and J LA (LSC/IRa) differences occurred (Figure 2). The rpl2 genes of V. nakaiana and V. sibiricum were separated by >50 bp at the J LB , whereas that in V. persica was separated by 21 bp. Three types of J LB based on expansion/contraction have been characterized in Veroniceae.

Divergence Region of Veroniceae Plastid Genome
The ITS region and cpDNA non-coding regions have been widely used to investigate taxonomy and molecular phylogeny at the interspecific level (Taberlet et al., 1991;Baldwin, 1992). Recently, studies of the chloroplast genome showed divergent sequences in plants (Parks et al., 2009;Yang et al., 2013). In Veroniceae, Albach Chase, 2001, 2004;Albach et al., 2004a,b,c;Albach and Meudt, 2010) tested the ITS and CYC2 of nuclear ribosomal DNA and some chloroplast DNA (trnL-F, trnL, rps16, rpoB-trnC, and trnH-psbA). However, the relationships among the 12 genera in Veronica are not well supported.
Here, we showed the eight DNA variable regions that could be used as molecular markers in Veroniceae and Veronica (Figure 5, Table 2 and Supplementary Table 4). The results presented here will be helpful to systematics of Veroniceae and evaluating the relationships among the 12 subgenera of Veronica.

ACKNOWLEDGMENTS
This work supported by grants from Scientific Research (KNA1-2-13, 14-2) of the Korea National Arboretum.