Comparative analyses of Linderniaceae plastomes, with implications for its phylogeny and evolution

Introduction The recently established Linderniaceae, separated from the traditionally defined Scrophulariaceae, is a taxonomically complicated family. Although previous phylogenetic studies based on a few short DNA markers have made great contributions to the taxonomy of Linderniaceae, limited sampling and low resolution of the phylogenetic tree have failed to resolve controversies between some generic circumscriptions. The plastid genome exhibits a powerful ability to solve phylogenetic relationships ranging from shallow to deep taxonomic levels. To date, no plastid phylogenomic studies have been carried out in Linderniaceae. Methods In this study, we newly sequenced 26 plastid genomes of Linderniaceae, including eight genera and 25 species, to explore the phylogenetic relationships and genome evolution of the family through plastid phylogenomic and comparative genomic analyses. Results The plastid genome size of Linderniaceae ranged from 152,386 bp to 154,402 bp, exhibiting a typical quartile structure. All plastomes encoded 114 unique genes, comprising 80 protein-coding genes, 30 tRNA genes, and four rRNA genes. The inverted repeat regions were more conserved compared with the single-copy regions. A total of 1803 microsatellites and 1909 long sequence repeats were identified, and five hypervariable regions (petN-psbM, rps16-trnQ, rpl32-trnL, rpl32, and ycf1) were screened out. Most protein-coding genes were relatively conserved, with only the ycf2 gene found under positive selection in a few species. Phylogenomic analyses confirmed that Linderniaceae was a distinctive lineage and revealed that the presently circumscribed Vandellia and Torenia were non-monophyletic. Discussion Comparative analyses showed the Linderniaceae plastomes were highly conservative in terms of structure, gene order, and gene content. Combining morphological and molecular evidence, we supported the newly established Yamazakia separating from Vandellia and the monotypic Picria as a separate genus. These findings provide further evidence to recognize the phylogenetic relationships among Linderniaceae and new insights into the evolution of the plastid genomes.

Historically, diagnostic characteristics of genera of the Linderniaceae have been controversial among different taxonomists, which results in ambiguous generic boundaries.For example, Bentham (1846) considered Lindernia and the other three closely related genera (Bonnaya, Ilysanthes, and Vandellia) as four distinct genera.Based on the number of fertile stamens, Urban (1884) and Wettstein (1891) included Bonnaya with two fertile stamens into Ilysanthes and Vandellia with four fertile stamens into Lindernia.However, Haines (1922) emphasized the diagnostic value of the leaf vein, reducing Bonnaya with pinnate vein to Vandellia and Ilysanthes with palmate vein to Lindernia.Pennell (1935) suggested that the diagnostic characteristics currently used were too weak and artificial, and included the four controversial genera into a broadly circumscribed genus Lindernia sensu lato (s.l.).Synapomorphies of the Lindernia sensu Pennell (1935) comprise the remarkably uniform corolla, curved anterior filaments, and similar septicidal dehiscence of the capsule.The treatment was followed by the majority of botanists (Philcox, 1968;Yamazaki, 1977;Yamazaki, 1985;Fischer, 1995;Lewis, 2000;Fischer, 2004;Rahmanzadeh et al., 2005).However, the Lindernia s.l.seems not to be a natural group as some species incorporated into the Lindernia s.l.have the same characteristics as calyx, staminal appendages, and floral disc of another genus Torenia (Hara, 1943;Pennell, 1943;Philcox, 1968).On the basis of the morphology of leaf margin and seed, Yamazaki (1954;1955) divided Lindernia s.l.into Lindernia s.s.(entire or slightly dentate leaf margin and non-alveolated seed) and Vandellia (serrate leaf margin and bothrospermous seed), although the revision has not been widely accepted.On the other hand, there were complicated relationships between Torenia and its relatives.The genus Craterostigma mainly distributed in Africa was reduced to a section of Torenia by bearing the winged calyx (Bentham, 1846).Wettstein (1891) supported Craterostigma as a separate genus based on its rosulate habit, but Craterostigma species often were described as Torenia in subsequent investigations (Schlechter, 1924).Since traditional classifications failed to provide a stable taxonomic framework at the generic level, evidence from the molecular level has been given weight.
When establishing Linderniaceae, Rahmanzadeh et al. (2005) included 13 genera of the tribe Lindernieae sensu Fischer (2004) within this family, of which the type genus Lindernia is the largest genus comprising ca. 100 species.Using two DNA markers (trnK and matK), Fischer et al. (2013) revealed the polyphyly of Lindernia and elaborated the first generic classification for Linderniaceae.In this study, 17 genera were recognized, of which Bonnaya and Vandellia were reinstated from Lindernia s.l., and formerly wellcircumscribed Torenia and Craterostigma were expanded to include some members of Lindernia s.l.(Fischer et al., 2013).The narrowly circumscribed Lindernia was kept according to the delimitation of Yamazaki (1955), including Bryodes, Ilysanthes, and Psammetes.In Fischer et al. (2013) treatment, the species not represented by molecular data were assigned to genera according to morphology, of which most taxa with pinnate veins and deeply lobed calyx from Asia and America were transferred to Vandellia.On the basis of expanded phylogenetic sampling and four DNA markers (rps16, matK, trnL-F, and RPB2), however, Liang (2017) revealed the polyphyly of Vandellia sensu Fischer et al. (2013) as other longaccepted genera (e.g., Torenia, Craterostigma, Chamaegigas, and Linderniella) embedded.Based on phylogenetic analyses inferred from the matK marker, Biffin et al. (2018) narrowed Vandellia sensu Fischer et al. (2013) and formally established the new genus Yamazakia separating from Vandellia.Although molecular phylogenetic studies have made great contributions to the circumscription of Linderniaceae, there are still considerable controversies concerning generic circumscription, predominantly due to the limited sampling and the low resolution of phylogenetic trees inferred from a few short molecular markers.
With the rapid development of Next-generation sequencing (NGS) technology, more and more organelle genomes were sequenced and submitted to GenBank.Compared to nuclear and mitochondrial genomes, the plastid genome has been widely applied to resolve phylogenetic relationships at different taxonomic levels, owing to its relatively small genome and slow mutation rate (Parks et al., 2009;Wicke et al., 2011;Wei et al., 2017;Wen et al., 2021;Zhao et al., 2021;Dong et al., 2022).The complete plastome exhibits a powerful ability to solve backbone phylogeny in contrast with universal DNA markers (Alwadani et al., 2019;Chen et al., 2022;Peng et al., 2023).Moreover, comparative analyses of the plastid genomes not only provide plentiful information for phylogenetic relationships but also deepen our understanding of genome evolution (Chu et al., 2022;Ogoma et al., 2022;Wan et al., 2023).However, no comparative analyses related to Linderniaceae plastomes have been conducted so far, only general reports about genome size and gene contents (Cheng et al., 2019;Chen et al., 2021).In this study, for the first time, we attempted to explore the phylogenetic relationships and genome evolution of Linderniaceae using a large number of complete plastomes.Our goals were to (1) investigate the structure and variation of Linderniaceae plastomes, (2) provide insights into the evolution of Linderniaceae plastomes, and (3) infer the phylogeny of Linderniaceae based on plastid genomic data.

Plant material, DNA extraction, and sequencing
In total, we included 31 samples of Linderniaceae representing eight genera and 28 species, of which 26 individuals were newly sequenced and the other five species were downloaded from GenBank (Table 1).Scientific names of Linderniaceae followed by Fischer et al. (2013) andBiffin et al. (2018).Plant samples were collected from the field and fresh leaves from healthy plants were stored in silica gel.These voucher specimens were deposited in the Herbarium of the Natural Museum of Guizhou University (GACP) (Table 1).The modified CTAB method was employed to extract total genomic DNA (Doyle and Doyle, 1987).The DNA purity and concentration were quantified using agarose gel electrophoresis and a NanoDrop 2000 Spectrophotometer.The quantified DNA was used to construct shotgun libraries with fragments of 200-500 bp, and paired-end (150 bp) reads were sequenced using an Illumina Hiseq 2500 platform by Biomarker Technologies, Inc (Shandong, China).Low-quality sequences of raw reads were filtered using the software Trimmomatic v.0.32 (Bolger et al., 2014), yielding at least 3 GB of clean reads for each sample.

Plastome comparative analyses
Gene rearrangements were detected based on collinear blocks using Mauve v2.4.0 (Darling et al., 2004).Genome divergence was identified using mVISTA with Shuffle-LAGAN mode (Frazer et al., 2004).To detect hypervariable regions, nucleotide variability (Pi) was calculated in DnaSP v.5.0 (Librado and Rozas, 2009) with the parameters of 600 bp on window length and 200 bp on step size.We also compared the boundaries of large single copy (LSC), small single copy (SSC), and two inverted repeat (IR) regions by online program IRscope (Amiryousefi et al., 2018).

Repeat sequences analyses
Simple sequence repeats (SSRs) were identified in MISA-web (Beier et al., 2017), with parameters set to ten, five, four, three, three, and three for mono-, di-, tri-, tetra-, penta-, and hexa-nucleotides, respectively.The forward, reverse, palindromic, and complement repeats were recognized using REPuter (Kurtz et al., 2001), with a minimum repeat size of 30 bp and a Hamming distance of three.The repeat numbers in the regions of LSC, SSC, and IR were counted in Geneious v9.0.2 (Kearse et al., 2012).

Condon usage and selective pressure analyses
The codon usage bias of protein-coding genes based on 28 Linderniaceae species was calculated in CodonW v1.4.4 (Peden, 2000).Relative synonymous codon usage (RSCU) was used to evaluate codon usage preference.When the RSCU value > 1 indicates that the codon is preferred, the RSCU value = 1 means that the codon is not preferred, and the RSCU value < 1 shows that the codon usage is low.The effective number of codons (ENC) and GC content of the synonymous third codon positions (GC3s) were used to assess codon usage patterns.To investigate the selective pressure on protein-coding genes, we extracted the shared GenBank accession numbers of the newly sequenced were shown in bold, other sequences were from the previous studies (NA, information is unavailable).
nucleotide and protein sequences based on 28 Linderniaceae species using Geneious v9.0.2 (Kearse et al., 2012).Protein sequences were aligned by MAFFT v.7.388 (Katoh and Standley, 2013), and then transformed data into axt format.Non-synonymous (Ka) substitution and synonymous (Ks) substitution were calculated using KaKs_Calculator v.2.0 (Wang et al., 2010), with Lindernia procumbens as the reference, setting the genetic code to 11 (bacterial and plant plastid codes) and calculation model to Yang-Nielsen algorithm (Nielsen and Yang, 1998).The ratio of non-synonymous and synonymous substitutions (Ka/Ks) is greater than 1, equal to 1, and less than 1, indicating positive selection, neutral selection, and purifying selection, respectively.The Ka value of 0 showed that the Ka/Ks value equaled 0. Ka = 0 and Ks = 0 showed the invalid value was represented by NA.

Phylogenetic analyses
Phylogenetic relationships of Linderniaceae were reconstructed using plastid genomic data (complete plastid genomes and 80 protein-coding genes).The ingroup included 31 individuals of Linderniaceae, representing eight genera and 28 species, with three species from Plantaginaceae and Scrophulariaceae as the outgroups (Table 1).Alignments of sequences were performed in MAFFT v.7.388 with auto strategy (Katoh and Standley, 2013) and manually adjusted the alignment results in PhyDE v.0.9971 (Müller et al., 2010).Phylogenetic trees were conducted using maximum likelihood (ML) and Bayesian inference (BI) methods in CIPRES Science Gateway (https://www.phylo.org/)with the tool of RAxML-HPC2 on XSEDE 8.2.12 and MrBayes on XSEDE 3.2.7a,respectively.The ML tree was inferred under the GTRGAMMA model, with bootstrap replicates of 1,000.For BI analysis, the bestfit model was selected in the software PhyloSuite (Zhang et al., 2020).The GTR+F+I+G4 model was selected for two data sets under the Akaike Information Criterion (AIC) (David and Buckley, 2004).Two Markov Chain Monte Carlo (MCMC) simulations were run for 2,000,000 generations independently, with every 1,000 generations for tree sampling.The initial 25% of trees were removed as burn-in and the remaining data were used to generate consensus trees.Convergences were inspected when the average standard deviation of the split frequencies was less than 0.001.Finally, we visualized and edited these trees using iTOL v3.4.3 (Letunic and Bork, 2016).

Structure and features of Linderniaceae plastomes
The whole plastid genomes of 31 Linderniaceae ranged from 152,386 bp (Bonnaya ciliata) to 154,402 bp (Yamazakia viscosa) (Table 2).These plastomes displayed a typical quadripartite structure (Figure 1), with a large single copy (LSC) region, a small single copy (SSC) region, and a pair of inverted repeats (IRs).The size of the LSC region varied from 84,566 bp in B. ruellioides to 85,837 bp in Y. viscosa.The size of the SSC region was from 18,352 bp in B. ciliata to 19,095 bp in Lindernia procumbens.The size of the IR region ranged from 24,599 bp in L. procumbens to 24,873 bp in Vandellia scutellariiformis.Total GC content did not present significant differences among all plastomes (37.5-37.7%),with 35.3-35.6% in the LSC region, 31.6-32.0% in the SSC region, and 43.3-43.6% in the IR region.Gene content was identical in all Linderniaceae plastomes, with 114 unique genes, comprising 80 protein-coding genes (PCGs), 30 tRNA genes, and four rRNA genes (Table 3).Seven protein-coding genes, seven tRNA genes, and four rRNA genes were duplicated in the IR regions.A total of 17 genes with introns were detected, of which clpP, rps12, and ycf3 genes contained two introns and the other 14 genes had a single intron.
With the first exon situated in the LSC region and the remaining two exons dispersed throughout the IR regions, the rps12 was identified as a trans-spliced gene.Gene order in Linderniaceae was highly conservative and we did not find rearrangement or inversion events (Figure 1 and Supplementary Figure 1).However, the junctions between single copy (SC) and IR regions exhibited slight differences (Supplementary Figure 2).The boundary of LSC and IRb was located in the intergenic region between rps19 and rpl2 among Bonnaya and Torenia anagallis plastomes, and the rps19 gene crossed the LSC-IRb boundary for the other plastomes.Except for Lindernia procumbens and L. rotundifolia, the gene ndhF spanned the SSC-IRb junction in all taxa, with 36 bp to 240 bp in the IRb region and 2019 bp to 2165 bp in the SSC region.At the SSC-IRa junction, the ycf1 and trnN genes were entirely located within the SSC and IRa region near the junction, respectively.For all species, the boundary of IRa and LSC was placed in the intergenic region between rpl2 and trnH, with gene rpl2 being 41 to 110 bp and gene trnH being 3 to 44 bp away from the boundary.

Sequence variation analyses
The mVISTA analyses indicated that the plastid genomes of Linderniaceae were relatively conservative (Supplementary Figure 3).The alignment of the whole plastomes was 143,491 bp in length, comprising 14,984 variable sites (10.44%) and 7,949 parsimony-informative sites (5.54%) (Table 4).The highest percentage of parsimony-informative sites was from the SSC region (1,739,9.84%),and the lowest was from the IR region (300, 1.26%).There was nucleotide variability (Pi) of 0.01737 across the complete plastome.The IR region (Pi = 0.0042) displayed the lowest sequence divergence, while the SSC region (Pi = 0.03225) had the most variance.Hypervariable regions of Linderniaceae species were further identified through the slide window analyses (Figure 2).The nucleotide diversity values in plastid genomes ranged from 0.00024 to 0.06292.The mutational hotspots were defined with Pi values greater than 0.0500.A total of five highly divergent regions from SC regions were detected (Figure 2A), including three intergenic spacers (IGS) (rps16-trnQ, petN-psbM, and rpl32-trnL), and two coding genes (rpl32 and ycf1).Among these hypervariable sites, the ycf1 gene exhibited the highest Pi value (0.06292), and the lowest was from rpl32-trnL (0.05002).Overall, the IR regions exhibited less variation (Figure 3B), and divergence was greater in the non-coding region than in the coding region (Supplementary Figure 3).
A total of 1909 long sequence repeats (LSRs) greater than 30 bp were identified, containing 844 forward repeats, 899 palindromic Representative circular map of the Linderniaceae plastome.Genes inside the circle are transcribed clockwise, while those outside the circle are transcribed counter-clockwise.The genes are color-coded with different functions.The darker gray and lighter gray in the inner correspond to GC content and AT content, respectively.LSC, large single copy; IR, inverted repeat; SSC, small single copy.

Codon usage bias and selective pressure
Codon usage biases with high similarity were observed among 28 Linderniaceae species (Supplementary Table 4).The total number of codons of protein-coding genes ranged from 22,800 (Picria felterrae) to 22,895 (Bonnaya antipoda).Leucine (Leu) was the most abundant (2400-2434), while cysteine (Cys) showed the least abundance (240-252).The RSCU values of 30 codons were greater than 1, almost all of which ended with A or U except for UUG.Specifically, the UUA encoding Leu had the highest RSCU from 1.91 to 1.98, and the AGC encoding serine (Ser) had the lowest RSCU from 0.27 to 0.32.Methionine (Met) and tryptophan (Try) preferred one codon with RSCU = 1, namely AUG and UGG, respectively.To further evaluate the codon usage pattern, the GC content of synonymous third codon positions (GC3s) and the effective number of codons (ENC) were calculated in Supplementary Table 5.The values of ENC showed a slight bias ranging from 46.91 to 47.33.Total GC content in the codons was Large subunit of rubisco rbcL

Other genes
Maturase matK

Protease
a, Gene with one intron.b, Gene with two introns.c, Trans-spliced gene.

Phylogenetic analyses
In this study, we included 31 complete plastomes from Linderniaceae and three outgroups from Plantaginaceae and Scrophulariaceae for phylogenetic inference.After removing ambiguously aligned regions, the alignment of the complete plastomes was 150,629 bp in length, comprising 11,541 parsimony-informative sites.A total of 80 protein-coding genes were aligned and the matrix of 79,174 bp was generated after removing ambiguously aligned regions, including 5,744 parsimony-informative sites.As topologies generated from ML and BI analysis were similar, only the ML tree was presented with ML bootstrap support (BS) and BI posterior probability (PP) values added (Figure 4).Except for the phylogenetic positions of Vandellia elata and Torenia parviflora, the topologies generated from the two datasets were identical.In the complete plastomes tree, Vandellia elata was sister to the group consisting of Legazpia polygonoides and T. anagallis with weak support (BS = 36, PP = 0.35).However, Vandellia elata was strongly supported as a sister to the core Torenia that comprised V. stricta and all Torenia species except T. anagallis in the protein-coding genes tree (BS = 92, PP = 0.84).Torenia parviflora was one of a member of the core Torenia.It was sister to the clade including T. asiatica, T. concolor, T. fournieri, T. benthamiana, T. fordii, and T. violacea in the complete plastomes tree (BS = 100, PP = 1.00) but sister to another group consisting of V. stricta, T. flava, T. crustacea and T. oblonga in the protein-coding genes tree (BS = 100, PP = 1.00).
In both trees, the monophyly of Linderniaceae was strongly supported (BS = 100, PP = 1.00).Within Linderniaceae, Lindernia procumbens was sister to L. rotundifolia, then together sister to the rest of Linderniaceae (BS = 100, PP = 1.00).The two Yamazakia species did not group together as Vandellia montana embedded within the genus (BS = 100, PP = 1.00).Picria felterrae from different populations clustered together and formed a sister relationship with monophyletic Boyanna comprising Boyanna antipoda, B. ciliata, B. ruellioides, and B. tenuifolia here (BS = 100, PP = 1.00).The monotypic Legazpia represented by three different individuals was strongly supported, which was sister to Torenia anagallis (BS = 100, PP = 1.00).The monophyly of Torenia was not supported with V. stricta, V. elata, and Legazpia polygonoides embedded.Similarly, the monophyly of Vandellia was also not supported as species of the genus scattered to other distinct clades.

Conservatism and diversity of Linderniaceae plastomes
In this study, the 26 plastid genomes of Linderniaceae were newly sequenced, and comprehensive analyses were performed in combination with five published data.Comparative analyses revealed conservatism and diversity in the Linderniaceae plastomes.Similar to most angiosperms (Wicke et al., 2011), the plastid genomes of Linderniaceae typically presented circular quadripartite structure, comprising a pair of inverted repeats (IRs), a large single copy (LSC) region, and a small single copy (SSC) region (Figure 1).The 31 plastid genomes of Linderniaceae, ranging from 152,386 bp to 154,402 bp, did not show significant differences in size, and the gene content and order were identical (Tables 2, 3).Within the same species, there is little or no variation in the genome size.The two populations of Picria felterrae differ in size by only 7 bases, and the three plastomes of Legazpia polygonoides had the same 153,477 bp in size (Table 2).Similarly, there are also no significant differences among distantly related species between Torenia and Lindernia, such as Torenia concolor (153,994 bp) and Lindernia procumbens (153,448 bp).These means that the plastid genomes of Linderniaceae are highly conservative in general features with only slight variations in genome size.Previous studies have revealed that the expansion and contraction of the IR, LSC, and SSC regions are common during evolution and are the main reason for variations in plastome size (Ravi et al., 2007;Downie and Jansen, 2015;Daniell et al., 2016).Significant contraction of IR region was observed among Linderniaceae species (Supplementary Figure 2), which was also common in Lamiales, such as Gesneriaceae (Gu et al., 2020), Lamiaceae (Su et al., 2022), Oleaceae (Long et al., 2023), Plantaginaceae (Xie et al., 2023), and Scrophulariaceae (Wang et al., 2022).Notably, the IR boundaries did not coincide exactly in the three populations of L. polygonoides, but they had the same plastome size.The two populations of P. felterrae were completely consistent in IR size, with slight variation occurring in the LSC and SSC regions.Overall, the 31 plastid genomes of Linderniaceae showed size differences mainly in the LSC region (Table 2).Consistent with previous studies (Yang et al., 2022;Bai et al., 2023;Peng et al., 2023), the non-coding and single copy (SC) regions were more divergent than the coding and IR regions (Supplementary Figure 3).We, therefore, speculated that the size variation of Linderniaceae plastomes was mainly attributed to the LSC region, especially in the non-coding regions, similar to the results of studies of Gao et al. (2022) and Li et al. (2023).
In Linderniaceae, the gene composition in each junction was almost identical, while the gene arrangement pattern was slightly different (Supplementary Figure 2).Interestingly, the gene arrangement pattern of each IR boundary in Linderniaceae was more similar among closely related species.For example, based on the phylogenetic analyses in this study, Torenia crustacea, T. oblonga, T. flava, and Vandellia stricta formed a well-supported clade (Figure 4), and their IR boundaries were highly conservative, with only slight differences in the SSC-IRb and SCC-IRa junctions.Particularly, the sister species of T. crustacea and T. oblonga showed completely consistent boundary genes and arrangement patterns, and so did the sister species of T. flava and V. stricta.This correlation has been reported in Aristolochiaceae (Bai et al., 2023), Juglandaceae (Xi et al., 2022), and Musaceae (Song et al., 2022).Therefore, information from the IR boundary may reveal the phylogenetic relationships among species to some extent.
Long sequence repeats (LSRs) play an important role in genome recombination and rearrangement (Do and Kim, 2017).In Linderniaceae, LSRs were found predominantly in the IR regions, and the forward and palindromic repeats with the length mainly Cladograms of Linderniaceae inferred from the complete plastid genomes (A) and protein-coding genes (B) based on Bayesian inference (BI) and maximum likelihood (ML) methods.The ML bootstrap (BS) and BI posterior probability (PP) that supported each node are shown under the branches.Red-bold horizontal lines mean different phylogenetic positions for taxa between two different datasets.* indicates that the sequence is from GenBank.BS values < 50% and PP < 0.5 are indicated by '-'.
concentrated in 30-40 bp were the most abundant (Supplementary Table 3).These findings were largely consistent with previous studies (Tian et al., 2018;Thode and Lohmann, 2019;Fan et al., 2021;Zhu et al., 2021).Single Sequence Repeats (SSRs) with polymorphism have been developed as molecular markers that were widely used in population genetics, polymorphism investigation, and biogeographic inference (Ebert and Peakall, 2009;Xue et al., 2012;Yu et al., 2017).In 28 Linderniaceae species, the mononucleotide SSRs were the most abundant and almost consisted of A or T bases (Supplementary Table 2), which may explain the richness of A/T content in plastomes.The number of SSRs in the single-copy (SC) and non-coding regions was higher than that in the IR and coding regions (Figure 3), consistent with previous research (Huang et al., 2020;Kong et al., 2021;Wu et al., 2021).There are significant differences in the number and distribution pattern of SSRs among species (Figure 3 and Supplementary Table 1), which have great significance for future population genetics studies of Linderniaceae.
Hypervariable regions within the plastid genome have been screened out as candidate barcodes for phylogenetic analyses and species identification (Dong et al., 2012;Dong et al., 2015;Li et al., 2015).As non-coding sequences have higher evolutionary rates than coding regions, they were often identified as the best barcoding candidates (Ravi et al., 2007), which was confirmed in the present study (Figure 2 and Supplementary Figure 3).Five hypervariable regions were screened within the Linderniaceae, namely petN-psbM, rps16-trnQ, rpl32-trnL, rpl32, and ycf1, of which petN-psbM and rps16-trnQ were located in the LSC region and others were distributed in the SSC region.These non-coding regions have exhibited high nucleotide diversity and phylogenetic utility in Lamiales, such as petN-psbM in Lamiaceae (Su et al., 2022;Wang et al., 2023), rps16-trnQ in Scrophulariaceae (Dong et al., 2022) and Gesneriaceae (Gu et al., 2020), and rpl32-trnL in Bignoniaceae (Fonseca and Lohmann, 2018), Lamiaceae (Hu et al., 2020) and Scrophulariaceae (Dong et al., 2022).Furthermore, the gene ycf1, the second largest gene in the plastid genome playing a critical role in plant cell survival (Drescher et al., 2000), has been proposed as the most potential DNA barcode due to its high variability in land plants (Drew and Sytsma, 2011;Dong et al., 2015).The relatively high nucleotide diversity observed in the rpl32 gene is similar to that observed in Dioscoreaceae (Wonok et al., 2023), Fagaceae (Worth et al., 2019), and Poaceae (Guo et al., 2022).In the previous study, the phylogeny of Linderniaceae inferred from a few common plastid markers (rps16, matK, trnL-F, and trnK) or nuclear gene (RPB2) had moderate or weak support values (Fischer et al., 2013;Liang, 2017;Biffin et al., 2018), hindering the investigation for the phylogeny and evolution of Linderniaceae.Hypervariable regions in this study provided additional information for further phylogenetic and biodiversity research in Linderniaceae.

Evolution of protein-coding genes in Linderniaceae
Codon usage bias may help to better understand gene expression and genome evolution (López et al., 2019).The analyses of relative synonymous codon usage (RSCU) showed that the Linderniaceae plastomes were highly similar in codon usage pattern (Supplementary Table 4).Consistent with previous studies (Azarin et al., 2021;Zhao et al., 2022;Li et al., 2023), Leucine (Leu) was the most frequent amino acid, and cysteine (Cys) was the least common.In Linderniaceae, codons mainly ended with A and U when the RSCU value was greater than 1, and codons primarily ended with G and C when the RSCU value was below 1, which appears to be a common phenomenon in gene expression of land plants (Cui et al., 2019;Chakraborty et al., 2020;Gao et al., 2022;Bai et al., 2023;Zhou et al., 2023).Moreover, the GC content of synonymous third codons positions (GC3s) showed that AT content was more abundant in the protein-coding genes (Supplementary Table 5), which may be correlated with the abundant AT content of plastid genomes.The ENC values of protein-coding genes in Linderniaceae plastomes varied from 46.91 to 47.33, demonstrating a conservative codon usage pattern (Wu et al., 2007).
The ratio (Ka/Ks) of non-synonymous (Ka) and synonymous (Ks) substitutions is a valuable parameter to assess the selective pressure in evolution (Ohta, 1995).Positive selection is considered to be closely associated with adaptive evolution in harsh environments while purifying selection is a common evolutionary force responsible for maintaining genomic conservation (Cvijovicé t al., 2018).In Linderniaceae, the Ka/Ks values of almost all genes were less than 1.00, providing evidence for purifying selection.The ycf2 gene had a positive selection in only Bonnaya antipoda, B. ciliata, B. ruellioides, B. tenuifolia, Picria felterrae, and Torenia flava, with a Ka/Ks value greater than 1, indicating that most protein-coding genes were relatively conserved.As the largest known plastid gene in angiosperms, ycf2 is an enigmatic gene due to its unknown function.Drescher et al. (2000) and Kikuchi et al. (2018) pointed out that the ycf2 gene is associated with plant viability and the 2-MD AAA-ATPase complex.Among many angiosperms, ycf2 has been reported under positive selection (Liu et al., 2018;Zhong et al., 2019;Peng et al., 2020;Wu et al., 2020;Song et al., 2022;Fu et al., 2023).Overall, protein-coding genes of plastid genomes of the Linderniaceae were highly conserved during long-term evolution, while the functions of the gene ycf2 in adaptive evolution need to be further verified.

Insights into the phylogeny of Linderniaceae
In this study, the robust phylogenetic backbone of Linderniaceae was constructed with the whole plastomes, in which the support values of these clades were greatly improved compared with previous studies based on a few genetic markers (Fischer et al., 2013;Liang, 2017;Biffin et al., 2018).Based on phylogenomic analyses presented here, we found that Vandellia and Torenia sensu Fischer et al. (2013) were polyphyletic.Meanwhile, strong evidence was provided to support the taxonomic status of two genera, Picria and Yamazakia.
Traditionally, the genus Torenia is characterized by winged calyx on the shallowly lobed calyx tube and completely calyx-enclosed capsule (Bentham, 1846).However, historically, it is very difficult to define the exact circumscription of Torenia.Some species formerly belonging to Craterostigma, Lindernia s.l., and Vandellia were once treated as members of Torenia (Bentham, 1846;Hance, 1868;Pennell, 1943).On the basis of synapomorphy of the calyx tube together with molecular evidence, Fischer et al. (2013) redefined the genus Torenia with a total of 51 species included.In the phylogenetic analyses, Torenia seemed to be monophyletic as the only two species sampled formed a sister relationship (Fischer et al., 2013).Subsequent phylogenetic analyses based on expanded sampling indicated that Torenia sensu Fischer et al. (2013) is not monophyletic as some species of Vandellia, Legazpia, and Schizotorenia are embedded within it (Liang, 2017).Combining morphological and molecular evidence, Liang (2017) lumped Legazpia, Schizotorenia, and some species with claviform appendages of Vandellia into Torenia, with the number of species expanding to 78.According to the taxonomic treatment of Liang (2017), the morphology of the calyx tube was no longer used as a generic diagnostic characteristic, and potential synapomorphies of newly circumscribed Torenia included the pinnate vein, claviform appendages at the abaxial stamens, and bothrospermous seed.In this study, phylogenomic analyses including 11 species of Torenia indicated that Torenia sensu Fischer et al. (2013) is not monophyletic with Vandellia stricta, V. elata, and Legazpia polygonoides embedded.Morphologically, Legazpia can be easily distinguished from other Torenia species by such characteristics as three large semi-circular wings on the calyx, glabrous ovary, small corolla slightly exceeding the calyx, and rounded upper corolla lip (Yamazaki, 1955), and has been accepted as a separate genus by most taxonomists (Yamazaki, 1955;Fischer, 1995;Hong et al., 1998;Fischer, 2004;Rahmanzadeh et al., 2005;Fischer et al., 2013;Fischer et al., 2013).To maintain the monophyly of Torenia, reducing Legazpia to Torenia may be a reasonable choice.
Vandellia has long been treated as a synonym of Lindernia s.l.(Pennell, 1935;Pennell, 1943;Philcox, 1968;Yamazaki, 1977;Yamazaki, 1985;Fischer, 1995;Lewis, 2000;Fischer, 2004;Rahmanzadeh et al., 2005).Based on molecular phylogenetic analyses, Fischer et al. (2013) revealed that Lindernia s.l. is polyphyletic and split this genus into six genera (Lindernia s.s., Vandellia, Torenia, Bonnaya, Linderniella, and Craterostigma), of which most species of Lindernia s.l. were assigned to Vandellia.In this study, phylogenetic analyses showed that the species of Vandellia sensu Fischer et al. (2013) scattered various clades, which was similar to previous studies (Liang, 2017;Biffin et al., 2018).Traditionally, the pinnate vein, deeply lobed calyx, and four fertile stamens were usually used to define the genus Vandellia (Bentham, 1846;Hooker, 1884;Fischer et al., 2013).On the basis of morphological together with molecular evidence, Liang (2017) transferred Chamaegigas, Craterostigma, and Linderniella which have been long-established and accepted genera to Vandellia, although the Vandellia clade had weak support based on combined DNA markers (trnL-F, rps16, and matK).Vandellia sensu Liang (2017) comprised 49 species, almost all of which were transferred from other genera.A large number of species name changes will lead to confusion for end-users of these names (most of them are not taxonomists) (Drew et al., 2017).Due to limited sampling in the current study, we have insufficient evidence to elucidate the circumscription of Vandellia, pending comprehensive phylogenomic analyses to better determine the strength of support for this genus.
The Yamazakia was currently proposed by Biffin et al. (2018) as a replacement name for the genus Tittmannia Rchb.(blocking name: Tittmannia Brongn.), including two sampled species in phylogenetic analyses (Yamazakia pusilla and Y. viscosa).These two species were transferred from Vandellia sensu Fischer et al. (2013).Actually, Liang (2017) has revealed a well-supported clade (BS = 100, PP = 1.00) including three species [Vandellia moillis (synonym of V. montana), V. pusilla, and V. viscosa], and placed them in the genus Tittmannia together with other five species (Tittmannia cyrtotricha, T. longituba, T. rivularis, T. satakei, and T. stolonifera) based on morphological evidence.Traditionally, these species shared similar morphological characteristics and therefore were placed within the Lindernia section Tittmannia (Philcox, 1968).In our study, phylogenetic analyses showed that V. montana was sister to Yamazakia viscosa with strong support (BS = 100, PP = 1.00), then together sister to Y. pusilla (BS = 100, PP = 1.00).Resolution of the three species is significantly improved compared with the study of Liang (2017).Therefore, our results demonstrated V. montana to be a member of Yamazakia, supporting Yamazakia as a separate genus.
The genus Picria was proposed by Loureiro (1790), containing only one species Picria felterrae.The native range of this genus is Himalaya to S.China and Peninsula Malaysia, Philippines to Caroline Islands.Picria is a very distinctive genus morphologically different from the other Linderniaceae plants (Fischer, 2004).The calyx tube of Linderniaceae is composed of four types, namely deeply lobed, shallowly lobed, half lobed, and no calyx tube (Liang, 2017).The last type is in fact that the calyx is completely split to base and divided into four spreading segments, consisting of a pair of cordate outer sepals and another pair of filiform lateral inner sepals.This type of calyx tube only occurs in Picria within the Linderniaceae.Liang (2017) first included Picria in molecular phylogenetic analyses and indicated that the genus was sister to Boyanna with moderate support (BS = 67, PP = 0.64).In this study, we included two individuals from different populations in phylogenetic analyses.The result further confirmed that Picria is closely related to Boyanna with strongly supported values (BS = 100, PP = 1.00).Except for significantly different calyx morphology, the two genera bear similar corolla tubes, clavate staminodes without appendages, and flora discs, which may reflect the relationship between the two genera.Due to the particular calyx and relatively isolated phylogenetic position, we supported the monotypic Picria as a separate genus.

Conclusion
Linderniaceae plastomes were highly conservative in terms of structure, gene order, and gene content, with slight contraction in the IR region.Except for that the ycf2 gene has a positive selection in a few species, most protein-coding genes were highly conserved, indicating that the evolution of Linderniaceae species was relatively slow.The phylogenetic analyses of Linderniaceae based on the complete plastid genome showed a higher resolution compared with previous studies, and the results revealed that the Vandellia and Torenia sensu Fischer et al. (2013) were non-monophyletic.The newly established Yamazakia was supported and the monotypic Picria could be regarded as a separate genus.Molecular phylogenetic evidence is of great importance to taxonomic revision of the complicated Linderniaceae.In this study, the sampling is not comprehensive with only 28 representative species from eight genera.As the plastid genome has been demonstrated to be effective in solving the phylogeny of Linderniaceae, a widely accepted taxonomic treatment is likely to be reached based on comprehensive sampling and plastid genome data.

FIGURE 1
FIGURE 1 (2), ycf3 b , ycf4, ycf15 FIGURE 3 Comparisons of simple sequence repeats (SSRs) and long sequence repeats (LSRs) among 28 Linderniaceae species.(A) Number of six types of SSRs.(B) Number of four types of LSRs.(C) Number of SSRs in the LSC, SSC, and IR regions.(D) Number of LSRs in the LSC, SSC, and IR regions.(E) Number of SSRs in the IGS, exons, and introns.(F) Number of LSRs in the IGS, exons, and introns.

TABLE 1
Voucher information and GenBank accession numbers of the samples in this study.

TABLE 2
Summary of features of the Linderniaceae plastomes.

TABLE 3
Gene content of the Linderniaceae plastomes.

TABLE 4
Sequence divergence of the Linderniaceae plastomes.