Variation in Plastome Sizes Accompanied by Evolutionary History in Monogenomic Triticeae (Poaceae: Triticeae)

To investigate the pattern of chloroplast genome variation in Triticeae, we comprehensively analyzed the indels in protein-coding genes and intergenic sequence, gene loss/pseudonization, intron variation, expansion/contraction in inverted repeat regions, and the relationship between sequence characteristics and chloroplast genome size in 34 monogenomic Triticeae plants. Ancestral genome reconstruction suggests that major length variations occurred in four-stem branches of monogenomic Triticeae followed by independent changes in each genus. It was shown that the chloroplast genome sizes of monogenomic Triticeae were highly variable. The chloroplast genome of Pseudoroegneria, Dasypyrum, Lophopyrum, Thinopyrum, Eremopyrum, Agropyron, Australopyrum, and Henradia in Triticeae had evolved toward size reduction largely because of pseudogenes elimination events and length deletion fragments in intergenic. The Aegilops/Triticum complex, Taeniatherum, Secale, Crithopsis, Herteranthelium, and Hordeum in Triticeae had a larger chloroplast genome size. The large size variation in major lineages and their subclades are most likely consequences of adaptive processes since these variations were significantly correlated with divergence time and historical climatic changes. We also found that several intergenic regions, such as petN–trnC and psbE–petL containing unique genetic information, which can be used as important tools to identify the maternal relationship among Triticeae species. Our results contribute to the novel knowledge of plastid genome evolution in Triticeae.


INTRODUCTION
Chloroplast DNA (cp DNA) is composed of a single circular DNA molecule with a quadripartite structure and encodes multiple proteins, including components of light reactions in the photosynthesis process (Martin et al., 2002). In angiosperms, cp genomes have a very constrained size, ranging from 115 to 165 kb in length, and consist of two copies of an inverted repeat (IR) region, a large-single-copy (LSC) region, and a small single copy (SSC) region (Raubeson and Jansen, 2005;Wicke et al., 2011). Because of a uniparental mode of inheritance and high conservation in gene content and genome structure, the chloroplast genome is generally treated as a single locus (Raubeson and Jansen, 2005;Nock et al., 2011). With a smaller effective population size and being essentially recombinationfree, the chloroplast genome has a shorter coalescent time than nuclear genomes (Birky et al., 1983). The cp genome is highly conserved in structure, size, and gene content within land plants (Palmer, 1985;Wicke et al., 2011), because chloroplast sequences evolve at approximately half the speed of nuclear regions (Walker et al., 2014). However, variations in some regions of the cp genome have been widely reported in plants (Krause, 2011;Weng et al., 2014;Schwarz et al., 2015;Chen et al., 2017;Bedoya et al., 2019;Shrestha et al., 2019). These advantageous features give the chloroplast genome an important value in reconstructing phylogeny, DNA barcoding for accurate identification of plant species, and tracing evolutionary history (Jansen et al., 2008;Walker et al., 2014;Chen et al., 2017;Bedoya et al., 2019;Shrestha et al., 2019). However, associations between DNA composition and cp genome divergence need to be clarified in species over a range of evolutionary time.
Comparison of whole cp genomes can explore sequence variation, permit examination of molecular evolutionary patterns associated with structural rearrangement, and elucidate genetic changes underlying those events. Previous studies on seed plants have suggested that structural rearrangement, intergenic region variation, variation in IR regions, and gene loss were principal factors driving the variation in chloroplast genome size and structure (Kugita et al., 2003;Parks et al., 2009;Wu et al., 2011;Chen et al., 2017;Zheng et al., 2017). It has been reported that contractions into single copy regions with inversions have shaped certain evolutionary features for monophyletic groups (Palmer et al., 1985;Hoot and Palmer, 1994). While these described features in chloroplast genome variation enable researchers to investigate genome divergences over a broad range of evolutionary time, from early land plants to recently domesticated plants, comparisons among very distant relatives have yielded results with uncertain generality (Matsuoka et al., 2002;Zheng et al., 2017). Moreover, a comprehensive search for factors that drive the variation in genome size in a given phylogenetic framework is still lacking.
The wheat tribe (Poaceae: Triticeae), an economically important gene pool for genetic improvement of cereal and forage crops, includes about 450 diploid and polyploid species that distribute in a wide range of ecological habitats over temperate, subtropical, and tropic alpine regions (Dewey, 1984). In Triticeae, 24 major basic genomes in diploid species have been designated and recognized in 18 monogenomic genera (Löve, 1984;Wang et al., 1994). Monogenomic genera have been the focus of numerous evolutionary investigations, partly because of either their economic importance (including barley, rye, and crested-wheat grasses) or their genome donor to the speciation of polyploid species (accounting for 75% of the Triticeae species), as well as their wide variety of species richness, morphology, ecology, and distribution. Previous data from geographical distribution (Sakamoto, 1973) and morphological characteristics (West et al., 1988), and DNA information (Kellogg et al., 1996) have suggested that the first step of phylogenetic diversification in Triticeae occurred at the diploid level and then gave rise to different present-day diploid lineages. Since the tribe was originated, lineage diversification of monogenomic genus in a range of evolutionary time raised the question of how cp genomes have evolved. Comparison of cp genomic structure on several monogenomic Triticeae showed that a 737-bp deletion in Pseudoroegneria libanotica might be related to potential nuclearcytoplasm transfer (Wu et al., 2017). Analysis of the cp genome sequence in Agropyron cristatum suggested that deletion of accD and translocation of rpl23 genes might represent an independent gene-loss event or an additional divergence in Triticeae . However, to better understand the pattern of variation in cp genome structure and size in Triticeae, it is critical to assess the evolutionary processes that drive cp genome variation accompanied by the diversification of Triticeae with relatively well-sampled monogenomic Triticeae genera.
Here, we carried out phylogenetic reconstructions and estimated diversification patterns using the cp genome from 34 species of 17 monogenomic genera within the Triticeae. While generic definitions within the Triticeae have been extremely variable (Barkworth, 1998), our sample represents nearly all monogenomic genera and genome types according to genomebased classifications of the tribe (Yen et al., 2005). The aims of this study were to (1) examine variation in cp genomic structure and size in monogenomic Triticeae; (2) document the process of cp genomic variation accompanied by the diversification history of monogenomic Triticeae. Knowledge of cp genomic variation over a range of evolutionary time would provide a better understanding of the evolutionary history of the Triticeae.

Comparison of 35 Chloroplast Genomes
The software mVISTA was used to compare the cp genomes using the annotation of A. cristatum as a reference (Frazer et al., 2004). The sequences were aligned with MAFFT v6.833 (Katoh and Toh, 2010) using default settings. Sequence variation, such as gene loss, divergent genes, intergenic sequence (IGS) indels, and IR contraction/expansion, were examined by MEGA 6.0  (Tamura et al., 2013). The IR expansion and contraction of cp genomes were analyzed using IRscope (Amiryousefi et al., 2018).

Phylogenetic Analysis
Phylogenetic analysis was performed using maximum likelihood (ML) and Bayesian inference (BI). ML analysis was conducted using RAxML v.8.2.104 (Stamatakis, 2014). The best model was determined by the model test-ng-0.1.6 software with a default parameter (Kozlov et al., 2019). The optimal model identified was GTR + I + G, which was used in both the ML and BI analyses. The robustness of the trees was estimated by bootstrap support (BS). Statistical support for nodes in ML analysis was estimated using 1,000 fast bootstrap replicates, each with three replicates of stepwise random taxon addition. A BS value of less than 50% was not included in the figures. The BI analysis was performed using MrBayes v3.0 (Huelsenbeck and Ronquist, 2001). Four Markov Chain Monte Carlo (MCMC) chains (one cold and three heated), with MrBayes default heating values (t = 0.2), were run for 100,000 generations, each sampling every 100 generations. The first 250 trees were stationarily discarded as "burn-in." The statistical confidence in nodes was estimated by posterior probabilities (PP). A PP value of less than 0.9 was not included in the figures.

Ancestral State Reconstruction
Two data matrixes, complete cp genome sequence data and concatenated non-protein coding data, were used to trace the ancestral states of genome size on the phylogenetic tree using weighted squared-change parsimony in the software Mesquite v2.5 (Maddison and Maddison, 2021). Weighted squared-change parsimony minimizes the sum of squared change along all branches of the tree, weighting branches by their length (Finarelli and Flynn, 2006). The evolutionary history of the two selected data characters was mapped over the single best ML tree based on complete cp genome sequences.

Divergence Dating
Divergence times with 95% confidence intervals were estimated using the Bayesian relaxed molecular clock method, implemented in BEAST v1.4.6 (Drummond and Rambaut, 2007). Calibration points were performed using a relaxed uncorrelated lognormal molecular clock. A complete cp genome sequence dataset was used to conduct the BEAST analysis. The lack of fossils for Triticeae precluded the direct calibration of tree topologies. Instead, dating was based on the divergence time for the basalmost split in Triticeae (Marcussen et al., 2014). Priors on Triticeae crown age (15.32 Ma ± 0.34) were set as inferred by Marcussen et al. (2014), where several macrofossils from grass (Festuca, Berriochloa, and Nassella) were used to calibrate the age of Triticeae. Tracer v1.4 (Rambaut and Drummond, 2018) was used to ensure the convergence of mixing in terms of effective sample size (ESS) values and coefficient rate. The results will be accepted if the values of the estimated sample size were larger than 200, suggesting little autocorrelation between samples. The resultant trees were analyzed using TreeAnnotator in BEAST where the "burn-in" (2,000 trees) was removed, and a maximum credibility tree was constructed. The trees were then viewed in FigTree v. 1.3.1. 1

Statistical Analysis
Statistical analysis was performed by using the R software. Data statistics of the complete cp genome, protein coding region, and non-protein coding region of each clade were compared with Boxplot. Kruskal-Wallis test was performed on the complete cp genome, protein coding region, and non-protein coding region to determine significant differences among the clades. The relationship between divergence time and indel amount was evaluated by using Spearman's correlation coefficient test.

Phylogeny of the Triticeae Species
Chloroplast genomes contain an abundance of phylogenetic information, which has been widely used for phylogeny reconstruction at different taxonomic levels, such as order, family, genus, and species, in plants. Using chloroplast genome data, long-standing controversies related to various phylogenetically difficult groups have been resolved, supporting its importance in systematic studies. To better determine the phylogenetic position of Triticeae and further clarify the evolutionary relationships within the Triticeae tribe, phylogenetic analyses were constructed based on the 34 complete chloroplast genomes using B. distachyon as an outgroup. Phylogenetic reconstruction based on complete cp genome data resulted in a tree with high posterior probability support across most clades ( Figure 1A). The ML and Bayesian analyses of complete cp genome data generated the same tree topology with BS >50% above and PP >0.9 below branches ( Figure 1A). The tree illustrated that four clades (I-IV) were recognized. Clade I included the Aegilops/Triticum complex, Taeniatherum

General Variation of Chloroplast Genomes in Triticeae Species
The complete genome size of 34 cp genomes in Triticeae ranged from 135,003 (Thinopyrum bessarabicum) to 136,968 bp (Hordeum bogdanii), and non-protein coding region size from 75,694 (T. bessarabicum) to 77,672 bp (H. bogdanii). The level of sequence divergence among the 34 cp genomes was compared using the mVISTA program. We found that nonprotein coding regions were more highly variable than protein coding regions, and IRs had lower sequence divergence than SC regions (Supplementary Figure 1). Otherwise, large hotspot variation regions were detected in petN-rpoB, rbcL-psaI, and rpl23-ndhB. Gene loss/pseudonization, indels in protein coding genes, intron variation, and IGS indels within the cp genomes were characterized and mapped on branches of the phylogenetic tree ( Figure 1A) of the Triticeae species based on complete plastid genomes ( Figure 1B). One of the mutation events occurred between the rbcL gene and the psaI gene in the LSC region (Supplementary Figure 2). This region primarily contains the rpl23 gene, the accD gene, and intergenic regions. The rpl23 and accD genes were completely absent in A. cristatum, Agropyron mongolicum, Ere. triticum, Ere. distant, Australopyrum retrofractum, Henradia persica, and Aegilops tauschii. Within this region, the absence of a long IGS and the accD gene was also detected in Aegilops speltoides and Triticum monococcum ssp. aegilopoides (Supplementary Figure 3). Another gene loss event was identified between the trnL gene and the trnI gene in the IRs region (Supplementary Figure 4). These deletion regions primarily contain a ycf 2 gene fragment, the ycf 15 gene, and intergenic regions, and occurred in Pseudoroegneria spicata, P. libanotica, T. bessarabicum, Lophopyrum elongatum, and Frontiers in Plant Science | www.frontiersin.org Dasypyrum villosum, which were grouped in clade II (Figure 1). A hot variation region was also found in the non-protein coding regions. All species in Clade II had a 529-bp deletion between the petN and trnC genes in LSC (Supplementary Figure 5). All species in clade III, A. speltoides, Aegilops speltoides ssp. ligustica and T. monococcum ssp. aegilopoides, had a 438-bp deletion between psbE and petL in LSC (Supplementary Figure 6). All species in Clade I and Hordeum jubatum contained a 172-bp deletion between trnT and trnE in LSC.

Highly Divergent Genes
The total number of protein coding genes was 76 in each cp genome of the Triticeae species. Nucleotide divergence occurred in the coding regions of rpoC2, rps3, rpl22, matK, ycf 1, rps16, rpoC1, ndhH, atpF, rpl16, rpl32, ndhA, psbB, ycf 3, and inf A. Among these, the rpoC2 gene has a 6-bp insertion fragment in the species clustered in clades I and II, and a 42-bp insertion fragment in D. villosum (Supplementary Figures 7, 8). The Rps3 gene in H. bogdanii has a 45-bp deletion (Supplementary Figure 9). The Inf A gene had a deletion of two different 18-bp fragments in L. elongatum, S. cereale, and H. jubatum (Supplementary  Figures 10, 11). A 15-bp deletion in the rpl22 gene was found in L. elongatum (Supplementary Figure 12).

Inverted Repeat Contraction/Expansion
The most common events underlying changes in the plastome size of land plants included the contraction/expansion of IR.
Land plants have a highly conserved chloroplast genome, but four junctions (LSC/IRB/SSC/IRA) vary in genome size and could affect IR contraction and expansion (Saski et al., 2007;Choi and Park, 2015). All IR boundaries among the sampled monogenomic Triticeae species were in a similar location, but IR size varied from 20,806 (D. villosum) to 21,589 bp (Eremopyrum distans) because of variation in the size of pseudogenes ycf 2 and ycf 15. Because of a ∼800-bp deletion of ycf 2 and ycf 15 in the clade II species, IR size varied from 20,806 (D. villosum) to 20,866 bp (P. spicata), which was much smaller than that in the other Triticeae species (>21 kb) (Figure 2). The LSC/IRb/SSC/IRa boundary regions were compared using IRscope (Figure 2). The LSC/IRb junction was between the rpl22 gene and the rps19 gene in 31 Triticeae species, and close to the rps19 gene in the plastomes of Aegilops markgrafii, Aegilops umbellulata, Aegilops umbellulata ssp. transcaucasica, Amblyopyrum muticum, D. villosum, H. jubatum, and T. monococcum ssp. aegilopoides, varying from 17 to 36 bp. The rpl22 gene in the LSC region was close to the LSC/IRb junction in 24 out of the 31 cp genomes and varied from 28 to 42 bp apart from the LSC/IRb junction. The Rpl2 gene in H. vulgare was in IR regions 54 bp away from the LSC/IRb junction, while the rpl2 gene in H. vulgare ssp. spontaneum was in the LSC region 5 bp away from the LSC/IRb junction. The distance between rps15 and the junction of the IRb/SSC region ranged from 13 to 480 bp (except rps15 in H. vulgare that crossed the junction of the IRb/SSC region). At the junction of the SSC/IRa region, the ndhH gene crossed the SSC/IRa boundary in all the genomes with 0-1,007 bp located in the SSC region. The rps19 gene was in the IRa region in 31 of the species 1-51 bp apart from the LSC/IRa junction. The rpl2 gene was located in the IRa region 590 bp apart from the LSC/IRa junction in H. vulgare, and in the LSC region 4 bp apart from the junction in H. vulgare ssp. spontaneum.

Ancestral State Reconstruction
Sizes of complete chloroplast genomes, protein coding sequences, and non-protein coding sequences in monogenomic Triticeae are shown in Table 2. The reconstruction of character evolution revealed that the sizes of complete cp genome sequences and non-protein coding sequences of the species within clades I and IV were gradually enlarged (Figure 3)

Divergence Dating
Based on the complete cp genome sequences dataset, divergence times with 95% CI by using BEAST analyses generated a maternal time-calibrated tree (Figure 4). The divergence time was marked on 33 branch nodes within the ML tree. Time calibration analysis demonstrated that the time to the most recent common maternal ancestor of the Triticeae was dated to 27.63 MYA (95% CI). The maternal ancestor of Hordeum originated about 15.09 MYA (95% CI). The divergence time of maternal ancestor of species in clade III (Agropyron, Australopyrum, Henradia, and Eremopyrum) and clade II (Dasypyrum, Pseudoroegneria, Lophopyrum, and Thinopyrum) was dated to 17.16 MYA (95% CI) and 12.62 MYA (95% CI), respectively. The divergence time of the maternal ancestor of Aegilops/Triticum, Taeniatherum, Secale, Crithopsis, and Heteranthelium in clade I was 12.71 MYA (95% CI).

Statistical Analysis
The insertion and deletion of 33 divergence nodes of the phylogenomic tree are counted in Table 3. The divergence time was highly correlated with the number of variations (R = 0.87, p = 1.6e −10 < 0.05) (Figure 5A), as well as cp genome size and indel number (R = 0.37, p = 0.04) ( Figure 5B). Therefore, both early and late divergence times might lead to a similar trend in the associated number of indels. The highest frequency of indels occurred about 5 MYA.
The sizes of complete genome sequences and nonprotein coding sequences of the species within clades I (Triticum/Aegilops) and IV (Hordeum) were significantly larger than those of the species within clades II (St/E/V genomes) and III (P/F/Xe/W/O genomes) (Figures 6A,C). However, the sizes of the protein coding sequences among these species were not significantly different (Figure 6B). The average sizes of complete cp genome sequences of the species within clades I-IV were 136,362, 135,093, 135,553, and 136,575 bp (Figure 6A), and protein-coding gene sequences were 59,308, 59,324, 59,305, and 59,266 bp (Figure 6B), respectively.

DISCUSSION
Plastome size variations within major lineages are mainly attributable to a combination of three factors: differences in length of intergenic regions, changes in intron content, and contraction/expansion of IRs, which resulted in cp genome size ranging from 115 to 165 kb in the length of angiosperms (Raubeson and Jansen, 2005;Brouard et al., 2010;Wicke et al., 2011;Smith et al., 2013;Turmel et al., 2017). Previous studies have shown that the size of the cp genome of Triticeae species ranged from 134 to 137 kb (Gornicki et al., 2014;Middleton et al., 2014;Chen et al., 2017). The intergenic regions, which represent up to 68% of the genome, contribute to most of the observed genome size variation (Turmel et al., 2015). Our ancestral genome reconstruction suggests that major rearrangements occurred in four branches (clades I-IV) of monogenomic Triticeae followed by minor independent rearrangements in each genus. The sizes of complete genome sequences of the species within clades I (Triticum/Aegilops complex, Taeniatherum, Secale, Crithopsis, and Herteranthelium) and IV (Hordeum) were significantly larger than those of within clades II (Pseudoroegneria, Dasypyrum, Lophopyrum, and  Thinopyrum) and III (Agropyron, Australopyrum, Eremopyrum, and Henradia). The amounts of IGSs showed extensive fluctuations and were attributed to major plastome size variation among the 34 taxa studied (Figures 3, 6), such as the 529-bp deletion between the petN gene and the trnC gene (Supplementary Figure 5) in the species within the clade II (Pseudoroegneria, Dasypyrum, Lophopyrum, and Thinopyrum), and the 438-bp deletion between psbE and petL in the species within clade III (Agropyron, Australopyrum, Eremopyrum, and Henradia) (Supplementary  Figure 6). These deletions might occur in the ancestral plastid genomes of species in clades II and III, inherited by and maintained in their offspring species. Variations in intergenic regions do not affect protein function and structure but cause variations in chloroplast genomes' size. Intergenic regions in the cp genome have been utilized for systematic studies in diverse plants because they provide a wealth of information for distinguishing different species, such as trnL-trnF and trnH-psbA (Sha et al., 2014;Sevindik et al., 2021). Therefore, these intergenic regions (like petN-trnC and psbE-petL) can be used as an important tool to identify the maternal relationship among Triticeae species in the future.
The amount of introns, such as the intron of atpF (Supplementary Figure 16) and the intron of rpl16 (Supplementary Figure 17), also showed extensive changes, in LSC, but no obvious patterns can be discerned in a phylogenetic context. The size variations in those introns were most likely the consequence of non-adaptive processes. However, random genetic drift might also play a central role in the shaping of plastome architecture (Smith, 2016). Inverted repeat expansion and contraction were mostly associated with cp genome size variation (McCoy et al., 2008;Wu et al., 2009). Expansion and contraction in IRs of chloroplast genomes have been widely reported in various kinds of plants (Krause, 2011;Weng et al., 2014;Schwarz et al., 2015;Shrestha et al., 2019;He et al., 2020;Guo et al., 2021). We also found that cp genomes in the 34 Triticeae species exhibited obvious different IR sizes. A ∼800-bp length variation in IR was detected between trnI and trnL, which mainly contained pseudogenes ycf 2 and ycf 15. Wu et al. (2017) suggested that this deletion in the cp of P. libanotica was specific in Triticeae species. We confirmed that the loss of pseudogenes ycf 2 and ycf 15 occurred in the cp of Pseudoroegneria, Dasypyrum, Lophopyrum, and Thinopyrum. The Lophopyrum, Thinopyrum, and Dasypyrum genomes contributed a cytoplasm genome to the Pseudoroegneria species as a result of incomplete lineage sorting and/or chloroplast capture (Chen et al., 2020). A maternal donor of the species in clade II might have lost pseudogenes ycf 2, ycf 15, and adjacent gene spacers, and genetically transmitted to their offspring, leading to the IR size variation that ranged from 20,806 (D. villosum) to 20,866 bp (P. spicata), which was smaller than that in the other Triticeae species (>21 kb) (Figure 2). In some plants, ycf 2-encoded protein and five related nuclearencoded FtsH comprised a 2-MD complex, which can promote ATP synthesis (Kikuchi et al., 2018). The disappearance of ycf 2 might indicate that this function region has been transferred to a nuclear genome. Wu et al. (2017) found that this deletion was exactly similar (identities = 99%) to a genomic scaffold of chromosome 3B of T. aestivum, and that it possesses high similarity (identify = 98%) with cp sequences in Pooideae species, but the specific function is still unknown. Although many species in different lineages contain an intact ycf 15 gene (encoding the beta subunit of acetyl-CoA carboxylase complex) and have been annotated in several sequenced chloroplast genomes, e.g., several Asterids, Magnolia (Kuang et al., 2011), andPiper (Cai et al., 2006), it is almost impossible to determine whether this gene is able to encode a functional protein or how it has evolved in angiosperms so far. The gene was first identified in the Nicotiana chloroplast genome (Shinozaki et al., 1986), and its similar expression was also reported in Solanaceae chloroplasts (Legen et al., 2002). However, the validity of ycf 15 as a protein-coding gene has long been questioned (Steane, 2005;Chumley et al., 2006). Its function was disabled in some basal angiosperms such, as Amborella (Goremykin et al., 2003) and Nuphar (Raubeson et al., 2007), monocots, most rosids, and some species in other lineages. It has been wholly lost in some other lineages, such as Illicium, Acorus, Ceratophyllum, and Ranunculus during their evolution processes. Transcriptome analyses revealed that ycf 15 has transcribed as a precursor polycistronic transcript that contained ycf 2, ycf 15, and antisense trnL-CAA. Pseudogene ycf 15 was mapped by multiple transcripts, which suggested that plastid DNA posttranscriptional splicing might involve a complex cleavage of non-functional genes (Shi et al., 2013). A similar pseudogenetic elimination also occurred in the LSC of clade III species (Agropyron, Eremopyrum, Australopyrum, and Henradia), such as the rpl23 pseudogene copy and the accD pseudogene between the rbcL gene and the psaI gene (Supplementary Figure 2). In the grass plastome, the rpl23 gene is originally located in the IR region and encodes the functional ribosomal protein L23. Present studies have indicated that rpl23 has been non-reciprocally translocated to a region downstream from rbcL (Bowman et al., 1988;Ogihara et al., 1988;He et al., 2017;Lencina et al., 2019). A non-reciprocal translocation of the rpl23 gene occurred during the differentiation of Poaceae from its unknown ancestor (Katayama and Ogihara, 1996). In this study, we detected only ∼260 bp of rpl23 at the 3 end (complete gene size was 282 bp), which was inserted into the region between rbcL and psaI within the LSC of Triticeae species (except for Aegilops tauschii, Agropyron, Australopyrum, Eremopyrum, Henradia, and S. cereale). Meanwhile, sequence analysis detected another inserted pseudogene accD in this region, which is situated downstream of rpl23. In angiosperms, the accD gene encodes the protein of the acetyl-CoA carboxylase subunit D in the plastome. It is notable that the loss of accD is associated with hotspots of rearrangements in each of the families, introducing sensitivity to the herbicides quizalofop and sethoxydim (Konishi and Sasaki, 1994), and causing an alteration of lipid metabolism in the plants. The loss of the accD gene had been found in four lineages of angiosperms, such as grasses (Hiratsuka et al., 1989;Maier et al., 1995;Katayama and Ogihara, 1996), Campanulaceae (Cosner et al., 1997), Geraniaceae (Palmer et al., 1987;Chumley et al., 2006), and Aroideae (Henriquez et al., 2021). Katayama and Ogihara (1996) considered accD loss prior to the divergence of the Poales. However, previous studies have shown the absence of the accD pseudogene in at least one species of Triticeae (Ogihara et al., 2002), and the presence of the accD pseudogene of up to 349 bp in Secale of the Triticeae species (Aagesen et al., 2005). The highly varied pattern of accD pseudogene presence or gene absence in members of the restiid and graminid clades might be due to the pseudogene being carried as the ancestral state throughout most of the divergence of the Poales (Harris et al., 2013). The mechanism for the insertion of variable sizes in this region is uncertain. In this study, the absence of the accD gene was only detected in species of clade III, A. speltoides, and A. tauschii in Triticeae (Supplementary Figure 3). The accD gene loss might accelerate gene relocations by unknown mechanisms, and various movements of genes might induce the sequential loss of accD . The monogenomic Triticeae was diverged about 12-15 MYA to generate lineages, resulting in four branches of plastome species. Our ancestral genome reconstruction suggests that ancestral plastome species of clades II and III lost redundant sequences during divergence from Triticeae. The cp DNA sizes of species in the clades II and III (Pseudoroegneria, Dasypyrum, Eremopyrum, Lophopyrum, Thinopyrum, Agropyron, Australopyrum, and Henradia) evolved toward size reduction (Figures 3, 6) because of elimination of pseudogene and loss of long fragments (>200 bp) in IGS. Compared with species in clades II and III, the cp genomes of species in clades I and IV (Aegilops/Triticum complex, Taeniatherum, Secale, Crithopsis, Herteranthelium, and Hordeum) had larger cp genome size, and retained the invalid genes and lots of redundant fragments of IGS (Figures 3, 6). The cpDNA size reduction in the species of clades II and III (Pseudoroegneria, Dasypyrum, Eremopyrum, Lophopyrum, Thinopyrum, Agropyron, Australopyrum, and Henradia) might increase the efficiency of genome replication. Genome reduction is speculated to be the result of a low-cost strategy that could facilitate rapid genome replication under disadvantageous environmental conditions (McCoy et al., 2008;Wu et al., 2009). The retained pseudogenes and replicates might participate in some nucleo-plasmic interactions to promote gene function. Although the accD, ycf 2, and ycf 15 genes had been lost from the plastid genome several times in angiosperms, their functions were fulfilled by nuclear copies (Nakkaew et al., 2008;Huang et al., 2017;Wu et al., 2017).
Environmental conditions were thought to influence organelle DNA architecture. For example, plastid genomic compaction in the endolithic ulvophyte seaweed (Ostreobium quekettii) and the palmophylalean green alga (Verdigellas peltata) was caused primarily by adaptation to low light conditions (Marcelino et al., 2016). The large size variation in major lineages and their subclades is most likely the consequence of adaptive processes since those variations are highly positively correlated with divergence time (Figure 5). Plastid genomes smaller than 120,000 bp were detected mostly in non-photosynthetic angiosperms that had a deletion of several genes during their diversification (Wicke et al., 2011(Wicke et al., , 2016. The diversification of high plant species has been found to be strongly linked to climate fluctuations (Jaramillo et al., 2006;Hoorn et al., 2010). During the late Miocene (5-10 MYA), the atmospheric CO 2 level was decreased to the bottom after the mid-Miocene climate optimum (14-16 MYA) (Tripati et al., 2009), resulting in climate change from greenhouse to the icehouse. In order to adapt to an extremely cold and low CO 2 concentration climate, most species will reduce metabolism to maintain survival, and correspondingly, the expression of genes related to photosynthesis would be reduced, since CO 2 is required for photosynthesis. The species in the clade I from genera Aegilops, Triticum, Secale, Taeniatherum, Crithopsis, and Heteranthelium were restrictedly distributed in the Mediterranean and adjacent regions (Sakamoto, 1973;Hsiao et al., 1995), where there are hot, dry environments such as the deserts of Mediterranean regions. The sizes of complete cp genome and non-protein coding sequences in Mediterranean lineage were larger than those of other genera as the ancestral species evolved and diverged 0.03-12.71 MYA (Figure 3), and variations appeared most frequently about 5 MYA (Figure 5). The main diversification of Mediterranean lineage in Triticeae occurred about 9 MYA when Mediterranean climates are thought to have arisen. In the late Miocene, when atmospheric CO 2 concentrations and temperatures were extremely low, the Mediterranean climate undoubtedly provided a good shelter for some plants. The development of the Mediterranean climate can be seen as the opening of a new and novel climatic niche, to which lineages have adapted and speciated, by accumulating morphological change in other climate zones (Yesson and Culham, 2006). It is, thus, likely that climate oscillations during the late Miocene, especially the establishment of the Mediterranean climate, might have promoted the Mediterranean lineage of Triticeae rapid diversification and adaptation, and have continued to diversify from the Quaternary to the present (Fan et al., 2013), which FIGURE 6 | Comparisons of complete cp genome, protein coding region, and non-protein coding region within the four clades, with significant differences (p < 0.05, Kruskal-Wallis test) being estimated. (A) Comparisons of complete cp genome sequences; (B) comparisons of protein coding gene sequences; (C) comparisons of non-protein coding sequences. *p < 0.05; **p < 0.01. resulted in plastid genome and nuclear genome change to adapt to climate oscillations. We suggested that the dynamically reduced/enlarged cpDNAs of Triticeae might result from the adaptation to historical climate changes. However, we still need more molecular evidence to determine the role of natural selection in chloroplast genome evolution during the diversification of Triticeae.

CONCLUSION
In this study, we detected gene loss/pseudonization, indels, and intron variation, variation in cp genome sequence size, and expansion/contraction in IRs among 34 chloroplast genomes of monogenomic Triticeae species. We found that the cp genome sequence size variation was mainly caused by the size of nonprotein coding sequences. The monogenomic Triticeae diverged about 12-15 MYA, which resulted in four stem branches of plastome species. Losses of a series of invalid genes or sequence fragments have no effect on genomic function in these plants, which might be an evolutionary mechanism to increase the efficiency of genome replication. According to the distribution and habitat of the species, the species in the Mediterranean region (in clade I, Aegilops/Triticum complex, Taeniatherum, Secale, Crithopsis, and Herteranthelium) might have experienced the change in the Mediterranean climate and expanded the cp genome significantly. Our results enhance the understanding of the complexity and evolution of Triticeae cp DNAs.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: the chloroplast genome sequences in this study, KJ614418,

ACKNOWLEDGMENTS
We are very grateful to the American National Plant Germplasm System (Pullman, WA, United States) for providing the part seed materials.

SUPPLEMENTARY MATERIAL
The