Evolutionary differences in gene loss and pseudogenization among mycoheterotrophic orchids in the tribe Vanilleae (subfamily Vanilloideae)

Introduction Galeola lindleyana is a mycoheterotrophic orchid belonging to the tribe Vanilleae within the subfamily Vanilloideae. Methods In this study, the G. lindleyana plastome was assembled and annotated, and compared with other Vanilleae orchids, revealing the evolutionary variations between the photoautotrophic and mycoheterotrophic plastomes. Results The G. lindleyana plastome was found to include 32 protein-coding genes, 16 tRNA genes and four ribosomal RNA genes, including 11 pseudogenes. Almost all of the genes encoding photosynthesis have been lost physically or functionally, with the exception of six genes encoding ATP synthase and psaJ in photosystem I. The length of the G. lindleyana plastome has decreased to 100,749 bp, while still retaining its typical quadripartite structure. Compared with the photoautotrophic Vanilloideae plastomes, the inverted repeat (IR) regions and the large single copy (LSC) region of the mycoheterotrophic orchid’s plastome have contracted, while the small single copy (SSC) region has expanded significantly. Moreover, the difference in length between the two ndhB genes was found to be 682 bp, with one of them spanning the IRb/SSC boundary. The Vanilloideae plastomes were varied in their structural organization, gene arrangement, and gene content. Even the Cyrtosia septentrionalis plastome which was found to be closest in length to the G. lindleyana plastome, differed in terms of its gene arrangement and gene content. In the LSC region, the psbA, psbK, atpA and psaB retained in the G. lindleyana plastome were missing in the C. septentrionalis plastome, while, the matK, rps16, and atpF were incomplete in the C. septentrionalis plastome, yet still complete in that of the G. lindleyana. Lastly, compared with the G. lindleyana plastome, a 15 kb region located in the SSC area between ndhB-rrn16S was found to be inverted in the C. septentrionalis plastome. These changes in gene content, gene arrangment and gene structure shed light on the polyphyletic evolution of photoautotrophic orchid plastomes to mycoheterotrophic orchid plastomes. Discussion Thus, this study’s decoding of the mycoheterotrophic G. lindleyana plastome provides valuable resource data for future research and conservation of endangered orchids.


Introduction
The Orchidaceae, one of the two largest families of flowering plants, are unique in features such as their labellum, stamen, pollen block and dust-like seeds (Chase et al., 2015). They include approximately 28,000 species in 736 genera (Christenhusz and Byng, 2016), and are distributed throughout the world. Molecular phylogenetic research has revealed the Orchidaceae to be a monophyletic group, divided into five subfamilies, namely Apostasioideae, Cypripedioideae, Vanilloideae, Orchidoideae and Epidendroideae (Chase et al., 2015;Li et al., 2016). Among these, the Vanilloideae, Orchidoideae and Epidendroideae subfamilies include both photoautotrophic and mycoheterotrophic orchids (Merckx, 2013), with 232 mycoheterotrophic species in 43 genera. This feature makes them ideal material for research related to trophic transformation and gene structure change. Molecular evidence has shown that the rate of evolution in the Vanilloideae subfamily has variously accelerated and slowed down (Wang et al., 2018), and it is the first independent clade to include mycoheterotrophic species after the Apostasioideae subfamily . Therefore, further in-depth investigation of the Vanilloideae subfamily plastomes can provide more abundant evidence for the analysis of the genetic evolution mechanism, as well as establish a molecular basis of trophic-type changes in orchids in their early evolutionary stage.
Photosynthesis is known to be the primary means by which plants obtain their nutrients. However, limited by factors such as challenging living conditions or oxidative stress, plants have also adaptively evolved alternative survival strategies such as mycoheterotrophy and parasitism (Leake, 1994;Bidartondo, 2005;Merckx and Freudenstein, 2010;Westwood et al., 2010;Delannoy et al., 2011;Merckx, 2013;Wicke et al., 2016). While such changes in the survival strategies and vegetative organs of plants occurred, research has shown that traces of gene variation and selection remained in their plastomes (Wicke et al., 2011;Bromham et al., 2013;Petersen et al., 2015;Wicke et al., 2016). The transition from autotrophic to parasitic was accompanied by the loss of photosynthesis and housekeeping plastid genes (Wolfe et al., 1992;Funk et al., 2007), as well as the function, size (Lohan and Wolfe, 1998) and transcriptional association with essential genes (Wicke et al., 2013), all of which are linked to the retention or lost of the gene in the plastids of non-photosynthetic plants. Therefore, based on data published regarding the evolutionary transition from phototrophic to parasitic plastomes, an overall trend prediction for progressive gene loss has been proposed (Barrett and Davis, 2012;Barrett et al., 2014a;Bellot and Renner, 2016;Naumann et al., 2016;Wicke et al., 2016). In this hypothesis, the plastid genome reduction process is divided into the following four stages: 1) The loss of the NADH dehydrogenase-like (NDH-1) complex, generally regarded as the earliest loss of plastid-encoded genes; 2) This stage was followed by pseudogenization and the loss of photosynthetic genes, the deprivation of photosynthetic function and the removal of selection pressure to retain photosynthetic plastid genes; 3) Subsequently, the loss of genes for the plastid-encoded subunits of RNA polymerase and photosynthetic enzymes with minor functions (Rubisco and ATP synthase), the relative timing of which had an asynchronous and comparatively wide window; and 4) The delayed loss of the five core non-bioenergetic genes (especially trnE and accD, which encode glutamyl tRNA and acetyl-CoA carboxylase subunits, respectively), observed only in fully parasitic plastomes with large-scale gene loss. The range of changes of mycoheterotrophic plastomes is similar to that of parasitic species (Barrett et al., 2014b;Lam et al., 2015;Lam et al., 2016;Kim et al., 2019;Kim et al., 2020), however, in the evolution of plastid genomes in mycoheterotrophic orchids belonging to the same tribe, gene variation and loss have been specific rather than convergent. Feng et al. (2016), for example, found that the transitions to a fully mycoheterotrophic lifestyle evolved independently at least three times during the evolution of the tribe Neottieae. Despite the general trend of plastid degradation was similar during the transition from autotrophic to mycoheterotrophic and from autotrophic to parasitic, highly lineage-specific plastome degeneration still occurred.
The Vanilloideae subfamily includes 14 genera and 245 species (Chase et al., 2015) with various lifestyles, including epiphytic and terrestrial. Only nine orchid plastomes have thus far been reported, distributed among four genera and two tribes (Cameron, 2009;Lin et al., 2015;Amiryousefi et al., 2017;Niu et al., 2017;Kim et al., 2019;Kim et al., 2020). Of the 14 Vanilloideae subfamily heterotrophs in five genera, only three species in two genera have been reported. Due to the lack of decoded mycoheterotrophic Vanilloideae plastomes, the study of genetic variation in plastomes during the lifestyle transition from photoautotrophic to mycoheterotrophic has thus far been insufficient, and the diversity of evolutionary models has not been verified. Thus, this study sequenced, assembled, and annotated the mycoheterotrophic G. lindleyana plastome, subsequently comparing it with the plastomes of previously researched photosynthetic and heterotrophic orchids in the same subfamily, in order to explore the genetic variations occurring in plastomes that transitioned from photoautotrophic to mycoheterotrophic.

Material and methods
Plant materials, DNA extraction and high-throughput sequencing Since the G. lindleyana plant does not have typical leaf organs, its root was used to obtain its chloroplast genome sequence. The root of G. lindleyana (Figure 1)  Fresh root samples were ground into fine powder with liquid nitrogen in a mortar, and then used to extract the genomic DNA using a Plant Genomic DNA Extraction Kit (Tiangen Biotech (Beijing) Co., Ltd., China). The DNA concentration and the ratios of A260/A280 and A260/A230 were measured using a Thermo Scientific NanoDrop 2000 ultra-micro spectrophotometer (Thermo Fisher Scientific Inc., MA, USA). Following the construction of a 270 bp PCR-free library, the whole genomic DNA of G. lindleyana was sequenced using the Illumina NovaSeq 6000 platform via shotgun sequencing.

Plastome assembly and annotation
The resulting raw high-throughput sequencing reads were trimmed and filtered using Trimmomatic v0.38 (Bolger et al., 2014). The complete G. lindleyana chloroplast genome was assembled using the GetOrganelle(v1.7.7.0) toolkit (Jin et al., 2020), and four junction regions between IRs and LSC/SSC were subsequently confirmed via polymerase chain reaction (PCR) amplifications and Sanger sequencing using the primers listed in Table S1. The genome sequence was automatically annotated using the CPGAVAS2 integrated web server (Shi et al., 2019), and then manually edited using the Apollo editor according to separate BLASTN results comparing the CDS and protein sequences of 238 previously published plastid genomes in the Orchidaceae family.

Genome comparison, divergence and phylogenetic analysis
Genome features, such as size, LSC region, SSR region, IR region, GC content and unique genes, were analyzed or counted using a local Python script. SSRs were predicted via MISA(http:// pgrc.ipk-gatersleben.de/misa/) microsatellite finder (Beier et al., 2017);, and the tandem repeat sequences were found using a tandem repeats finder (TRF) (Benson, 1999). SC-IR junction regions were described via IR Scope among six Vanilleae species (Amiryousefi et al., 2018), while chloroplast genome comparison was completed using mVISTA (http://genome.lbl.gov/vista/mvista/ submit.shtml) with the annotation of four Vanilleae species as references (Frazer et al., 2004). Pairing sequence alignments of the cp genomes were performed using Mummer v3.23 with the six Vanilleae species and five mycoheterotrophic orchids (Kurtz et al., 2004). Synonymous codon usage and RSCU were analyzed using the CodonW v1.4.2 (Sharp and Li, 1987). The plastome sequence gene data of 59 Orchidaceae species were also used to construct a maximum likelihood (ML) phylogenetic tree, using five species (Allium cepa, Eustrephus latifolius, Iris gatesii, Iris sanguinea and Fritillaria hupehensis) as out groups. A total of 79 CDS genes were extracted and aligned using MAFFT software v7.515 (Katoh et al., 2002), and these alignments were subsequently concatenated into a single length of 71, 597 bp. The missing data of CDS genes were treated as insertions/deletions, and filled the alignments with dashs. Concatenated alignments were then used to perform the JModelTest. Finally, an ML tree was constructed via RAxML v8.2.12 (Stamatakis, 2014) with a GTR+G+I model with 1000 bootstrap replicates.

Features of G. lindleyana cpDNA
A total of 43 million pair-end reads were produced with 5.81 Gb of clean data. Data from all of the reads were deposited in the NCBI Genbank under accession number MW528436. The complete plastome was found to be 100,749 bp (Figure 2), and displayed a typical quadripartite structure, including a pair of inverted repeat regions (IR; 11,607 bp) separated by large single copy (LSC; 59,493 bp) and small single copy (SSC; 18,042 bp) regions, covering 11.5%, 59.1% and 17.9% in the plastome, respectively ( Table 1). The total GC content in the whole G. lindleyana plastome was 34.36%, with the LSC region containing the lowest amount of GC contents (31.24%) compared to those of the SSC (38.91%) and IR (38.80%) regions.
In this plastome the most frequently used codon was AAA (n=1374) followed by AAT (n=1032), encoding lysine and asparagine, respectively. The least frequently used codon was stop codon TGA (n=31).

Codon usage
Codon usage patterns of the coding sequences for G. lindleyana were calculated based on the relative synonymous codon usage (RSCU) value. All protein-coding genes in G. lindleyana plastome were encoded by 9749 codons ( Figure S1). A total of 61 codons encoded 20 amino acids, with three stop codons. There were 32 preferred and 32 nonpreferred codon usages. Among them, the most abundant amino acid was leucine, with 962 codons (9.87% of total), followed by isoleucine with 838 codons (8.60% of total) (Figure S1A), while stop codons were the fewest, at just 31 (0.32% of total). Almost all amino acids had more than one synonymous codon, with the exceptions of tryptophan and methionine. Arginine, serine and leucine had the most synonymous codon usages.
The RSCU value was used to evaluate the synonymous codon bias, and codons with greater RSCU values were preferred in each case ( Figure S1B). The most preferred codon was found to be AGA encoding amino acid arginine (Arg), with 1.99 RSCU, followed by UUA encoding leucine (Leu) and UCU encoding serine (Ser) with 1.79 and 1.73 RSCU, respectively. By contrast, the lowest frequency codon was found to be CAC encoding histidine (His) with 0.32 RSCU, followed by CGC encoding arginine (Arg) with 0.37 RSCU. With the exceptions of the leucine encoded by UUG and serine encoded by UCC, amino acid codons (RSCU > 1) in the G. lindleyana plastome preferentially showed A-or U-endings, and non-preferred codon usages ended with base C or G, corresponding to the previously mentioned results that were calculated based on protein-coding sequences.

Simple sequence and tandem repeat analyses
Simple sequence repeats (SSRs, also known as microsatellites) and tandem repeats (TRs) in the G. lindleyana plastome were surveyed in this study. A total of 41 SSRs were detected, following strict performance parameters (unit_size/min_repeats): 1/10 (mononucleotides ≥ 10 nt), 2/ 6 (dinucleotides ≥ 6 repeats), 3/5, 4/5, 5/5 and 6/5 (Table S1). Analysis included 29 mononucleotide repeats, nine dinucleotide repeats and two trinucleotide repeats, with only one hexa-nucleotide SSR identified. No tetra-or penta-nucleotide SSRs were found. The majority of SSRs were mononucleotides repeats (70.7%) in which base T and A were the primary elements with a proportion of 68.3%, only one C motif and no G motifs. The T-repeat unit was found to be most abundant in this study, with a total of 16, while the hexa-nucleotide SSR was repeated most frequently at 23 times in total.
There were 116 copy number variations (CNVs) of TR units (Table S1) in the G. lindleyana plastome ranging in length from 7 to 63 bp, with those of 16 and 18 bp most abundant (12), followed by those of 12 bp (10), and then those of 13 bp (7) and 24 bp (7). Most of the 116 CNVs were found to be present in intergenic regions, and only nine were present in the genic regions of the accD, ycf1 and ycf2 genes, and the psbC pseudogene. The TR units Chloroplast genome map of G. lindleyana. Genes inside the circle are transcribed clockwise, whereas those outside are transcribed counterclockwise. The darker gray in the inner circle corresponds to the GC content, while the lighter gray corresponds to the AT content. Pseudogenes are marked with an asterisk (*). appeared more frequently in the LSC regions (81.0%) than in the SSC (12.9%) and IR (6.0%) regions. CNVs of various TR units related to indel polymorphism were also identified in the G. lindleyana plastome.

Junctions of inverted repeats and single copy regions
A comprehensive assessment of the four junctions (J LA , J LB , J SA and J SB ) between the two IR regions and the two single copy regions (LSC and SSC) of six species in the subfamily Vanilloideae was also performed, and the results are presented in Figure 3. Evidence of substantial expansion and contraction in both the IR regions and the two single copy regions were detected, with the IR regions ranging from 32,683 bp in Pogonia japonica to 10,414 bp in Cyrtosia septentrionalis; the LSC region ranging from 87,447 bp in P. japonica to 28,197 bp in Lecanorchis japonica; and the SSC region ranging from 18,042 bp in G. lindleyana to 2,146 bp in Vanilla aphylla.
Two junctions were conserved between the LSC region and the two IR regions in the same genus, but were variated in different genera. The distance between the J LA border and matK gene was found to be the same in all Lecanorchis plastomes. Similarly, the rpl2 gene that crossed J LB in the two Lecanorchis plastomes was located in the LSC region but expanded to 127 bp to reach the IRb region. However, the J LA and J LB borders were variable in the other four species: The J LB borders of P. "×2" indicates that the number of repeat units is two; One or two asterisks following genes indicate one or two contained introns, respectively. Pseudogenes are marked with an underscore. The IR contraction in the C. septentrionalis plastome brought about the crossing of rps19 over the J LB border, with 41 bp located in the LSC region and the entire rps19 in the IRb region of the P. japonica, V. aphylla and G. lindleyana plastomes. In addition, J LA was found between rps19 and psbA in P. japonica, V. aphylla and G. lindleyana plastomes, although the psbA was not present at the J LA border in C. septentrionalis plastome. The distances between rps19 and J LA were found to be 1 bp in C. septentrionalis plastome, 89 bp in V. aphylla plastome and 101 bp in G. lindleyana plastome, respectively, all of which were longer than the distance of 119 bp between psbA and J LA in P. japonica plastome.
In these six species belonging to the Vanilleae, more variations were found in the J SA and J SB than in the J LA and J LB . In Lecanorchis kiusiana and L. japonica plastomes, the distance between ycf1 and J SB was 1 bp and 2 bp, respectively, and the entire ycf1 was in the IRb region. However, the ycf1 in these two Lecanorchis species crossed the J SA border at 4431 bp and 4532 bp located in SSC region and 1187 bp and 1113 bp in the IRa region in L. kiusiana and L. japonica plastomes, respectively. The genes around the J SA and J SB, and its location were varied in the other four species. In C. septentrionalis plastome, rrn16 and ndhB were located entirely in the SSC region, at 67 bp from the J SB border and 139 bp from the J SA border, respectively.On the J SB border, the ndhA in P. japonica, ccsA in V. aphylla and ndhB in G. lindleyana crossed the IRb/SSC boundary, with 442 bp, 513 bp and 914 bp of each gene located in the IRb regions, respectively, and 153 bp, 407 bp and 1594 bp within the SSC regions, respectively. Moreover, the ndhA in P. japonica, ccsA in V. aphylla and ndhB in G. lindleyana were all at the beginning of the IRa regions, near the J SA borders.

Comparative analysis and sequence divergence analyses
The differences in the four mycotrophic Vanilleae plastomes were evaluated using the phototrophic V. aphylla plastome as a reference, and the results are presented in Figure 4. Compared with the phototrophic species, the deletion of gene sequences in mycotrophic species was found to be significant. In L. kiusiana and L. japonica, all functional genes involved in photosynthesis have been lost. In C. septentrionalis and G. lindleyana, most genes of the photosystem were lost, while others, such as psaA, psaB and psaC, have been retained. Among these five species, the length of G. lindleyana and C. septentrionalis plastomes were found to be similar, but still showed significant differences in their functional gene loss and pseudo-genes. In the LSC area, the psbA, psbK, atpA and psaB retained by G. lindleyana were found to be missing in C. septentrionalis; while the matK, rps16 and atpF present as pseudogenes in C. septentrionalis remain complete in G. lindleyana.

Dynamic chloroplast genome structures of Vanilleae and three typical mycotrophic orchids
Gene block analysis, by which gene rearrangements are identified, was carried out using Mauve software among six Vanilleae orchids and four typical mycotrophic orchids ( Figure 5). In the Vanilleae, rearrangement and inversion of genes appeared frequently, and even the phototrophic orchid plastomes were not fully colinear in all regions. In P. japonica and V. aphylla, approximately 37 kb of inversion occurred in the LSC area. However, compared with the other four mycotrophic orchid plastomes, gene loss and rearrangements of these two phototrophic orchid plastomes were limited. Among the  Vanilleae, the species with the chloroplast genome most similar to that of G. lindleyana was C. septentrionalis, however, with a 15 kb inverted in the SSC area, between ndhB and rrn16S. Compared with G. lindleyana, the photosynthesis genes in the LSC regions of the L. kiusiana and L. japonica chloroplasts were absent, and the rpl2-rps19 in the IR area was inverted to the LSC area.

Phylogenetic relationships
In phylogenetic analysis, the Orchidaceae were divided into five subfamilies using the classification of Orchidaceae proposed by Chase et al. (2015). A total of 59 orchid plastomes from three of these subfamilies were employed to infer the phylogenetic relationship, using the maximum likelihood (ML) methods ( Figure 6). Support values were consistently very high, except for the branch leading to the tribes Epidendreae, Neottieae and Cranichideae. Eight species belonging to Vanilleae formed a monophyletic group with high support. In the tribe Vanilleae, four mycoheterotrophic species were well distinguished from the photoautotrophic species and, within them, genus Lecanorchis was resolved as sister to the branch including G. lindleyana and C. septentrionalis. The respective branch support values for the branch including G. lindleyana, C. septentrionalis and genus Lecanorchis were lower than those for the branch including G. lindleyana and C. septentrionalis. The mycoheterotrophic plastomes belonging to the tribe Vanilleae are divided into two main clades, one of which is represented by genus Lecanorchis, while the other group contains two genera, namely Galeola and Cyrtosia. Among these three genera, the Galeola and Cyrtosia plastomes were found to have lost a similar length and number of genes, with a greater loss than that of Lecanorchis. Since only these four mycoheterotrophic chloroplast genomes have thus far been reported in Vanilleae, more sample data is needed to confirm any differences between the chloroplast genome loss strategies of the two genera species (Galeola and Cyrtosia). Global alignment of five chloroplast genomes of Vanilleae using mVISTA. Y-axis indicates the range of identity (50%-100%). Alignment was performed using V. aphylla as reference.

Rearrangements in plastomes
Although chloroplast genome structures are generally highly conserved, some do undergo changes during the long-term evolutionary process, subsequently appearing as either rearrangements or inversions. Rearrangements and inversions were detected in all chloroplast genomes of both the autotrophic and heterotrophic orchids in the Vanilloideae subfamily. In the Pogonia and Vanilla genera, rearrangement and inversion were seen in the LSC, IR and SSC regions, while these changes were seen only in the IR and SSC regions in the Cyrtosia and Galeola genera, and were confined to the LSC region in Lecanorchis genus. Rearrangements of the IR and SSC regions occur frequently and are thought to be the product of species evolution (Jansen and Palmer, 1987;Doyle et al., 1996;Chumley et al., 2006;Lee et al., 2007;Jansen et al., 2008;Ravi et al., 2008;Weng et al., 2014;Yan et al., 2017;Frailey et al., 2018;Roma et al., 2018). In Pelargonium hortorum, for example, the IR region of the chloroplast genome has extensively expanded, increasing in length to 76 kb, which is three times that of most common plants (Chumley et al., 2006). By contrast, the IR regions in Fabaceae and Cupressaceae have contracted significantly and even, in some, been completely lost (Saski et al., 2005;Hirao et al., 2008). In addition, the rearrangement of SSC regions has been detected in the chloroplast genomes of semi-parasitic species (Frailey et al., 2018), and the inversion of the SSC region, which appears to be a hallmark of the semi-parasitic group in the Orobanchaceae, has a chloroplast genome size similar to that of autotrophs. However, such rearrangement and inversion in all regions, as well as the initial contraction followed by expansion of the IR region, are rare, which suggests that the Vanilloideae subfamily may have been at the forefront of this evolutionary process. Changes in the nutrient types of the Vanilloideae subfamily, thus, closely related to the frequent occurrence of rearrangement and inversion.

Variations in the IR/SC boundary
All chloroplast genomes have a typical four-part structure in the Vanilloideae subfamily, as do mycoheterotrophic orchids. However, differences have been found within the mycoheterotrophic orchids Phylogenetic tree for 59 Orchidaceae (with five non-Orchidaceae species as outgroups) using maximum likelihood (ML), based on the alignments of complete chloroplast genomes. Numbers at the nodes indicate bootstrap values from 1000 replicates.
in the Orchidaceae subfamily, such as the loss of the IR region in Gastrodia elata plastome (Yuan et al., 2018;Kang et al., 2020;Park et al., 2020). The sizes and structures of chloroplast genomes in the Vanilloideae subfamily differ among the genera, and rearrangement and inversion occur in all four regions (Kim et al., 2015;Zeng et al., 2017). Among them, the longest chloroplast genome, at approximately 158,200 bp, is that of P. japonica, a member of the Pogonieae tribe, while the shortest, less than half the size at approximately 70,498 bp, is that of L. japonica belonging to the Vanilleae tribe. Such a dramatic difference in chloroplast genome length has also been found among the Neottieae tribe (Feng et al., 2016). However, some semi-heterotrophic species have also been found in the Orobanchaceae family, whose length of plastomes are similar to those of autotrophic species, suggesting that their plastomes are still in the early stages of their transition from autotroph to parasite (Wicke et al., 2013;Wicke and Naumann, 2018). The size of autotrophic orchid plastomes has been found to range from 147 kb to 151 kb, while those of mycoheterotrophic orchid plastomes fluctuate more widely, ranging from 70 kb to 100 kb. In mycoheterotrophic orchids, the plastomes of the genus Lecanorchis are smaller than 100 kb, while those of the genera Cyrtosia and Galeola are approximately 100 kb. Plastomes of the Lecanorchis orchid have not contracted significantly, possibly because the significant shrinkage of the LSC region is countered by the expansion of the SSC region. A similarly remarkable expansion of the SSC region to the IR region has also been reported in the Orobanchaceae family: Unexpectedly enlarged overall chloroplast genome lengths, attributable to the expansion of IR regions into SSC regions, have been noted in the semiheterotrophic Buchnera americana as well as in three species belonging to the genus Striga in the same family (Frailey et al., 2018). Therefore, it is evident that the expansion of the SSC region in fully heterotrophic plants slowed down the pace of their plastome shrinkage, however, the reasons for this expansion still need to be explored. The expansion of SSC regions in the chloroplast genomes of semi-heterotrophic plants in the Orobanchaceae family and mycoheterotrophic orchids in the Vanilloideae subfamily is highly significant because it may indicate the evolution of key genes for energy metabolism in non-photosynthetic plants.
The IR/SC boundary and its nearby genes were analysed, among six Vanilloideae plastomes. and variations were found at all IR/SC boundaries. The shrinkage or expansion of the IR is known to lead to greater variability in the IR/SC boundary, and has been a recent hotspot in scientific research, especially in the Orchidaceae family (Wu et al., 2009;Wu et al., 2011;Sanderson et al., 2015). The LSC region has contracted in each of the six Vanilloideae plastomes, ranging from photosynthetic to mycoheterotrophic, with the LSC/ IRB boundary gradually shrinking from rpl22 (P. japonica, V. aphylla and G. lindleyana) to rps19 (C. septentrionalis) and then to rpl2 (L. kiusiana and L. japonica). Moreover, the IR regions have contracted and then expanded from photosynthetic to mycoheterotrophic plastomes and the location of cross-boundary genes in the two boundaries was asymmetrical. The cross-boundary genes located in IRB/SSC and IRA/SSC boundaries changed gradually from ndhA (P. japonica) to ccsA (V. aphylla) to ndhB (G. lindleyana). However, the IR regions in L. kiusiana and L. japonica plastomes expanded,so that ycf1 regained its location in the IR regions. All Vanilloideae plastomes have genes across the IRB/SSC or IRA/SSC boundaries, with the exception of C. septentrionalis. Therefore, the cross-boundary genes are located in two IR regions and the SSC region simultaneously. In addition, the IRA/LSC boundary was found to have shrunk and then expanded. In P. japonica, V. aphylla, G. lindleyana and C. septentrionalis plastomes, the IRA region shrank gradually, and the rps19 moved to the IRA/LSC boundary gradually, while in L. kiusiana and L. japonica plastomes, the IRA region expanded until the matK gene was located in the IRA/LSC boundary. These changes in the IR/SC boundaries are consistent with the two evolutionary routes in the Orchidaceae family (Yang et al., 2013;Luo et al., 2014): (1) The ycf1 gene expands continually into the IRA region, so the duplicated pseudogene yycf1 fragment appearanced in the IRB region and overlapped with yndhF. This evolutionary route was matched with changes in the heterotrophic orchid(L. kiusiana and L. japonica) plastomes. The SSC region expanded, causing the ycf1 to relocate in the IRA region; (2) The continuous movement of ycf1 to the SSC region resulted in the shorter length of the replicated ycf1 in the IRB region, which eventually moved completely into the SSC region, while the replicated ycf1 fragment disappeared in the IRB region. In this study, the transformation of plastomes from autotrophic (P. japonica and V. aphylla) to heterotrophic (G. lindleyana and C. septentrionalis) was found to be consistent with this hypothesis. The ycf1 was eventually located in the SSC region (G. lindleyana and C. septentrionalis). Changes in the spanning boundary genes were the major motivation for the contraction or expansion of IR regions (Goulding et al., 1996;Yang et al., 2010). Both of these hypotheses could well explain the changes at the IR/SC boundaries in the Orchidaceae family. Nevertheless, the evolutinonary origin and the chronological framework of cross-boundary gene changes were still largely unknown. Some researchers have shown that changes at the IR/SC boundaries could provide evidence for phylogenetic relationships (Gao et al., 2009;Wu et al., 2009;Wu et al., 2011;Yang et al., 2013;Downie and Jansen, 2015;Sanderson et al., 2015), however, in the Vanilloideae subfamily, the decoded plastomes were limited. In order to relate the changes at the IR/SC boundaries with phylogenetics, more Vanilloideae plastomes with various lifestyles should be collected and explored.

Genes loss in plastid genomes and diversity of mycoheterotrophs in Vanilloideae orchids
In this study, the number of genes in the plastomes of each species in the Vanilloideae subfamily was found to be different. Pseudogenes and gene loss were present in the chloroplast genomes of all species. Most of the ndh gene family have been lost fully or partially, such as the remaining residual fragments of ndhA, ndhB, ndhD and ndhH. Rampant independent loss of the ndh gene family was a common feature across the Orchidaceae family  and, regardless of whether orchids were identified as mycoheterotrophic species or not, most of the ndh gene family were lost in the Vanilloideae subfamily orchid plastomes, with only the ndhB residual fragment still evident or a complete disappearance of the ndh gene (Kim et al., 2019;Kim et al., 2020). Assuming that the plastid genome degrades in a gradual manner, it may be also assumed the nonessential ndh gene pseudogenized before its fully physical deletion Wicke and Naumann, 2018). Pseudogenes could have been generated by premature codon termination and frameshift mutations (Balakirev and Ayala, 2003;Poliseno et al., 2010). The intron region of the ndhB gene in G. lindleyana plastome was found to have been shortened and was located at the IR/SC boundary, both favorable conditions for pseudogene generation. Moreover, the ndh gene family members were found to vary frequently and contained abundant repetitive sequences. Repeated sequences are considered to be one of the main reasons behind the rearrangement of the chloroplast genome (Yue et al., 2008). Therefore, the non-functionalization of the Vanilloideae subfamily chloroplast genome probably began with the pseudogenization and loss of the ndh gene family, as is consistent with the known evolutionary process of trophic changes (Wicke et al., 2013;Zeng et al., 2017). However, non-functionalization of the ndh gene and/or its fully physical loss was independent of trophic type in this species. Its polygenicity has been previously demonstrated in orchids (Barrett and Davis, 2012), carnivorous plants (Wicke et al., 2014) and even gymnosperms (Lin et al., 2010). In addition, Chang et al. (2006) found that ndhA, ndhF and ndhH in the Phalaenopsis aphrodite plastome had been transferred to its nuclear genome. Nonetheless, due to the lack of relevant nuclear genome data, it remains to be confirmed whether the ndh gene family has been transferred to its nuclear genome in the photosynthetic species of the Vanilloideae subfamily.
Currently, several models of chloroplast genome degeneration have been proposed to explain the physical or functional changes associated with the transition to a mycoheterotrophic lifestyle (Wicke et al., 2011;Bromham et al., 2013;Petersen et al., 2015;Wicke et al., 2016). In this study, photosynthetic plants were found to be in their initial stage of chloroplast genome degeneration, and the ndh gene appeared to be first pseudogenized and then lost entirely (Graham et al., 2017). During this period, reduced photosynthetic function begins with the loss of non-essential or stress-related genes (such as ndh genes) in half-heterotrophs. This is followed by pseudogenization and the loss of major photosynthesis-related genes (such as pet, psa and psb genes) and plastid-encoded polymerases. This was evident in this study, with both G. lindleyana and C. septentrionalis (Kim et al., 2019) found to be in this second stage of chloroplast genome degeneration. Among the photosynthesis-related genes, only petN, psaJ, psbM and psbZ remain in C. septentrionalis, while G. lindleyana no longer contains a complete gene of the photosynthesis-related genes and petD, psaA, psaB, psbA, psbC, psbK and psbZ have all pseudogenized. The next stage of plastome degeneration ocurred in the edge of these shift from semi-mycoheterotrophic orchids to fully mycoheterotrophic orchids. Genes with extended or alternative functions, such as atp and rbcL, and non-essential housekeeping genes are lost after the transition to a non-photosynthetic or fully heterotrophic lifestyle, but prior to a plateau or slowing in the rate of gene loss. Herein, L. kiusiana and L. japonica  were found to be at this third stage of the plastome degeneration. In L. kiusiana and L. japonica plastomes, psa (5), psb (15), pet (6), atp (6), rpo (4) and rbcL were no longer evident, and the number of pseudogenes did not exceed three. Further loss of the chloroplast genome is followed by the deletion of other metabolic genes (such as accD, clpP and ycf1/2), along with all other remaining housekeeping genes, including trnE. Thereafter, a stage of 'total deletion' is reached, in which the chloroplast genome is completely eradicated. At present, the data of mycoheterotrophic plastomes belonging to the Vanilloideae subfamily is limited, and it remains to be confirmed whether any species have undergone further plastome degeneration. However, it has been shown that the Vanilleae tribe has a very high evolutionary rate, and the plastome degeneration of mycoheterotrophic orchids in this tribe are various. Therefore, it is imperative to decode more chloroplast genome data of mycoheterotrophic plants belonging to the Vanilleae tribe.

Data availability statement
The original contributions presented in the study are publicly available. This data can be found here: https://www.ncbi.nlm.nih.gov/ nuccore/MW528436.1/.

Author contributions
LZ, SG and JL conceived the research and experimental design. JL collected the plant material. LZ, TC and XQ performed the experiments and analyzed the data. LZ, SG and JL designed the draft manuscript. All authors approved the final draft for submission and take full public responsibility for the content of the manuscript.

Funding
Publisher's note All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2023.1160446/ full#supplementary-material SUPPLEMENTARY FIGURE 1 Codon frequencies and RSCU values of G. lindleyana: (A) Amino acid frequencies in protein-coding genes; (B) RSCU values of 20 amino acids and stop codons in 32 protein-coding genes.