Comprehensive analysis of the mitochondrial genome of Rehmannia glutinosa: insights into repeat-mediated recombinations and RNA editing-induced stop codon acquisition

Rehmannia glutinosa is an economically significant medicinal plant. Yet, the structure and sequence of its mitochondrial genome has not been published, which plays a crucial role in evolutionary analysis and regulating respiratory-related macromolecule synthesis. In this study, the R. glutinosa mitogenome was sequenced employing a combination of Illumina short reads and Nanopore long reads, with subsequent assembly using a hybrid strategy. We found that the predominant configuration of the R. glutinosa mitogenome comprises two circular chromosomes. The primary structure of the mitogenome encompasses two mitochondrial chromosomes corresponding to the two major configurations, Mac1-1 and Mac1-2. The R. glutinosa mitogenome encoded an angiosperm-typical set of 24 core genes, nine variable genes, three rRNA genes, and 15 tRNA genes. A phylogenetic analysis using the 16 shared protein-coding genes (PCG) yielded a tree consistent with the phylogeny of Lamiales species and two outgroup taxa. Mapping RNA-seq data to the coding sequences (CDS) of the PCGs revealed 507 C-to-U RNA editing sites across 31 PCGs of the R. glutinosa mitogenome. Furthermore, one start codon (nad4L) and two stop codons (rpl10 and atp6) were identified as products of RNA editing events in the R. glutinosa mitogenome.


Introduction
Rehmannia glutinosa (Gaertn.)DC. (http://www.theplantlist.org/), a member of the Scrophulariaceae family has been widely used in traditional Chinese medicine (TCM) and is commonly known as "DiHuang" in China (Li et al., 2022).With a medicinal history spanning over two millennia, R. glutinosa is a vital industrial crop first documented in Shennong's Classic of Materia Medica (Qin and Han Dynasties, 100 BC) (Li et al., 2022).The plant is processed into various forms, including fresh rehmannia root (Xian DiHuang), rehmannia dried rhizome (Sheng DiHuang), and prepared rehmannia root (Shu DiHuang) (Meng et al., 2017;Li et al., 2022).In its fresh form, R. glutinosa root possesses many therapeutic benefits, including antipyretic, salivary secretion enhancement, hematothermal regulation, anti-coagulative, detoxifying, and analgesic properties.It is commonly employed in the treatment of various medical conditions such as fevers, yin imbalances, glossal abnormalities, polydipsia, cutaneous eruptions, hematemesis, epistaxis, and pharyngitis (Liu et al., 2017;Meng et al., 2017;Li et al., 2022).Moreover, the Liuwei Dihuang Pill, a quintessential formulation in Traditional Chinese Medicine (TCM), features R. glutinosa as its principal component, demonstrating substantial efficacy in ameliorating diabetes and its associated complications (Zheng et al., 2020;Chen et al., 2021;Lu et al., 2022).R. glutinosa holds significant research and development value due to its extensive medicinal history and efficacy.However, cultivation is often challenged by root rot and high-stress resistance (Kim et al., 2020;Wang R et al., 2018).Traditional artificial domestication is time-consuming, and the ability of direct introduction of superior wild variety genes is limited (Fernie and Yang, 2019).Next-generation sequencing technology has enabled the integration of bioinformatics with genetic engineering, offering new possibilities for breeding R. glutinosa (Koenig et al., 2013;Meyer and Purugganan, 2013).
Mitochondria, biomacromolecules as essential cellular organelles, play a critical role in various metabolic processes, including the tricarboxylic acid (TCA) cycle, urea cycle, heme biosynthesis, calcium homeostasis, iron/sulfur cluster formation, gluconeogenesis, amino acid metabolism, and apoptosis (Osellame et al., 2012).Moreover, mitochondria are involved in synthesizing and folding essential biological macromolecules such as proteins, lipids, and nucleic acids, which are fundamental components of cellular structures and processes (Blomain and McMahon, 2012).Contrasting nuclear DNA (nDNA), mitochondrial DNA (mtDNA) is more susceptible to exogenous and endogenous stress due to its proximity to oxidative phosphorylation sites and the absence of protective histones in mitochondria.Although nucleoid structures offer some protection, mtDNA damage frequently occurs within mitochondria (Liao et al., 2022;Palozzi et al., 2022).The current research has demonstrated that the incidence of mtDNA damage in cells significantly surpasses that of nDNA damage (Liao et al., 2022;Palozzi et al., 2022;Roy et al., 2022).The mtDNA damage and repair mechanisms including Non-homologous end joining (NEHJ) often lead to homologous recombination in mitochondrial genomes, potentially mediated by repeat sequences (Dahal et al., 2018;Chevigny et al., 2020).Damage to mitochondria can also result in delusions, which are implicated in cytoplasmic male sterility (CMS) (Hu et al., 2014).Several CMS-related genes have been identified and characterized across various species, such as RT98-CMS rice and RT102-CMS rice (Igarashi et al., 2013;Okazaki et al., 2013).Given the central role of mitochondria in synthesizing and maintaining the presence of biological macromolecules, investigating the mitochondria of industrial crops holds substantial importance for cultivating high-quality crops.Genome research on R. glutinosa may yield valuable insights into the relationship between mitochondria and macromolecules, furthering our understanding of these complex interactions and their implications for crop improvement.
RNA editing events are prevalent in mitochondrial genomes and have far-reaching implications for protein function.These events often result in alterations to the amino acids specified by the genomic sequence.Such modifications not only enhance the conservation of the overall amino acid sequence but also affect the physicochemical attributes of the protein, even influencing its folding dynamics (Takenaka et al., 2008;Small et al., 2020;Kang et al., 2021).These observations underscore the pivotal role of RNA editing sites in maintaining the proper functionality of proteins.Additionally, RNA editing events appear to be intricately linked with the mechanisms of natural selection.Some researchers (Chateigner-Boutin and Small, 2010;Ichinose and Sugita, 2017) have investigated RNA editing events within the mitochondria of 17 angiosperm species.Remarkably, the nonsynonymous editing sites exhibit high conservation across these species, with approximately 80% conservation observed.
Additionally, the efficiency of the editing process is notably high, achieving an editing extent of around 80% across all examined plant species.This high level of conservation and efficiency suggests a crucial functional role for these editing events in plant mitochondrial biology.After reverse transcription into cDNA, some edited transcripts integrate into the genome through homologous recombination and are subsequently preserved.Most RNA editing sites in plant mitochondria are predominantly at the second codon position.The most frequent form of editing involves the conversion of cytosine (C) to uracil (U).This specific nucleotide alteration is thought to be correlated with an overall increase in the hydrophobicity of the resultant protein.Approximately 55% of amino acid substitutions resulting from RNA editing events exhibit a transition from hydrophilic to hydrophobic properties.This trend suggests a substantive impact on the edited protein's physicochemical characteristics, potentially affecting its function and interaction within cellular environments (Sun et al., 2016;Mohammed et al., 2022).Additionally, the premature emergence of stop codons caused by RNA editing may result from erroneous editing, leading to the premature termination of gene translation, reducing the amino acid sequence of the encoded protein, and affecting protein function.
We successfully assembled and characterized the mitochondrial genome of R. glutinosa's dual mitochondrial chromosomes in the present study.We validated its secondary structure through the lens of homologous recombination mediated by direct repeats.Additionally, we identified 507 RNA editing sites within the protein-coding regions, all of which involved the conversion of cytosine (C) to uracil (U).Our analysis also revealed the presence of two modified stop codons in the CDs of rps10 and atp6 and one altered start codon of nad4L, resulting from RNA editing events.These findings offer novel insights into the complexity and functional implications of RNA editing in the mitochondrial genome of R. glutinosa.).After cleansing with deionized distilled water (ddH2O), the specimens were cryopreserved at −80°C.The leaf samples were partitioned into two sets designated for DNA sequencing (DNA-seq) and RNA sequencing (RNA-seq).

Materials and methods
Genomic DNA was isolated employing the Magnetic Plant Genomic DNA Kit (Catalog No. DP342; Tiangen, China).Total RNA was extracted utilizing the RNAprep Pure Plant Plus Kit (Catalog No. DP441; Tiangen, China).For RNA-seq analysis, mRNA was selectively enriched from the total RNA pool using targeted probes to remove ribosomal RNA (rRNA).Fragmentation was executed using divalent cations in a high-temperature environment provided by the First Strand Synthesis Reaction Buffer (5X).Subsequently, after the adenylation of the 3' ends of DNA fragments, NEBNext Adaptors featuring a hairpin loop structure were ligated, setting the stage for subsequent hybridization.The cDNA fragments with a predominant length range of 370 to 420 base pairs were isolated to construct the fragmented library via the AMPure XP system (Beckman Coulter, Beverly, USA).The sequencing library was constructed using the TIANSeq Fast DNA Library Kit (Illumina; Catalog No. NG102), and sequencing was performed on an Illumina NovaSeq 6000 platform (Illumina, Inc.; San Diego, CA, USA).
For Oxford Nanopore sequencing, high molecular weight (HMW) DNA was isolated using the NEB Monarch HMW DNA Extraction Kit (Catalog No. T3060L; New England Biolabs, England).Mechanical shearing of the genomic DNA to an average fragment size of approximately 10 kb was accomplished using the Covaris g-TUBE (Thermo Fisher, USA).The DNA library was assembled using the DNA Library Kit (Catalog No. SQK-LSK110) and sequenced on a PromethION platform (Novogene Co., Ltd., Beijing, China).

Genome assembly and annotation
Illumina short-read sequences were processed using Trimmomatic software, employing the default settings (Bolger et al., 2014).Nanopore long-read sequences were filtered using Guppy software, also with default settings (Wick 2017).A hybrid assembly approach was implemented for the assembly of organelle genomes.For the assembly of the plastid genome (plastome), we utilized GetOrganelle software (Jin et al., 2020) to extract plastidspecific reads from the Illumina dataset, applying parameters "-R 15 -k 21,45,65,85,105 -F embplant_pt".These reads were assembled into a unitig graph, and bifurcation structures corresponding to inverted repeat regions were resolved by aligning the Nanopore reads to these structures via the Unicycler software (Wick et al., 2017).The orientation of the resulting assembled genome was subsequently refined using Novowrap (Wu et al., 2021).For the mitochondrial genome (mitogenome), a similar hybrid assembly strategy was employed.Initially, GetOrganelle was used to isolate mitochondrial reads from the raw data, employing parameters "-R 50 -k 21,45,65,85,105 -P 1000000 -F embplant_mt".These reads were then assembled into a unitig graph, and bifurcation structures were resolved through nanopore read alignment via Unicycler software (Wick et al., 2017).
Mitochondrial Plastid Sequences (MTPTs) were discerned through a reciprocal comparison strategy, employing BLASTn (version 2.2.30+) with its default parameters.The plastid genome (plastome) was assembled utilizing Illumina sequence reads via the GetOrganelle software.Comparative analysis between the plastome and the mitochondrial genome (mitogenome) was performed using BLASTn, employing specific parameters: e-value set to 1e-6 and word size configured at 7 (Chen et al., 2015).BLASTn hits shorter than 100 base pairs were excluded from the analysis.Subsequently, MTPT gene clusters within the mitogenome were delineated and defined as contiguous gene assemblies in the plastome devoid of intervening mitochondrial genes.These MTPT gene clusters were visually represented in a circular map generated using TBtools (version 1.076).
To identify putative Nuclear Mitochondrial DNA segments (NUMTs), the nuclear genome of R. glutinosa (GenBank accession JABTTQ000000000.1) was compared against the mitogenome using BLASTn.Specific BLASTn parameters were as follows: e-value of 1e-5, word size of 9, gap opening cost of 5, gap extension cost of 2, match reward of 2, mismatch penalty of -3, and turning off the dust filter.The BLASTn output was visualized using TBtools (Chen et al., 2020).Segments identified as potential NUMTs were further annotated using GeSeq software.Additionally, the nuclear genomes of R. glutinosa were similarly probed for putative NUMTs.

Identification and validation of repeat mediated recombination
To investigate the influence of repeat sequences on both intermolecular and intramolecular recombination events within the mitochondrial genome of R. glutinosa, we employed BLASTn analysis.The search parameters were meticulously chosen, incorporating an Expectation value (E-value) threshold of 1E-6 and a word size setting of 7 to identify relevant repeat sequences rigorously (Chen et al., 2015).Sequence segments of 500 base pairs (bp) in length surrounding the repeats were extracted to assess potential recombination products near the repeats based on anticipated sequences preceding and succeeding recombination.Subsequently, Nanopore long reads were mapped to the extracted sequence segments of the four configurations, and the repeatspanning reads were enumerated.
To investigate putative recombination products identified through mapping PacBio long reads, polymerase chain reaction (PCR) primers were designed at the junction of repetitive sequences and recombination fragments using the Primer 3 web service (Untergasser et al., 2012).PCR reactions were performed in 50 mL volumes, consisting of 23 mL water, 25 mL 2 × Taq PCR Master Mix, 1 mL of each primer, and 1 mL DNA.The reactions were performed on a Pro-Flex PCR system (Applied Biosystems, Waltham, MA, USA).Subsequently, the PCR products were separated and visualized on 1.0% agarose gels.Finally, the PCR amplicons were sequenced using the Sanger method to confirm the recombination events.

Phylogenetic analysis
To construct a phylogenetic tree, we downloaded 21 Lamiales mitogenome sequences, including the original version of R. glutinosa (OM397952.2),from the National Center for Biotechnology Information (NCBI) database.The common genes from 21 mitochondrial genomes were extracted and concatenated using Phylosuite (Zhang D et al., 2020).Subsequently, the DNA sequences of the 16 protein-coding genes (PCGs) shared among these ten mitogenomes were extracted (Table 1).These sequences were aligned with MAFFT (v7.450) (Rozewicki et al., 2019), and a phylogenetic tree was constructed using Phylosuite with the maximum likelihood (ML) method based on the alignment.The credibility of the phylogenetic tree was assessed by performing bootstrap testing with 1,000 replications.Finally, the resulting maximum-likelihood tree was visualized using iTOL (https:// itol.embl.de/)(Letunic and Bork, 2021).

Identification and validation of RNA editing sites
To delineate both RNA editing sites and Single Nucleotide Polymorphism (SNP) loci, we initially extracted the coding domains (CDs) of each protein-coding gene (PCG), flanked by 100 base pair (bp) regions to serve as reference sequences.To detect SNP loci, genomic DNA sequencing reads were aligned to the reference above sequences using the Burrows-Wheeler Aligner (BWA; version 0.7.12-r1039) (Li and Durbin, 2010), with all parameters set to default.SNP loci were subsequently identified using REDItools (version 2.0), adopting identical parameters for RNA editing site identification: a minimum coverage of 5 reads and a frequency threshold of ≥ 0.1.Following this, RNA editing sites were ascertained utilizing REDItools (version 2.0) (Picardi and Pesole, 2013), with the criteria set at a coverage threshold of ≥ 5 reads and a frequency threshold of ≥ 0.1 (Wu et al., 2017).The  (Milne et al., 2010).

Results
3.1 General feature of the R. glutinosa mitochondrial genome The mitochondrial genome of R. glutinosa represented the first published genome of the genus Rehmannia and the fifth mitochondrial genome within the Orobaceae family.Genomic assembly was performed using Illumina and Nanopore sequencing technologies, generating 10.2 GB and 19.8 GB reads, respectively.The coverage depth of the long and short reads mapped to the R. glutinosa mitogenome sequences was obtained using samtools (v1.3.1)(Li et al., 2009) (Supplementary Figures S1,  2).De novo assembly of Illumina short reads was performed using the GetOrganelle software.Repeated sequences were resolved by mapping the Nanopore long sequences.Subsequently, Unicycler software was used to extract 29 contigs to construct unitig graphs, including ten double-bifurcating structures (DBS) (Figure 1A).The abundance of each configuration of DBS were calculated by mapping Nanopore long reads to the reference sequences using Unicycler.These configurations were further used for final assembly, and the results of the Unicycler analysis were subsequently loaded into bandage software with the "Merge all possible nodes" module.As a result, two chromosomes of the mitotic genome of R. glutinosa were obtained (Figure 1B).
The R. glutinosa mitogenome had two chromosomes of 545,523 bp (chromosome 1 with 497,303bp, chromosome 2 with 48,220bp), and its entire GC was 45% (T 27.6%, C 22.5%, A 27.4%, G 22.5%).The GC content of R. glutinosa and its relative species ranged from 43.27% to 45.62%, and the genome length ranged from 225,612 bp-1,860,774 bp (Table 2).We annotated the mitochondrial genome, and the categorization of genes is shown in Table 3.The core genes consisted of five ATP synthase genes, nine NADH dehydrogenase genes, three cytochrome C biogenesis genes, three cytochrome C oxidase genes, ubiquinol cytochrome c reductase, a transport membrane protein, a maturase.The variable genes consisted of 4 large subunits of ribosome proteins (rpl2, rpl5, rpl10, and rpl16), seven small subunits of ribosome proteins (rps3, rps4, rps7, rps10, rps12, rps13 and rps14), three rRNA genes (rrn5, rrn18, and rrn26), and two respiratory genes (sdh3 and sdh4).A total of 15 unique tRNA genes were identified based on tRNAscan-SE.The schematic genome is presented in Figure 2.
In this study, we analyzed the mitochondrial genome of R. glutinosa in this research and compared it with the publicly available genome sequence OM397952.2(Supplementary Figure S1).Our findings indicate a strong collinearity between the two genomes, which is consistent from 1 bp up to 352,181 bp.Notably, there are repeated fragments spanning from 352,182 bp to 361,101 bp, and an extensive inverted repeat sequence can be observed from 419,346 bp to 547,032 bp.

Repeat elements analysis
Microsatellites, also known as simple sequence repeats (SSRs), are short repetitive DNA units composed of mononucleotide, dinucleotide, trinucleotide, tetranucleotide, or pentanucleotide motifs that are predominantly present in eukaryotic genomes [14].In the mitochondrial genome of R. glutinosa, 100 and 16 SSR markers were identified in the major and secondary chromosomal molecules, respectively (refer to Figure 3; Supplementary Tables S2, 3).All six types of SSRs were detected in the mitochondrial genome, with 30, 17, 9, 41, and 3 SSRs having mono-, di-, tri-, tetra-, penta-or hexanucleotide repeat units in the major chromosomal molecule, and 3, 3, 1, 8, and 1 SSRs having mono-, di-, tri-, tetra-, or pentanucleotide repeat units in the chromosome 2, respectively.The most commonly occurring SSRs in the mitochondrial genome of R. glutinosa had a four-nucleotide repeat unit, accounting for 42.2% of all repeats.These microsatellite markers have the potential to serve as identification markers of R. glutinosa.
Tandemly repeated DNA sequences are characterized by a unit length greater than six base pairs and are highly variable components of the genome [15].These repeats are commonly found in intergenic regions, although some can be located within coding sequences or pseudogenes.Six tandem repeat sequences were detected within chromosome 1 of the R. glutinosa mitogenome, with lengths ranging from 14 to 23 base pairs (Supplementary Tables S3).

Recombination mediated by repeat sequences
The mitochondrial genome of plants could not be fully represented by a single cyclic molecule, as rearrangement mediated by repeated sequences may occur to varying degrees.To investigate the possible homologous recombination in the mitochondrial genome of R. glutinosa, we detected 87 pairs of repetitive sequences in the mitochondrial genome of R. glutinosa using BLASTN with 1E-5.Based on Nanopore long reads, we carefully examined each pair of repetitive sequences for their support with long reads, and found that three pairs might support homologous recombination (Table 4; Supplementary Table S4).The length of these repeats is between 2,795 and 7,933 bp.
Primers were designed at each end of the repeat sequences further to investigate the recombination and potential configurations of R. glutinosa.Since the size of these repeat sequences exceeded 1,000 bp in length, specific primers were designed at the junction of each pair of repeats present on the primary single circular molecule (Figure 4A; Supplementary Table S5).In a recombinant configuration, PCR products (junctions 1-4) were shown in Figure 4B.The alignment of the Sanger sequencing results of the PCR products and the genomic sequences are shown in Supplementary Figures S4-15.We predicted the various configuration of the mitochondrial genome in Figure 4C.Three recombination events mediated by repeat sequences were confirmed (R1, R3, and R77, with R77 representing a pair of direct repeat sequences).All three sets of repeat sequences were found to generate secondary configurations, which is in accordance with according to the findings obtained through our long-read analysis (Figure 5; Supplementary Figure S16).
Besides chloroplasts, there are homologous sequences between the mitochondria and nuclear genomes (Supplementary Table S7).

B A
A schematic representation of the assembly process for the R. glutinosa mitogenome is provided.(A) A unitig graph for the R. glutinosa mitogenome was generated through de novo assembly of Illumina reads using Unicycler.This unitig graph consisted of seven contigs (depicted in yellow) that formed double bifurcating structures (DBSs).Each DBS exhibited two secondary configurations based on the Nanopore long reads.(B) A schematic diagram of the mitochondrial chromosome 1 (MC1, represented by a blue circle) and mitochondrial chromosome 2 (MC2, represented by a red circle) of R. glutinosa following the resolution of DBSs using long reads is presented.The contigs illustrated in blue and red correspond to chromosomes 1 and 2, respectively.
Compared with the published whole genome sequence of R. glutinosa, we found that there were 4,395 fragments of nuclear DNAs with a total of 5,073,866bp length, which were similar.Among them, 3,694 fragments, with a total of 4,742,687 bp, were homologous to mitochondrial chromosome 1, with the longest fragment being 78,947 bp and the shortest being 36 bp.There are 1,701 fragments (331,179bp) homologous to mitochondrial chromosome 2 (the longest sequence had 32,990bp, and the shortest sequence was only 34bp).The total length of homologous fragments on these nuclear DNAs far exceeded the total length of the whole mitochondrial genome (545,523 bp), which might be related to the multiple migration of mitochondrial genes (Brigulla and Wackernagel, 2010;McFarlane and Humphrey, 2010;Knoll et al., 2014).

Graph-based method for mitochondrial genome assembly
Early investigations into plant mitochondrial genomes postulated a single master circle configuration akin to chloroplast genomes (Wu et al., 2022).However, subsequent studies had revealed that a solitary reference genome is inadequate for representing the full extent of genetic variation between individuals, particularly in plant mitochondrial genomes (Backert et al., 1996;Gonzalez et al., 1999).Graph-based genomic representations have proven more effective in capturing configurational and structural variations, as demonstrated by the soybean pan-genome comprising 26 plant materials (Tian et al., 2012).Plant mitochondrial genomes exhibit substantial differences in complexity, size, and structure, and a single circular representation fails to encompass all potential configurations.Recent publications have presented graph-based plant mitochondrial genomes of Lamiales species, such as Salvia miltiorrhiza and Scutellaria tsinyunensis (Li et al., 2021;Yang H et al., 2022).These genomes comprise two chromosomal molecules, with nine configurations reported for Salvia miltiorrhiza.More graph-based assembly tools have been published, including Master graph and PMAT (He et al., 2023;Bi et al., 2024).Additionally, there is an increasing discovery of multi-conformation mitochondrial genomes.This study provides a graph-based mitochondrial genome consisting of 29 contigs, including ten repeat regions (DBS structure), from which additional minor configurations can emerge.

Multiple chromosome configurations and homologous recombination
Traditionally, plant mitochondrial genomes were considered single circular molecules, similar to chloroplast genomes, primarily because Illumina short-read sequencing technologies struggled to resolve complex bifurcated structures.Consequently, plant mitochondrial assemblies that failed to form a circle were often considered assembly errors (Sloan, 2013).However, Nanopore long-read sequencing and graph-based genome assembly approaches have provided new solutions for mitochondrial genome assembly, and the homologous recombination of mitochondrial genomes mediated by direct repeat sequences had been confirmed (Wang et al., 2024).These advancements have also facilitated better prediction the potential complex structures of the mitochondrial genomes.
The presence of minor configurations in mitochondrial genomes had been confirmed in various plant species, including Scutellaria tsinyunensis, Ipomoea batatas, Saposhnikovia divaricata, Salvia miltiorrhiza, Cistanche deserticola, and Aeginetia indica (Li et al., 2021;Miao et al., 2022;Ni et al., 2022;Yang H. et al., 2022;Yang Z. et al., 2022;Zhong et al., 2022).Among these, Salvia miltiorrhiza displayed two distinct mitochondrial configurations, which were ascribed to an increased occurrence of homologous recombination events (Yang H. et al., 2022).Our current investigation identified three minor configurations within the R. glutinosa mitochondrial genome, leveraging long-read sequencing technology.While we confirmed the junction sites through Type and quantity of SSR.The red column represents chromosome 1, and the blue column represents chromosome 2.
polymerase chain reaction (PCR) amplification and Sanger sequencing, dditional experimental evidence is necessary to validate this phenomenon further.

NUMT
Mitochondria are thought to have originated from endosymbiotic a-proteobacteria, which subsequently experienced gene loss or transfer to the nucleus (Martin et al., 2015;Roger et al., 2017).The mitochondrial genome of flowering plants contains up to 40 known protein-coding genes, with the number of non-core genes varying significantly among species, apart from the 24 core protein-coding genes (Adams et al., 2002).One primary factor accounting for the variability in gene content within mitochondrial genomes is the transfer of mitochondrial genes to the nucleus during eukaryotic evolution (Brigulla and Wackernagel, 2010;McFarlane and Humphrey, 2010;Knoll et al., 2014).This functional gene transfer contributes to the co-evolution of mitochondria and the nucleus (Levin et al., 2014), has been found in mice and humans, an ongoing evolutionary process in land plants and some green algae.Due to the differing evolutionary and migration rates among plant species from various flora, gene migration frequencies also vary.For instance, the rps1 gene has been lost from the mitochondria of most Lamiales plants, a finding consistent with our study results.
For successful activation and expression of genes in the nucleus following physical transfer from mitochondria (Gualberto and Newton, 2017), these newly transferred genes must acquire promoters and other regulatory elements.If a protein lacks the necessary targeting information, it must obtain sequences of protein products targeting mitochondria.Several transferred genes have acquired mitochondrial target pre-sequences, which are removed from the protein following their introduction into mitochondria.Some genes have obtained mitochondrial pre-sequences from preexisting mitochondrial protein genes (Liu et al., 2009;Gualberto and Newton, 2017).Once the transferred nuclear copy is activated, both this and the mitochondrial copies can be co-expressed for a period, at least at the transcript level, as demonstrated in the case of cox2 in some legumes, rpl5 in wheat, and sdh4 in poplar.The activation of transferred genes appears to be related to positive selection, but the existence of a nuclear screening mechanism for transferred genes remains uncertain.It is clear, however, that the transfer between mitochondria and the nucleus occurs frequently.In our study, we found that the nuclear DNA of R. glutinosa contained 4,329 fragments, with a total length of 4,880,380 bp, exhibiting similarity to mitochondrial sequences.

MTPT
DNA transfer is prevalent in flowering plants, with DNA sequences being exchanged between the nuclear genome and the mitogenome (Wang X. C. et al., 2018).The most ancient mitochondrial-to-plastid DNA transfer (MTPT) events occurred approximately 300 million years ago, before the divergence of gymnosperms and angiosperms.Although most MTPTs are non-  functional, some notable exceptions have been discovered, such as contributions to the replacement of tRNA genes, the creation of promoter regions and codons, and involvement in posttranscriptional RNA processing.Small DNA plastid fragments typically migrate to mitochondria, while larger fragments are exchanged between the nucleus and mitochondria.In this study, the longest potential transfer fragment from the chloroplast of R. glutinosa was 4,513 bp, whereas the longest from mitochondria was 78,947 bp.The imperfect repair mechanism of mitochondria may facilitate the insertion of foreign sequences, and following the integration of nuclear organelle DNA, this DNA may undergo rearrangement, mutation, elimination, breakage, and proliferation.This process may represent one of the mechanisms driving species evolution.

RNA editing sites
RNA editing events typically occur during the posttranscriptional process in mitochondria, with specific RNA positions affected by RNA editing and their corresponding DNA positions referred to as editing sites (Edera et al., 2018).Early diverging lineages exhibit the highest number of editing sites among angiosperms, with approximately 400 editing sites reported in Arabidopsis (Edera et al., 2018).Our study verified 507 RNA editing events within the protein-coding region of R. glutinosa, all of which involved C-to-U conversions.The amino acid changes induced by this type of RNA editing may be statistically correlated with alterations in protein hydrophobicity.Cytoplasmic male sterility (CMS) is also associated with reduced, deleted, or incorrect RNA editing of mitochondrial gene transcripts, which modifies gene expression patterns and the functional properties of translation products, ultimately leading to CMS (Hu et al., 2014).For example, in male-sterile lines of Sorghum, the frequency of RNA editing within the atp6 transcript is notably reduced.Additionally, two specific RNA editing sites within the atp9 maintainer transcript in rice alter arginine codons to termination codons.Intriguingly, these amino acid changes result in alterations in the expression of three genes including two stop codons (rps10 and atp6) and one start codon (nad4L).This finding provides insights for future molecular breeding of R. glutinosa.
Double-stranded DNA breaks (DSBs) are repaired primarily via two mechanisms: non-homologous end-joining (NHEJ) and homologous recombination (HR) (Roy et al., 2022).RNA can A schematic representation of the secondary configuration for the R. glutinosa mitogenome is provided.(A) A unitig graph for the R. glutinosa mitogenome was generated through de novo assembly of Illumina reads using Unicycler.This unitig graph consisted of seven contigs (depicted in yellow) that formed double bifurcating structures (DBSs) (B) The coverage depth of the Illumina short reads mapped to the R. glutinosa mitogenome sequences of the secondary configuration.
directly repair DSBs in an HR-dependent (RAD51-dependent) process, inhibited by RNases H1 and H2, known to degrade RNA-DNA hybrids (Mishra et al., 2018).Compared to other terrestrial plants, angiosperms have undergone extensive loss of editing sites through the substitution of editable cytidines with thymidines in their genomes.The homologous recombination of cDNA produced by reverse transcription of edited RNA appears to be one of the molecular mechanisms responsible for the loss of editing sites.While RNA editing is essential for DNA damage repair and genetic selection, its specific mechanism requires further investigation.The prediction and identification of these RNA editing sites offer valuable insights into inferring gene function through the introduction of novel codons.Furthermore, these findings highlight the crucial role of RNA editing in regulating mitochondrial gene expression in plants, particularly its impact on protein synthesis and functionality which subsequently influences plant growth and developmental processes.

Conclusion
In conclusion, this study offers a comprehensive analysis of the R. glutinosa mitochondrial genome, focusing on graph-based genome representation and identifying multiple chromosome configurations.Our findings reveal the presence of three minor configurations of the R. glutinosa mitochondrial genome, which were confirmed through PCR amplification and Sanger sequencing.Additionally, we observed the transfer of mitochondrial and chloroplast sequences to the nuclear genome, highlighting the complex interplay between organelle genomes and the nucleus.The research presented here contributes valuable insights into plant mitochondrial genomes' intricate structure and dynamics, which can inform future molecular breeding efforts for R. glutinosa and other plant species.However, further experimental evidence is needed to fully understand the specific mechanisms of RNA editing and the potential nuclear screening processes for  Molecular phylogenomic analysis of mitogenomes in Lamiales.The tree was constructed using concatenated conserved protein sequences from the mitogenomes of 21 species through maximum likelihood (ML) methods.Bootstrap scores were obtained using 1,000 replicates, and the ML bootstrap support values were indicated at the respective nodes.The tree in the upper left corner initially displays the original branch lengths.Two species from Oleaceae (Ligustrum quihoui and Osmanthus fragrans) were used as outgroups.

B A
Statistics on the type and quantity of RNA editing events.(A) The number of RNA editing events for each gene.(B) The quantity of various amino acid changes.
transferred genes.By expanding our understanding of plant mitochondrial genomes, we can better elucidate the factors that drive species evolution and develop targeted strategies for plant improvement.

FIGURE 2 A
FIGURE 2A schematic representation of the circular mitochondrial chromosome 1 and mitochondrial chromosome 2 of Rehmannia glutinosa is provided.Genes depicted on the inner side correspond to the negative strand, while those on the outer side represent the positive strand.Genes containing introns are marked with an asterisk (*).The gray circle illustrates the GC content, with an inner circle within the GC content graph denoting the 50% threshold.Different functional categories are indicated by the colors shown in the accompanying legend.

FIGURE 3
FIGURE 3 FIGURE 4 PCR validation of recombination products associated with repetitive sequence-mediated secondary configurations.(A) Schematic illustration of junctions related to each repetitive sequence.The corresponding primers are depicted as purple dots.F1-4: forward primers; R1-4: reverse primers.(B) Electrophoretic gel image of PCR products amplified using various forward and reverse primer combinations to amplify the DNA molecules corresponding to junctions 1-4.The name of the repetitive sequence, combinations of forward and reverse primers, expected junctions to be amplified, and lane numbers are displayed above the gel image.Each PCR product's expected size encompasses those of the repetitive sequence and its 200-1000 bp long flanking sequences.The PCR product lengths are a rough evaluation of the successful amplification of fragments representing recombination products.(C) Hypothetical products of homologous recombination mediated by repetitive sequences R1, R3, and R77.Arrows indicate the repeat units of R1, R3, and R77.Arcs connect two repeat units if they are located on the same chromosome.Sequences surrounding the repeat units are displayed in distinct colors.Circles represent circular chromosomes.The genomic configuration is denoted by "C" followed by the configuration and chromosome numbers.Double-headed arrows indicate the source circular chromosomes, the repetitive elements, and the product circular chromosomes.The genomic configuration name is prefixed with "Ma," representing "major" if it is the most abundant configuration; otherwise, the genomic configuration name is prefixed with "Mi," representing "minor."Mac is the genomic configuration containing chromosomes Mac1-1 and Mac1-2.Mac1-1 can undergo recombination mediated by R1 or R3 to form a circular chromosome Mic1-1 or Mic2-1.Mic3 only contains one circular chromosome, and it can undergo recombination mediated by R77 to form two circular chromosomes: Mac1-1 and Mac1-2.
FIGURE 6 Examplar homologos sequences bewteen the mitogenome and chloroplastome.(A) Similar sequences are shared between the mitogenome and chloroplastome.The yellow and green arcs represent the mitogenome and chloroplastome genome (labeled as cpDNA), respectively.The inner circle arcs represent the MTPT fragments.(B) A bird's eye view of MTPT, and the red box represents the enlarged part.(C) Mapping of long reads onto MTPT1 on chromosome 1.The MTPT sequence is highlighted in a green box.The encompassed regions illustrate upstream (mitoDNA) -MTPT -downstream (mitoDNA) sequences.A mitochondrial read is highlighted in yellow, bordered by mitoDNA sequences with MTPT sequence in the middle.

TABLE 1
Lamiales mitogenome sequences for the construction of the phylogenetic tree.

TABLE 2
Comparative genomic analysis of Lamiales mitogenome sequences.

TABLE 3
Gene contents in the mitogenome of R. glutinosa.

TABLE 4
The details of three direct repeats.