Mitochondrial Phylogenomics of Fagales Provides Insights Into Plant Mitogenome Mosaic Evolution

Fagales are an order of woody plants and comprise more than 1,100 species, most of which produce economically important timbers, nuts, and fruits. Their nuclear and plastid genomes are well-sequenced and provided valuable resources to study their phylogeny, breeding, resistance, etc. However, little is known about the mitochondrial genomes (mitogenomes), which hinder a full understanding of their genome evolution. In this study, we assembled complete mitogenomes of 23 species, covering five of the seven families of Fagales. These mitogenomes had similar gene sets but varied 2.4 times in size. The mitochondrial genes were highly conserved, and their capacity in phylogeny was challenging. The mitogenomic structure was extremely dynamic, and synteny among species was poor. Further analyses of the Fagales mitogenomes revealed extremely mosaic characteristics, with horizontal transfer (HGT)-like sequences from almost all seed plant taxa and even mitoviruses. The largest mitogenome, Carpinus cordata, did not have large amounts of specific sequences but instead contained a high proportion of sequences homologous to other Fagales. Independent and unequal transfers of third-party DNA, including nuclear genome and other resources, may partially account for the HGT-like fragments and unbalanced size expansions observed in Fagales mitogenomes. Supporting this, a mitochondrial plasmid-like of nuclear origin was found in Carpinus. Overall, we deciphered the last genetic materials of Fagales, and our large-scale analyses provide new insights into plant mitogenome evolution and size variation.


INTRODUCTION
The order Fagales of flowering plants belong to the Rosids clade of the Eudicotidae. Fagales contain more than 1,100 species in seven families and 33 genera, according to the Angiosperm Phylogeny Group (APG) system (Sennikov et al., 2016). They are one of the most critical woody plants that grow in tropical, subtropical, and temperate forests (Xiang et al., 2014). Many Fagales play significant roles in ecosystem support and food supply, including beeches, oaks, birches, and some nuts and fruits, such as walnuts, chestnuts, hazels, and bayberries. Some species can fix nitrogen via root nodules through symbiosis with bacteria.
Fagales is one of the most sequenced orders in angiosperms. To date, the nuclear genomes of at least 28 species from five families have been sequenced (https://www.plabipd.de). Besides, >150 Fagales plastomes have been released. These genomes provide valuable genetic resources for improving nut quality and disease resistance, and these genomes also increased our knowledge of their phylogeny, nitrogen fixation, and sex determination (e.g., Griesmann et al., 2018;Jia et al., 2019;Lovell et al., 2021;Lucas et al., 2021;Yang et al., 2021). However, despite this, the last genome in the cell, the mitochondrial genome (mitogenome), is seldom studied in Fagales. So far, only three mitogenomes have been released, namely, Betula pendula, Quercus variabilis, and Fagus sylvatica. The B. pendula mitogenome was derived from whole-genome sequencing (WGS) study, but only minimal information regarding the mitogenome was included (Salojärvi et al., 2017). The Q. variabilis mitogenome was similarly sparsely described (Bi et al., 2019). F. sylvatica mitogenome was published recently (Mader et al., 2020). The evolution of mitogenomes in Fagales remains unanswered. Parsing the last unknown genetic material is crucial for understanding the evolution and genomic resources of Fagales.
Mitogenome in plants exhibits many unique features compared with those in animals and fungi. In angiosperms, its size is highly expanded and also varies significantly among species, ranging from 200 Kb up to 11 Mb ; exception see Skippington et al., 2015). Duplications and foreign DNA, including plastid-derived insertions (referred to as mitochondrial plastid insertions, MTPTs), nuclear insertions, and even horizontal gene/DNA transfers (HGTs), contribute significantly to the expansion (Mower et al., 2012;Wynn and Christensen, 2018). Plant mitochondrial DNA has the lowest substitution in the cell, while the structure is highly dynamic, with even close relatives or individuals of the same species exhibiting differences (Wolfe et al., 1987;Palmer and Herbon, 1988). Rearrangements between repeats could generate substoichiometric isomers, and mitogenomic chromosomes often exhibited unusual structures, e.g., multipartite or branched (Cheng et al., 2017;Gualberto and Newton, 2017;Kozik et al., 2019). These unique characteristics of plant mitogenomes hinder the production of complete and high-quality assemblies. In many plant species, the mitogenome has become the last genome that remains to be deciphered. Mitogenomic publications, to date, have usually focused on one or a few species, and large-scale comparisons are still scarce. The full scope of mitogenome evolution remains obscure.
In this study, we assembled complete mitogenomes of 23 Fagales species, including 16 genera from five families, covering almost half of the total Fagales genera and 71% of the total families, respectively. We showed that the mitogenomes in Fagales are extremely mosaic and rich in HGT-like sequences. Mitogenome size varies significantly among species and is likely affected by third-party DNA such as nuclear genome or some viruses. This is one of the few studies that comprise the largest number of new and complete angiosperm mitogenomes yet produced. It gives many comprehensive insights into the mitogenomic evolution in Fagales as well as in angiosperms.

Sequenced Data Acquisition
Raw reads used for our assembly were all obtained from the NCBI SRA database (https://www.ncbi.nlm.nih.gov/sra). Fagales genome sequencing projects were retrieved from SRA, and finally, 23 species from 16 genera and five families were selected to perform the assembly (Supplementary Table 1). All the data were whole-genome sequencing, meaning that reads included sequences from the nuclear, mitochondrial, and plastid genomes. Organelle genome sequences are usually smaller than nuclear sequences but are present at much higher copy numbers. Therefore, relatively small amounts of data were enough to obtain mitogenomes and plastomes in Fagales.

Genome Assembly
Raw reads of each species were filtered for low-quality bases using TRIMMOMATIC v0.36 (Bolger et al., 2014). Clean reads of ∼2-4 Gb were used for de novo assembly with SPADES v3.13 (Bankevich et al., 2012) (Supplementary Table 1). Plastid contigs of Casuarina equisetifolia, Lithocarpus fenestratus, and Quercus suber were obtained by BLASTN v2.9.0 (Camacho et al., 2009) searches of all assembled contigs against the B. pendula plastid genome (plastome, GenBank ID: NC_044852). Clean reads were then mapped to plastid contigs using GENEIOUS R10 (Biomatters, Inc.), and contigs were extended and connected manually until joined. Inverted repeat (IR) boundaries were identified by searching repeats using the GENEIOUS "Repeat Finder" plugin. Mitogenomes are often more variable than plastomes in terms of DNA sequences and structure. Preliminary mitogenome contigs were identified from total contigs by BLASTN with the B. pendula mitogenome (GenBank ID: LT855379) as a reference (word size: 16, evalue: 1e-20). All hit sequences longer than 500 bp were extracted. Two subsequent strategies were used to improve completeness and sequence content. For completeness, contigs were annotated using the GENEIOUS "Annotate from Database" function, where the "Database" comprised all known mitochondrial genes. If known mitochondrial genes were absent, reads were mapped to the genes (of reference) to check coverage and confirm the presence or absence. Mitochondrial genes that were missing from preliminary contigs were used to search all contigs, and identified contigs were added to the preliminary mitogenome contig set. This strategy ensured that gene sets for mitogenome assemblies were complete. For DNA content, clean reads were mapped back to the selected contigs. Plastid (higher coverage) and other contigs (unbalanced coverage) were removed from the set to provide approximate mitogenome coverage, which was then used to bait other potential mitochondrial contigs from all contigs. Newly selected contigs were mapped back by reads, and nonmitochondrial contigs were removed as before. This strategy reduced the amount of missing sequences and ensured that mitogenome assemblies were as complete as possible.
Next, the comprehensive mitochondrial contig sets were joined together. Contigs normally ended with repeat and/or MTPT sequences. Repeats longer than 50 bp in contigs were found using GENEIOUS "Repeat Finder, " and paired reads were mapped to contigs. Repeat regions were identified and resolved using sequencing coverage. Connections of long repeats may introduce artificial rearrangements. MTPTs are very similar to plastome sequences, and it is not usually possible to assemble MTPTs directly into contigs. MTPTs can be identified on plastomes (or plastid contigs); however, unlike repeat sequences, MTPT regions cannot be easily resolved by coverage as plastome coverage is usually much higher than mitogenome coverage. Repeats were filled and contigs were connected at both ends, after which MTPT ends were mapped to the plastome. After plastome mapping, the closest ends in the same orientation were most likely derived from the same MTPT. Rearrangements or recombination can occur within MTPTs, resulting in extended distances between sequences or opposite orientations with respect to plastome mapping. In these circumstances, paired reads could be used to identify the correct connections. MTPTs and their plastid counterparts may not be 100% identical, and additional steps were needed to correct MTPTs identified in the previous step. Reads were re-mapped and the divergent bases were manually checked and corrected, and reads that were 100% identical to the plastome were filtered, with the remaining "unused reads" re-mapped to mitogenomes to enhance the identification of divergent bases.
Several iterations of the map-check-connect strategies outlined above were usually sufficient to resolve all the repetitive and MTPT ends and retrieve one or more circular chromosomes. As the last step, paired-end reads were re-mapped a final time to check and correct any misassemblies and ensure that all bases were correct. The processes of the assembly are depicted in Supplementary Figure 1.

Annotation
Putative mitochondrial protein-coding and rRNA genes were annotated by similarity to known mitochondrial genes, followed by manual corrections, and tRNA genes were predicted using tRNAscan-SE v2.0 (Chan and Lowe, 2019). Coding genes with disrupted reading frames, premature stop codons, or non-triplet frameshifts were annotated as pseudogenes.
Mitochondrial plastid insertions were determined by BLASTN comparison to a collection of plastomes. Hits smaller than 100 bp were masked. Dispersed repeats within the genome were searched by BLASTN against itself. Hits with identity <95% were filtered. Repeat lengths were determined using a custom Perl script. Only one part of each repeat pair was calculated, and overlapping bases were counted only once.

Genus-Specific Sequence (GSS) Analysis
A BLAST program was used to compare mitogenomes to a database comprising all Fagales mitogenomes, with an e-value of 1e-5 and word size of 16. GSSs i.e., sequences present only within the specific Fagales genus) longer than 300 bp were isolated using a custom Python script. Short hits short than 70 bp were masked. Quercus species exhibited non-monophyletic relationships (Figure 3), and Q. robur was considered as a single genus in the analysis. GSSs were searched against the NCBI nt database, with parameters as before, and each saved the first 100 hits. The best hits for each GSS were examined (more than one best hit was possible if sequences matched different targets) using a custom Python script. Only best hits longer than 100 bp were used, and MTPTs were removed from the results. Subsequently, the best matches were grouped into orders, and a face-toface tree was plotted in R using the APE package cophyloplot function (Paradis et al., 2004). Connections were colored using RColorBrewer (https://colorbrewer2.org/), and orders were positioned with reference to the Angiosperm Phylogeny Group website (https://www.mobot.org/MOBOT/research/APweb/).

Synteny Inference
Mitogenome syntenies between families were plotted using CIRCOS v0.69 (Krzywinski et al., 2009). Links were searched by BLASTN with default parameters and hits shorter than 500 bp were excluded. Syntenies within each family were plotted by Python version MCscan of JCVI utility libraries v1.1.17 (Tang et al., 2008). The mitogenomes were cut into 300-bp pieces and forced to use as genes to search orthologous regions with parameter -cscore = 0.99. Syntenies between Carpinus, Fagus and, J. microcarpa, and other mitogenomes, respectively, were also plotted by MCscan with -cscore = 0.7.

Assembly and Completeness Assessment
Our assembly approach focused on solving disconnections caused by repeats and MTPTs, which are two main difficulties of mitogenome assembly. Sequencing coverage was used to resolve repeats; MTPTs were identified using their positions and directions on the plastome (Supplementary Figure 1). For each species, 2-3 Gb bases were used for the assembly, and the final coverage depth ranged from 33 to 174 (Supplementary Table 1). One disadvantage of short reads is their inability to process long repeats. The structure of our assemblies could only represent one potential type. Of the 23 species, 13 yielded one or more circular mitogenomes, and the remaining 10 species contained 1 | Basic information of Fagales mitogenomes. In the column "Chr," the number means the total number of chromosomes, while "C" and "L" behind represent "circular" and "linear," respectively.

Species
Family length ( The two species with asterisks mean published by others. For Fagus, we used our assembly. one or more linear chromosomes ( Table 1). The mitogenome of Fagus sylvatica was previously assembled using both long and short reads to produce a single circular chromosome of 504,715 bp in length (Mader et al., 2020). The sequence content of the published assembly was almost identical to that of the Fagus assembly produced in this study, differing only in two bases. The only disparity between the two assemblies was an inversion of a sequence located between 900-bp repeats. The consistency between our assembly and that of the previous study provided support for the practicability and reliability of our assembly methods.

Mitogenome Size and Content
Characteristics of the mitogenome assemblies produced in this study, as well as previously published B. pendula and Q. variabilis assemblies, are provided in Table 1 and Figure 1. Mitogenome sizes in Casuarinaceae, Fagaceae, and Myricaceae resembled those of distant relatives from Rosales or Fabales (400 Kb and 480 Kb on average, respectively, NCBI data). By contrast, mitogenome sizes were substantially expanded in Betulaceae and Juglandaceae. The largest mitogenome was found in Carpinus cordata (922 Kb; Betulaceae) and was much larger than those of confamiliar species. Mitogenome sequences were less similar, and structures were highly rearranged, and many sequences have no homologs in other species, no matter between or within families (Figure 2 and Supplementary Figure 2). The proportion of repeats in Fagales mitogenomes was small, typically <3% and no more than 6.2% of the total mitogenome length ( Table 1). In Betulaceae, short repeats of <200 bp were more apparent, especially in Alnus (Supplementary Table 2). MTPT percentages were also low, with only two species having more than 6% (Casu. equisetifolia, 13.5%; and Corylus, ∼9.5%).
The gene content of Fagales resembles other angiosperms. The 24 "core" protein-coding genes (atp1, 4, 6, 8 and 9, ccmB, C, Fc and Fn, cob, cox1-3, nad1-7, 9 and 4L, matR, and mttB), three ribosomal RNA genes (rrn5, rrnS, and rrnL), and two succinate dehydrogenase subunit genes (sdh3 and sdh4) are well preserved. As in many plants, the conservation of ribosomal protein genes is poor (Figure 1). Only 5 of them, rpl5, rpl10, rps1, rps4, and rps12, exist in all. Five of the seven Betulaceae species had rps11 sequences with approximate identities of 100%. Comparison of Betulaceae rps11 sequences with those in the NCBI nt database indicated similarities with rps11 in monocots or basal core angiosperms such as Triantha glutinosa (KX808303, Alismatales) and Liriodendron tulipifera (NC_021152, Magnoliales), consistent with previous research (Bergthorsson et al., 2003). These similarities suggested that HGT of rps11 may have occurred in a common Betulaceae ancestor, followed by differential losses in some species. Exon FIGURE 1 | Sampling, mitogenome size, and gene variations. The gray bars in the middle showed the size of the mitogenomes. The grids on the right exhibit gene variations with black, gray, and blank indicating the gene intact, pseudo, and missing, respectively. The plastid tree was used to exhibit the species relationship. The breaks and reunion of the nad1e4-matR-nad1e5 block are marked on the branches. 4 of nad1 (nad1e4), matR, and nad1e5 forms a colinear block in many angiosperms. This block was disrupted between matR and nad1e5 at least twice in Fagales species but, surprisingly, was recovered in J. sigillata and J. regia (Figure 1).

Phylogenetic Relationship
Phylogeny was reconstructed using four matrices from all the three genomes, namely, mitochondrial genes with introns (68,743 bp in length and 2,126 parsimony informative sites, PIS), mitochondrial CDSs without RNA-edited sites (31,551 bp and 750 PIS), plastid CDSs (69,243 bp and 6,495 PIS), and nuclear 45S (6,019 bp and 444 PIS). The trees of mitochondrial gene and plastid were robust, while those of mitochondrial CDS and nuclear were poorly supported (Figure 3). The nuclear tree was mostly congruent with the plastid, despite some nodes in Juglandaceae and Fagaceae. The most incongruence of the two trees was the position of Myricaceae, which was placed as the sister group of either "Betulaceae + Casuarinaceae" or "Betulaceae + Casuarinaceae + Juglandaceae".

Genus-Specific Sequences and Mosaic Origins
Repeat and MTPT sequences were not solely sufficient to explain the substantial size variation observed among mitogenomes from different species (Table 1). Furthermore, GSSs were identified and used to explore the causes of mitogenome size divergence. Quercus species were found to have nonmonophyletic relationships (Figure 3), and Q. robur was not included with other Quercus species when identifying Quercus-specific sequences. The GSSs and the total length of each species were given in Supplementary Tables 4, 6, respectively. As expected, Juglans yielded a large number of GSSs (105 Kb) since it has six species analyzed and their mitogenome size is generally bigger than those of relatives in the family. However, GSSs and mitogenome size showed a poor correlation in many other genera. Casuarina, which had relatively small mitogenomes and had the most GSSs (166 Kb). A similar situation was also observed in Fagus (105 Kb) and Morella (98 Kb). By contrast, Carpinus, which is the largest mitogenome and much longer than close relatives, did not contain correspondingly long GSSs (32 Kb).
Moreover, we searched these GSSs against NCBI nt database to detect the potential origins. Best matches of each region were retrieved and then grouped by compartment and order (Figure 4; Supplementary Tables 5, 6). Overall, the GSSs were related to a range of seed plant lineages and were mainly of mitogenomic origin (Figure 4). Some genera contained more best matches from certain orders, such as Casuarina from FIGURE 3 | Phylogenetic trees reconstructed by mitochondrial genes, mitochondrial CDSs, plastid CDSs, and nuclear 45S. Numbers at nodes indicate the bootstrap support, and the full supports (100) were marked by asterisks. Each family used a different color as the background.
Amborellales; Morella from Lamiales and Ericales; Juglans from Lamiales, Malpighiales, and Magnoliales. Most of the GSSs and best matches were short, while some were quite long (Supplementary Figure 3).

Identification of Other DNA
Mitochondrial plasmids are small autonomous circular or linear extrachromosomal DNA molecules in mitochondria, and these plasmids have been found in several species, including maize, rice, and carrot (McDermott et al., 2008). The origins and functions of mitochondrial plasmids remain unclear. In Carpinus, one 2,888-bp circular plasmid-like sequence was identified from the contigs. Its sequencing coverage resembled that of the mitogenome. Except for a small 240-bp plastidlike region, the circle had no sequence similarities with known angiosperm mitogenomes, including Fagales. It could be fully encompassed by Carpinus avellana or Car. fangiana nuclear sequences from different chromosomes. Its GC content was close to nuclear Carpinus genomes (Car. fangiana: 37.6%; Yang et al., 2020), but much lower than mtDNA ( Table 1). Two open reading frames (ORFs), ORF244 (732 bp) and ORF162 (486 bp), could be predicted on the plasmid-like sequence. BLASTP comparison against the nr database identified homologs of ORF244 in several angiosperm species, including a nearly full-length match in Arabidopsis thaliana (AT1G74875, identical 34%). ORF244 homologs were annotated as putative F-box proteins, and homologs of ORF162 were annotated as DNA methylation four factors in several Rosids. It was unclear whether the two ORFs were expressed, but there was sufficient evidence to conclude that the sequence was of nuclear origin.
Mitovirus-like sequences were also found in several Fagales. Mitoviruses, which belong to the Narnaviridae family, are positive single-stranded RNA viruses that replicate in host mitochondria. Mitovirus genomes are small, approximately 2.1-4.4 Kb in length, and contain a single ORF encoding a viral RNAdependent RNA polymerase (RdRP) required for replication (Nibert, 2017). In Betula, a ca. 2-Kb region best matched two mitoviruses (GenBank: MN034926 and MN033122) in NCBI nt database. Behind the mitoviruses, it has another long hit from the mitogenome of Ilex pubescens (Aquifoliaceae, Asterids). In contrast, other hits were much shorter. Searching against Fagales mitogenomes, this region could get hits from many species. We used hits longer than 700 bp from these two databases to build the phylogeny (Figure 5). The tree revealed that these sequences were likely introduced into Fagales via multiple events.
Fagales belong to the nitrogen-fixing lineage of angiosperms, and at least three genera in this study have nitrogenfixing capacity: Casuarina, Morella, and Alnus (Yelenik and D'Antonio, 2013;Huisman and Geurts, 2020). However, there was no indication that these genera contained sequences similar to bacteria.

Assembling Plant Mitogenomes With Short Reads
Short reads can be used to retrieve plant mitogenomes that have been verified in many species. Their shortcomings are also apparent. Short reads could not overcome repeats longer than the sequencing read or insert length, resulting in fragmental contigs and artificial connections. From our experience and a brief survey, the quality of some known mitogenomes is concerning, including unfeasibly long repeats and MTPTs, inappropriate circularization, or having missing sequences, such as the absence of ribosomal RNA genes rrnS and rrnL. Those are challenging to reuse these data to get stringent conclusions. Our assembly method used in this study can obtain complete mitogenomes efficiently. Visual processes in the powerful software GENEIOUS allowed full verification of every base. The method might be not new and have similar versions used in other laboratories. However, we still want to highlight it here for the potential improvement of the future assemblies.
On the other hand, the structure of mitogenome in vivo is remaining mysterious. Despite the complex mitogenome structures observed under microscopy (Backert and Börner, 2000;Manchekar et al., 2006;Cheng et al., 2017), most mitogenomes can still be assembled as circles in silico. The connection underneath is unclear. Plant mitogenomes experience frequent rearrangements through their long repeats (Kozik et al., 2019), and it is thus unclear whether these mitogenomes can be considered to have a standard structure.
The mitogenomes of Fagus sylvatica in the two independent projects were almost identical (this study and Mader et al., 2020), indicating preservation of mitogenomes among individuals in at least some plant species.

Mitochondrial DNA and Phylogeny
We used four datasets to reconstruct the trees. The plastid tree is congruent with the previous study (Yang et al., 2021). In the matrix of nuclear 45S, most of the PISs are located in the internal transcribed spacers (ITSs) and seldom in rRNA genes. However, ITSs evolve quickly and some regions were difficult to align between families. That could be the reason for the low support of the nuclear tree. Mitochondrion, like plastid, may also reflect the evolution of the cytoplasm. Nevertheless, the capacity of mtDNA in phylogeny is yet to decide. Mitochondrial genes contain hundreds of RNA-edited sites (Small et al., 2020). These RNA-edited sites may interfere the tree building (Bowe and dePamphilis, 1996). A good way is to remove these sites. The poor support of mitochondrial CDSs without edited sites is mainly because of the low substitution rate (Palmer and Herbon, 1988), and the PISs were not enough. Although the mitochondrial genes, which include RNA-edited sites and introns, got a more robust tree, the incongruence between our two mitochondrial FIGURE 5 | Mitovirus-like sequences in Fagales and other species. Long hits (>700 bp) were retrieved by searching the mitovirus-like sequence in Betula against NCBI nt database and other Fagales mitogenomes. At the right, lines show the position of the hits, and color indicates the similarity compared to the mitovirus-like sequence in Betula (red lines). The left tree was constructed by these hits using the ML method. The blue and green colors on the tree highlighted Ilex and mitoviruses, respectively. datasets may also challenge the ability of mitochondrial genes in phylogenetic construction in Fagales. For the noncoding regions, mitogenomes often convert DNA lesions into DBSs followed by inaccurate nonhomologous repairs (Gualberto and Newton, 2017;Christensen, 2018), which may introduce random mutations that mislead the tree building.

Mitogenome Size Variation in Fagales
Size variation between close species is a common feature of plant mitogenomes and has been observed in a range of taxa, such as Viscum album and V. scurruloideum (565 Kb vs. 66 Kb; Petersen et al., 2015;Skippington et al., 2015), Silene conica and S. noctiflora (11.1 Mb vs. 6.7 Mb; Wu et al., 2015;Wu and Sloan, 2018), and Cucumis melo and C. sativus (2.7 Mb vs. 1.7 Mb; Alverson et al., 2011;Rodríguez-Moreno et al., 2011). The reasons for this size variability may be complex. Duplications, intracellular transfer events, and introductions of foreign DNA all contribute to mitogenome size expansion (Alverson et al., 2011;Rice et al., 2013). In Fagales, the mitogenome of Carpinus is notably larger than those of close relatives. However, lengths of repeats, MTPTs, and GSSs were insufficient to explain the size divergence. Another possibility is that the Carpinus mitogenome has an unusually high number of homologs with other Fagales, which was confirmed by the homolog searches between Carpinus and other Fagales (Figure 6). Most interestingly, it raises the question of what was the ancestral mitogenome like in Fagales. One potential is that the ancestral mitogenome was similarly as large as that of Carpinus, and sequences were then lost independently in different lineages during evolution. This model was used to explain the mitogenome size variation in kiwifruits . However, it appears unlikely that all Fagales genera other than Carpinus experienced such large and variable sequence losses, suggesting that sequence transfer may be a more likely scenario for Fagales. Intracellular DNA transfer between genome compartments is a common phenomenon. Interactions between nuclear and mitochondrial genomes may occur frequently and contribute significantly to mitogenome size (Alverson et al., 2011;Goremykin et al., 2012). Although we attempted to analyze the nuclear insertions in the mitogenomes, our efforts did not yield, because Fagales nuclear genomes seem all to contain mitochondrial contigs. In this case, it is challenging to get accurate results. Instead, the plasmid-like sequence of nuclear origin was found in Carpinus, which could reflect that its mitogenome has integrated nuclear insertions. The FIGURE 6 | Carpinus mitogenome shared more homologous sequences with other Fagales. Homologous sequences between Carpinus and other Fagales mitogenomes were linked by gray lines. GSSs shared no homologous with others so that it would leave blank spaces that had no connections. Carpinus showed little blank spaces, and it proved that it has more homologous sequences with other Fagales. Fagus and J. microcarpa were used as comparisons.
mitochondrial plasmid may be an intermediary stage prior to incorporation into the chromosomal mitogenome.

Mosaic Evolution of Mitogenomes
Our GSS analysis showed that Fagales mitogenomes exhibited mosaic characteristics, which could be a general feature of all seed plant mitogenomes. Plant mitogenomes are prone to absorb foreign DNA, but this ability also has its limits. "Mitochondrial fusion occurs in a fundamentally similar manner" (Rice et al., 2013), so that plant mitogenomes are easier to get DNA from other plants. Our results complied with this rule as no sequences appeared to be derived from other cellular organisms out of seed plants, even though some species were symbionts with nitrogen-fixing bacteria. Some of these GSSs are likely horizontally transferred, especially those orders that are distantly related to Fagales but received large amounts of best hits, such as Amborella. A previous study has shown that Amborella contains HGTs from many species, including Fagales (Rice et al., 2013). We found that these HGT-like sequences were mainly shared with Casuarina. As we used GSSs in the analysis, the direction of them was undecided. The "wounding-HGT model" could explain massive HGTs between nonparasitic plants (Rice et al., 2013). In comparison, this model seems unconvincing if applied to the widespread mosaic sequences in land plants.
Hints may come from the mitovirus-like sequences in this study. Mitovirus sequences, particularly those corresponding to the RdRP region, are widespread in plant nuclear and mitochondrial genomes (Alverson et al., 2011;Bruenn et al., 2015;Nibert, 2017;Silva et al., 2017;Chu et al., 2018;Nibert et al., 2018;Charon et al., 2020). Plant mitovirus-like sequences were thought to be derived from plant pathogenic fungal interactions and HGT events (Bruenn et al., 2015). However, direct HGT from fungal to plant mitogenomes is unlikely, as incompatibility hampers fusion between mitochondria in fungi and plants (Rice et al., 2013). An alternative path is transferring from fungi to the plant nuclear genome, and then from the nucleus to the plant mitogenome. This idea was also excluded by searching the mitovirus-like sequence against B. nana and B. pendula nuclear genomes (Wang et al., 2013;Salojärvi et al., 2017). It is therefore possible that mitoviruses can infect plants directly and frequently (Figure 5; Vong et al., 2019). The mitovirus-like sequence found in Ilex pubescens could also be another independent infection instead of HGT from Fagales.
In conclusion, the "third-party" DNA, including mitovirus and nuclear insertions, may account partially for the mosaic composition of plant mitogenomes. The mosaic HGT-like sequences in angiosperm mitogenomes may be similar underneath to the expanded set of homologs observed in Carpinus. If two species get DNA from the same source, we sometimes can make an illusion that similar sequences are shared with far-away lineages; if different dosages were transferred in independent events, some species may share more homologs with others (Figure 7). Since the transfers between the third parties and mitogenomes could happen independently and were not limited to time, and mitogenomes themselves also encountered continuous rearrangements and deletion, from time to time it would finally create extremely mosaic mitogenomes.

AUTHOR CONTRIBUTIONS
YF, ZF, and XJ designed the project. YF assembled and annotated the genomes. YF, XX, DA, RP, ZF, and XJ worked together to finish the analyses and the manuscript. All authors contributed to the article and approved the submitted version.