The Multipartite Mitochondrial Genome of Marama (Tylosema esculentum)

Tylosema esculentum (marama bean), a wild legume from tropical Africa, has long been considered as a potential crop for local farmers due to its rich nutritional value. Genomics research of marama is indispensable for the domestication and varietal improvement of the bean. The chloroplast genome of marama has been sequenced and assembled previously using a hybrid approach based on both Illumina and PacBio data. In this study, a similar method was used to assemble the mitochondrial genome of marama. The mitochondrial genome of the experimental individual has been confirmed to have two large circles OK638188 and OK638189, which do not recombine according to the data. However, they may be able to restructure into five smaller circles through recombination on the 4 pairs of long repeats (>1 kb). The total length of marama mitogenome is 399,572 bp. A 9,798 bp DNA fragment has been found that is homologous to the chloroplast genome of marama, accounting for 2.5% of the mitogenome. In the Fabaceae family, the mitogenome of Millettia pinnata is highly similar to marama, including for both the genes present and the total size. Some genes including cox2, rpl10, rps1, and sdh4 have been lost during the evolution of angiosperms and are absent in the mitogenomes of some legumes. However, these remain intact and functional in marama. Another set of genes, rpl2, rps2, rps7, rps11, rps13, and rps19 are either absent, or present as pseudogenes, in the mitogenome of marama.


INTRODUCTION
Tylosema esculentum, also known as gemsbok bean, tamani berry, and marama bean, is a longlived perennial legume native to the Kalahari Desert and adjacent arid and semi-arid regions in Botswana, Namibia, and South Africa (Bower et al., 1988; Figure 1). Marama naturally grows in environments of long-term drought, low rainfall (100-900 mm), and extreme high temperatures (daily maximum of 37 • C in the growing season) (van der Maesen, 2006). Marama is often referred to as "the green gold of Africa, " and has long been considered as a potential crop due to its highly nutritious seeds and tubers.
The protein content of marama seeds reaches 30-39% dry matter (dm) (Belitz et al., 2004;Dakora, 2013) and the lipid content ranges from 35 to 48% dm (Amarteifio and Moholo, 1998), both of which are comparable to commercial crops like soybean and peanut (Belitz et al., 2004). Marama FIGURE 1 | Morphology of marama (Tylosema esculentum) plants from Namibia. (A) Double-lobed leaves, soft and red-brown when young, become leathery and gray-green over time (National Research Council [NRC], 2006). (B) Brownish-black seeds, edible after roasting and rich in protein, lipids, and micronutrients (Jackson et al., 2010). (C) Yellow-orange flowers, bloom every midsummer from the 3rd or 4th year after planting (Bower et al., 1988). (D) Prostrate vines, can grow up to six meters or longer for mature plants (Vietmeyer, 1986). also provides many micronutrients including calcium, iron, zinc, phosphate, magnesium and B vitamins (folate), as well as various phytonutrients, such as phenolic compounds (tannins), trypsin inhibitors, phytates and oligosaccharides, which can help prevent cardiovascular diseases, diabetes, and certain cancers (Jackson et al., 2010).
Despite its social-economic value, as an orphan crop, marama is still underdeveloped in terms of research and breeding. Domesticating marama and increasing its seed yield could increase the food security in arid zones where traditional crops are not suitable for cultivation, and diversify the livelihoods of people living there.
Marama is a basal legume that does not form nodules, which may be beneficial for drought avoidance and make the plant better adapted to arid environments (Cullis et al., 2018). The genome size of marama was estimated at about 1 gigabase through Feulgen staining (Takundwa et al., 2012). The number of chromosomes is 42 and it is speculated that marama is a hexaploid (7 chromosomes for haploid) (Takundwa et al., 2012). The complete T. esculentum chloroplast genome was assembled and found to be 161,537 bp in length and contains a specific inversion of 7,479 bp, which includes the six genes rbcL, accD, psaI, ycf4, cemA, and petA, and has not been found in other legumes (Kim and Cullis, 2017). A large amount of genetic diversity was found in Namibian marama germplasm within populations, but no significant differences between populations, through assessment using SSR markers, and the level of genetic diversity was thought to be related to the plant growth environment and stress response (Nepolo, 2010).
As the powerhouse of cells, mitochondria are where oxidative phosphorylation takes place and the cellular energy source adenosine triphosphate (ATP) is generated (Siekevitz, 1957). Mitochondria play an important role in the growth and development of organisms by regulating respiration, metabolism, programmed cell death (PCD) (Galluzzi et al., 2012) and intracellular signaling (Epstein et al., 2001). Mitochondrial defects are associated with cytoplasmic male sterility (CMS) in plants, preventing them from producing functional pollen. This can be utilized by breeders to ensure crossing of selfpollinating plants and obtain hybrid seeds (Kempken and Pring, 1999). Mitochondrial function is also believed to be related to the adaptability of organisms to the environment and their response to abiotic stress. For example, during the development of mung bean (Vigna radiata) cotyledons, in vivo freeze-thaw treatment converted the mitochondrial rosette genome structure to a longer linear DNA shown by electron microscopy (Cheng et al., 2016). Knowing whether different mtDNA structures coexist in marama individuals and the transition between them may provide information to understand better why marama can perform well in harsh environments. In addition, since plant mitochondrial genes are highly conserved with a low nucleotide mutation rate, they can be used in the study of plant evolution (Palmer and Herbon, 1988).
Self-incompatibility (SI) is a common genetically determined mechanism in flowering plants that promotes outcrossing and prevents inbreeding to increase genetic diversity. Marama is selfincompatible. This is one major obstacle to its cultivation and breeding, since SI not only reduces the yield of seeds and/or fruits for crops (Miller and Gross, 2011), but also makes it extremely difficult to combine desirable traits of two incompatible parents through simple cross-pollination (Claessen et al., 2019). The interaction between NaSIPP, a mitochondrial protein with phosphate transporter activity, exclusively transcribed in mature pollen and stigma protein NaStEP has been found essential to SI in Nicotiana spp., and this may destabilize mitochondria and stop pollen tube growth (García-Valencia et al., 2017). Since most self-incompatible Fabaceae plants are believed to apply a similar gametophytic SI system, the sequencing of marama mtDNA is indispensable for studying marama selfincompatibility (Aguiar et al., 2015).
Plant mitochondrial genomes are very complex and diverse in size, sequence arrangement, repeat content and structure (Kubo and Mikami, 2007;Mower et al., 2012). Unlike animal mitochondrial genomes, which are usually only 16-20 kb, the mitochondrial genome of higher plants can vary from 200 to 2,000 kb in size (Fauron et al., 1995). Some newly sequenced plant mitogenomes exceed this range, even reaching 11.3 Mb in the flowering plant Silene conica . Although many plant mitochondrial genomes can be combined into one ring, in fact, their physical structure may be very complex and composed of multiple circular, linear, and branched molecules suggested by electron microscopy (Backert et al., 1997). There is a large number of repeats in plant mitochondrial genomes, including long repeats (>1 kb), short repeats (<100 nt), and tandem repeats. Repeat-mediated recombination is believed to be the main cause of changes in the structure of mtDNA and plays an important role in mitochondrial replication and repair (Maréchal and Brisson, 2010). Although the size of plant mitogenomes is often large, their gene pool (24 core genes with 17 variable genes) is usually small, since many genes have been either lost or transferred to nucleus during angiosperm evolution, but the coding sequences of the remaining genes are highly conserved (Adams et al., 2002).
The existence of many repetitive sequences and multipartite subgenomes makes the assembly of plant mitochondrial genome difficult. The hybrid assembly approach using PacBio long reads to fill the gaps together with high coverage Illumina short reads to correct sequencing errors has, however, proven to be effective in accomplishing this task (Kozik et al., 2019). In this study, a similar method has been used to unveil the mitochondrial genome of marama.

Plant Materials Sources and Genome Sequencing
The marama samples were collected from plants growing at the University of Pretoria Farm in South Africa, and from various locations in Namibia. Total DNA was extracted from fresh young leaves using a Qiagen Plant miniprep kit following the manufacturer's protocol. The purity of DNA samples was assessed using a NanoDrop spectrophotometer to measure the ratio of 260/280 and 260/230 and was confirmed by electrophoresing 20 µL of the sample on a 1.0% agarose TBE gel at 100 V running for 50 min, using GelRed R for DNA staining. The samples were then sent to the Génome Québec Innovation Centre, CWRU Genomics Core and Novogene Corporation for the Illumina sequencing and the PacBio RSII SMRT/Québec platform was used to generate the long-read sequencing data. 179,470,509 reads covering 26.9 Gb were generated from the Illumina HiSeq 2000 platform for an individual collected in Namibia and the data was used for the first round assembly due to the high coverage. 21,373,859 reads, 37,816,777 reads and 46,425,865 reads were obtained from the same Illumina platform for one individual growing at the University of Pretoria Farm and two other plants grown from seeds collected in Namibia. The sample from the University of Pretoria Farm was also sequenced in five PacBio SMRT cells, producing 1.78 Gb of sequence with an average coverage of 15×-20× for the mitogenomes. All raw Illumina and PacBio reads were submitted to the NCBI SRA database 1 under the accession number PRJNA779273.

Mitochondrial Genome Assembly
The paired end Illumina reads were assembled de novo by ABySS, which were further elongated using DBG2OLC to get the preliminary genome assembly. The mitochondrial contigs  According to the connections shown in the dashed boxes, the 16 scaffolds can also be assembled into two long rings, as shown in Figure 3. The PacBio long reads data confirmed that these two structures co-exist in the experimental individual and they are close to the same frequency.
were retrieved based on the alignment and BLAST with known plant's, especially Fabaceae, mitogenomes. Ten mitochondrial genomes from 7 legumes and 3 other model plants including Arabidopsis thaliana, Glycine max, Lotus japonicas, Medicago truncatula, Millettia pinnata, Vicia faba, Vigna angularis, Vigna radiate, and Zea mays (2) were retrieved from NCBI Organelle Genome Resources. The Illumina reads of the studied individual were aligned to the 10 mitogenomes by Bowtie 2 using the i-Plant Discovery Environment (Cyverse). The mitochondrial genome of Millettia pinnata (NC_016742.1) showed the highest alignment rate and was used as the reference to recover the marama mitochondrial genome. The coding sequences of Millettia pinnata mitochondrial genes including atp1, atp4, atp6, atp8, atp9, ccmB, ccmC, ccmFc, ccmFn, cob, cox1, cox2, cox3, matR, mttB, nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad7, nad9, rpl2, rpl5, rpl16, rps1, rps2, rps3, rps4, rps7, rps10, rps11, rps12, rps14, rps19, sdh3, and sdh4 were obtained from NCBI GenBank and used to find corresponding homologs in the marama genome assembly using BLAST. The acquired homologous sequences were then extended and assembled manually based on both the Illumina and PacBio sequencing data. A 500 bp sequence at both ends of each contig was BLASTed against the PacBio database on Geneious 9, and all obtained results were BLASTed back to the genome assembly to show all the possible connections. The ID and number of PacBio reads going across each contig junction were recorded, manually counted and saved in a document. Two long sequences were eventually generated from these steps.
The Illumina reads of the sample from the University of Pretoria Farm were aligned to the two assembled sequences using Frontiers in Plant Science | www.frontiersin.org Bowtie 2. In total, 16 primary scaffolds were obtained and then they were used for the second round of assembly. The process was repeated until the complete mitochondrial genome of marama has been recovered. The results were verified through alignment using the Illumina reads of other marama individuals against the assembled sequences by Bowtie 2 and the alignments were visualized in IGV.

Gene Annotation
The protein coding and rRNA genes were annotated using MITOFY 2 and AGORA 3 . The two programs gave similar annotation results including gene sets, positions and matching rates, which were summarized and recorded in one spreadsheet. MITOFY BLASTN was used to annotate tRNA genes in the assembled genome. The sequence of the two circles comprising the marama mitochondrial genome and corresponding annotations were uploaded to GenBank under the accession numbers OK638188 and OK638189. The circular visualization of the gene annotation was generated using OGDRAW 4 .

RESULTS AND DISCUSSION
The Bowtie2 alignments resulted in 0.48% of the marama Illumina reads mapping to the mitochondrial genome of Millettia pinnata, 0.43% to Lotus japonicus, 0.31% to Glycine max and 0.30% to Vigna radiate. Therefore, the homologs of Millettia pinnata mitochondrial genes in the marama genome assembly were used as the starting point for the assembly. The sequencing data was initially assembled into 16 primary scaffolds, varying in size from 2,351 to 56,817 bp with a median of 27,406 bp ( Table 1). According to the PacBio long reads at the junctions, these scaffolds joined to form five possible circular molecules, the sizes of which are 169,330, 44,455, 39,474, 32,520, and 113,793 bp (Figure 2 and Table 2). The Illumina short
reads were remapped to the final assembly using Bowtie 2 and the proportion of mitochondrial reads in the whole genome sequence data varied from 2 to 3% between individuals. Four large repeats of 5,212, 2,351, 3,908, and 4,926 bp were found in these molecules (H, I, J, and O on Figure 2), accounting for 8.2% of the genome (Table 1 and Figure 2). The depths of coverage of these four regions are double compared to the single copy sequences (Supplementary Figures 1-4). Four long fragments with lengths of 9,798 bp (M on Figure 2), 859, 342, and 273 bp in marama mtDNA were identified as homologous to regions of the marama chloroplast genome with similarity ranging from 74 to 98%, covering 7.2% of the chloroplast genome and 2.8% of the mitochondrial genome. These fragments are all on one circular molecule (M5 on Figure 3) and contain some chloroplast genes including psaA, psaB and part of psbC (Supplementary Figure 5). When these chloroplast insertions occurred in the evolution and their role in the mitogenome remains unknown. Homologous recombination between long repeats can reorganize the plant mitochondrial genome structure causing the formation of different subgenomic molecules. Two large circular molecules have been confirmed existing in the marama mitochondrial genome, and they may be able to reversibly transform into the five basic molecules through such recombination (Figure 3). No connection was found between the two large molecules in the mitochondrial genome of the experimental marama individual. 12 long contigs were obtained from the direct assembly of PacBio reads using Canu v2.2 and the contigs were aligned to the structure units of LS1 and LS2 using BLAST to verify the connections as shown in Supplementary  Figures 6-17. However, whether they can further combine to form one large circular molecule like the mtDNA of other legumes including Millettia pinnata, Vigna radiate, and Glycine max (Alverson et al., 2011;Kazakoff et al., 2012;Chang et al., 2013) in other marama individuals still needs to be determined. In addition, from the PacBio data, the ratio of five basic rings to the two large molecules was found to be approximately 1:1 (Figure 2), but the sample size is very small and more PacBio data or qPCR amplifications at the junctions are required to verify the ratios.
The total length of the marama mitochondrial genome is 399,572 bp, which is slightly shorter than the mitochondrial genomes of some other Fabaceae, 402,558 bp for Glycine max, 401,262 bp for Vigna radiate and 425,718 bp for Millettia pinnata (Alverson et al., 2011;Kazakoff et al., 2012;Chang et al., 2013;Negruk, 2013), but larger than Arabidopsis thaliana (366,924 bp) (Unseld et al., 1997). The GC content is 44.71%, close to 44.8% of Arabidopsis thaliana (Unseld et al., 1997), 45.4% of Millettia pinnata (Kazakoff et al., 2012), and 45.1% of Vigna radiata (Alverson et al., 2011). 35 protein coding genes, 3 rRNA genes, and 15 tRNA genes were identified from the assembled sequences using MITOFY and AGORA organelle genome annotation platforms (Table 3) and the positions are shown in Figure 4. Four protein coding genes rpl2, rps2, rps11, and rps13 are completely missing from the mitochondrial genome of marama (Table 4). Two protein coding genes rps7 and rps19 are present as pseudogenes since each contains one internal stop codon. Genes atp1, atp8 and one open reading frame (ORF) of nad6 have two copies since they are on long repeats H, J, and I separately. The two copies of atp1 have ORFs of different lengths since both the stop codons are outside the repeated sequence (Supplementary Figure 18). Whether marama mtDNA recombination and structural variation have altered the expression of these genes remains to be determined. Genes including nad1, nad2, nad4, nad5, nad7, and ccmFc contain introns as shown in Figure 4. The tRNA gene trnfM-CAT has two copies in the marama mitochondrial genome, while it has four copies in the mitogenomes of soybean (Chang et al., 2013). Some genes have been lost during angiosperm evolution ( Table 4). For example, rpl10 is absent from the mitochondrial genomes of Millettia pinnata and Vigna radiata, however, it is present and functional in marama (Alverson et al., 2011;Kazakoff et al., 2012). Genes including cox2 and rps1 are missing from the mtDNA of Vigna radiata and Lotus japonicus, but remain intact in marama, Millettia pinnata, and Glycine max mtDNA (Kazakoff et al., 2012;Chang et al., 2013). sdh4 exists as a pseudogene in Millettia pinnata and soybean, but it was found to be functional in the mitogenome of marama (Kazakoff et al., 2012;Chang et al., 2013). A phylogenetic tree was drawn based on 29 chloroplast protein coding genes to study the relationship between marama and other related plants (Kim and Cullis, 2017). A similar phylogenetic study can be carried out in the future according to the mitochondrial genes mentioned above to further verify the taxonomic relationship between marama and other Fabaceae, which will provide an in-depth understanding of how marama has evolved.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ncbi.nlm. nih.gov/genbank/, OK638188 and OK638189; https://www.ncbi. nlm.nih.gov/Traces/study/?acc=PRJNA779273, PRJNA779273.

AUTHOR CONTRIBUTIONS
JL carried out the bioinformatics assembly and drafted the manuscript. CC provided extracted DNAs, involved in assembly of mitochondrial genome, and assisted in writing and editing the manuscript. Both authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. URL: www.cyverse.org and CWRU biology department teaching resources.