Comparative genomics analysis of endangered wild Egyptian Moringa peregrina (Forssk.) Fiori plastome, with implications for the evolution of Brassicales order

Moringa is a mono-genus belonging to the Moringaceae family, which includes 13 species. Among them, Moringa peregrina is plant species native to the Arabian Peninsula, Southern Sinai in Egypt, and the Horn of Africa, and comprehensive studies on its nutritional, industrial, and medicinal values have been performed. Herein, we sequenced and analyzed the initial complete chloroplast genome of Moringa peregrina. Concurrently, we analyzed the new chloroplast genome along with 25 chloroplast genomes related to species representing eight families in the Brassicales order. The results indicate that the plastome sequence of M. peregrina consists of 131 genes, with an average GC content of 39.23%. There is a disparity in the IR regions of the 26 species ranging from 25,804 to 31,477 bp. Plastome structural variations generated 20 hotspot regions that could be considered prospective DNA barcode locations in the Brassicales order. Tandem repeats and SSR structures are reported as significant evidence of structural variations among the 26 tested specimens. Furthermore, selective pressure analysis was performed to estimate the substitution rate within the Moringaceae family, which revealing that the ndhA and accD genes are under positive selective pressure. The phylogenetic analysis of the Brassicales order produced an accurate monophyletic annotation cluster of the Moringaceae and Capparaceae species, offering unambiguous identification without overlapping groups between M. oleifera and M. peregrina, which are genetically strongly associated. Divergence time estimation suggests that the two Moringa species recently diversified, 0.467 Ma. Our findings highlight the first complete plastome of the Egyptian wild-type of M. peregrina, which can be used for determining plastome phylogenetic relationships and systematic evolution history within studies on the Moringaceae family.


Introduction
The Moringaceae is dicotyledonous flowering plants family consisting of 13 species (Olson, 2002). The family is commonly known as the horseradish or drumstick family (Abd Rani et al., 2018). It is classified under the morphologically diverse order Brassicales together with Brassicaceae, Salvadoraceae, Capparaceae, Cleomaceae, Pentadiplandraceae, Akaniaceceae, and Caricaceae (Group et al., 2016;Leite and Castilho, 2017). Species of the Moringaceae family have a wide distribution mostly in the tropical and sub-tropical regions occurring mainly in India, China, mainland Africa and the Arabian Peninsula (Fahey, 2005;Amaglo et al., 2010;Moerman, 2013;Abd Rani et al., 2018). Moringa species are primarily shrubs or subshrubs with opposite leaves, tuberous roots, and zygomorphic flowers (Olson and Committee, 1993). Moringa oleifera species are recorded as native to India, Pakistan, and Nepal (Padayachee and Baijnath, 2012;Liu et al., 2019), whereas M. peregrina (Habb El Yasar) is reported to be an Egyptian native tree with a wide biogeographic distribution along the Red Sea coast up to Sinai Mountains in the Arabian desert (Boopathi and Abubakar, 2021). Recently, the Moringa genus gained attention due to the rich pharmaceutical traits used against more than 300 health disorders (Ebert et al., 2019). In previous studies, M. oleifera has primarily been used in terms of phytochemistry (El Sohaimy et al., 2015;Jaja-Chimedza et al., 2017), and some molecular studies were solved M. oleifera using genetic markers (Barbosa et al., 2019;Sodvadiya et al., 2020). The unique seed oil composition (Ben oil) extracted from M. peregrina has gained scientific interest (Al-kahtani, 1995).
Egyptian M. oleifera and M. peregrina extracts were reported to have anti-cancer and anti-microbial phytochemicals according to GC-MS technology results (Mansour et al., 2019). In addition, its essential oils were reported to have rich anti-cancer components in a study on the treatment of zebrafish embryos with their seed extracts (Elsayed et al., 2020). Additionally, bark extract was estimated to enhance the ratio of B6 and cytochrome genes in under-experienced rats (Rizk et al., 2021). Recently, Menon et al., 2021, reported that M. peregrina leaf extract is rich in anti-inflammatory and antioxidant components, which can improve prostatic hyperplasia in rat glands.
Previous studies onMoringa species have used two morphological parameters: The flower's irregular and winged seeds (Verdcourt, 1986) and habitats (Olson and Carlquist, 2001). Previous taxonomic studies classified Moringaceae and Caricaceae families into the subclade for the higher order Capparales (Karol et al., 1999). Recently, both families were classified as families belonging to the order of Brassicales, counted as highly diversified and advanced among angiosperms, comprising 17 families (Group, 2009). However, M. oleifera and M. peregrina are phylogenetically closely related (Verdcourt, 1986;Olson and Carlquist, 2001). However, M. peregrina has significant medicinal value.
In a molecular phylogenetic study, M. oleifera and M. peregrina showed a close relationship based on the genes of seven Moringa species (Abdel-Hameed, 2015). Additionally, phylogenetic analyses using rbcL showed a close relationship between the sister families of the Moringaceae and the Caricaceae (Karol et al., 1999). However, a limited number of studies have focused on systematic analyses in Brassicales. Next-generation sequencing technology (NGS) has made available various genomic resources during the past 2 decades, including the complete cpDNA sequence. The plastome has lower substitution rates than the mitochondrial and nuclear genomes used in several systematic plant studies (Zhang et al., 2012;Asaf et al., 2017). In addition, plastome sequences have become a cheap and practical resource for non-model genotypes and for predicting complete phylogenetic attitude and gene evolution (Asaf et al., 2016;Khan et al., 2019). At present, about 6,500 plastomes of pharmaceutically and commercially important species are currently available in the chloroplast genome database (CPGDB) (Singh et al., 2020). Among them, a limited number of Brassicales order plastomes had available sequences in the National Center for Biotechnology Information (NCBI) databases. However, the Moringaceae are a complex family with uncertainties in classical taxonomy due to floral and morphological similarities (Olson and Carlquist, 2001). Regardless of the undisputed uniqueness of the family, the phylogenetic dataset is still a rarity in its species. The only reported plastome in the National Center for Biotechnology Information (NCBI) database among the 13 species of the Moringaceae family is that of M. oleifera (Lin et al., 2019;Yang et al., 2019). To date, there is no study has reported a complete plastome genome sequence for an Egyptian wild Moringa species. Therefore, further comparative research on the Moringaceae family still needs to be conducted.
Here, we report a newly, firstly completed plastome of the endangered wild type of Egyptian Moringa peregrina to improve our understanding of plastome characteristics, structural diversity, and evolution within the Brassicales order. The main objectives of this study were to 1) assemble and annotate the genome structures of Moringa peregrina; 2) reveal structural and size variations and trace the evolutionary patterns of the IR boundaries in the plastomes of M. peregrina and 25 species belonging to eight families of the Brassicales order, as placeholders 3) recognize highly variable hotspots of the plastome for the advanced phylogenetic evolutionary and systematic revision of the Moringaceae placeholders; and 4) assess the phylogenetic framework and finally infer the divergence time estimates and adaptive evolution of species in this section, which has yet to be determined. We selected 24 species due to the limited Moringaceae species studied so far, as M. oleifera has gained all research attention so far (Lin et al., 2019;Liu et al., 2019). Moreover, we aimed to increase the number of species in our analysis with respect to the previous paper, where only 10 species of the Brassicales order were studied (Khan et al., 2021). Therefore, the present genomic information' provides vital genetic datasets to determine phylogenetic relationships, and genome diversity in future genetic evolution-related studies of M. peregrina in complex Brassicales order families as a part of angiosperm plants.

DNA extraction and sequencing
Fresh young leaves of the wild type of M. peregrina were collected from the natural habitat (Saint Catherine, Egypt (Wadi Zaghra) (28°39′05″N 34°189.7″E) in collaboration with seed collection team of Desert research center, Cairo, Egypt. Leaves were dried using silica gel (Wilkie et al., 2013). About 0.5 g of dried leaves was used for DNA extraction using a modified CTAB protocol (Doyle, 1991). Sequencing was performed by Novogene company (Beijing, China) using the Illumina paired-end technology platform.

Selective pressure analysis
The substitution rate within Moringa species was estimated using 74 protein-coding genes from the cp plastomes of two Moringa representative species (M. peregrina and M. oleifera). The regions were extracted using Phylosuite ver. 1.2.2 . Stop codons were manually cut and removed, and the gap within the sequences was removed using Gblocks implanted in Phylosuite. MAFFT ver.7, included in Phylosuite ver. 1.2.2, was used to align the combined files after stop codons and gaps were removed. We manually converted the aligned files by saving them in AXT format. The non-synonymous (Ka), and synonymous (Ks) rates of the 74 protein-coding genes, as well as the Ka/Ks ratio of each region, were calculated using Ka/Ks calculator ver. 2.0, and maximum likelihood methods Ma and Ms were selected (Li et al., 2009).

Phylogenetic analysis
The present study used 34 plastomes belonging to the Brassicales order for plastome phylogenetic relationship determination. According to a previous study, the two Gossypium species were used as out-groups (Khan et al., 2021). Further, 54 protein-coding sequences (CDSs) were aligned in groups with MAFFT ver. 7.313 (Supplementary Table S1) (Katoh and Standley, 2013). MAFFT was applied to compare gene arrangement and reveal missing gene alignment after multiple structure annotation and was utilized to estimate pairwise-sequence variations. Furthermore, to evaluate phylogenies across species, we employed the maximum likelihood (ML) and Bayesian inference (BI) models described using IQ-tree ver.1.6.8 and Bayesian (MrBayes ver.3.2.6) merged into Phylosuite ver.1.2.2 . Model Finder was used to estimate the fittest model using the Bayesian information criterion (BIC) (Kalyaanamoorthy et al., 2017). The fittest model for ML analysis was GTR + R3+F, while that for Bayesian analysis was JTT + F + R3. Moreover, the JTT + F + R3 models and GTR + R3+F were run for 1,000 rapid bootstrapping replicates using ultrafast bootstraps (Minh et al., 2013) and the Shimodaira-Hasegawa like approximate likelihood-ratio test (Guindon et al., 2010). Finally, Figtree ver.1.4.2 (Drummond et al., 2012) was used to visualize the phylogenetic topology results.

Divergence time estimation
Divergence date estimates were evaluated using Bayesian methods under a relaxed molecular clock to account for rate variation among lineages (Drummond et al., 2012). In BEAST ver. 1.8.0, we used an uncorrelated relaxed lognormal model of rate evolution to simultaneously estimate phylogeny and divergence times. To account for the reasonable assumption that the calibrated node 0) fossil was not older than the fossil's first recorded age, fossil calibrations were constrained as follows: Normal priors, mean, 125; std 1.0; and the 95% upper limit equal to the stratigraphic age plus 10%. (Ho and Phillips, 2009). To constrain the age of Brassicales, the second node was calibrated using a secondary calibration (lognormal priors; offset 114; Mean 0.5; std 1.0 (Cardinal-McTeague et al., 2016). To obtain the exact diversification period of Moringa peregrina and Moringa oleifera, we used BEAUti ver. 1.8.0 to obtain a partitioned 54-gene data set to generate XML files. The rate of molecular evolution and rate variation parameters were estimated using an uncorrelated relaxed clock model. The tree model was tested using the Yule process of speciation (Yule, 1925), which began with a randomly generated tree. Brassicales were given uniform height priors ranging from 0 to 125 Mya, implying that those nodes could not be older than the earliest recorded evidence of eudicot fossils (Brenner, 1996;Sun et al., 2011). The node' prior time constraints were chosen using the lognormal distribution of mean and standard deviation set at the mean and median limits, and the GTR +1 + G substitution model was set as the nucleotide substitution model. BEAST2 in XSEDE ver. 1.8.0 on the CIPRES web server was used to estimate the dating time. The MCMC analysis was run for 10 million generations with sampling every 1,000 generations. Further, Tracer ver. 1.7.1 was used to evaluate the runs for convergence and ESS. If convergence and ESS (>200) were met, runs were combined using Log Combiner ver.1.8.0 after a 25% (10 million generations) burn-in and summarized with Tree Annotator ver. 1.8.0 to produce a maximum clade credibility tree with height median ages. The tree result file was visualized

Comparative analyses of Moringa peregrina plastome structural features
The plastome assembly of M. peregrina exhibits a doublestranded circular DNA molecule of 160,600 bp in length. It has a quadripartite natural structure that comprises LSC (88,577 bp) and SSC region (18,883 bp) divided by two inverted repeats (26,250 bp) ( Figure 1; Table 1). The plastome sequence of M. peregrina consists of 131 genes, comprising 87 protein-coding genes, four rRNA genes; and 30 genes, encoding tRNA, that are duplicated (Table 1; Figure 1); (9 large and 12 small ribosomal subunits; four DNAdependent RNA polymerases; and 10 genes decoding other proteins (Table 2). In detail, 18 genes containing determined introns in M. peregrina plastome comprise seven tRNA genes and 11 proteincoding genes, whereas the ycf3 and clpP genes have two introns (Table 2). The largest intergenic exon region belongs to the tRNA-UUU gene (2,541 bp), while the smallest exon region is that of trnL-UAA gene (Table 2). When comparing the 26 related plastomes, four common regions include a small single-copy (SSC) part, large single copy (LSC) part, and copy inverted repeat (IR) parts (Table 3). The plastome structure length ranges from 152,860 bp (Brassica napus) to 160,600 bp (M. oleifera and M. peregrina). All related plastomes show a typical quadripartite structure, consisting of a couple of IR regions (25,804-31,477 bp) divided by the LSC region (83,030-88,749 bp) and the SSC region (9.631-18,883 bp) (Table 3). Moreover, the total number of genes ranges from 127 to 132, and protein-coding genes range from 73 to 87. In addition, tRNA genes range from 35 to 38, and rRNA genes contain eight genes; and only A. tetracanthahas 7 (Table 3).

IR junction variations in Moringa peregrina plastome
In most angiosperm plastome structures, the IR regions are the most conserved ones in whole-plastome structural regions. There is

FIGURE 1
The map of M. peregrina plastome. The genes are classifield into the circle in the clockwise direction for inside genes and the anticlockwise direction for outside genes. Every color belongs to the same gene function group. IRa & IRb extant shown in the thick lines, which divided plastome sequence with LSC and SSC.
Frontiers in Genetics frontiersin.org 04 a positive correlation between IR length and plastome length. According to our study, the IR length of M. peregrina is similar to what previously reported angiosperm plastomes. The current study reported a close correlation between four junction regions (JLB, JSB, JSA, and JLA) and the three plastome regions (IR a, and b, LSS, and SSC) in M. peregrina. In addition, when considering the 25 tested specimens with related plastome sequences (Figure 2), the results indicate a disparity in the IR length regions of the 26 species, ranging from 25,804 bp (Tarenaya hassleriana) to 31,477 bp (Capparis spinosa var herbacea). The results show that in Salvadora persica, the rps19 gene was estimated to be 36 bp away from the JLB junction at the end of the LSC region; in contrast, in Pentadiplandra brazzeana the rps19 gene was found to be in the IRa and IRb regions. In the M. peregrina plastome, the ycf1 gene is at the JSA junction, with 4,373 bp in the SSC region and 1,171 bp in the IRa region, while trnH extends by 36 bp from the JLA junction toward the LSC region ( Figure 2). Similarly, the psbA gene was found to be in the LSC region in the studied species. On the other hand, in Pentadiplandra brazzeana, the rpl22 gene is in the IRb region. The position of the rps19 gene in all related plastomes is close to that in A. thaliana except for the cases of Salvadora persica (where it can be found in the LSC region) and Pentadiplandra brazzeana (where it can be found in IRA and IRb regions). Regarding the ycf1 gene its location in the various species was similar to that in A. thaliana except for the cases of Aethionema grandiflorum, Aethionema arabicum, Breschneidera sinensis, Crateva tapia, and Salvadora persica (where the gene is located at the JSB junction), and M. oleifera (where it is absent). Likewise, the psbA gene was found to be
Frontiers in Genetics frontiersin.org 05 in the LSC region. In contrast, in Pentadiplandra brazzeana the rpl22 gene is in the IRb region. Finally, the position of the rps19 gene in all related plastomes is close to that in A. thaliana except for the cases of Salvadora persica (where it can be found in the LSC region), and Pentadiplandra brazzeana (where it can be found in IRa and IRb regions).

Repeats and codon bias in related species
In this section, we present the results of codon usage analysis; the findings are summarized in (Figure 2; Supplementary Table S1). The results indicate that 20 amino acids can be transported for protein biosynthesis by tRNA in the M. peregrina plastome structure. Moreover, all the CDSs consist of 26,665 codons; among them, the codons encoding Isoleucine resulted to be the most used, accounting for 8, 75% of total usage, while the codons encoding Cysteine resulted to be the least used, accounting for 1.19% of total usage in the M. peregrina plastome structure. Additionally, as the number of amino acid-encoding codons increases an inevitable increase in the value of RSCU (shortening of relative synonymous codon usage) is also observed, as (Figure 4) shows. Remarkably, the most significant amino acid codons were found in AU(T)G and U(T)GG, encoding methionine and tryptophan, respectively.
Upon future analysis, the repeat structures in the 26 plastome species comprised various long repeats, including palindromic, forward, reverse, and complement repeats ( Figure 5. A). The results indicate that M. oleifera and M. Peregrina contain the same number of repeat types, i.e., 26 forward-22 palindromic-, and two reverse-type repeats ( Figure 5. A). The forward-type repeats were found to be range between 11 repeats (in S. persica and Capsella grandiflora) and 29 repeats (in Brassica nigra and A. grandiflorum), while the palindromic-type repeats were found to be range from 13 repeats (in R. aucheri) to 27 repeats (in Caricaceae species). The highest number of reverse-type repeats was observed in R. aucheri, and the most abundant complement-type repeats were reported in C.s. var spinosa ( Figure 5A). The present study investigated the appearance and types of s sequence of M. peregrina and 25 related species (Figures 5B, C). The SSR types in the 26 specimens of the plastome sequences tested encompass several types of SSR repeats, including mono, di, tri, tetra, penta, and hexa nucleotides. On average, the number of mononucleotide motifs found was between 112 (Capparis spinosa var spinosa) and 52 (Azima tetracantha). Dinucleotide was found to range from 6 (in R. cretica) to 19 (S. persica). The most numerous repeat types of tri and tetranucleotides recorded were 8 (in T. hassleriana) and 22 (in Carica papaya). The most numerous penta-and hexanucleotides were found in R. aucheri. Meanwhile, A/T mononucleotide repeats were found to be the most aboundant across all the plastomes of the related species ( Figure 5C). The results indicate that the SSRs among these plastome sequences mainly combine poly-A and poly-T repeats. Hence, they comprise most of the AT multitude in the plastomes of the 26 species.

Selective pressure analysis
The rates of synonymous (Ka) and non-synonymous (Ks) substitution were calculated using a total of 74 regions (proteincoding genes) extracted from the plastomes of M. peregrina and M. oleifera. In all the extracted regions, the Ks value of ndhA (0.00426136) was the highest, and the Ka value of accD was slightly the highest among all (0.000000325799). Ka/Ks was also evaluated to determine the effectiveness of selective pressure imposed on specific genes. The Ka/ Ks value of shared protein-coding genes suggests that there has been evolutionary pressure to conserve the ancestral state (adverse selection) (see Figure 6); (Ka/Ks = 1, neutral selection; Ka/Ks < 1, purifying selection; and Ka/Ks > 1, positive selection) (IPNI, 2022). Purifying selection is standard in many protein-coding regions. These results indicate that all the genes had a Ka/Ks ratio less than one ( Figure 6). Finally, our results suggest that the protein-coding genes in plastomes of different plant species were exposed to diverse selection pressures.

Phylogenetic analysis
The phylogenetics of M. peregrina was re-established using 54 shared protein-coding genes from eight family representatives in the order of Brassicales to determine the structure position of M. peregrina (Figure 7). The results indicate that all the families in the Brassicales order have been compressed in the sister clade of our phylogenetic tree (Akaniaceae, Caricaecea, Salvadoraceae, Pentadiplandraceae, Capparaceae, Cleomaceae, Brassicaceae, and Moringaceae), except for the Malvaceae family, which belongs to the Malvales order. Additionally, M. peregrina forms a sisterhood with M. oleifera, as determined using high bootstrap support with the Bayesian method. Moreover, the current study suggests that the Moringaceae family is close to the Caricaceae and Akaniacaeae families.

Divergence time estimation
Evolutionary divergence dating time was estimated using 34 taxa of the Brassicales order to calculate the estimated age of the wild type  Frontiers in Genetics frontiersin.org 09

FIGURE 3
Plastomes genomic variations for the 26 relative speices using mVISITA software. The above gray axis obtained genes' position and order in plastome structures and LCS, SSC, and IRs regions.

FIGURE 4
The codon Usage Bias of the plastomes structure of M. peregrina in all protein-coding genes.

Moringa peregrina plastome structure and comparative variations
In recent years, attention has been paid to the advent of highthroughput sequencing. These genomic tools are crucial for comparative genomics and genome-wide association studies (GWASs). This technology offers opportunities to determine the phylogenetic relationships among closely related species. Recent surveys show that M. peregrina has come to be under threat, or even on the border of extinction, primarily due to global climate change and desertification caused by extreme drought (Abdel Raouf et al., 2012). Hence, crucial conservation and restoration actions must be taken to conserve the threatened species M. peregrina. To facilitate such purposes, we constructed the first complete plastome structure of wild Egyptian M. peregrina to provide a fundamental genetic resource that can be useful to future research on comparative population genomics. Our results reveal high conservation of the cpDNA genome of M. peregrina in terms of architecture and linear sequence order. Our results show that the exact length of the M. peregrina plastome is similar to the one previously reported in M. oleifera (Lin et al., 2019). The total length of the plastomes of related species is ranges from 153,415 bp to 160,600 bp, and the IR length ranges from 25,804 bp to 31,477 bp. The shortest plastome length was recorded in A. Our results indicate that greater sequence divergence was observed in LSC and SSC, while fewer sequence differences were found in the two IR regions.

IR junction variations among related species
In most angiosperm plastomes, the stabilizing sequences mainly lie in the conservative IR region. The contraction and expansion of the IR region can lead to the creation of pseudogenes (Ni et al., 2016a). Our findings show that the ycf1 gene is located in the JSB region. In contrast, the rpl22 gene is in the IRb region at the LSC/IRb border, while rps19 was detected at the LSC/IRa border in all related plastomes. In addition, rps19 was found to be located in a position close to that in A. thaliana but not in Salvadora persica (where it is located in the LSC region) or Pentadiplandra brazzeana (where it is located in the IRa and IRb regions), possibly due to incomplete duplication and its incapacity to encode proteins. Therefore, the contraction and expansion of the two IR regions represent important evolutionary events responsible for the size differences of the cpDNA genome (Maréchal and Brisson, 2010). This trend is supported by previous results of genome re-sequencing, which revealed that IR regions can be recognized as the key to chloroplast genome evolution, even among narrowly related genera of the same family (Xiong et al., 2009). The results of our comparative analysis of protein-coding gene sequences reveal a wide range of gene numbers. The number of protein-coding genes ranged from 73 (C. s. var. herbacea) to 88 (M. peregrina). The number of tRNA genes was found to be between 36 (M. oleifera and M. peregrina) and 38(T. hassleriana). Further, the number of rRNA genes is similar 8) in all related species; A. thaliana does not follow that phenomenon with seven genes (Seol et al., 2017).

Plastome structural variations among species
The plastome sequences analyzed show great conservation of gene order and composition, as evidenced with mVISTA annotation, which we used to compare the complete plastomes of the species'. It is known that the non-coding genes of the chloroplast genome have diverse signatures. They are responsible for cpDNA genome size variations, which offers superior levels of evolutionary frequency for barcoding and phylogenetic studies at the subspecies level (Amar, 2020). According to our study result, twelve non-coding genes show higher variance than coding genes, agreeing with previous results (Perry and Wolfe, 2002). We detected less hyper-shifting in the coding structures than in the non-coding structures. These findings could be used in future DNA barcode studies; this information on the genetic structure could be helpful in terms of providing parameters for phylogenetic relationships and divergence within various species belonging to the Brassicales order (Edger et al., 2018).

Repeats and codon bias variations
Simple sequence repeats (SSRs) are vital DNA markers because of features, such as high reproducibility, co-dominant inheritance, abundance, multiallelic nature, and comprehensive genome coverage Frontiers in Genetics frontiersin.org (Brake et al., 2022). It is well known that repeat sequences have widespread structures in non-coding and coding regions of the cpDNA genome, playing a vital role in plastome recombination, which is used as a parameter in phylogenetics, population genetics analyses, and evolutionary studies (Cavalier-Smith, 2002;Ni et al., 2016b). In the present study, we used REPuter software and the MISA application to evaluate the repeat sequences in the wholeplastome sequence of M. peregrina, Herein, numerous polymorphic SSR repeat units of between one and six nucleotides were identified in M. peregrina with an average length between 30 and 90 bp, corresponding to other previously recorded angiosperm lengths (Li et al., 2017;Greiner et al., 2019). The frequency of genic SSRs found is higher than what previously reported in other Moringaceae (Kaila et al., 2017). These genic SSRs may be potential genetic markers suitable for phylogenetic and population genetic studies in the Brassicales order (Li et al., 2017). All of the above results show that the plastome of wild-type of M. peregrina involves some highly diverged hotspot regions, which may be suitable for DNA barcoding, phylogenetics, and molecular evolution studies in the Brassicales order. In the same context, codon usage analysis is essential recognizing the selection pressure on genes and the evolution of the plastome genome structure (Yang et al., 2014). In the current analyses, the usage of the codon encoding Isoleucine was the highest in the M. peregrina plastome structure. The results of the RSCU of the codon bias are consistent with research on other angiosperm chloroplast genomes (Asaf et al., 2017;Fan and Ma., 2022). Statistical procedures for performing selective pressure analysis are now applicable to wholeplastome genome data, which aids the identification of alleles that may be vital to our understanding of plant evolution. Overall, the selective pressure analysis results vary considerably across genes. However, the accelerated rate of amino acid substitutions in a particular gene during a particular evolutionary period is evidence that the gene plays an essential role in adaptive evolution in a given species.

Selective pressure analysis in both moringa species
The synonymous (Ks) and non-synonymous (Ka) substitution rates, as well as the corresponding ratio (Ka/Ks): Also known as (dN/ dS), are widely used to calculate nucleotide evolution rates and natural selection pressure (Yang, 2005). Previous research has found that the Ka/Ks ratio is typically less than one (Yang et al., 2019), owing to synonymous nucleotide substitution rates that are higher than non-synonymous substitution rates. In this study, for all genes the ratio was below one, indicating purifying selection and positive selection (Wu et al., 2020;Zhao et al., 2020). Our results show that two genes named ndhA and accD, have the highest Ka/Ks variability due to the highest ratio of transitions to transversions. These genes may play an essential role in shaping plastome evolution, as the substitution of transition to transversion commonly affects amino acid mutations in the chloroplast genome (Amar, 2020). The functions of the above genes' are mainly related to subunits of NADH dehydrogenase and the Acetyl CoA-carboxylase subunit and cytochrome synthesis. They may be under large positive selection due to specific environmental conditions (Huang et al., 2022). Indeed, both genes can be used as candidate barcodes for different species and to perform phylogenetic framework and systematic revision in future studies.

Phylogenetic analysis
Earlier studies of Brassicales families have shown an adequate comprehension of phylogenetic relationships among families as established with molecular, morphology, and taxonomic analyses; however, the arrangement of the Moringaceae has not been fully

FIGURE 7
Phylogenetic topology for 34 complete plastome of Brassicales order species and two outgroups depend on 54 protein-coding genes among all plastome structures, used maximum likelihood analysis with bootstrap estimated branches.
Frontiers in Genetics frontiersin.org 13 determined (Karol et al., 1999;Olson and Carlquist, 2001;Lin et al., 2019). Compared with fragment DNA markers, plastomes have been shown to provide sufficient phylogenetic signals, which are important for determining deep relationships of plant lineage (Gitzendanner et al., 2018). A recent phylogenetic analysis used a limited number of species and gene regions as Moringaceae placeholders (Lin et al., 2019;Yang et al., 2019). In general, the topological structure of the five phylogenetic trees was similar to the tested accessions but with slight resolution. Our plastome phylogenetic studies determined eight families representing the Brassicales order with high support. As seen in our chloroplast phylogenomic tree the Cleomaceae and the Capparaceae form a particular monophyletic clade. El Zayat et al. (2020) clarified the complex interactions among them, supporting the undisputed viewpoint that the Capparaceae are very closely associated with the Cleomaceae in a sister clade and seem to be distinguished. These results might be applied in systematic and evolutionary biology studies in the Capparis and Cleome. We observed that the Caricaceae and the Moringaceae are clustered with the Akaniaceae, suggesting that they could have a common ancestor, which is in agreement with previous studies (Olson, 2002;Lin et al., 2019;Khan et al., 2021). Our results establish M. peregrina and M. oleifera as monophyletic clades in the Brassicales order, suggesting that the introgression of the wild types into the chloroplast genome cultivated Moringa might have occurred. Indeed, compared with previous analyses on a small quantity of plastome structure fragments, our findings clarify the complex interactions among the narrowly related species of the Brassicales order, providing an effectively determined phylogenetic relationship (Lin et al., 2019;Khan et al., 2021).

Divergence time estimation
In this study, we selected two reliable fossils representing the early dating time of the Brassicale's order, which was welldocumented in previous studies on angiosperm tree age. Our crown age estimation of Brassicales infers values similar to those reported in previous studies ranging from 121 to 114 Ma (Beilstein et al., 2010;Edger et al., 2015;Magallón et al., 2015;Cardinal-McTeague et al., 2016). Previous lines of evidence place the Caricaceae and the Akaniaceae with the Moringaceae and within the Brassicales order. This concept has gained much acceptance and support the evidence recommended by Beilstein et al. (2010); a hypothesis is that these placements have fueled speculations that genome-doubling events are linked with diversification in the Brassicaceae. The time divergence period of the Moringaceae was estimated to be around the Oligocene-Paleocene periods, which are linked to the development of Brassicales phylogeny in angiosperms (Kagale et al., 2014;Rockinger et al., 2016). Our results suggest that both genome structure and developmental processes have evolved slower than appreciated. The evolutionary history of Moringa peregrina and Moringa oleifera and their neighborhood here described could allow a more precise application of our understanding of this model organism to other flowering plants in the Brassicales order to be achieved. Finally, the estimated age of Moringa peregrina and Moringa oleifera (0.476 Ma) is older than previously estimated (Cardinal-McTeague et al., 2016), possibly resulting from our use of multiple analyses or dense taxon sampling within the Brassicales order. Furthermore, using additional species from the Moringaceae family and Brassicales order families is essential to better calculate age estimation, evolution history, and phylogenetic framework. Based on the above considerations, our

FIGURE 8
Phylogenetic chronogram showing the evolutionary dating time of order Brassicales using 34 taxa. The tree was estimated using Bayesian analysis of 54 protein-coding genes in the MCMC tree. The number in the circle in red relates to our two nods of interest.
Frontiers in Genetics frontiersin.org findings could be useful to future analyses of the whole-plastome sequences and shared protein-coding genes in all referred species. The present study provides the first phylogenomic framework containing the Egyptian wild-type M. peregrina plastome.

Conclusion
The complete plastome structure of the Egyptian wild type of M. peregrina is presented here. The study of the M. peregrina plastome, together with those of 25 related species belonging to eight families in the Brassicales order, revealed variations in their plastome sequences and composition. The comparative genomic variations were estimated producing data about the Brassicales order by studying plastome diversity-related structures and providing knowledge to understand plastome structural polymorphisms. Useful genomic markers such as SSR repeat sequences and codon usage bias may be helpful in DNA barcoding and evolutionary studies in the Brassicales order, which has yet to be determined. The phylogenetic tree derived from 54 shared protein-coding genes (CDSs) within 34 species reveals that M. peregrina and M. oleifera form a sisterhood relationship this could be a helpful phylogenetic framework for future studies. The time evolutionary tree suggests that the two Moringa species diversified at 0.467 Ma. The collected comparative genetic information, phylogeny, and time diversity estimation here reported provide novel insights into the plastome evolution of M. peregrina within the Brassicales order.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ncbi.nlm.nih.gov/, ON855355.