- 1College of Horticulture, Shenyang Agricultural University, Shenyang, China
- 2National Field GeneBank for Hawthorn, Shenyang, China
Crataegus spp. plants are valuable horticultural crops because of their extensive use in Chinese herbal medications, cosmetics, food production, and other industries. However, the wide variety of species, similar morphological characteristics, inherent hybridization, apomixis, and polyploidy have led to confusion in terms of their taxonomic status. Herein, a total of 18 complete chloroplast genomes including 17 Crataegus species and 1 Mespilus species were newly sequenced and comprehensively analyzed for comparative genomics and phylogenetic relationships. The 18 chloroplast genomes possessed typical quadripartite structures with lengths from 159,638 to 159,973 bp in size. These chloroplast genomes encode 119–131 genes, including 37 transfer RNA (rRNA) genes, 8 ribosomal RNA (tRNA) genes, and 74–85 protein-coding genes (PCGs). In addition, 23–54 long repeat sequences and 74–87 simple sequence repeats (SSRs) were detected. The examination of Ka/Ks ratios for 18 chloroplast genomes revealed that the rpoC2 gene was significantly positively selected. Additionally, we identified nine distinct hotspot regions (infA, ndhC, pasl, rps19, ndhC~trnV-UAC, psbZ~trnG-UCC, rpl33~rps18, trnH-GUG~psbA, and trnR-UCU~atpA), and verified that ndhC~trnV-UAC might be used as a foundation for subsequent molecular marker studies aimed at identifying Crataegus species. Maximum likelihood and Bayesian phylogenetic trees using chloroplast genome sequences consistently revealed genetic relationships among Crataegus and Mespilus species, and confirmed the taxonomic status of Crataegus accessions (GSSZ, JRY, RR2H, RR3H, ZWSZ). The results of divergence time showed that the crown age of C. subg. Crataegus was about 33.487 Ma, and then started to diverge into the C. subg. Americanae and C. subg. Sanguineae around 27.059 Ma. Based on the results of molecular evidence, we speculate that genus Crataegus originated earliest from European-derived species within C. subg. Crataegus. Biogeographic and molecular dating analyses suggested that China represented a putative maternal origin of Crataegus species. The complete chloroplast genomes of Crataegus not only enable the resolution of phylogenetic relationships within the genus but also offer novel insights into chloroplast genome structure variation and evolution. Additionally, the identified divergent DNA regions hold significant utility for species identification and phylogenetic reconstruction in Crataegus.
Introduction
Crataegus spp. plants are valuable horticultural crops due to their extensive use in Chinese herbal medications, cosmetics, food production, and other industries (Liu et al., 2020; Liang et al., 2022; Zhang et al., 2025). Crataegus spp. plants have been used for centuries as traditional medicines and herbal drugs (Rocchetti et al., 2020). More than 150 substances, such as polysaccharides, phenolics, and flavonoids, have been extracted from its leaves and fruits and have been used to treat hypertension and cardiovascular and cerebrovascular diseases (Tassell et al., 2010; Cloud et al., 2020; Feng et al., 2022). The leaves and flowers of Crataegus sp. can also be used to make nanocapsules, which are widely used in the pharmaceutical, cosmetic, and fragrance industries (Esmaeili et al., 2013). Furthermore, the xylan in Crataegus kernels can be converted into products such as xylose and xylooligosaccharides (Liu et al., 2020).
In the northern temperate zones of North America, Europe, and Asia, Crataegus species, which are members of the Rosaceae family, are extensively distributed (Dong and Li, 2015). Because of its natural hybridization, apomixis, polyploidy, and similar morphological traits, Crataegus is a challenging species to identify. The genus Crataegus contains more than 200 species (Phipps et al., 1990; Campbell et al., 2007; Benli et al., 2008). China is the primary origin of both cultivated and wild Crataegus species, with eighteen species and six varieties (Zhao and Feng, 1996). However, some researchers claim that there are twenty species and seven varieties of Chinese Crataegus (Dong and Li, 2015).
Morphological traits have been identified as significant indices in the identification of Crataegus species (Dickinson et al., 1996). Nonetheless, the conventional classification of Crataegus according to morphological characteristics has been contested and is influenced by environmental factors (Gosler et al., 1994). Most Crataegus plants native to China have corymbs and contain a single white flower. Thus, the classification of Crataegus plants is based mainly on the morphological characteristics of the leaves and fruits. For example, both C. chlorosarca and C. jozana have black and spherical fruits (Figure 1). The fruits of C. dahurica and C. sanguinea share similar morphological traits: both are subspherical and exhibit an orange-red or orange-yellow color. Their leaf morphologies are also comparable (Figure 1). In addition, the taxonomic status of the Crataegus accessions (ZWSZ, GSSZ, RR3H and RR5H) could not be determined based on the morphological traits of leaf and fruit; in the phylogenetic tree constructed using nuclear Simple sequence repeats (nSSR) markers and specific locus amplified fragment sequencing (SLAF-seq) data, these accessions showed a closer genetic relationship with C. maximowiczii and C. sanguinea (Du et al., 2019). Therefore, clarifying the phylogenetic and taxonomic relationships among Crataegus species and establishing a standardized identification system are of great significance for related research and applications.
Molecular markers are among the most accurate ways to determine the genetic links of entire plant populations (Güney et al., 2018). Several innovative DNA-based markers have been rapidly created for recognizing the Crataegus genome and investigating genetic variability within and between wild landraces of this species. These marker techniques include simple sequence repeats (SSRs) (Khiari et al., 2015), intersimple sequence repeats (ISSRs) (Sheng et al., 2017; Tunç et al., 2025), randomly amplified polymorphic DNA (RAPD) (Yilmaz et al., 2010), and start codon targeted polymorphism (SCoT) (Sagbas et al., 2023; Mohammed et al., 2025). Owing to inadequate information and coverage, these molecular markers may not be sufficient for high-resolution genetic studies of Crataegus plants.
Chloroplasts have lived on Earth for billions of years by offering carbohydrates, amino acids, lipids and O2 to humans through the process of photosynthesis (Daniell et al., 2021). Following the discovery of organellar DNA, the first complete chloroplast genome was published in 1986 (Shinozaki et al., 1986). Notably, the chloroplast genome features a low mutation rate (Green, 2011), as its synonymous nucleotide substitution rate (dS) in angiosperms is three-fold higher than that of plant mitochondrial DNA; in contrast, the structural variability of the chloroplast genome is substantially lower than that of both mitochondrial and nuclear genomes (Wang et al., 2024). Structural variations (e.g., insertions, inversions, deletions) in chloroplast genomes modulate genome size and gene content through processes including gene duplication and pseudogenization (Daniell et al., 2016; Qin et al., 2025). These features have made the chloroplast genome become the primary resource for studies into plant phylogeny and evolution (Cauz-Santos, 2025). As next-generation sequencing technology has advances, an increasing number of researchers are utilizing chloroplast genome data to elucidate the evolutionary relationships among various plant species (Wu et al., 2024; Jiang et al., 2025; Li et al., 2025a, 2025, 2025; Xia et al., 2025; Yan et al., 2025).
Owing to frequent hybridization, parthenogenesis and other factors, the classification of subfamilies, tribes and genera of Rosaceae plants remains controversial. The phylogenetic tree of 79 genera and 132 species of Rosaceae, including two Crataegus species (C. pinnatifida var. major, C. chungtienensis) was reconstructed via the chloroplast phylogenomic method, and the phylogenetic relationships among its subfamilies, tribes and genera were successfully analyzed (Zhang et al., 2017). Recent studies proposed a comprehensive subgeneric classification of Crataegus, including five subgenera: C. subg. Crataegus, C. subg. Americanae El Gazzar, C. subg. Brevispinae (Beadle) Ufimov & T. A. Dickinson, C. subg. Mespilus (L.) Ufimov & T. A. Dickinson, and C. subg. Sanguineae Ufimov (Ufimov and Dickinson, 2020). Chinese Crataegus species are classified into two subgenera: C. subg. Crataegus and C. subg. Sanguineae (Gu et al., 2003). Chloroplast genome sequencing has been completed for several Crataegus plants native to China, including C. scabrifolia, C. chungtienensis, and C. oresbia (Wu et al., 2022), C. pinnatifida var. major (Wu et al., 2021), C. bretschneideri, C. maximowiczii, C. maximowiczii var. ninganensis (Zhang et al., 2022), C. rhipidophylla from Europe and C. mollis from North America (Liu et al., 2022). The comprehensive chloroplast genome data will enhance comparative genomic research and serve as a valuable tool to clarify the interspecific relationships of Crataegus species worldwide.
In this study, we selected 17 Crataegus species (encompassing 7 identified species and 5 Crataegus sp. taxa indigenous to China, 3 North American species, and 2 European species) as well as one Mespilus species, aiming to explore the structural variability and genetic diversity of chloroplast genomes from a comparative genomic perspective. To this end, we systematically integrated analyses of relative synonymous codon usage (RSCU), repetitive sequence identification, and selective pressure evaluation to comprehensively characterize the chloroplast genomic features of Crataegus species. Specifically, we screened and experimentally validated highly conserved intergenic regions as universal DNA barcodes for species discrimination. In addition, we accurately estimated the species divergence times, reconstructed the patterns of their ancestral geographic distribution, and thereby inferred the evolutionary pathways of Crataegus species. Collectively, our results will provide comprehensive genetic and evolutionary insights into these genera, and offering valuable tools for future taxonomic, ecological, and conservation research.
Materials and methods
Plant materials and chloroplast genome sequencing
Eighteen individuals of Crataegus and Mespilus species, including C. altaica, C. chlorosarca, C. crus-galli, C. dahurica, C. jozana, C. laevigata, C.× lavalleei, C. monogyna, C. phaenopyrum, C. sanguinea, C. shensiensis, C. songarica, Crataegus sp. (GSSZ, JRY, RR2H, RR3H, ZWSZ), and M. germanica were cultivated at the National Field Genebank for Hawthorn, Shenyang, Liaoning Province, China (41°84′N, 123°56′E) (Supplementary Table S1). Fresh and healthy leaves were collected and stored at –80 °C for chloroplast genome sequencing. The chloroplast DNA from these leaves was sequenced by Nanjing Genepioneer Biotechnologies (Nanjing, China) via an Illumina NovaSeq 6000 system in paired-end (2 × 150 bp) sequencing mode. The processing of raw sequencing data was based on previous research (Zhang et al., 2022). After the quality control process, high quality reads (clean data) were obtained and stored in the FASTQ format. 51 individuals of Crataegus were also cultivated at the National Field Genebank for Hawthorn (Supplementary Table S2). Fresh and healthy leaves were collected and stored at –80 °C for DNA extraction and candidate DNA barcode sequencing.
Assembly and annotation of the chloroplast genome
De novo assembly of the chloroplast genome was performed using GetOrganelle v1.7.7.1 with gradient k-mer sizes (55, 87, and 121) to balance assembly sensitivity and accuracy (Jin et al., 2020). To ensure the reliability of the assembled genome, two quality control (QC) steps were conducted: first, sequencing reads were mapped back to the assembled genome to calculate key metrics (e.g., genome coverage and insert size; Supplementary Table S1); second, the assembled genome was compared against the reference sequence C. maximowiczii (GenBank accession No. NC065485) for further validation.
The GeSeq web service (Tillich et al., 2017) was used to annotate the chloroplast genome. BLAST was performed on the 18 assembled chloroplast genome sequences via the National Center for Biotechnology Information (NCBI) website (https://blast.ncbi.nlm.nih.gov/Blast.cgi), and the most similar annotations were selected as reference genomes. These reference genomes were then uploaded to GeSeq to annotate the 18 chloroplast genomes. The protein-coding genes (PCGs) of the chloroplast genome were annotated via Prodigal v2.6.3 (https://www.github.com/hyattpd/Prodigal), rRNA was predicted via HMMER v3.1b2 (http://www.hmmer.org/), and tRNA was predicted via aragorn v1.2.38 (http://www.ansikte.se/ARAGORN/).
A schematic diagram of the chloroplast genome with the annotation of large single-copy (LSC), small single-copy (SSC), and inverted repeats (IR) regions was obtained via CPGview (Liu et al., 2023). The chloroplast genome of C. altaica was compared to the other 17 whole chloroplast genomes of Crataegus using CGView Server (Grant and Stothard, 2008). GC distributions were measured based on GC skew using the equation: GC skew = (G-C)/(G + C). The exact boundaries of the IR/LSC and IR/SSC regions were confirmed by alignment with homologous sequences from other Crataegus species. The GC content of each section was calculated via EditSeq (Burland, 2000). The genes on the boundaries of the junction sites of the chloroplast genome were analyzed via IRscope (Amiryousefi et al., 2018).
Relative synonymous codon usage, repeat structure, and microsatellite analysis
PhyloSuite v1.2.2 (Wuhan, China) (Xiang et al., 2023) was used to screen the protein coding genes (PCGs) of 18 Crataegus and Mespilus chloroplast genomes and to calculate codon preference, which was obtained from the actual frequency of codon use to the theoretical frequency. The calculation method of RSCU was based on a previous study (Zhang et al., 2024). The visualization of codon usage bias was platformed by interactive tool https://pcg-lab.shinyapps.io/RSCU-Plot/.
Repeat structures, including forward, reverse, complement and palindromic repeats within the 49 chloroplast genomes (18 newly sequenced chloroplast genomes, 26 Crataegus chloroplast genome datasets retrieved from GenBank in the Supplementary Table S3, and 5 Amelanchier species (MN068257, MN068255, MK920297, MN068262, MK920292) were identified via REPuter (Kurtz and Schleiermacher, 1999). The REPuter parameters were set a minimal repeat size of ≥ 30 bp and a Hamming distance of 3 (90% or greater sequence identity) (Zhang et al., 2022). Tandem repeats were screened via the online program Tandem Repeats Finder v4.07 b (Benson, 1999), and the alignment parameters match, mismatch, and indels were set to 2, 7, and 7, respectively. The minimum alignment scores to report repeats and maximum period size were 70 bp and 500 bp, respectively. Otherwise, single sequence repeats (SSRs) within 49 chloroplast genomes were detected via MISA-web (Beier et al., 2017). When the SSR motif length was 1, 2, 3, 4, 5, and 6, the minimum numbers of repeats in the SSR search parameters were 10, 5, 4, 3, 3, and 3, respectively. The maximum sequence length between two SSRs for registration as a compound SSR was 100 bp.
Sequence divergence and selective pressure analyses
The 18 newly sequencing chloroplast whole-genome sequences were visualized via the mVISTA online software. PhyloSuite v1.2.2 was used to perform the genomes alignments. Concatenated datasets of chloroplast PCGs and intergenetic regions were constructed separately. The frequencies of nonsynonymous (Ka) and synonymous (Ks) substitutions and the Ka/Ks ratio for each PCG generated from 18 Crataegus and Mespilus chloroplast genomes were calculated via the software KaKs_Calculator v3.0 (Zhang et al., 2006).
DNA polymorphism analyses (sliding-window analyses) were performed via DnaSP v6 (Rozas et al., 2017) to determine the nucleotide diversity (Pi) of complete chloroplast genomes, PCGs, and intergenetic regions. The window length was set to 600 bp, with a step size of 200 bp for complete chloroplast genomes.
Candidate DNA barcode screening and sequencing
Based on the results of DNA polymorphism analyses, PhyloSuite v1.2.2 was employed to extract the three intergenetic regions with the highest nucleotide diversity from the chloroplast genomes of 44 Crataegus and Mespilus species. Among these sequences, the intergenic region (ndhC_trnV-UAC), which was present in all 44 chloroplast genomes, was selected for sequencing analysis. We amplified the ndhC_trnV-UAC sequences of 51 individuals of Crataegus. The primers were F-5’-AGACGTACTCCTATTAATG-3’, and R-5’-AAACCTAAAAATTCAAAT-3’. PCR amplification was performed in a reaction mixture with a final volume of 20 μL consisting of 1 μL of template DNA (50 ng), 10 μL of Takara ExTaq® (RR001A), and 2 μL of primers. The PCR conditions were as follows: initial denaturation at 94 °C for 3 min; followed by 35 cycles of 30 s at 94 °C, 30 s at 55 °C, 1 min at 72 °C; and a final extension of 5 min at 72 °C. PCR amplification was carried out in a thermal cycler (Applied Biosystems, USA). PCR products submitted to Sangon Biotech (Co., Ltd., Shanghai, China) for sequencing. All sequences were aligned using MAFFT v7 with the FFT-NS-2 module. The IQ-TREE module in PhyloSuite was used to build a maximum likelihood tree under the TVM+F+R2 model with 5,000 ultrafast bootstraps. The maximum likelihood (ML) trees were visualized via using FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/).
Phylogenetic analyses, divergence time estimation and ancestral area reconstruction
The complete chloroplast genomes of total 49 Crataegus, Mespilus, and Amelanchier species were compared and aligned. The IQ-TREE module v2.2.0 (Minh et al., 2020) in PhyloSuite was used to build a maximum likelihood tree under the TVM+F+R2 model with 5,000 ultrafast bootstraps. The MrBayes module v3.2.7 (Ronquist et al., 2012) under the partition model (nst = 6, rates = invgamma, statefreqpr = Dirichlet (1,1,1,1)) was used to build a Bayesian inference tree. The Bayesian inference (BI) and maximum likelihood (ML) trees were visualized via FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/).
The divergence times of total 49 Crataegus, Mespilus, and Amelanchier species were estimated by BEAST v1.10.4 (Suchard et al., 2018). We assigned the fossils to stem Amelanchier with a minimum age of 33.9 Ma and a mean and standard deviation of 0.5, which was treated as the calibration constraint according to published article (Xiang et al., 2017). The GTR nucleotide substitution model and the prior tree Yule model were selected with an uncorrelated relaxed clock. Each MCMC run had a chain length of 1,000,000, with sampling every 1,000 steps. Tracer (http://beast.community/tracer) was used to read the ESS and trace value of logged statistics to access the results. The divergence time was subsequently accessed via the Tree-annotator program of BEAST2. The settings used were as follows: burn-in percentage = 50, posterior probability limit = 0.1, target tree type = maximum clade credibility tree, and node height = mean height.
Ancestral area reconstruction and assessment of geographic diversification patterns within Crataegus was conducted using BioGeoBEARS (Matzke, 2014) method implemented in RASP v3.2 (Yu et al., 2015). Firstly, the outgroup samples (Amelanchier) were deleted from chloroplast genome datasets BEAST MCMC tree utilizing the outgroup-removal tool in the RASP. The model comparison of BioGeoBEARS in RASP was applied to select the best models. A total of six models calculated from the BioGeoBEARS analysis, and BAYAREALIKE+J was the best model (Supplementary Table S4). Detailed descriptions of the model parameters can be found in the published article (Matzke, 2022; Vargas et al., 2023). The biogeographic data for species within Crataegus was compiled from Plants of the World Online (POWO, https://powo.science.kew.org/), published book and article (Dong and Li, 2015; Meng et al., 2025) Seven biogeographical areas were chosen based on the geographic range: A) South-western China; B) Central Plains and Qinling Mountains of China; C) North-eastern China; D) Mongolia-Siberian region; E) Central and Western Asia; F) Europe; G) North America.
Results
Chloroplast genome characteristics of Crataegus and Mespilus species
In this study, a total of 18 complete chloroplast genomes of Crataegus and Mespilus species were analyzed (Table 1). The 18 sequenced samples produced 10.11 to 15.55 Gb clean reads each after removal of adapters and low-quality reads (Table 1). The 18 complete chloroplast genomes in this study were deposited in the GenBank with accession numbers PX413282 to PX413299 (Supplementary Table S1). The complete chloroplast genomes of Crataegus ranged from 159,638 (Crataegus sp., RR2H and RR3H) to 159,973 bp (C. phaenopyrum) in length, with differences of 4~335 bp (Figure 2; Table 1). The length of the Mespilus germanica chloroplast genome was 159,811 bp, similar in length to that of Crataegus sp. (GSSZ). The Crataegus and Mespilus chloroplast genomes contained a typical quadripartite structure containing IRa and IRb regions (26,311~26,396 bp) separated by the LSC (87,665~88,081 bp) and SSC (19,139~19,295 bp) regions. The chloroplast genome characteristics of 12 Crataegus species, 5 Crataegus sp. plants, and Mespilus germanica were similar. The GC contents of these complete chloroplast genomes ranged from 36.59%~36.65%, 34.29%~34.40% in the LSC region, 30.32%~30.56% in the SSC region, and 42.63%~45.62% in the IR region, revealing high similarity among different Crataegus and Mespilus plants.
Figure 2. Chloroplast genome map of C. altaica (the outermost three rings) and CGView comparison of comparison of 17 complete chloroplast genomes in the Crataegus and Mespilus species (the inter rings with different colors). Genes shown on the outside of the outermost first ring are transcribed counter-clockwise and on the inside clockwise. Outermost second ring with darker gray corresponds to GC content, whereas outermost third ring with the lighter gray corresponds to AT content of C. altaica chloroplast genome by OGDRAW. The gray arrowheads indicate the direction of the genes. LSC, large single copy region; IR, inverted repeat; SSC, small single copy region. The innermost first black ring indicates the chloroplast genome size of C. altaica. The innermost second and third rings indicate GC content and GC skews deviations in chloroplast genome of C. altaica, respectively: GC skew + indicates G > C, and GC skew- indicates G < C. CGView comparison result of 17 complete chloroplast genomes displayed from innermost fourth color ring to outwards 21th ring in turn: C.chlorosarca, C.crus-galli, C. dahurica, C. jozana, C. laevigata, C. monogyna, C. phaenopyrum, C. sanguinea, C. shensiensis, C. songarica, Crataegus × lavalleei, Crataegus sp. (JRY, RR2H, ZWSZ, GSSZ, RR3H), and M. germanica. Sequence identity analysis, conducted via CGView, indicate the level of similarity between the chloroplast genome of each Crataegus and Mespilus species and that of C. altaica.
In general, the complete chloroplast genomes of Crataegus and Mespilus plants encoded 119~131 genes, including 74~85 PCGs (protein coding genes), 37 tRNA genes, and 8 rRNA genes (Table 1). Several PCGs (ndhB, rpl23, rpl2, rps7, ycf2, and rps12), tRNA genes (trnN-GUU, trnR-ACG, trnA-UGC, trnI-GAU, trnV-GAC, trnL-CAA, and trnI-CAU), and rRNA genes (rrn5, rrn4.5, rrn23, and rrn16) had one duplicated gene. The annotated chloroplast genomes of Crataegus and Mespilus, including their gene numbers, orders, and names are represented in a circular map (Figure 2). Among the 113 unique genes, PCGs (rps16, atpF, rpoC1, petB, petD, rpl16, rpl2, ndhB, and ndhA) and tRNA genes (trnK-UUU, trnG-GCC, trnL-UAA, trnV-UAC, trnI-GAU, and trnA-UGC) had one intron; PCGs (rps12, clpP and ycf3) contained two introns (Table 2). The locations and numbers of introns of genes in the Crataegus and Mespilus chloroplast genomes presented similar features. Through CGView-based sequence identity analysis, the similarity levels between the chloroplast genomes of individual Crataegus and Mespilus species and C. altaica were characterized (Figure 2). The results of this analysis showed that the sequence identity ranged from 98.882% to 99.510%, indicating that these 18 chloroplast genomes were relatively conserved and exhibit high sequence similarity.
Border region variations in the chloroplast genomes
IRscope was used to visualize the genes on the boundaries of the junction sites of the Crataegus and Mespilus chloroplast genomes. The adjacent genes and border regions of 18 Crataegus and Mespilus species were analyzed, and C. kansuensis (MF784433) was used as the reference (Figure 3). In general, the genomic structure was relatively conserved. However, 18 Crataegus and Mespilus chloroplast genomes presented variations at the LSC/IRb, IRb/SSC, and IRa/LSC borders. The LSC/IRb regions of nine Crataegus and Mespilus species contained the rpl2 gene, which is located after the rps19 gene. The IRb/SSC regions of the 6 Crataegus species contained the ycf1 gene.
Figure 3. Comparisons of the borders of large single-copy (LSC), small single-copy (SSC), and inverted repeats (IR) regions among 18 chloroplast genomes.
The length of the ndhf gene spans the IRb/SSC regions, and its length in the SSC region was differs among Crataegus and Mespilus species (2,255 bp to 2,276 bp). The trnH gene is closely located 18 bp from the junction in C. crus-galli and 38 bp from the junction in C. phaenopyrum. In other Crataegus species, the trnH gene is located near the junction at distances ranging from 18~81 bp. The yfc1 gene of 9 Crataegus and Mespilus species crossed the SSC/IRa junction, extending the same length in the SSC region (4,551 bp or 4,560 bp) and IRa region (1,074 bp). The variations in these boundary regions resulted in differences in the lengths of the Crataegus and Mespilus chloroplast genomes and their LSC, IRs, and SSC regions.
Codon use analysis of protein coding genes
Protein coding genes (PCGs) of the Crataegus and Mespilus chloroplast genomes were extracted and subjected to codon use analysis. Among the 18 chloroplast genomes, arginine (Arg) (5.9998%~6.0001%), serine (Ser) (5.9999%~6%), and leucine (Leu) (5.9999%~6.0001%) were the most frequently occurring amino acids. In contrast, tryptophan (Trp) (1%) and methionine (Met) (1%) were identified infrequently. In addition, the relative synonymous codon usage (RSCU) was investigated among the 18 chloroplast genomes. The results revealed that 30 types of codons had RSCU values of more than 1.0 in the PCGs of 18 chloroplast genomes, revealing that they were used more than synonymous codons (Supplementary Figure S1; Supplementary Table S5).
Repeat sequence and microsatellite assays
Three types of repeat sequences were analyzed in this study, including dispersed repeats, long tandem repeats, and repeat structure sequences. Dispersed repeat sequences within the 49 chloroplast genomes were identified via VMATCH. A total of 45~50 repeat sequences were present in 49 chloroplast genomes, including 20~35 direct matches and 15~27 palindromic matches in 49 chloroplast genomes (Figure 4A). Long tandem repeats were also identified using the following parameters: “2, 7, 7, 80, 10, 70, 500, -f, -d, -m”. In total, 23~54 long tandem repeat sequences were detected among 49 chloroplast genomes (Figure 4B). Among these, 0%~ 6.98% were shorter than 10 bp, 10.81%~ 39.13% were between 10 and 20 bp, and 56.52%~ 89.19% were longer than 20 bp. Repeat structures, including forward, reverse, complement and palindromic repeats, within these chloroplast genomes were also identified. In general, 49~70 repeat sequences were explored, including 20~36 forward repeat sequences, 11~30 palindromic repeat sequences, 0~30 reverse repeat sequences, and 0~7 complement repeat sequences (Figure 4C). Single sequence repeats (SSRs) within 49 chloroplast genomes were detected (Figure 4D; Supplementary Table S6). In general, 74~87 SSRs were predicted in these chloroplast genomes. Mono-nucleotides (P1) were the most abundant SSRs in each genome, ranging in quantity from 41 (C. marshallii, MK920293) to 56 (A. cusickii, MN068257, A. ovalis, MK920297).
Figure 4. Statistics of repeat elements in 49 chloroplast genomes. (A) Number of dispersed repeat sequences; (B) number of long tandem repeats; (C) number of repeat structures; (D) total number of short sequence repeats (SSRs) classified by repeat unit type and repeat unit count. P1, P2, P3, P4, P5, P6, C represent mono-nucleotide, di-nucleotide, tri-nucleotide, tetra-nucleotide, penta-nucleotide, hexa-nucleotide, and compound SSRs, respectively. OR915924, C. altaica; NC085587, C. aurantia; PQ346305, C. bretschneideri; ON032469, C. chungtienensis; OR915925, C. cuneata; MW201730, C. hupehensis; MF784433, C. kansuensis; OP964012, C. laevigata; MK920293, C. marshallii; PQ346302, C. maximowiczii; PQ346303, C. maximowiczii var. ninganensis; NC062346, C. mollis; NC066002, C. monogyna; ON032470, C. oresbia; OR448914, C. pinnatifida; MW653325, C. pinnatifida; PQ346304, C. pinnatifida; MW653326, C. pinnatifida (f) major; PQ346301, C. pinnatifida (f) major; NC062345, C. rhipidophylla; OR915926, C. sanguinea; ON032471, C. scabrifolia; OR915929, C. songarica; OR915931, C. wilsonii; OP964008, C. viridis; MK920294, Crataegus sp.; MN068257, A. cusickii, MN068255, A. alnifolia, MK920297, A. ovalis, MN068262, A. sanguinea; MK920292, A. spicata.
Sequence divergence and estimation rate analysis
The chloroplast genomes of 18 Crataegus and Mespilus species were compared and analyzed with the chloroplast genome of C. kansuensis used as a reference (Figure 5). The results revealed that the chloroplast genomes of the 18 Crataegus and Mespilus species presented minimal interspecies variation. The exon and UTR regions (shown in blue) presented the highest level of conservation, particularly in ycf2, rrn23, and rrn16. In contrast, intergenetic regions presented the greatest variability, with rapid changes in regions such as trnR-UCU~atpA, trnT-UCU~trnL-UAA, ndhC~trnV-UAC, and rpl32~trnL-UAG.
Figure 5. Visualization of the comparison of 18 chloroplast genomes. The horizontal axis represents the coordinates within the chloroplast genome, and the vertical axis indicates the percentage identity, ranging from 50% to 100%. The colors represent different regions: blue for exons, green for introns, and red for intergenetic regions.
DNA polymorphism analyses were conducted to determine the nucleotide diversity (Pi) of the complete chloroplast genome, PCGs, and intergenetic regions (Figure 6). Intergenetic regions presented greater nucleotide polymorphisms than the PCG regions did, which was consistent with the whole-genome alignment results among 18 Crataegus and Mespilus species. The most highly variable regions included four PCGs (infA, ndhC, pasl, and rps19) along with five intergenetic regions (ndhC~trnV-UAC, psbZ~trnG-UCC, rpl33~rps18, trnH-GUG~psbA, and trnR-UCU~atpA), which may be potential molecular markers for the identification of Crataegus species.
Figure 6. Comparative analysis of nucleotide diversity (Pi). The horizontal coordinates indicate the midpoint position/gene name/intergenetic region, and the vertical coordinates represent the Pi value.
The Ka, Ks, and Ka/Ks ratios of PCGs in 18 Crataegus and Mespilus species were analyzed, revealing the evolutionary rates of these species relative to those of C. kansuensis (Supplementary Figure S2; Supplementary Table S7). Overall, most of the Ka/Ks ratios of the 42 PCGs were less than 1, indicating that these PCGs were under purifying selection. The Ka/Ks ratio of the rpoC2 gene exceeded more than 1 in C. monogyna (2.11), C. laevigata (1.83), C. songarica (1.83), and M. germanica (1.83). The high Ka/Ks ratios for rpoC2 in these species indicate that they may be phylogenetically distant from other Crataegus species. Similarly, ndhB presented a high Ka/Ks ratio in C. crus-galli, C. jozana, C.× lavalleei, and C. phaenopyrum, which notably clustered together.
Validation of candidate DNA barcode
Based on the results of DNA polymorphism analyses (Figure 6), PhyloSuite was employed to extract the three intergenic regions (ndhC~trnV-UAC, psbZ~trnG-UCC, and trnR-UCU~atpA) with the highest nucleotide diversity from the chloroplast genomes of 44 Crataegus and Mespilus species. Only the ndhC~trnV-UAC sequence was present in all 44 chloroplast genomes. Subsequently, we constructed a phylogenetic tree using the ndhC~trnV-UAC sequences (Supplementary Figure S3B), and compared it with the phylogenetic tree constructed based on the complete chloroplast genomes (Supplementary Figure S3A). The results showed that the clustering results of the ndhC~trnV-UAC sequences for the vast majority (87.75%) of Crataegus and Mespilus species were consistent with those of the complete chloroplast genomes, and both could divide these accessions into four clades. The phylogenetic tree constructed based on the ndhC~trnV-UAC sequencing could accurately distinguish the two Crataegus subgroups (C. subg. Crataegus and C. subg. Sanguineae) native to China and also clustered the European-native Crataegus species into a separate clade (Supplementary Figure S4). These results indicated that ndhC~trnV-UAC could serve as a candidate DNA barcode for Crataegus species identification.
Phylogenetic and divergence time analysis
Taxonomic analysis of the chloroplast genomes of 18 Crataegus and Mespilus species, along with 31 published Crataegus and related Amelanchier species, was conducted to investigate the evolution of Crataegus species. Generally, high congruence was observed between the maximum likelihood (ML) and Bayesian inference (BI) trees, and 49 species and variants were divided into four main clades and one outgroup on the basis of their chloroplast genomes (Supplementary Figure S5). Crataegus and Amelanchier were separated into two groups. For the genus Crataegus, 44 species and variants were divided into four distinct clades. The Chinese Crataegus species originating from the northeast (C. sanguinea, C. dahurica, and C. maximowiczii), central (C. wilsonii, C. aurantia) and western regions (C. altaica, C. kansuensis) were within Clade I, which were belong to C. subg. Sanguineae. C. subg. Americanae plants were within Clade II. These two clusters originated from a common ancestor. C. pinnatifida and C. pinnatifida Bge. var. major formed Clade III with Crataegus species originating from the southwest (C. scabrifolia), and central regions (C. hupehensis, C. shensiensis, and C. cuneata) of China. C. songarica and European Crataegus plants were within Clade IV. All species in Clade III and Clade IV were belong to C. subg. Crataegus and C. subg. Mespilus (L.), which originated from a common ancestor. Crataegus spp. plants (GSSZ and ZWSZ) may represent the independent species of Crataegus similar to C. wilsonii and C. kansuensis. Crataegus spp. plants (JRY, RR2H, and RR3H) belong to C. pinnatifida.
The divergence time of Crataegus species was estimated (Figure 7). The divergence clades of these genera were consistent with the phylogenetic trees, and the Crataegus and the outgroup were expected to differentiate 44.42 Ma (Eocene). The differentiation between Clades I + II and III + IV was estimated to have occurred at approximately 33.49 Ma (Oligocene). The divergence time between some Crataegus species from Northeast China and their North American congeners was 27.06 Ma. C. phaenopyrum and other North American Crataegus species differentiated approximately 8.67 Ma in Clade II. For the other group of Chinese Crataegus species, their divergence time from European Crataegus species was estimated at 21.06 Ma. Crataegus rhipidophylla and other European Crataegus species differentiated approximately 12.59 Ma. The divergence between M. germanica and C. laevigata occurred approximately 0.88 Ma.
Figure 7. Divergence time estimation for Crataegus, Mespilus and Amelanchier on the basis of complete chloroplast genomes. The number at each node represents the median divergence time, and the node bars represent the 95% HPD (highest posterior density). The ruler on the lower left represents the geologic timescale. Eocene (33.90 ~ 55.80 Ma); Oligocene (23.03 ~ 33.90 Ma); Miocene (5.33 ~ 23.03 Ma); Pli (Pliocene, 1.81 ~ 5.33 Ma); Ple (Pleistocene, 0.01 ~ 1.81 Ma).
Ancestral area reconstruction analysis
The BioGeoBEARS analyses in RASP identified that BAYAREALIKE + j was the best-fit biogeographical model with highest AICc_wt value among the six models for chloroplast genomes of Crataegus and Mespilus (Supplementary Table S4). Therefore, we have presented the reconstruction result of BioGeoBEARS with BAYAREALIKE + j DEC model (Figure 8). The most recent common ancestor of the entire Crataegus clade at Node 1 had a core distribution spanning South-western China (A), the Central Plains and Qinling Mountains of China (B), and Europe (F), providing compelling evidence for a broad trans-Eurasian distribution of this genus during its early evolutionary radiation. Its evolutionary dynamics fall into three distinct biogeographic pathways.
Figure 8. Biogeographic reconstruction in RASP utilizing the BAYAREALIKE + j model, based on the complete chloroplast genomes phylogenies obtained from BEAST analysis. Pie charts depict the most likely distribution locations for the most recent common ancestor. Other ancestral ranges are represented in black and marked with an asterisk.
Specifically, the East Asian endemic pathway (Nodes 1, 2, 3, 6) reflects long-term in-situ diversification and intra-regional dispersal within the putative East Asian cradle. Node 2 retained the ancestral range (A+B), laying the groundwork for East Asian endemic clades; Node 3 saw the derived lineage persist in South-western to Central China and undergo adaptive radiation to yield endemic taxa represented by C. scabrifolia; Node 6 expanded northeastward from the A–B ancestral range, colonizing North-eastern China (C) and the Mongolian-Siberian region (D) and evolving low-temperature acclimation species (e.g., C. dahurica, C. maximowiczii).
In parallel, the transcontinental dispersal pathway (Nodes 1, 2, 4) delineates intercontinental expansion from East Asia to Europe and Central Asia. Node 2 generated a subclade with westward dispersal potential into Europe; Node 4 extended its range from region B through Europe (F) to Central Asia (E), driving the emergence of the widespread C. monogyna (Europe) and endemic C. songarica (Central and Western Asia). On the other hand, the transoceanic dispersal pathway (Nodes 5, 7) illuminates the trans-Pacific dispersal of Crataegus from East Asia to North America: Node 5 achieved long-distance dispersal from South-western China (A) to North America (G) via the Bering Land Bridge—a key biogeographic corridor—to establish the founder population of the North American clade, while Node 7 underwent independent diversification in North America to produce endemic taxa such as C. phaenopyrum.
Discussion
With minor variations, the whole chloroplast genome is mostly conserved in terms of its quadripartite structure, size, number and placement of genes, and GC content (Daniell et al., 2016). Typical quadripartite structures containing LSC, SSC, and IR sections were found in the newly sequenced chloroplast genomes of 18 Crataegus and Mespilus (Figure 2). These genome structures resembled those of previously identified Crataegus species (Wu et al., 2022; Zhang et al., 2022; Meng et al., 2025). The chloroplast genomes of vegetable species are 120~160 kb in length, whereas those of fruit species are 140~180 kb in length (Daniell et al., 2021). The chloroplast genomes in this study were similar in length and conserved; they ranged from 159,638 bp in Crataegus sp. (RR2H, RR3H) to 159,973 bp in C. phaenopyrum (Table 1). Most land plant plastid genomes contain 110~130 genes, over 80 of which encode proteins involved in photosynthesis and other processes (Daniell et al., 2021). Eighteen Crataegus and Mespilus chloroplast genomes encoded 119~132 genes. The LSC regions were 87,665~88,081 bp in length, and the SSC regions were 19,139~19,295 bp in length. The pair of inverted IRa/IRb regions was 26,311~26,396 bp in length (Table 1) With GC levels and compositions ranging from 36.59% to 36.65%, the Crataegus and Mespilus species were found to be relatively conserved.
Two large inverted repeats (IRs) represent a defining feature of chloroplast genomes (Lin et al., 2012). As a key utility of this structural element, variations in IRs length not only serve as a reliable marker for phylogenetic analysis but also correlate with changes in chloroplast genome size (Wolf et al., 2003). In fact, dynamic expansion and contraction of IR regions have been documented across diverse land plant lineages (Wang et al., 2008; Zhu et al., 2016), with a notable example being the significant IR depletion observed in 13 species belonging to the Papilionoideae subfamily (Qin et al., 2025). According to previous studies, the IRs length of Crataegus species exhibits minimal variation (Wu et al., 2022; Zhang et al., 2022). Consistent with these findings, our results demonstrated that the IR region length across the eighteen chloroplast genomes showed no significant differences, ranging from 26,331 to 26,396 bp. Two distinct types of genes were identified in the IR regions and their adjacent border areas: specifically, rpl2 was localized within either the IRb or IRa region, while ycf1 was situated precisely at the junction between the small single-copy (SSC) region and the IRa region (Figure 3). Therefore, the degree of similarity in the inverted repeat (IR) region reflects the conservation of the chloroplast genome in the genus Crataegus.
Repetitive DNA sequences (repeats) are patterns of nucleic acids that exist in many copies throughout the genome (Biscotti et al., 2015). Repetitive DNA sequences perform important functions in promoting evolution, causing diversity, and regulating gene expression (Liao et al., 2023). Sequence arrangement and evolution of the chloroplast genome are more influenced by larger and more complex repeat sequences (Huang et al., 2014). In 18 newly sequenced and 31 published chloroplast genomes of Crataegus, Mespilus and Amelanchier, we identified three different kinds of repetitive sequences (Figure 4). The quantity and location of scattered repetitions among the genomes varied significantly. The greatest number of forward and palindromic repeats were found in C. monogyna (36) and C. pinnatifida f. major (30), respectively. Chloroplast genomes vary greatly in terms of SSR copy number diversity, which could be used as molecular markers for species identification and population genetics (Li et al., 2020; Ping et al., 2021; Wang et al., 2021). In this study, we identified 74~87 SSRs within 49 chloroplast genomes, respectively (Supplementary Table S6). The mono-nucleotides (P1) were the most abundant SSRs in each genome, ranging in quantity from 41 (C. marshallii, MK920293) to 56 (A. cusickii, MN068257, A. ovalis, MK920297). These SSR markers hold considerable potential as tools for species identification and phylogenetic analysis of Crataegus.
The creation of mutational hotspots for interspecies discriminating and species-level phylogenetic analysis is facilitated by multi-genome alignments (Liu et al., 2019; Abdullah et al., 2020; Wang et al., 2021; Kłubowicz et al., 2024). To resolve phylogenetic issues in different plant species, a number of plastid DNA markers obtained from highly variable coding and noncoding regions may be employed. Prior successes have been reported with the coding gene ndhA in Hordeum (Yuan et al., 2023), the noncoding region near rbcL in the grass family (Morton and Clegg, 1993), and the intergenetic sequence rpoC2-rps2 in Artemisia (Shahzadi et al., 2020). In this study, we determined the nucleotide diversity (Pi) of the complete chloroplast genome, protein coding genes (PCGs) and intergenetic regions among 18 Crataegus and Mespilus species. Four PCGs (infA, ndhC, pasl, and rps19) along with five spacer sequences (ndhC~trnV-UAC, psbZ~trnG-UCC, rpl33~rps18, trnH-GUG~psbA, and trnR-UCU~atpA) showed high variation (Figure 6). The phylogenetic tree constructed based on the ndhC~trnV-UAC sequence displayed a branching topology that is largely consistent with that of the tree built using the complete chloroplast genome. The inconsistent distribution of six accessions in the two phylogenetic trees is precisely due to the selective use of variant nucleotide sequences (Supplementary Figure S3). Validated by our experimental results, ndhC~trnV-UAC could serve as an effective DNA barcode for the identification of Crataegus species, especially for distinguishing the plants of the two Crataegus subgroups native to China (Supplementary Figure S4).
The Ka/Ks ratio serves as a critical metric for deciphering evolutionary pressures acting on genes. In this study, we calculated the Ka/Ks values of 42 protein-coding genes (PCGs) across 18 Crataegus and Mespilus species (Supplementary Table S7). Overall, the majority of PCGs exhibited significantly low Ka/Ks ratios (< 1) indicating that these genes are primarily under purifying selection and thus relatively conserved in sequence. Notably, the rpoC2 gene displayed the highest Ka/Ks ratio (> 1), reflecting non-synonymous mutations in this gene were positively selected. As a member of the RNA polymerase subunit gene family, rpoC2, together with rpoB and rpoC1, encodes the β″, β, and β′ subunits of RNA polymerase, respectively—key components for maintaining the semiautonomous function of chloroplasts (Börner et al., 2015). The rpoC2 gene in C. monogyna, C. laevigata, C. songarica, and M. germanica exhibits high Ka/Ks ratios (Ka/Ks > 1) and clusters as a single clade in the phylogenetic tree (Supplementary Figures S2, S5). These reflected the adaptive evolutionary characteristics of these four species, which have undergone advantageous non-synonymous mutations driven by positive selection to cope with similar environmental pressures. A similar pattern was observed in a single clade of phylogenetic tree: C. crus-galli, C. jozana, C. × lavalleei, and C. phaenopyrum, which all exhibit a high Ka/Ks ratio in the ndhB gene.
It has been difficult to determine phylogenetic relationships in Rosaceae because of apomixis, frequent hybridization, and intricate historical diversification (Xue et al., 2019). The genomes of chloroplasts display typical traits of maternal inheritance. Our results revealed that Clades I and II which include the C. subg. Sanguineae and C. subg. Americanae species were closely related (Supplementary Figure S5). These results support Phipps’s hypothesis (Phipps, 1983) that Eurasian and North American Crataegus species exchange genes across the Bering Strait. The genus Mespilus, which belongs to the C. subg. Mespilus (L.), has two widely distributed species, M. germanica L. and M. canescens J.B. Phipps. The natural distribution range of M. germanica L. is concentrated in the Europe of the Caspian–Black Sea coast (Popović-Djordjević et al., 2023). Since Linnaeus, academic taxonomy has largely maintained Mespilus and Crataegus separately, but not necessarily for adequate reasons. Mespilus and Crataegus differ in morphology, which is why Phipps argued for the retention of a monotypic Mespilus (Phipps, 2016). However, according to the phylogenetic analyses results of intergenetic cpDNA regions and nuclear sequences, Mespilus and Crataegus are related genera that are members of the Rosaceae tribe Pyrea (Lo et al., 2007; Talent et al., 2008). Phylogenetic analysis (Supplementary Figure S5) further confirmed a close phylogenetic relationship among C. monogyna, C. laevigata, and M. germanica in the present study.
With the rapid development of high-throughput sequencing technology, chloroplast genomes have been widely applied as super-barcodes, which can provide effective information for resolving phylogenetic relationships and identifying plants (Liu et al., 2025; Shi et al., 2025). In our previous study, although four Crataegus individuals (GSSZ, ZWSZ, RR3H, RR5H) were classified as C. pinnatifida, their morphological characteristics of fruits and leaves exhibited certain differences from those of typical C. pinnatifida (Figure 1). Furthermore, the phylogenetic tree constructed based on specific locus amplified fragment sequencing (SLAF-seq) clustered GSSZ and ZWSZ into a separate clade; RR3H and RR5H into another distinct clade. These two clades showed a relatively distant genetic relationship with C. pinnatifida (Du et al., 2019). Our results indicated that RR2H and RR3H share the closest genetic relationship with C. pinnatifida (Supplementary Figure S5), confirming that these two Crataegus individuals are either C. pinnatifida or its varieties. GSSZ and ZWSZ belonging to C. subg. Sanguineae are most closely related to C. kansuensis and C. wilsonii (Supplementary Figure S5). These findings also explained the phenotypic similarity among GSSZ, ZWSZ, and C. kansuensis.
The geographic origin of Crataegus species remains controversial. Phipps proposed that the origin regions of Crataegus was East Asia or South America. The original population of C. scabrifolia spread westward to the European continent and on the other hand, evolved northward into most Asian species, including C. sanguineae, C. hupehensis, and C. pinnatifida, while crossing the Bering Strait to evolve into most existing North American hawthorn species (Phipps, 1983). Based on ITS sequences, cpDNA fragments, and LEAFY gene data, Lo et al. (2009) analyzed 37 Crataegus accessions sampled from East Asia, northwestern North America, northeastern North America, and Europe, and proposed that the genus Crataegus originated in northeastern North America and Europe, with C. germanica and C. brachyacantha identified as its ancestral species. However, Crataegus originating from southwestern China were not employed in Lo et al.’s research. Subsequent relevant study indicated that southwestern Chinese Crataegus share a gene pool with European lineages, while northeastern Chinese populations probably originated from North American species (Du et al., 2019). Recently, the plastome-based research indicate that East Asian Crataegus species migrated transoceanically via the Bering Land Bridge to form North American taxa, while the westward dispersal of C. songarica ancestors promoted the evolution of European lineages (Meng et al., 2025). Our biogeographic and molecular dating analyses reveal that the ancestral clade of Crataegus was widely distributed across eastern and western Eurasia (Figure 8). A comprehensive synthesis of prior studies and our novel data further confirms China as one of the genus’ primary centers of origin. The evolutionary trajectories of Crataegus elucidated herein, which are characterized by two major dispersal events (transcontinental expansion from East Asia to Europe and Central Asia, and trans-Pacific migration from East Asia to North America), are highly congruent with the findings of Phillips and Meng while providing empirical support for Du’s hypothesis. Notably, while our study differs from that of Meng et al. (2025) in the sampling size of Crataegus species and geographic partitioning methodology, the latter proposed that the ancestral lineage of C. songarica facilitated the formation of European Crataegus species. In contrast, our plastome-based analyses incorporating a larger number of European Crataegus individuals demonstrate that the formation of European Crataegus species predated that of C. songarica. Collectively, these results clarify the spatiotemporal dynamics underlying the global diversification of Crataegus, laying a robust foundation for future investigations into the genus’ adaptive evolution in response to historical climatic fluctuations.
Conclusion
In the present study, 18 complete chloroplast genomes from Crataegus and Mespilus species were assembled and compared. Numerous facets of the chloroplast genome, such as repeat sequence and microsatellite assays, gene annotation, and codon use studies have been investigated. Regions such as ycf2, rrn23, and rrn16 presented the highest degree of conservation among the exon and UTR regions. Nonetheless, ndhC~trnV-UAC demonstrated significant variability among the chloroplast genomes, and could be regarded as potential molecular markers for further phylogenetic assessments. Maximum likelihood and Bayesian phylogenetic trees revealed genetic relationships among Crataegus and Mespilus species, and confirmed the taxonomic status of Crataegus individuals (GSSZ, JRY, RR2H, RR3H, ZWSZ). The results of divergence time showed that the crown age of C. subg. Crataegus was about 33.487 Ma, and then started to diverge into the C. subg. Americanae and C. subg. Sanguineae around 27.059 Ma. Based on the results of molecular evidence, we speculate that China represents a putative maternal origin of Crataegus species. This study not only enriched the complete chloroplast genome resources of Crataegus, but also provided useful information for further studies of the evolution and phylogeny of Crataegus species.
Data availability statement
The original data presented in the article has been successfully deposited in the following database: NCBI, PX676094-PX676143.
Author contributions
XS: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing – original draft. MC: Data curation, Investigation, Methodology, Writing – original draft. BZ: Data curation, Investigation, Methodology, Writing – original draft. YW: Data curation, Investigation, Methodology, Writing – original draft. XZ: Data curation, Formal analysis, Funding acquisition, Resources, Supervision, Writing – original draft, Writing – review & editing. YL: Resources, Supervision, Writing – review & editing.
Funding
The author(s) declared that financial support was received for this work and/or its publication. This research was supported by the Conservation and Utilization of Crop Germplasm Resource–Hawthorn (No. 22250318); China Postdoctoral Science Foundation (No. 2024MD763998); Special Project for National Science and Technology Basic Resource Investigation (No. 2024FY100502-5).
Acknowledgments
We thank the Shanghai Chenshan Botanical Garden (Shanghai, China) for providing the Crataegus × lavalleei material.
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2026.1767012/full#supplementary-material
Abbreviations
BI, Bayesian inference; BLAST, Basic local alignment search tool; IR region, Inverted repeats region; ISSRs, Intersimple sequence repeats; Ka, nonsynonymous; Ks, synonymous; LSC region, Large single-copy region; ML, Maximum likelihood; NCBI, National center for biotechnology information; PCGs, Protein-coding genes; RAPD, Randomly amplified polymorphic DNA; rRNA, transfer RNA; RSCU, Relative synonymous codon usage; SCoT, Start codon targeted polymorphism; SSC region, Small single-copy region; SSRs, Simple sequence repeats; tRNA, ribosomal RNA; SLAF-seq, specific locus amplified fragment sequencing.
References
Abdullah, Mehmood, F., Shahzadi, I., Waseem, S., Mirza, B., Ahmed, I., et al. (2020). Chloroplast genome of Hibiscus rosa-sinensis (Malvaceae): Comparative analyses and identification of mutational hotspots. Genomics 112, 581–591. doi: 10.1016/j.ygeno.2019.04.010
Amiryousefi, A., Hyvönen, J., and Poczai, P. (2018). IRscope: an online program to visualize the junction sites of chloroplast genomes. Bioinformatics 34, 3030–3031. doi: 10.1093/bioinformatics/bty220
Beier, S., Thiel, T., Münch, T., Scholz, U., and Mascher, M. (2017). MISA-web: a web server for microsatellite prediction. Bioinformatics 33, 2583–2585. doi: 10.1093/bioinformatics/btx198
Benli, M., Yiğit, N., Geven, F., Güney, K., and Bingöl, U. (2008). Antimicrobial activity of endemic Crataegus tanacetifolia (Lam.) Pers and observation of the inhibition effect on bacterial cells. Cell Biochem. Funct. 26, 844–851. doi: 10.1002/cbf.1515
Benson, G. (1999). Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580. doi: 10.1093/nar/27.2.573
Biscotti, M. A., Olmo, E., and Heslop-Harrison, J. S. (2015). Repetitive DNA in eukaryotic genomes. Chromosome Res. 23, 415–420. doi: 10.1007/s10577-015-9499-z
Börner, T., Aleynikova, A. Y., Zubo, Y. O., and Kusnetsov, V. V. (2015). Chloroplast RNA polymerases: Role in chloroplast biogenesis. BBA-Bioenergetics 1847, 761–769. doi: 10.1016/j.bbabio.2015.02.004
Burland, T. G. (2000). DNASTAR’s Lasergene sequence analysis software. Methods Mol. Biol. 132, 71–91. doi: 10.1385/1-59259-192-2:71
Campbell, C. S., Evans, R. C., Morgan, D. R., Dickinson, T. A., and Arsenault, M. P. (2007). Phylogeny of subtribe Pyrinae (formerly the Maloideae, Rosaceae): Limited resolution of a complex evolutionary history. Plant Syst. Evol. 266, 119–145. doi: 10.1007/s00606-007-0545-y
Cauz-Santos, L. A. (2025). Beyond conservation: the landscape of chloroplast genome rearrangements in angiosperms. New Phytol. 247, 2571–2580. doi: 10.1111/nph.70364
Cloud, A., Vilcins, D., and McEwen, B. (2020). The effect of hawthorn (Crataegus spp.) on blood pressure: A systematic review. Adv. Integr. Med. 7, 167–175. doi: 10.1016/j.aimed.2019.09.002
Daniell, H., Jin, S., Zhu, X. G., Gitzendanner, M. A., Soltis, D. E., and Soltis, P. S. (2021). Green giant—a tiny chloroplast genome with mighty power to produce high-value proteins: history and phylogeny. Plant Biotechnol. J. 19, 430–447. doi: 10.1111/pbi.13556
Daniell, H., Lin, C. S., Yu, M., and Chang, W.-J. (2016). Chloroplast genomes: diversity, evolution, and applications in genetic engineering. Genome Biol. 17, 134. doi: 10.1186/s13059-016-1004-2
Dickinson, T. A., Belaoussoff, S., Love, R. M., and Muniyamma, M. (1996). North American black-fruited hawthorns. I. Variation in floral construction, breeding system correlates, and their possible evolutionary significance in Crataegus sect. Douglasii London. Folia Geobot 31, 355–371. doi: 10.1007/BF02815380
Dong, W. and Li, Z. X. (2015). The science and practice of Chinese fruit tree: Hawthorn (Shanxi, China: Science Press).
Du, X., Zhang, X., Bu, H., Zhang, T., Lao, Y., and Dong, W. (2019). Molecular analysis of evolution and origins of cultivated hawthorn (Crataegus spp.) and related species in China. Front. Plant Sci. 10. doi: 10.3389/fpls.2019.00443
Esmaeili, A., Rahnamoun, S., and Sharifnia, F. (2013). Effect of O/W process parameters on Crataegus azarolus L nanocapsule properties. J. Nanobiotechnology 11, 16. doi: 10.1186/1477-3155-11-16
Feng, Y., Gao, S., Zhu, T., Sun, G., Zhang, P., Huang, Y., et al. (2022). Hawthorn fruit acid consumption attenuates hyperlipidemia-associated oxidative damage in rats. Front. Nutr. 9. doi: 10.3389/fnut.2022.936229
Gosler, A. G., Kelly, C. K., and Blakey, J. K. (1994). Phenotypic plasticity in leaf morphology of Crataegus monogyna (Rosaceae): an experimental study with taxonomic implications. Bot. J. Linn Soc. 115, 211–219. doi: 10.1111/j.1095-8339.1994.tb01779.x
Grant, J. R. and Stothard, P. (2008). The CGView Server: a comparative genomics tool for circular genomes. Nucleic Acids Res. 36, W181–W184. doi: 10.1093/nar/gkn179
Green, B. R. (2011). Chloroplast genomes of photosynthetic eukaryotes. Plant J. 66, 34–44. doi: 10.1111/j.1365-313X.2011.04541.x
Gu, C. Z., Spongberg, S. A., Wu, Z. Y., Raven, P. H., and Hong, D. Y. (2003). Flora of China. (Beijing: Science Press).
Güney, M., Kafkas, S., Keles, H., Aras, S., and Ercişli, S. (2018). Characterization of hawthorn (Crataegus spp.) genotypes by SSR markers. Physiol. Mol. Biol. Plants 24, 1221–1230. doi: 10.1007/s12298-018-0604-6
Huang, H., Shi, C., Liu, Y., Mao, S. Y., and Gao, L. Z. (2014). Thirteen Camellia chloroplast genome sequences determined by high-throughput sequencing: genome structure and phylogenetic relationships. BMC Evol. Biol. 14, 151. doi: 10.1186/1471-2148-14-151
Jiang, H., He, S., He, J., Zuo, Y., Guan, W., Zhao, Y., et al. (2025). Plastid genomic features and phylogenetic placement in Rosa (Rosaceae) through comparative analysis. BMC Plant Biol. 25, 752. doi: 10.1186/s12870-025-06734-0
Jin, J. J., Yu, W. B., Yang, J. B., Song, Y., dePamphilis, C. W., Yi, T. S., et al. (2020). GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biol. 21, 241. doi: 10.1186/s13059-020-02154-5
Khiari, S., Boussaid, M., and Messaoud, C. (2015). Genetic diversity and population structure in natural populations of Tunisian Azarole (Crataegus azarolus L. var. aronia L.) assessed by microsatellite markers. Biochem. Syst. Ecol. 59, 264–270. doi: 10.1016/j.bse.2015.01.025
Kłubowicz, K., Sawicki, J., Paukszto, Ł., Ciborowski, K., Maździarz, M., and Krawczyk, K. (2024). Organellar genome evolution in Ribes L.: hotspots of mutation, sequence transfer, codon usage, and phylogenetic context. Tree Genet. Genomes 20, 51. doi: 10.1007/s11295-024-01684-4
Kurtz, S. and Schleiermacher, C. (1999). REPuter: fast computation of maximal repeats in complete genomes. Bioinformatics. 15, 426–427. doi: 10.1093/bioinformatics/15.5.426
Li, B., Lin, F., Huang, P., Guo, W., and Zheng, Y. (2020). Development of nuclear SSR and chloroplast genome markers in diverse Liriodendron chinense germplasm based on low-coverage whole genome sequencing. Biol. Res. 53, 21. doi: 10.1186/s40659-020-00289-0
Li, P. W., Lu, Y. B., Antonelli, A., Zhu, Z. J., Wang, W., Qin, X. M., et al. (2025b). Sliding-window phylogenetic analyses uncover complex interplastomic recombination in the tropical Asian-American disjunct plant genus Hedyosmum (Chloranthaceae). New Phytol. 246, 2405–2415. doi: 10.1111/nph.70120
Li, Z. Z., Wang, Y., He, X. Y., and Li, W. G. (2025c). The Taihangia mitogenome provides new insights into its adaptation and organelle genome evolution in Rosaceae. Planta 261, 59. doi: 10.1007/s00425-025-04629-w
Li, J., Wang, H., Wang, L., Wang, X., Jia, L., and Chen, Z. (2025a). Comprehensive analysis of the complete chloroplast genome of the cultivated soapberry and phylogenetic relationships of Sapindaceae. Indl Crop Prod 228, 120952. doi: 10.1016/j.indcrop.2025.120952
Liang, Y. Q., Wen, Y. T., Fu, Q. Q., Yu, Z. Y., Wang, L. J., Zou, Z. R., et al. (2022). Structure analysis and tyrosinase, melanogenesis, α-glucosidase, and nonenzymatic glycation inhibitory activities of polymeric proanthocyanidins from the pulp of Crataegus pinnatifida Bge. Ind. Crop Prod 187, 115524. doi: 10.1016/j.indcrop.2022.115524
Liao, X., Zhu, W., Zhou, J., Li, H., Xu, X., Zhang, B., et al. (2023). Repetitive DNA sequence detection and its role in the human genome. Commun. Biol. 6, 954. doi: 10.1038/s42003-023-05322-y
Lin, C. P., Wu, C. S., Huang, Y. Y., and Chaw, S. M. (2012). The complete chloroplast genome of Ginkgo biloba reveals the mechanism of inverted repeat contraction. Genome Biol. Evol. 4, 374–381. doi: 10.1093/gbe/evs021
Liu, S., Ni, Y., Li, J., Zhang, X., Yang, H., Chen, H., et al. (2023). CPGView: A package for visualizing detailed chloroplast genome structures. Mol. Ecol. Resour 23, 694–704. doi: 10.1111/1755-0998.13729
Liu, B. B., Ren, C., Kwak, M., Hodel, R. G. J., Xu, C., He, J., et al. (2022). Phylogenomic conflict analyses in the apple genus Malus s.l. reveal widespread hybridization and allopolyploidy driving diversification, with insights into the complex biogeographic history in the Northern Hemisphere. J. Integr. Plant Biol. 64, 1020–1043. doi: 10.1111/jipb.13246
Liu, Y., Shen, F., Wang, L., Dou, J., Dong, T., Li, M., et al. (2025). Accelerating moss identification through the development of specific DNA barcodes based on the whole chloroplast genome. Mol. Ecol. Resour 25, e70004. doi: 10.1111/1755-0998.70004
Liu, H., Su, Z., Yu, S., Liu, J., Yin, X., Zhang, G., et al. (2019). Genome comparison reveals mutation hotspots in the chloroplast genome and phylogenetic relationships of Ormosia Species. BioMed. Res. Int. 2019, 7265030. doi: 10.1155/2019/7265030
Liu, X., Yang, S., Ma, J., Yu, J., Yan, Q., and Jiang, Z. (2020). Efficient production of acetylated xylooligosaccharides from Hawthorn kernels by a xylanase from Paecilomyces aerugineus. Ind. Crop Prod 158, 112962. doi: 10.1016/j.indcrop.2020.112962
Lo, E. Y. Y., Stefanović, S., Christensen, K. I., and Dickinson, T. A. (2009). Evidence for genetic association between East Asian and western North American Crataegus L. (Rosaceae) and rapid divergence of the eastern North American lineages based on multiple DNA sequences. Mol. Phylogenet Evol. 51, 157–168. doi: 10.1016/j.ympev.2009.01.018
Lo, E. Y. Y., Stefanović, S., and Dickinson, T. A. (2007). Molecular reappraisal of relationships between Crataegus and Mespilus (Rosaceae, Pyreae) two genera or one? Syst. Bot. 32, 596–616. doi: 10.1600/036364407782250562
Matzke, N. J. (2014). Model selection in historical biogeography reveals that founder-event speciation is a crucial process in island clades. Syst. Biol. 63, 951–970. doi: 10.1093/sysbio/syu056
Matzke, N. J. (2022). Statistical comparison of DEC and DEC+J is identical to comparison of two ClaSSE submodels, and is therefore valid. J. Biogeogr 49, 1805–1824. doi: 10.1111/jbi.14346
Meng, J., Wang, Y., Song, H., Dong, W., and Dong, N. (2025). Insights into phylogeny, taxonomy, origins and evolution of Crataegus and Mespilus, based on comparative chloroplast genome analysis. Genes 16, 204. doi: 10.3390/genes16020204
Minh, B. Q., Schmidt, H. A., Chernomor, O., Schrempf, D., Woodhams, M. D., von Haeseler, A., et al. (2020). IQ-TREE 2: New models and efficient methods for phylogenetic inference in the Genomic Era. Mol. Biol. Evol. 37, 1530–1534. doi: 10.1093/molbev/msaa015
Mohammed, K. E., Ahmad, N. S., and Ahmad, S. A. (2025). Genetic diversity of wild hawthorn (Crataegus spp.) in Iraqi Kurdistan region using biochemical and molecular markers. Genet. Resour Crop Evol. 72, 6631–6647. doi: 10.1007/s10722-025-02343-w
Morton, B. R. and Clegg, M. T. (1993). A chloroplast DNA mutational hotspot and gene conversion in a noncoding region near rbcL in the grass family (Poaceae). Curr. Genet. 24, 357–365. doi: 10.1007/bf00336789
Phipps, J. B. (1983). Biogeographic, taxonomic, and cladistic relationships between East Asiatic and North American Crataegus. Ann. Missouri Bot. Gard 70, 667–700. doi: 10.2307/2398984
Phipps, J. B. (2016). Studies in Mespilus, Crataegus, and ×Crataemespilus (Rosaceae), II. The academic and folk taxonomy of the medlar, Mespilus germanica, and hawthorns, Crataegus (Rosaceae). Phytotaxa 260, 25–35. doi: 10.11646/phytotaxa.260.1.3
Phipps, J. B., Robertson, K. R., Smith, P. G., and Rohrer, J. R. (1990). A checklist of the subfamily Maloideae (Rosaceae). Can. J. Bot. 68, 2209–2269. doi: 10.1139/b90-288
Ping, J., Feng, P., Li, J., Zhang, R., Su, Y., and Wang, T. (2021). Molecular evolution and SSRs analysis based on the chloroplast genome of Callitropsis funebris. Ecol. Evol. 11, 4786–4802. doi: 10.1002/ece3.7381
Popović-Djordjević, J., Kostić, A.Ž., Kamiloglu, S., Tomas, M., Mićanović, N., and Capanoglu, E. (2023). Chemical composition, nutritional and health related properties of the medlar (Mespilus germanica L.): from medieval glory to underutilized fruit. Phytochem. Rev. 22, 1663–1690. doi: 10.1007/s11101-023-09883-y
Qin, Q., Dong, Y., Chen, J., Wang, B., Peng, W., Zhang, X., et al. (2025). Comparative analysis of chloroplast genomes reveals molecular evolution and phylogenetic relationships within the Papilionoideae of Fabaceae. BMC Plant Biol. 25, 157. doi: 10.1186/s12870-025-06138-0
Rocchetti, G., Senizza, B., Zengin, G., Mahomodally, M. F., Senkardes, I., Lobine, D., et al. (2020). Untargeted metabolomic profiling of three Crataegus species (hawthorn) and their in vitro biological activities. J. Sci. Food Agr 100, 1998–2006. doi: 10.1002/jsfa.10216
Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D. L., Darling, A., Höhna, S., et al. (2012). MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61, 539–542. doi: 10.1093/sysbio/sys029
Rozas, J., Ferrer-Mata, A., Sánchez-DelBarrio, J. C., Guirao-Rico, S., Librado, P., Ramos-Onsins, S. E., et al. (2017). DnaSP 6: DNA sequence polymorphism analysis of large data sets. Mol. Biol. Evol. 34, 3299–3302. doi: 10.1093/molbev/msx248
Sagbas, H. I., Ercisli, S., Aydin, M., Ilhan, E., Aydinyurt, R., Kasapoglu, A. G., et al. (2023). Evaluation of genetic diversity using iPBS-SCoT marker methods in native hawthorn genetic resources and species ıdentification by using DNA barcoding method. Cell Mol. Biol. 69, 43–55. doi: 10.14715/cmb/2023.69.10.6
Shahzadi, I., Abdullah, Mehmood, F., Ali, Z., Ahmed, I., and Mirza, B. (2020). Chloroplast genome sequences of Artemisia maritima and Artemisia absinthium: Comparative analyses, mutational hotspots in genus Artemisia and phylogeny in family Asteraceae. Genomics 112, 1454–1463. doi: 10.1016/j.ygeno.2019.08.016
Sheng, F., Chen, S. Y., Tian, J., Li, P., Qin, X., Wang, L., et al. (2017). Morphological and ISSR molecular markers reveal genetic diversity of wild hawthorns (Crataegus songorica K. Koch.) in Xinjiang, China. J. Integr. Agri 16, 2482–2495. doi: 10.1016/S2095-3119(17)61688-5
Shi, Y., Huang, J., Wan, X., Shi, J., Chen, Z., and Zeng, W. (2025). The population chloroplast genomes of Populus reveal the phylogenetic relationship between three new taxa of sect. Leucoides and their parents. BMC Genomics 26, 156. doi: 10.1186/s12864-024-11099-z
Shinozaki, K., Ohme, M., Tanaka, M., Wakasugi, T., Hayashida, N., Matsubayashi, T., et al. (1986). The complete nucleotide sequence of the tobacco chloroplast genome: its gene organization and expression. EMBO J. 5, 2043–2049. doi: 10.1002/j.1460-2075.1986.tb04464.x
Suchard, M. A., Lemey, P., Baele, G., Ayres, D. L., Drummond, A. J., and Rambaut, A. (2018). Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4, vey016. doi: 10.1093/ve/vey016
Talent, N., Eckenwalder, J. E., Lo, E., Ib Christensen, K., and Dickinson, T. A. (2008). (1847) Proposal to conserve the name Crataegus against Mespilus (Rosaceae). TAXON 57, 1007–1008. doi: 10.1002/tax.573042
Tassell, M. C., Kingston, R., Gilroy, D., Lehane, M., and Furey, A. (2010). Hawthorn (Crataegus spp.) in the treatment of cardiovascular disease. Pharmacogn Rev. 4, 32–41. doi: 10.4103/0973-7847.65324
Tillich, M., Lehwark, P., Pellizzer, T., Ulbricht-Jones, E. S., Fischer, A., Bock, R., et al. (2017). GeSeq - versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 45, W6–w11. doi: 10.1093/nar/gkx391
Tunç, Y., Yaman, M., Yılmaz, K. U., Khadivi, A., and Goodarzi, S. (2025). Elucidation of genetic diversity in hawthorn (Crataegus azarolus L.) accessions naturally growing in the Eastern Mediterranean region of Türkiye using morphological, pomological, and inter simple sequence repeat markers. BMC Plant Biol. 25, 43. doi: 10.1186/s12870-025-06065-0
Ufimov, R. A. and Dickinson, T. A. (2020). Infrageneric nomenclature adjustments in Crataegus L. (Maleae, Rosaceae). Phytologia 102, 177–199. Available online at: https://www.phytologia.org/uploads/2/3/4/2/23422706/102_3_177-199ufimov_and_dickinsoncratageus9-2-20.pdf
Vargas, O. M., Madriñán, S., and Simpson, B. (2023). Allopatric speciation is more prevalent than parapatric ecological divergence in a recent high-Andean diversification (Linochilus: Asteraceae). PeerJ 11, e15479. doi: 10.7717/peerj.15479
Wang, R. J., Cheng, C. L., Chang, C. C., Wu, C. L., Su, T. M., and Chaw, S. M. (2008). Dynamics and evolution of the inverted repeat-large single copy junctions in the chloroplast genomes of monocots. BMC Evol. Biol. 8, 36. doi: 10.1186/1471-2148-8-36
Wang, Y., Wang, S., Liu, Y., Yuan, Q., Sun, J., and Guo, L. (2021). Chloroplast genome variation and phylogenetic relationships of Atractylodes species. BMC Genomics 22, 103. doi: 10.1186/s12864-021-07394-8
Wang, J., Zou, Y., Mower, J. P., Reeve, W., and Wu, Z. (2024). Rethinking the mutation hypotheses of plant organellar DNA. Genomics Commun. 1, e003. doi: 10.48130/gcomm-0024-0003
Wolf, P. G., Rowe, C. A., Sinclair, R. B., and Hasebe, M. (2003). Complete nucleotide sequence of the chloroplast genome from a leptosporangiate fern, Adiantum capillus-veneris L. DNA Res. 10, 59–65. doi: 10.1093/dnares/10.2.59
Wu, L., Cui, Y., Wang, Q., Xu, Z., Wang, Y., Lin, Y., et al. (2021). Identification and phylogenetic analysis of five Crataegus species (Rosaceae) based on complete chloroplast genomes. Planta 254, 14. doi: 10.1007/s00425-021-03667-4
Wu, H., Li, D. Z., and Ma, P. F. (2024). Unprecedented variation pattern of plastid genomes and the potential role in adaptive evolution in Poales. BMC Biol. 22, 97. doi: 10.1186/s12915-024-01890-5
Wu, X., Luo, D., Zhang, Y., Yang, C., Crabbe, M. J. C., Zhang, T., et al. (2022). Comparative genomic and phylogenetic analysis of chloroplast genomes of hawthorn (Crataegus spp.) in Southwest China. Front. Genet. 13. doi: 10.3389/fgene.2022.900357
Xia, P., Huang, Y., and Zhu, J. (2025). The complete chloroplast genome sequences of 11 Panax species: Providing insights for evolution and species identification. Ind. Crop Prod 223, 120160. doi: 10.1016/j.indcrop.2024.120160
Xiang, C. Y., Gao, F., Jakovlić, I., Lei, H. P., Hu, Y., Zhang, H., et al. (2023). Using PhyloSuite for molecular phylogeny and tree-based analyses. iMeta 2, e87. doi: 10.1002/imt2.87
Xiang, Y., Huang, C. H., Hu, Y., Wen, J., Li, S., Yi, T., et al. (2017). Evolution of Rosaceae fruit types based on nuclear phylogeny in the context of geological times and genome duplication. Mol. Biol. Evol. 34, 262–281. doi: 10.1093/molbev/msw242
Xue, S., Shi, T., Luo, W., Ni, X., Iqbal, S., Ni, Z., et al. (2019). Comparative analysis of the complete chloroplast genome among Prunus mume, P. Armeniaca, and P. salicina. Hortic. Res. 6, 89. doi: 10.1038/s41438-019-0171-1
Yan, H. F., Liu, T. J., Yuan, X., Xu, Y., Zhang, S. Y., Hao, G., et al. (2025). Revisiting the phylogeny of Primulaceae s.l. using whole plastid genomes: Highlighting phylogenetic conflicts and their implications. J. Syst. Evol. 63, 788–802. doi: 10.1111/jse.13154
Yilmaz, K. U., Yanar, M., Ercisli, S., Sahiner, H., Taskin, T., and Zengin, Y. (2010). Genetic relationships among some hawthorn (Crataegus spp.) species and genotypes. Biochem. Genet. 48, 873–878. doi: 10.1007/s10528-010-9368-6
Yu, Y., Harris, A. J., Blair, C., and He, X. (2015). RASP (Reconstruct Ancestral State in Phylogenies): a tool for historical biogeography. Mol. Phylogenet Evol. 87, 46–49. doi: 10.1016/j.ympev.2015.03.008
Yuan, S., Nie, C., Jia, S., Liu, T., Zhao, J., Peng, J., et al. (2023). Complete chloroplast genomes of three wild perennial Hordeum species from Central Asia: genome structure, mutation hotspot, phylogenetic relationships, and comparative analysis. Front. Plant Sci. 14. doi: 10.3389/fpls.2023.1170004
Zhang, S. D., Jin, J. J., Chen, S. Y., Chase, M. W., Soltis, D. E., Li, H. T., et al. (2017). Diversification of Rosaceae since the Late Cretaceous based on plastid phylogenomics. New Phytol. 214, 1355–1367. doi: 10.1111/nph.14461
Zhang, X., Li, P., Wang, J., Fu, D., Zhao, B., Dong, W., et al. (2024). Comparative genomic and phylogenetic analyses of mitochondrial genomes of hawthorn (Crataegus spp.) in Northeast China. Int. J. Biol. Macromol 272, 132795. doi: 10.1016/j.ijbiomac.2024.132795
Zhang, Z., Li, J., Zhao, X. Q., Wang, J., Wong, G. K., and Yu, J. (2006). KaKs_Calculator: calculating Ka and Ks through model selection and model averaging. Genom Proteom Bioinf 4, 259–263. doi: 10.1016/s1672-0229(07)60007-2
Zhang, X., Sun, X., Cui, M., Zhao, B., Liu, Y., and Qi, H. (2025). Integrative metabolome and transcriptome analyses reveal the effects of plant growth regulator diethylaminoethyl hexanoate on flavonoid biosynthesis in hawthorn fruits. Food Qual Saf. 9, fyaf048. doi: 10.1093/fqsafe/fyaf048
Zhang, X., Sun, X., Li, T., Wang, J., Xue, M., Sun, C., et al. (2022). Chloroplast genomes and nuclear sequences reveal the interspecific relationships of Crataegus bretschneideri C. K. Schneid. and related species in China. Tree Geneti Genomes 18, 24. doi: 10.1007/s11295-022-01556-9
Zhao, H. and Feng, B. (1996). “China fruit-plant monograph of hawthorn (Crataegus),” in Flora. (China Forest Press, Beijing, China).
Keywords: biogeography, chloroplast genome, comparative genomics, Crataegus, divergence time, phylogenetic relationships
Citation: Sun X, Cui M, Zhao B, Wang Y, Zhang X and Liu Y (2026) Comparative genomic and phylogenetic analyses of Crataegus chloroplast genomes: insights for evolution and identification. Front. Plant Sci. 17:1767012. doi: 10.3389/fpls.2026.1767012
Received: 13 December 2025; Accepted: 27 January 2026; Revised: 22 January 2026;
Published: 11 February 2026.
Edited by:
Zhiqiang Wu, Chinese Academy of Agricultural Sciences, ChinaReviewed by:
Zinian Wu, Chinese Academy of Agricultural Sciences, ChinaCai Hanbing, Sichuan Agricultural University, China
Copyright © 2026 Sun, Cui, Zhao, Wang, Zhang and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Xiao Zhang, emhhbmd4aWFvODg2NkBzeWF1LmVkdS5jbg==; Yuexue Liu, eXVleHVlbGl1QHN5YXUuZWR1LmNu
Xinyu Sun1