On the Species Delimitation of the Maddenia Group of Prunus (Rosaceae): Evidence From Plastome and Nuclear Sequences and Morphology

The recognition, identification, and differentiation of closely related plant species present significant and notorious challenges to taxonomists. The Maddenia group of Prunus, which comprises four to seven species, is an example of a group in which species delimitation and phylogenetic reconstruction have been difficult, due to the lack of clear morphological distinctions, limited sampling, and low informativeness of molecular evidence. Thus, the precise number of species in the group and the relationships among them remain unclear. Here, we used genome skimming to generate the DNA sequence data for 22 samples, including 17 Maddenia individuals and five outgroups in Amygdaloideae of Rosaceae, from which we assembled the plastome and 446 single-copy nuclear (SCN) genes for each sample. The phylogenetic relationships of the Maddenia group were then reconstructed using both concatenated and coalescent-based methods. We also identified eight highly variable regions and detected simple sequence repeats (SSRs) and repeat sequences in the Maddenia species plastomes. The phylogenetic analysis based on the complete plastomes strongly supported three main subclades in the Maddenia group of Prunus, while five subclades were recognized based on the nuclear tree. The phylogenetic network analysis detected six hybridization events. Integrating the nuclear and morphological evidence, we proposed to recognize five species within the Maddenia group, i.e., Prunus fujianensis, P. himalayana, P. gongshanensis, P. hypoleuca, and P. hypoxantha. Within this group, the first three species are well-supported, while the gene flow occurring throughout the Maddenia group seems to be especially frequent between P. hypoleuca and P. hypoxantha, eroding the barrier between them. The phylogenetic trees based on eight concatenated hypervariable regions had a similar topology with the complete plastomes, showing their potential as molecular markers and effective barcodes for further phylogeographic studies on Maddenia.

The recognition, identification, and differentiation of closely related plant species present significant and notorious challenges to taxonomists. The Maddenia group of Prunus, which comprises four to seven species, is an example of a group in which species delimitation and phylogenetic reconstruction have been difficult, due to the lack of clear morphological distinctions, limited sampling, and low informativeness of molecular evidence. Thus, the precise number of species in the group and the relationships among them remain unclear. Here, we used genome skimming to generate the DNA sequence data for 22 samples, including 17 Maddenia individuals and five outgroups in Amygdaloideae of Rosaceae, from which we assembled the plastome and 446 single-copy nuclear (SCN) genes for each sample. The phylogenetic relationships of the Maddenia group were then reconstructed using both concatenated and coalescent-based methods. We also identified eight highly variable regions and detected simple sequence repeats (SSRs) and repeat sequences in the Maddenia species plastomes. The phylogenetic analysis based on the complete plastomes strongly supported three main subclades in the Maddenia group of Prunus, while five subclades were recognized based on the nuclear tree. The phylogenetic network analysis detected six hybridization events. Integrating the nuclear and morphological evidence, we proposed to recognize five species within the Maddenia group, i.e., Prunus fujianensis, P. himalayana, P. gongshanensis, P. hypoleuca, and P. hypoxantha. Within this group, the first three species are well-supported, while the gene flow occurring throughout the Maddenia group seems to be especially frequent between P. hypoleuca and P. hypoxantha, eroding the barrier between them. The phylogenetic trees based on eight concatenated hypervariable regions had a similar topology with the complete plastomes, showing their potential as molecular markers and effective barcodes for further phylogeographic studies on Maddenia.

INTRODUCTION
Prunus L. is a genus of more than 200 species, widely distributed in the temperate regions of the Northern Hemisphere and in the subtropics and tropics (Rehder, 1956;Yü et al., 1986;Lu et al., 2003;Hodel et al., 2021). Some taxa of Prunus (e.g., almonds, sweet cherries, peaches, and plums) are of significant economic value, and other species have also been used as ornamentals, timber, and medicine (Andro and Riffaud, 1995;Lee and Wen, 2001;Wen et al., 2008).
Maddenia Hook. f. & Thoms was established as a genus by Hooker and Thomson (1854) and was later merged with Prunus by Chin et al. (2010) based on the phylogenetic analyses of nuclear and plastid DNA sequences. This provided strong support for the monophyly of Maddenia but it was resolved as nested within Prunus; these conclusions have also been supported by subsequent studies (Chin et al., 2014;Zhao et al., 2016Zhao et al., , 2018Wang et al., 2021). The Maddenia group of Prunus is characterized by its simple deciduous leaves with a serrate margin, terminal racemose inflorescences, 10 undifferentiated perianth parts at maturity, and drupe fruits (Figure 1; Focke, 1894;Yü et al., 1986;Lu et al., 2003;Kalkman, 2004;Wang et al., 2021). The group includes about 4-7 species endemic to East Asia, mainly distributed in the temperate regions of the Himalayas and eastern China, with China as its center of diversity, and one species in Bhutan, Nepal, and Sikkim of India (Rehder, 1956;Yü et al., 1986;Lu et al., 2003;Chin et al., 2010;Wen and Shi, 2012).
Within the Maddenia group, Prunus himalayana Hook. f. & Thomson was the first species described, followed by six other putative species (i.e., P. hypoleuca Koehne, P. hypoxantha Koehne, P. wilsonii Koehne, P. fujianensis Y. T. Chang, P. incisoserrata T. T. Yü & T. C. Ku, and P. gongshanensis J. Hooker and Thomson, 1854;Koehne, 1911;Chang, 1985;Yü et al., 1985;Lu et al., 2003;Wen and Shi, 2012). The species in this group were originally described based on morphological traits, especially the abaxial leaf pubescence (Yü et al., 1986;Lu et al., 2003;Wen and Shi, 2012). For example, P. hypoxantha and P. wilsonii were considered as two separate species based on the denser pubescence on the veins in P. wilsonii, and the two were also differentiated based on the size of their winter bud scales (Yü et al., 1986;Lu et al., 2003). However, Wen and Shi (2012) noted a continuous variation in the leaf pubescence between P. hypoxantha and P. wilsonii, and therefore treated the latter as a synonym of P. hypoxantha. This treatment was also supported by Shi et al. (2013), based on pollen morphology. Furthermore, the relationships among P. fujianensis, P. hypoleuca, and P. incisoserrata are poorly understood (Chang, 1985;Yü et al., 1986;Wen and Shi, 2012). P. hypoleuca was described based on its abaxially glabrous leaves, while P. incisoserrata and P. fujianensis both have pubescent abaxial leaf surfaces (Lu et al., 2003). Additionally, P. incisoserrata and P. fujianensis were recognized by some workers based on their leaf margin morphology (incised doubly serrate in the former vs. margin incised irregular serrate in the latter; Lu et al., 2003). However, in previous observations, we found that there was a continuous variation in the degree of abaxial pubescence as P. hypoleuca also has abaxially pubescent leaf blades and that there was a broad variation on the margin shape of P. incisoserrata and P. fujianensis, which greatly increased the difficulty in identifying them. In the latest revision of the Maddenia clade, Wen and Shi (2012) treated P. fujianensis and P. incisoserrata as synonyms of P. hypoleuca and they also recognized the former variety P. himalaica var. glabrifolia as a distinct species, P. gongshanensis.
It has been challenging to identify these species due to the existence of intermediate morphological features in the Maddenia clade. Traditional morphological methods alone cannot meet the needs for the species delimitation of the Maddenia group. With the rapid development of the phylogenetic analysis of Prunus s.l., the relationships within the Prunus and the Maddenia group have attracted new attention [see Chin et al. (2014)]. Yet to date, interspecific relationships within the Maddenia group are still unclear due to the limited taxon sampling and phylogenetically informative sites included in previous studies (Wen et al., 2008;Chin et al., 2010Chin et al., , 2014Zhao et al., 2016Zhao et al., , 2018. Deoxyribonucleic acid (DNA) barcoding is an effective way to identify species by using a short DNA sequence (Kress et al., 2005;China Plant BOL Group, 2011;Li et al., 2015;Kress, 2017), however, DNA barcodes generally provide a limited number of informative sites among closely related taxa. As an alternative, genome skimming has been employed to generate complete chloroplast genomes (plastomes), an approach that has been dubbed as "super-barcoding" (Erickson et al., 2008;Yang et al., 2013;Li et al., 2015). The maternal inheritance and conservative genome structure of plastomes have rendered them essential markers in studying the evolutionary history of angiosperms (Gitzendanner et al., 2018;Do et al., 2020;Cai et al., 2021); noteworthy examples include the recent applications in Magnoliaceae , Rosaceae (Liu et al., 2019(Liu et al., , 2020a, and Vitaceae (Wen et al., 2018). However, the uniparental inheritance of plastomes limits their power to fully elucidate the evolutionary histories of lineages with reticulate evolution, which has been proved to be very common in Rosaceae (Liu et al., 2020a,b;Hodel et al., 2021). In a case study on Vitaceae, Liu et al. (2021) proposed a new method for obtaining single-copy nuclear (SCN) genes from deep genome skimming data (minimum 10× coverage for optimal performance), and this approach provided a good opportunity to infer phylogenetic relationships using the uniparentally inherited plastomes and the biparentally inherited nuclear genes. Additionally, with the rapid development of next-generation sequencing, it has been feasible to obtain genome skimming data efficiently and economically .
In this study, we assembled 22 plastomes and captured 446 SCN genes from seven assumed species of Maddenia and five outgroup species in Amygdaloideae of Rosaceae (Xiang et al., 2016;Zhang S. D. et al., 2017). We also examined their morphological and micromorphological characteristics. We identified simple sequence repeats (SSRs) and repeat sequences from the plastomes of Maddenia clade species. Additionally, eight highly variable regions were determined from the plastomes. We aim to test the hypotheses on species delimitations and resolve the interspecific relationships within Maddenia, integrating the plastome, nuclear, and morphological evidence. We also aim to provide potential molecular markers and effective barcodes for further population-level studies on the Maddenia group.

Sampling, DNA Extraction, and Sequencing
For this study, 22 individuals were sampled, including 17 ingroup individuals from the Maddenia group and five outgroup species from the other clades of the Rosaceae subfamily, Amygdaloideae (Table 1), which includes three other species of Prunus. The 17 ingroup samples represented the taxonomic and geographic coverage of Maddenia (Yü et al., 1986;Lu et al., 2003). Total genomic DNAs were extracted from 15 mg of silica gel dried leaves using the Cetyltrimethylammonium Bromide (CTAB) method (Doyle and Doyle, 1987 Plastid Genome and Nuclear Ribosomal DNA (nrDNA) Assembly, Annotation, Visualization, and Phylogenetic Inference The raw Illumina data were filtered for sequence quality using Trimmomatic v. 0.40 (Bolger et al., 2014) under default parameters. The filtered reads were assembled into plastome using the GetOrganelle pipeline (Jin et al., 2020). For a few accessions, Local Blast (Altschul et al., 1997) was used to align the contigs with the reference genomes (Prunus armeniaca (KY420025) and P. salicina (KY420002); Zhang S. D. et al., 2017;Zhang X. et al., 2017). Finally, we concatenated each contig based on the orientation of the reference genome and obtained the consensus sequences through Geneious v.11.0.2 (Kearse et al., 2012). We annotated the assembled chloroplast genomes using Plastid Genome Annotator (PGA: Qu et al., 2019) and made minor manual adjustments using Geneious v.11.0.2. The transfer RNA (tRNA) genes were checked using tRNAscan-SE v.2.0 (Lowe and Chen, 2016). The circular plastid genome diagram was generated using the online OGDRAW (Lohse et al., 2013). The newly generated plastome sequence data of Maddenia and the other species of Rosaceae from this study have been submitted to GenBank (Table 1).
To obtain high-quality nuclear ribosomal DNA (nrDNA), including the Internal Transcribed Spacer (ITS) 1, 5.8S, and ITS2, a modified reference-based and de novo method (Zhang et al., 2015;Liu et al., 2019Liu et al., , 2020a was employed for the assembly of the ITS sequences. The clean reads generated by Trimmomatic v. 0.40 (Bolger et al., 2014) were mapped to the reference sequence (Prunus hypoleuca: MH711078) using Bowtie2 v. 2.4.2 (Langmead and Salzberg, 2012), and then the draft sequence for each sample was generated. In addition, we conducted a de novo assembly using SPAdes v. 3.15.0 (Bankevich et al., 2012); the resulting scaffolds were used to correct the errors and ambiguities in the consensus sequences. Finally, we obtained high-quality nuclear ribosomal DNA (nrDNA) sequences for each sample using reference-based and de novo assembly methods.

Plastome Comparisons and Identification of Hypervariable Regions
Gene rearrangement events within the Maddenia clade were detected using the Mauve v2.4.0 (Darling et al., 2010) software. We chose one Maddenia sequence from each species for plastome comparisons, which were performed online using mVISTA in Shuffle-LAGAN mode (Frazer et al., 2004). The reference sequence used was P. wilsonii WX202.
To identify the hypervariable regions, we used 22 plastomes to conduct the sliding window analysis in DnaSP v5 (Librado and Rozas, 2009) using a step size of 200 bp and a window length of 600 bp. We chose the sequences with relatively higher values of nucleotide diversity (Pi) as the hypervariable regions. The Pi refers to the difference of the chloroplast genome sequences among sequenced samples.

Single-Copy Nuclear Marker Development, Gene Assembly, Alignment, and Phylogenetic Inference
As a part of the integrative systematic studies of Prunus, Hodel et al. (2021) identified 591 single-copy nuclear exons based on 17 transcriptomes of Prunus. Our genome skimming data were sequenced from the whole genomic DNA, which provided the opportunity to capture nuclear genes, including exon and intron sequences. We used three genomes of Prunus (P. dulcis (Mill.) D. A. Webb (https://www.ncbi.nlm.nih.gov/genome/ 10947), P. mume (Siebold) Sieb. et Zucc. (https://www.ncbi. nlm.nih.gov/genome/13911), and P. persica (L.) Batsch (https:// www.ncbi.nlm.nih.gov/genome/388) as references to discover the corresponding complete genes (introns and exons) for the 591 exons. The resulting nuclear genes were used as references in the following gene assembly.
For assembling the SCN genes, we followed the pipelines of Liu et al. (2021). Briefly, the adapters and low-quality reads were trimmed using Trimmomatic v. 0.40 (Bolger et al., 2014), and the results were quality-checked using FastQC v. 0.11.9 (Andrews, 2018). The resulting clean reads were counted to calculate the sequencing coverage, assuming the genome size (352.9 Mb: Shirasawa et al., 2017) of P. avium (L.) L. HybPiper pipeline v. 1.3.1 (Johnson et al., 2016), with the default settings, was used to target the SCN genes; BWA v. 0.7.1 (Li and Durbin, 2009) was used to align and distribute the reads to the target genes; SPAdes v. 3.15.0 (Bankevich et al., 2012), with a coverage cutoff value of 5, was used to assemble the reads to the contigs; and Exonerate v. 2.2.0 (Slater and Birney, 2005) was used to align the assembled contigs to the target sequences and determine the exon-intron boundaries. To balance the quality and quantity of the captured SCN genes from the uneven sequencing coverage of genome skimming data (cf . Table 1), we used a relatively lower coverage cutoff for generating the contigs in SPAdes v. 3.15.0. Python and R scripts included in the HybPiper pipeline (Johnson et al., 2016) were used to retrieve the recovered gene sequences, and to summarize and visualize the recovery efficiency.
The sequences in each SCN gene were aligned using MAFFT v. 7.475 (Nakamura et al., 2018) with the settings: "-localpair -maxiterate1000." Due to the variable sequencing depth in the genome skimming data, we employed three steps to remove the poorly aligned regions. In the first step, we used trimAL v. 1.2 (Capella-Gutiérrez et al., 2009) to trim the alignment of each SCN gene, in which all columns with gaps in more than 20% of the sequences or with a similarity score lower than 0.001 were removed. Considering the low-quality assembly in some regions, we used Spruceup (Borowiec, 2016) to discover, visualize, and remove the outlier sequences in the concatenated multiple sequence alignments with a window size of 50 and an overlap of 25. Because the Spruceup algorithm works better the more data it has, we concatenated all the SCN gene alignments using AMAS v. 1.0 (Borowiec, 2016) before running Spruceup, and we also used AMAS v. 1.0 (Borowiec, 2016) to split the processed/trimmed alignment back into single-locus alignments. The resulting alignments for each SCN gene were trimmed again using trimAL v. 1.2 (Capella-Gutiérrez et al., 2009) with the same parameters described above. At the third step, we excluded the sequences with <250 bp in each alignment using our customized python script (exclude_short_sequences.py), as the short sequences in each alignment have limited informative sites for the following coalescent-based species tree inference. Phylogenetic inference of the nuclear data of the Maddenia group was performed using both concatenated and coalescentbased methods. To reduce the effect of the missing data, gene alignments with at least 1,000 characters and 18 out of 22 taxa were retained. For the concatenation analysis, the bestfit partitioning schemes and nucleotide substitution models for the nuclear dataset were estimated using PartitionFinder2 (Stamatakis, 2006;Lanfear et al., 2016), under the corrected AICc and linked branch lengths, and with rcluster (Lanfear et al., 2014) algorithm options. The resulting scheme was then used to infer the ML trees using IQ-TREE 2 (Minh et al., 2020) and RAxML 8.2.12 (Stamatakis, 2014), respectively. To estimate the coalescent-based species tree, first, we inferred the individual ML gene trees using RAxML 8.2.12 (Stamatakis, 2014) with a GTRGAMMA model and 100 bootstrap replicates to assess the clade support, in which the low support branches (≤10) of gene trees were contracted by Newick Utilities (Junier and Zdobnov, 2010). The gene trees were then used to infer a species tree with ASTRAL-III v. 5.7.7  using local posterior probabilities (LPP; Sayyari and Mirarab, 2016) to assess clade support.

Retrieving Standard DNA Barcodes
To determine if standard DNA barcodes can resolve the interspecific relationships of Maddenia species, we extracted the gene sequences of matK, rbcL, and trnH-psbA from annotated plastomes, and then concatenated them into a single aligned dataset in Geneious v.11.0.2 (Kearse et al., 2012).

Phylogenetic Network Analyses
To explore the possibility of gene flow as a cause of discordance in the Maddenia group, we utilized 18 samples, including 17 Maddenia ingroups and one outgroup (Prunus davidiana (Carrière) Franch) for the phylogenetic network analyses. Species Networks applying Quartets (SNaQ: Solís-Lemus and Ané, 2016), as implemented in the Julia package PhyloNetworks (Solís-Lemus et al., 2017), was used to examine the contribution of the incomplete lineage sorting (ILS) and reticulation to the phylogenetic history of the Maddenia group. We used the ML tree inferred by RAxML for calculating the Concordance Factors (CFs), and the ASTRAL species tree was used as the input tree for SNaQ. We first tested the fit of the models, allowing from 0 to 8 reticulation events (h), and compared the models using their pseudolikelihood scores. For each number of hybrid nodes, we ran 50 SNaQ searches using the best topology from the previous run as a starting tree and retained the highest pseudolikelihood value. To distinguish the best fitting model, we used the log pseudolikelihood profile with h. A sharp improvement is expected until h reaches the best value and a slower, linear improvement thereafter. The best network was visualized in Julia using R.

Characterization of SSRs and Repeat Sequences in Plastomes
We searched for SSRs in seven Maddenia species using MISA (Thiel et al., 2003) with the settings at 10, 5, 4, 3, 3, 3 repeat units for mono-, di-, tri-, tetra-, penta-, and hexanucleotide SSRs, respectively. Tandem Repeat Finder (Benson, 1999) was used to analyze the tandem repeat sequences with the default parameters. One inverted repeat sequence was removed before detecting large repeat sequences. We employed REPuter (Kurtz et al., 2001) to identify the large repeat sequences, including forward, reverse, complement, and palindromic repeats. The minimal repeat size and Hamming distance were set at 30 bp and 3, respectively.

Morphological and Micromorphological Characteristics Detection
Images of mature leaves were taken with a Nikon SM225 Stereo microscope (Japan). To show the micromorphological traits, a scanning electron microscope (SEM) was used. The mature leaves were fixed in Formaldehyde-acetic acid-ethanol (FAA) (methanol: acetic acid: ethanol: water = 10:5:50:35), cut into small pieces, and washed in 70% alcohol. Then, they were dehydrated in an increasing alcohol series and iso-amyl acetate series. Afterward, the material was critical-point dried using liquid CO 2 with a K850 critical-point dryer (Quorum). The leaf pieces were then mounted on aluminum stubs and sputter-coated with gold using a JS-1600 sputter coater (HTCY). Photos were taken with a Hitachi S-3400 SEM (Hitachi, Tokyo, Japan).

Characteristics of Maddenia Plastomes
We used genome skimming to generate DNA sequence data for 22 samples, including seven Maddenia species (17 individuals) and five outgroup species. The size of the Maddenia plastomes ranged from 158,479 to 158,972 bp in length. The plastomes of all the Maddenia species had a quadripartite structure (Figure 2), including a large single-copy region (LSC, 86,939-87,405 bp), a small single-copy region (SSC,18,930 bp), and two inverted repeated regions (IRs, 26,292-26,363 bp) ( Table 2). The total Guanine-cytosine (GC) content of all the Maddenia plastomes was 36.6%, but the GC content in IRs (42.5-42.6%) was higher than that in LSC (34.4%) and SSC (30.4-30.5%). All the Maddenia plastomes encoded 113 unique genes, including 79 protein-coding genes (CDS), four ribosomal RNAs (rRNAs), and 30 tRNAs. In addition, 17 genes were duplicated in the IRs, of which 6, 4, and 7 encoded proteins, rRNAs, and tRNAs, respectively ( Table 2). In Maddenia plastomes, 14 unique genes had introns, of which two (ycf3 and clpP) had two introns (Supplementary Table 1). The genome size, GC content, gene number, and order in all the Maddenia plastomes were relatively conserved in comparison to the outgroups ( Table 2).

Plastome Comparisons
Overall, Maddenia plastomes showed high sequence similarity, and non-coding regions had more divergence than coding regions (Figure 3). In the Mauve analysis, no rearrangement event was detected among the Maddenia plastomes (Supplementary Figure 1).
Nucleotide substitution and sequence distance were used to compare the difference of plastomes between the seven Maddenia species. Across all individuals, the number of nucleotide substitutions was 0-266 bp and the pairwise sequence distance percentage among the whole plastome sequences was 0-0.00169. The sequence differences between P. fujianensis, P. himalayana, and P. gongshanensis were much higher than those between P. hypoleuca, P. hypoxantha, P. incisoserrata, and P. wilsonii (Table 3).

IRs Expansion and Contraction
Given that there were no significant differences among the Maddenia plastomes (Supplementary Figures 1, 2), P. wilsonii WX202 in Maddenia was chosen to conduct border comparisons. Six Rosaceae species, i.e., Physocarpus amurensis (Maxim.) Maxim. WX230 and Prinsepia uniflora Batalin WX231 from Amygdaloideae; Rosa multiflora Thunb. (NC_039989) and Fragaria vesca L. (NC_015206) from Rosoideae; Dryas octopetala var. Asiatica (Nakai) Nakai (KY420029) and Purshia tridentata (Pursh) DC. (KY420000) from Dryadoideae, were compared to P. wilsonii WX202. Variation was detected in the expansion and contraction of IR regions (Figure 4). The LSC/IRb borders of Amygdaloideae species and Purshia tridentatawere located in the rps19 gene, which extended 81-134 bp into the IRb. In Rosoideae species, the LSC/IRb borders were in the intergenic spacers, and the intact rps19 gene in the LSC contracted 12-13 bp from the LSC/IRb border. In addition, the SSC/IRs borders were in the ndhF/ψycf1 and ycf1 genes except for the Rosa multiflora (only in ycf1 gene). The IRb/SSC border was located in the pseudogene ycf1 for Fragaria vesca and Dryas octopetala var. asiatica, and between the pseudogene ycf1 and ndhF gene for Rosa multiflora. The ndhF gene extended 9-29 bp into the IRb in the Amygdaloideae species and Purshia tridentata but was completely located in the SSC for Rosoideae species and Dryas octopetala var. asiatica. The SSC/IRa border was located in the ycf1 gene across the Rosaceae species. The gene trnH in the LSC contracted 3-324 bp from the border region of IRa/LSC.

Identification of Hypervariable Regions
The Pi values were used to determine hypervariable regions. The result showed that the Pi values in IRs were less than those in LSC and SSC. We chose the regions with relatively higher Pi values as hypervariable regions. A total of eight hypervariable regions were identified, including seven intergenic spacer regions (trnS-trnG, trnR-atpA, trnC-petN, trnT-trnL, ndhC-trnV, ndhF-rpl32, and rpl32-trnL) and one protein-coding region (ycf1). These sequences were all located in two single-copy regions and none in IR regions (Figure 5). The Pi value of eight hypervariable regions ranged from 0.01619-0.03251 (Table 4).

The number in braces means duplicated gene number in the IR region.
Frontiers in Plant Science | www.frontiersin.org FIGURE 3 | Visualization of alignment of the seven Maddenia chloroplast genome sequences using mVISTA. P. wilsonii WX202 was used as a reference sequence. Blue represents coding regions, and pink represents non-coding regions.
showed the most SSRs, followed by P. hypoxantha, P. wilsonii, P. hypoleuca, P. incisoserrata, and P. himalayana, while the SSR numbers of P. gongshanensis were the least ( Figure 6B). There were many SSR motif types in the Maddenia plastomes, but most of them had few SSRs and only two types (A/T and AT/TA) contained more SSRs (Supplementary Figure 3A). The lengths of SSRs ranged from 10 to 24 bp ( Figure 6C).
In Maddenia plastomes, the repeat sequences included forward, reverse, palindromic, complement, and tandem repeats. P. incisoserrata contained the most repeat sequences, P. himalayana had the least, and P. gongshanensis had no tandem repeats ( Figure 6D). The most common type of repeat sequences by length was 30-34 bp ( Figure 6A).
Most of the SSRs and repeat sequences were located in the LSC, followed by the SSC and IRs (Figure 6E). In addition, repeat sequences were mainly distributed in the intergenic spacers (IGS), but some were also found in the CDS and intron regions (Supplementary Figure 3B).

Phylogenetic Analysis
To resolve the phylogenetic relationship of the Maddenia clade, different trees were reconstructed based on complete plastomes and SCN genes. For plastome data, all the trees had identical topology except SSC and IR (Figure 7A and Supplementary Figure 4). In the rest of this section, the tree based on complete plastomes will be used to discuss the phylogenetic relationships of the Maddenia group, which was monophyletic and was separated into three subclades with high support values. Subclade I only include one species (P. fujianensis) from Fujian Province of eastern China. Subclade II is sister to subclade III, and together they are both sister to subclade I with a posterior probability of 1.00 Subclade II   consists of P. gongshanensis from Yunnan Province (China) and P. himalayana from Tibet (China) and the adjacent Himalayan region. For the individuals sampled, the two species are reciprocally monophyletic. Subclade III is composed of four former species with a posterior probability of 1.00 and bootstrap value of 100%, in which samples from the same geographical position were grouped, although the four species were each not clearly identified.
To save costs of sequencing for further investigations on Maddenia, we also tried to explore four standard DNA barcodes to identify Maddenia species. Concatenated rbcL, matK, and trnH-psbA and ITS datasets were used to construct the phylogenetic trees of Maddenia, respectively. The trees constructed by the standard DNA barcodes were not congruent with those reconstructed using the complete plastomes (Figures 7C,D). In the phylogenic tree based on concatenated rbcL, matK, and trnH-psbA, although the Maddenia species formed a clade, P. fujianensis was sister to subclade III rather than to the remaining Maddenia species. In addition, the two P. himalayana individuals did not group, and subclade III exhibited more polytomies than the tree based on complete plastomes. For the tree based on ITS sequences, there were many polytomies and the interspecific relationships within Maddenia were poorly resolved.
Eight concatenated hypervariable regions ("specific barcodes"; see Discussion) were also employed to reconstruct the phylogenetic relationship of the Maddenia clade. We found that the topology based on the hypervariable regions was similar to that of complete plastomes, though there were lower support values at some branches ( Figure 7B).
The recovery efficiency of each SCN gene is shown in Figure 8. The quality of the nuclear genes recovered was relatively high. In total, we got 446 SCN genes from raw data. We also filtered out genes with <80% samples, leaving  413 SCN genes with more than 600 bp in length. For the tree generated by 446 SCN genes data, the Maddenia clade was monophyletic but its deep nodes were not resolved well (Figure 9). It was obvious that there were five subclades in the Maddenia internal clade according to their geographic positions (e.g., Fujian Province, Yunnan Province, Tibet, Sichuan Province, and Qingling Mountain). Subclades A, B, and C comprise P. fujianensis, P. gongshanensis, and P. himalayana, respectively. The monophyly of each of the three species was well-supported, which was congruent with that of 413 SCN genes (Supplementary Figure 5). Subclade D consists of samples of P. hypoxantha and one individual of P. wilsonii. Subclade E contains P. hypoleuca, P. incisoserrata, and two individuals of P. wilsonii. However, these four species are not identified clearly.

Morphological and Micromorphological Traits
The mature leaves of the Maddenia species are green to deep green adaxially. The shapes of the leaves and leaf bases are multiple ( Figure 11A 1−7 ). The leaf margins are serrulate, irregularly serrate, or doubly serrate. A few glandular teeth were found at the leaf bases of P. hypoleuca, P. incisoserrata, P. wilsonii, P. hypoxantha, and P. fujianensis ( Figure 11B 1−5 ), while many glandular teeth grow at the lower margins of P. gongshanensis (Figure 11A 6 ). For P. himalayana, leaf margins have less glandular on the foliage branch while, in the reproductive branch, the glandular teeth are distributed abundantly near the bases (Figure 11B 7 ). The most notable morphological character differentiating species in Maddenia is the hair distribution on the abaxial leaf surface. In P. fujianensis, few hairs were found at the axils between midvein and secondary veins, and there is no hair on the intercostal area (Figure 11C 1 ,D 1 ). The intercostal area of P. hypoleuca and P. incisoserrata are also glabrous, but the distribution pattern of hairs at the axils shows high diversity (Figure 11C 2 , 3 ,D 2 , 3 ). There were no hairs or only a few hairs growing on the bases of secondary veins, or there was a cluster of hairs growing on the axil. Such three situations could be found on a single blade of the leaf. In P. wilsonii and P. hypoxantha hairs grow all along the veins, but the hairs are present on veinlets in the intercostal area of P. wilsonii, which distinguishes it from P. hypoxantha (Figure 11C 4 , 5 ,D 4 , 5 ). Hairs were also observed at the axils of P. gongshanensis, and sometimes there are a few hairs on the midvein at the base (Figure 11C 6 ,D 6 ). Leaves of P. himalayana are densely pubescent on the abaxial side (Figure 11C 7 ,D 7 ).
In all seven species, stomata are found only on the abaxial surface, and each of them consists of a pair of guard cells encircled by several other cells ( Figure 11E 1−7 ). Distinct circular ornamentations were found on the cell wall of guard cells in P. hypoleuca and P. incisoserrata (Figure 11E 2 , 3 ). In the other species, such ornamentations are relatively obscure or nearly inexistent (Figure 11E 1 , 4−7 ).

Comparative Plastomes of Maddenia
All sequenced Maddenia plastomes share a typical quadripartite structure, which is similar to most photosynthetic angiosperms (Jansen and Ruhlman, 2012;Abdullah et al., 2019;Xu et al., 2019). However, the loss of one complete IR region also occurred in some taxa, such as the inverted-repeat-lacking clade of Fabaceae (Wang et al., 2017), Erodium of Geraniaceae (Guisinger et al., 2011), and Carnegiea of Cactaceae (Sanderson et al., 2015). In addition, the GC content in the IRs was higher than that in LSC and SSC, which is due to the presence of rRNA genes with high GC content (Kim and Lee, 2004). The conserved genome size, GC content, and gene number of Maddenia plastomes resemble other Amygdaloideae species (Wang et al., 2013;Kim et al., 2018). Although gene rearrangement events have been reported in some genera of other families, such as Lasthenia of Asteraceae (Walker et al., 2014), Anemone of Ranunculaceae , and Passiflora of Passifloraceae (Rabah et al., 2018), we observed no such events in Maddenia plastomes (Supplementary Figure 1).
The expansion and contraction of the IR region have an impact on the plastome size to some extent (Jansen and Ruhlman, 2012). Expansion events caused several genes in SC regions to enter the IR region. However, small IR expansions and contractions have a much higher frequency than large ones in seed plants (Goulding et al., 1996;Downie and Jansen, 2015). For Maddenia, no significant variation and slight IR expansions and contractions exist in every border of plastomes (Supplementary Figure 2), demonstrating their conserved traits. Nevertheless, compared with other Rosaceae species, we observe that Rosoideae plastomes have a partial rps19 gene in the LSC region but an intact rps19 gene in the LSC of Amygdaloideae. Variations in the location of the rps19 gene were also documented in other Rosaceae species (e.g., Wang et al., 2013;Kim et al., 2018). However, our results indicate that the location of the rps19 gene is not useful in distinguishing the three subfamilies, since the two Dryadoideae plastomes we analyzed showed two different locations of the rps19 gene: one matching Amygdaoideae and the other matching a member of Rosoideae.
The SSRs are effective molecular markers to population genetic and phylogenetic studies in plants (Powell et al., 1995;Doorduin et al., 2011;Zhang X. et al., 2017;Sun et al., 2020). A total of 558 SSRs were identified in seven Maddenia species (Figure 6). The SSRs motif type (A/T) was quite common and most of SSRs were located in the intergenic spacers, which were similar to other Rosaceae species (Wang et al., 2013). In addition, previous studies supported that the region rich in A/T had the most repeats and indels (Cai et al., 2008). Thus, Maddenia SSRs could be further utilized for population genetics research in the future.

Phylogenetic Analyses and Implications for Species Delimitation
Our results provided strong support for the monophyly of the Maddenia clade based on both plastome and nuclear datasets. The phylogenetic trees based on complete plastomes and nuclear datasets divided Maddenia into three and five major subclades, respectively. Subclade I in the plastome tree (= subclade A in the nuclear tree) included one species (P. fujianensis) only distributed at Wuyi Mountain of Fujian Province in Southeast of China. Maddenia fujianensis has been treated as a synonym of P. hypoleuca, for the differentiating characters between them are continuous (Wen and Shi, 2012). P. fujianensis was, however, sister to all remaining species of the Maddenia group based on plastome data. Even though this relationship was not congruent with that of SCN genes (Figure 9), the monophyly of P. fujianensis was well-supported. The SNaQ analyses suggested one hybridization event between P. fujianensis JR302 and P. incisoserrata JR334 (Figure 10). Therefore, it is likely that P. fujianensis may be a cryptic species, even though it is morphologically and micromorphologically similar to P. hypoleuca (Figure 11). More attention should be focused on the origin of P. fujianensis in our future studies, sampling P. hypoleuca broadly across its entire distribution range.   Subclade II in the plastome tree consists of P. gongshanensis and P. himalayana (= subclades B and C, respectively, in the nuclear tree) distributed in Yunnan Province (Northwest Hengduan Mountains) and Southeast Tibet, respectively. P. gongshanensis is characterized by subcordate to cordate leaf bases and glabrous leaf surfaces. P. himalayana stands out by its abaxially dense pubescent leaf blade. Although these two species can be identified based on both molecular, morphological, and micromorphological evidence, one hybridization event between them was detected in the SNaQ analyses (Figures 10, 11).
Subclade III in the plastome tree (comprising subclades D and E in the nuclear tree) is composed of the remaining four species, but their relationships were not resolved. Interestingly, most different species from the same geographical area were grouped, such as P. hypoxantha JR426 and P. wilsonii WX202 from the Emei Mountain of Sichuan Province and P. hypoxantha JR372 from Kangding of Sichuan Province, and all others from the Qinling Mountain region. Meanwhile, the nuclear tree showed that these four species were divided into two groups according to their geographical distribution. Therefore, we assume that Maddenia subclade III might represent two species, i.e. P. hypoleuca and P. hypoxantha, which is congruent with the treatment of Wen and Shi (2012). P. incisoserrata may best be merged with P. hypoleuca because they cannot be reliably distinguished by either molecular or morphological evidence (Figure 11). Although P. hypoxantha and P. wilsonii can be identified, to some extent, by the distribution of pubescence on the abaxial leaf surface (i.e., pubescence only on veins vs. denser pubescence on veins (Figure 11), we conclude that the latter should be treated as a synonym of the former due to the unsolved relationship between them in the various phylogenetic trees based on our results. Moreover, the sequence differences among species of subclade III of chloroplast tree are minimal ( Table 3). Gene flow might be widespread within these two species as detected by the SNaQ analyses (Figure 10). Future studies should aim to explore the speciation history of subclade III using a broader populational sampling scheme.

On Specific Barcoding of Maddenia
Considering the limitations of the standard DNA barcodes and the higher cost of super barcoding, an alternative approach known as "specific barcoding" was proposed, which combined the advantages of the other two (Li et al., 2015). Specific barcoding uses the sequences in target plastomes with high mutation rates. Compared to standard DNA barcodes, specific barcoding is more applicable for differentiation among closely related taxa (Li et al., 2015). We detected eight hypervariable regions among 22 individual plastomes, most of which are located in intergenic spacers (Figure 5). This result as well as those from mVISTA, SSRs, and repeat sequence analyses support that the intergenic spacers harbor the highest levels of variation in plastomes. High variability regions in the intergenic spacer have been reported in other studies and shown excellent discriminating ability, such as Echinacea of Asteraceae , Rhodiola of Crassulaceae , and Pulsatilla of Ranunculaceae (Li et al., 2020). Therefore, developing specific barcodes in the intergenic spacers is well-founded and should provide a reliable approach to assess the phylogenetic relationships and identification among Maddenia species.
The tree estimated from the specific barcoding regions ( Figure 7B) had a similar topology with that of the complete chloroplast genomes. However, the sister relationship between subclades II and III was relatively weakly supported (posterior probability of 0.65 and bootstrap support of 58%). At the same time, the gene flow between the species of subclade III within Maddenia is active. The utility of these barcodes from chloroplast genomes would be limited for this group. More sampling in population-scale and high-throughput sequencing nuclear data such as RAD or whole-genome resequencing are needed to further explore the relationships of the species of subclade III.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article.

AUTHOR CONTRIBUTIONS
LZ and JW planned and designed the research. NS, B-BL, J-RW, CR, and R-CT performed the experiments and analyzed the data. NS, B-BL, J-RW, R-CT, CR, Z-YC, LZ, DP, and JW wrote the manuscript. All authors approved the final manuscript.

FUNDING
This project was supported by the National Natural Science Foundation of China (Nos. 32170381, 31770200, 32000163, and 31300158) and the Chinese Universities Scientific Fund (No. 2452020179).

ACKNOWLEDGMENTS
We are very grateful to Fuzhen Guo, Xiaohua He, Minrong Luo, Ningjuan Fan, and Guoyun Zhang of Northwest A&F University for their assistance with the SEM and LM. We sincerely thank Prof. Rong Li for the sample collection and Prof. Zhong-hu Li for his help with data analysis.