A Dynamic Tandem Repeat in Monocotyledons Inferred from a Comparative Analysis of Chloroplast Genomes in Melanthiaceae

Chloroplast genomes (cpDNA) are highly valuable resources for evolutionary studies of angiosperms, since they are highly conserved, are small in size, and play critical roles in plants. Slipped-strand mispairing (SSM) was assumed to be a mechanism for generating repeat units in cpDNA. However, research on the employment of different small repeated sequences through SSM events, which may induce the accumulation of distinct types of repeats within the same region in cpDNA, has not been documented. Here, we sequenced two chloroplast genomes from the endemic species Heloniopsis tubiflora (Korea) and Xerophyllum tenax (USA) to cover the gap between molecular data and explore “hot spots” for genomic events in Melanthiaceae. Comparative analysis of 23 complete cpDNA sequences revealed that there were different stages of deletion in the rps16 region across the Melanthiaceae. Based on the partial or complete loss of rps16 gene in cpDNA, we have firstly reported potential molecular markers for recognizing two sections (Veratrum and Fuscoveratrum) of Veratrum. Melathiaceae exhibits a significant change in the junction between large single copy and inverted repeat regions, ranging from trnH_GUG to a part of rps3. Our results show an accumulation of tandem repeats in the rpl23-ycf2 regions of cpDNAs. Small conserved sequences exist and flank tandem repeats in further observation of this region across most of the examined taxa of Liliales. Therefore, we propose three scenarios in which different small repeated sequences were used during SSM events to generate newly distinct types of repeats. Occasionally, prior to the SSM process, point mutation event and double strand break repair occurred and induced the formation of initial repeat units which are indispensable in the SSM process. SSM may have likely occurred more frequently for short repeats than for long repeat sequences in tribe Parideae (Melanthiaceae, Liliales). Collectively, these findings add new evidence of dynamic results from SSM in chloroplast genomes which can be useful for further evolutionary studies in angiosperms. Additionally, genomics events in cpDNA are potential resources for mining molecular markers in Liliales.


INTRODUCTION
Chloroplast genome sequences provide useful information for phylogenetic studies of higher level taxa, including families and orders (Zomlefer et al., 2001;Ji et al., 2006;Barrett et al., 2013;Kim et al., , 2016aNguyen et al., 2013;Ruhfel et al., 2014). Structural changes such as small and large inversions, gene contents (duplication, triplication, and deletion), and pseudogenization have provided valuable resources for examining genome evolution among plants. Gene duplications have been reported in previous studies (Lee et al., 2007;Cai et al., 2008;Schmickl et al., 2009). Specifically, different copies of trnF_GAA were found in several genera of Brassicaceae (Schmickl et al., 2009). Additionally, repeated DNA sequences, which were assumed to have originated from different mechanisms such as gene conversion, unequal recombination, and slipped-strand mispairing (SSM), are main resources for genomic events of duplication, deletion, and rearrangement in chloroplast genomes (Levinson and Gutman, 1987;Cai et al., 2008;Huang et al., 2014;Sveinsson and Cronk, 2014).
Melanthiaceae is a family within the order Liliales that includes 16 genera divided into five tribes: Melanthieae (7 genera), Heloniadeae (3 genera), Parideae (3 genera), Chionographideae (2 genera), and Xerophylleae (1 genus) (Angiosperm Phylogeny Group, 2009Govaerts, 2016;WCSP, 2016). Prior to its grouping within Liliales, these genera were classified into different orders of Dioscoreales and Melanthiales based on the morphological characteristics of their extrorse anthers and ovaries, and often with the presence of three styles . The tribe Parideae, comprising Paris, Psedotrillium, and Trillium, was formerly treated as an independent family, Trilliaceae (Thorne, 1992;Takhtajan, 1997). However, based on molecular and morphological data, this tribe was later reclassified as monophyletic within Liliales (Chase et al., 2000;Angiosperm Phylogeny Group, 2009;. Recently, Pellicer et al. (2014) identified the extreme variations in genome size and a significant reduction in the number of chromosomes in Parideae. The evolution of the chloroplast genome (cpDNA) has been investigated in Veratrum patulum (Do et al., 2013), Chionographis japonica (Bodin et al., 2013), Paris verticillata (Do et al., 2014), Trillium species (Kim et al., 2016b), and Paris sp. (Huang et al., 2016), which represent three tribes, Melanthieae, Chionographideae, and Parideae, respectively. Specifically, different numbers of trnI_CAU and repeat sequences in rpl23-ycf2 regions and inversion was detected in tribe Parideae (Do et al., 2014;Huang et al., 2016;Kim et al., 2016b). The rps16 gene was completely lost in C. japoncica and partially deleted in V. patulum (Bodin et al., 2013;Do et al., 2014). Collectively, these findings suggest that Melanthiaceae possess evidence of different genomic events in cpDNA. Nonetheless, these genomic events in Melanthiaceae have not been fully characterized because of the lack of cpDNA data.
In this study, we sequenced the complete chloroplast genomes of Heloniopsis tubiflora (GenBank Accession number KM078036) and Xerophyllum tenax (GenBank Accession number KM078035), representing the two unreported tribes of Heloniadeae and Xerophylleae, to cover the gap of cpDNA data within Melanthiaceae. Based on the complete cpDNA sequences, we characterized the differentiation, including gene loss, duplication, and fluctuation of IR-LSC boundary, among five tribes of Melanthiaceae. Then, we applied these features to create the first potential molecular marker for recognizing two sections of Veratrum. Additionally, we questioned the pattern and the mechanism of repeat's accumulation in rpl23-ycf2 regions. Therefore, we sequenced this region among representatives from other families in Liliales and conducted comparative analyses of sequence data to (1) investigate the pattern of repeat's accumulation within rpl23-ycf2 regions of examined species, and (2) propose hypothetical scenarios for the duplication process.

MATERIALS AND METHODS
Sample Collection, DNA Extraction, Whole-Genome Sequencing, and Assembly Fresh leaves of H. tubiflora were collected in Deogyusan National Park, South Korea. Voucher specimens were deposited in the Herbarium of Gachon University (GCU). Dried leaves of X. tenax were obtained from the Forestfarm Plant Nursery (Williams, Oregon, USA). The plant materials used in this study were collected through the KNRRC (Medicinal Plants Resources Bank NRF-2010-0005790), supported by the Korea Research Foundation (resources provided by the Ministry of Education, Science and Technology in 2014). Total DNA was extracted using a DNAEasy Plant Mini Kit (Qiagen, Seoul, South Korea). These DNA samples were sequenced using the 454 system for H. tubiflora and the Hiseq2000 system for X. tenax. After removal of reads with ambiguous "N" bases, the remaining reads were trimmed with no more than a 5% chance of error per base before being mapped to the reference chloroplast genome sequences of C. japonica (Bodin et al., 2013) and P. verticillata (Do et al., 2014), to isolate chloroplast genome sequences using Geneious (Biomatters Ltd., Auckland, New Zealand). Based on the tribal relationship in Melanthiaceae (Kim et al., 2016a), we mapped the reads of H. tubiflora and X. tenax to cpDNA sequences of C. japonica and P. verticillata, respectively. The assembled reads were then extracted and reanalyzed in Geneious using the De Novo Assembly tool with the option of "no gaps or mismatches per read." The consensus sequences generated from De Novo Assembly were used as references to reassemble raw reads. These steps were repeated until the complete cpDNA sequences were identified. Occasionally, gaps were present among chloroplast contigs. These remaining gaps were closed using the Sanger method with newly designed primers based on homologous sequences between the reads and reference sequences. Additionally, borders between the LSC, small single copy (SSC), and IR regions, as well as ambiguous regions (i.e., insertion and deletion events in coding regions and low coverage regions) were confirmed by Sanger sequencing methods. A total of 1,093,684 and 8,719,277 reads of H. tubiflora and X. tenax were generated, respectively. The results showed that cpDNA of H. tubiflora consisted of 37,973 (3.47%) out of 1,093,684 reads with a coverage rate of 19.2 x. For X. tenax, 196,299 reads (2.25%) belonged to chloroplast genome sequences with a coverage rate of 112.6 × over the cpDNA. The complete cpDNA sequences of X. tenax and H. tubiflora were deposited into GenBank under accession numbers KM078035 and KM078036, respectively (Table 1).

Genome Annotation, Comparison, Visualization, and Characterization of Repeat Sequences
The complete cpDNA sequences of H. tubiflora and X. tenax were annotated using Geneious. All tRNA sequences were confirmed using the online web-based tool tRNAScan-SE (Schattner et al., 2005). The Mauve alignment, embedded in Geneious, was used to compare the 23 complete cpDNA sequences and to identify significant differentiation, including gene loss, duplication, and fluctuation of IR-LSC boundary with default settings (Darling et al., 2004). Genome maps were generated using OGDraw v1.2 (Lohse et al., 2013), followed by manual modifications. The maps of cpDNA sequences of X. tenax and H. tubiflora, which illustrate the genome structure and gene composition and order, were shown in Figure 1. The locations of repeat sequences were identified using Phobos (Mayer, 2006) with default settings.

Characterization of rps16 Loss and Junction of IR-LSC Regions
After conducting alignment of complete cpDNA sequences using Geneious, we designed primer pairs for amplifying trnK_UUU-trnQ_UUG region, containing rps16 (data not shown). Because of different PCR results of verifying rps16 loss in tribe Melanthieae, primer pairs (rps16-F: 5 ′ -GTCAATATGAATGTTGATAA-3 ′ and rps16-R: 5 ′ -TTTTCTATTCCATACACATG-3 ′ ) were designed using Primer3 (Untergasser et al., 2012). The PCR profile consisted of denaturation at 94 • C for 3 min followed by 35 cycles at 94 • C for 1 min, 54 • C for 1 min, and 72 • C for 1 min, with a final extension at 72 • C for 7 min. These newly designed primer pairs were applied for Veratrum species of two sections including Veratrum (V. oxysepalum-JK Hong 015; V. lobelianum-Chase 19618; V. grandiflorum-KUN0303565) and Fuscoveratrum (V. versicolor-GCU1411788; V. maackii-KWU03358; V. nigrum-GCU121205181). All PCR products were purified using the MEGAquick-spin Total Fragment DNA Purification Kit (iNtRON Biotechnology, Seoul, Korea) and sequenced using the BigDye Terminator Cycle Sequencing Kit (Applied Biosystems, Foster City, CA, USA) according to the manufacturer's instructions. These sequences were assembled and annotated in Geneious.
For identifying junction of IR/LSC regions, we designed primer pairs for rpl2-psbA and rpl2-rps3 regions (data not shown). The PCR products were sequenced and then annotated in Geneious. After getting annotation results, these sequences were aligned to identify the border among examined species.

Confirmation of Duplication Events in Liliales
To confirm the duplication patterns in Liliales, we sampled 68 taxa of 9 families within Liliales, except Corsiaceae which contains mycoheterotrophic species ( Table 2). Total genomic DNA was extracted from dried leaves using a modified CTAB method based on Doyle and Doyle (1987). Primer pairs that amplify the entire rpl23-ycf2 IGS were applied with the same PCR profile from the description of Kim et al. (2016b). The further steps for PCR product's treatment were conducted identically as the description above.

Features of cpDNA among Five Families of Liliales and Five Tribes of Melanthiaceae
The lengths of circular double-stranded DNA molecules differed among the families, ranging from 152,793 bp (Lilium longiflorum, Liliaceae) to 158,451 bp (Paris luquanensis, Melanthiaceae, Table 1). In Melanthiaceae, V. patulum (tribe Melanthieae) possesses the smallest cpDNA whereas the biggest cpDNA belongs to Paris species (tribe Parideae, Table 1). Although, the lengths varied, the AT and GC contents were relatively stable among the observed taxa ( Table 1). Most of the examined species have 81 protein-coding genes, 30 tRNAs and 4 rRNAs in cpDNA sequences. However, there were 80 protein-coding genes in C. japonica, Colchicum autumnale, and Alstroemeria aurea because of the complete loss of rps16 in C. japonica and the deletion of whole region of ycf15 in C. autumnale and A. aurea. Comparative genomic analysis among five representative taxa of Melanthiaceae revealed that the deletion of rps16 was only found in tribes Chionographideae and Melanthieae. Further investigation on rps16 among genera of tribe Melanthieae revealed that the loss of rps16 was not common and only found in Veratrum, Toxicoscordion, and Schoenocaulon (Figure 2). Specifically, exon 1 of rps16 was deleted in Schoenocaulon whereas a part of exon 2 of this gene (47 bp) remained in Toxicoscordion. In contrast to complete loss of rps16 in C. japonica, only exon 2 of this gene was deleted in V. patulum, suggesting that there were different stages of this event in Veratrum genus which was divided into two sections: Veratrum and Fuscoveratrum. To track the loss of rps16, one primer pairs which covered the whole coding regions of rps16 was designed and applied for Veratrum species ( Figure 3A). As expected, the PCR results revealed that there were two types of deletion of rps16 in Veratrum ( Figure 3B). The first type was found in section Veratrum of which exon 2 of rps16 was lost. Remaining of exon 1 of rps16 resulted in a PCR product of ∼1.5 kb in three examined taxa of section Veratrum (V. oxysepalum, V. lobelianum, and V. grandiflorum; Figures 3A,B). In contrast, the deletion of whole coding region of rps16 in section Fuscoveratrum caused a 400 bp-PCR product in V. versicolor, V. maackii, and V. nigrum (Figures 3A,B). In Liliaceae, Smilacaceae, Altroemeriaceae, and Colchicaceae, the intact coding sequence of rps16 was found (Table 1, Figure 2).
Sequences flanking the LSC/IR junction were compared between other taxa of the Liliales (Figure 4). The IR/LSC borders varied among the taxa. Specifically, the IR/LSC borders located in coding region of rps19 in Liliaceae and Colchicaceae. Meanwhile, it expanded to a part of rpl22 in Smilacaceae (Figure 4). In Melanthiaceae, it occurred in the trnH_GUG/rps19 intergenic spacer in Veratrum and Toxicoscordion. However, in other taxa, it expanded into a part of rps19 (350 bp in Trillium), into the rps19/rpl22 intergenic spacer (IGS; Anticlea and Stenanthium), into the rpl22/rps3 IGS (Heloniopsis), and into a section of rps3 (6 bp in Xerophyllum and Paris; 83 bp in Chionographis; 65 bp in Schoenocaulon; 161 bp in Zigadenus). Compared to other tribes, Melanthieae possessed a wide range of IR/LSC junctions (Figure 4).

Accumulation of Repeat Sequences in Tribe Parideae
Further investigation of repeat sequences showed that the IGS between rpl23 and ycf2 containing trnI_CAU was extremely variable in length, ranging from 299 to 818 bp among Paris and Trillium, while a more stable length was detected in other species (  Table 2). In Paris species, this region contained a ranging copy number from 2 to 16 whereas the number of copy varied from 2 to 20 in Trillium taxa ( Table 2, Supplementary Data S1). Also, the length of these repeats was different in both genera, ranging from 24 to 155 bp in Paris and from 18 to 209 bp in Trillium (Supplementary Data S1). Additionally, we observed upstream and downstream of repeats because of the important role of initial repeats in SSM mechanism. As a result, we found two groups of small conserved repeated sequences in most of the   surveyed taxa ( Figure 5A, Supplementary Data S1). In the first group, there were two 7 bp-direct repeats which were located upstream and within the coding sequence of trnI_CAU. In the rest group, there was a cluster of direct repeats including R1 (5 ′ -CAAATTCCAAT-3 ′ ), R1 a (5 ′ -CCAATTCCAAT-3 ′ ), and R1 b (5 ′ -ATTCCA-3 ′ ).

Comparative Characteristics of cpDNA among Melanthiaceae Species and Its Implication
The cpDNA structures of representative species of Melanthiaceae consist of typical double-stranded DNA molecules and are highly conserved, as reported in previous angiosperm cpDNA studies (Palmer, 1991;Yang et al., 2010;Liu et al., 2012;Huang et al., 2013Huang et al., , 2014Huang et al., , 2016Luo et al., 2014;Nguyen et al., 2015). In this study, length variations were identified among the Melanthiaceae taxa ( Table 1). The longer sequences of cpDNA were found in Paris, Trillium, Xerophyllum, and Heloniopsis species which possessed either repeat units in rpl23-ycf2 regions or expansion of IR/LSC border. Although, C. japonica has the expansion of IR/LSC junction to rps3 (83 bp), the loss of rps16 caused a shorten length of its cpDNA. Therefore, it is suggested that the length variations within Melanthiaceae could have been led by the deletion and duplication of genes, as well as the expansion of IR regions. The comparative analysis among five families of Liliales revealed a notable variety of length and different losses of genes in cpDNA (Table 1). However, further studies, which cover all 10 families, should be conducted to investigate the overall trends of genomic events in Liliales. The rps16 gene, encoding ribosomal protein S16, is commonly detected in the plant chloroplast genomes. However, the loss of this gene was also recorded in different taxa including Connarus, Epifagus, Pinus, Viola, Fagus, legume species, and etc. (Downie and Palmer, 1992;Doyle et al., 1995). For understanding this loss, it was proposed that the rps16 gene was transferred to the nucleus and its protein product was able to target both chloroplast and mitochondria in the case of Medicago truncatula and Populus alba (Ueda et al., 2008). Additionally, deletion of rps16 was found in a moss species of Physcomitrella patens subsp. patens (Sugiura et al., 2003). These results suggest that the transfer event of rps16 occurred independently at the early divergence of plants. It was lost in C. japonica and partially or completely deleted in V. patulum, S. densum, and T. micranthus among the Melanthiaceae; therefore, ribosomal protein S16 was predicted to be untranscribed and untranslated from cpDNA. However, this deficiency could be compensated from nuclear rps16 products as described in a previous study (Ueda et al., 2008). In contrast to the deletion of exon 2 and complete loss of rps16 in Chionographis and Veratrum, the deletion of exon 1 and remains of a piece of exon 2 were recorded in Schoenocaulon and Toxicoscordion, respectively. Additionally, 22 out of 26 species of Schoenocaulon are endemic from the Southern of United States of America to Peru (Zomlefer and Judd, 2008). Therefore, this genomic feature might contribute to investigating the evolution of cpDNA in this genus. Further studies which cover all species of Schoenocaulon and Toxicoscordion should be conducted to clarify the overview of this feature in the tribe Melanthieae. Furthermore, two types of rps16 deletions were found in two sections of Veratrum which were distinguished by characteristics of leaf, style, and sheath of stem base (Chen and Takahashi, 2000;Zomlefer et al., 2003; Figure 3). Previously, genomic events in chloroplast genome sequences were specifically detected in some species and could be molecular markers. For example, the inversion of the trnV_UAC-atpB region was only detected in species of Trillium subgenus Phyllantherum of Melanthiaceae and the loss of ycf15 was observed in tribe Colchiceae of Colchicaceae (Nguyen et al., 2015;Kim et al., 2016b). In this study, based on the finding of partial or complete loss of rps16, we provide the first potential molecular maker for recognizing two sections among Veratrum (Figure 3). From these results, it is likely that genomic events in chloroplast genomes are effective for making molecular markers and reflect the phylogeny among Liliales taxa.
In general, IR expansion affects length variation in cpDNA. For example, the expansion of the IR region (36,501 bp) into psbB in Mahonia bealei cpDNA led to an increased total genome length (164,792 bp; Ma et al., 2013). In Melanthiaceae, the IR/LSC junctions were also variable (Figure 4). This variability affected the total length of the cpDNA region. For instance, the IR/LSC junction expansion from trnH_GUG into rps3 resulted in an increased length of the IR region from 26,360 bp in V. patulum to 28,373 bp in P. verticillata (Table 1). Wang et al. (2008) suggested that the IR/LSC junctions in the Liliales taxa contained the trnH_GUG-psbA cluster, but variable patterns of junction existed in the order Liliales (Figure 4). Within the same family of monocots, IR/LSC junctions contained similarities; for example, the boundaries located in the rps19 and rpl22 genes of the Arecaceae and Orchidaceae, respectively (Huang et al., 2013;Luo et al., 2014). A similar trend was observed in dicots species of the Araliaceae, in which a common IR/LSC boundary was detected in the rps19 gene (Li et al., 2013). In contrast, the border of IR/LSC varied among Melanthiaceae in which the IR region was expanded from trnH_GUG into a part of rps3 (Figure 4). Significantly, there were three different borders in tribe Melanthieae (Figure 4). The unique expansion into 161 bp of rps3 might be a potentially molecular marker for monotypic species-Z. glaberrimus (Figure 4).

Hypothetical Scenarios for Dynamic SSM Events in cpDNA
Previously, the DSB mechanism induced recombination in Chlomydomonas reinhardtii cpDNA (Dürrenberger et al., 1996). Kwon et al. (2010) reported the DSB repair pathways from both microhomology and no homology in Arabidopsis. Additionally, cpDNA sequences typically contain two inverted repeat regions which can be reversely used as a template for repairing the break of DNA through recombination. Tandem repeats ranging from 6 to 33 bp in IR region was discovered in Oenothera species (Onagraceae, Myrtales) (Blasko et al., 1988;Nimzyk et al., 1993;Sears et al., 1996), and tandem repeats comprising a 29bp sequence have been found in the rps8-rpl14 IGS of the LSC region of Oenothera (Wolfson et al., 1991). The copy correction of IR regions after imprecise alignment, replication slippage, and recombination have also been proposed as a mechanism for the accumulation of tandem repeats in Oenothrea cpDNA (Blasko et al., 1988;Wolfson et al., 1991;Sears et al., 1996). Recently, Massouh et al. (2016) surveyed and found spontaneous mutants in chloroplast genomes of Oenothera which were mostly caused by the replication slippage events. SSM was believed to be a major factor for DNA evolution (Levinson and Gutman, 1987). Although, results of SSM were previously reported in cpDNA of angiosperms, there have not been records of utilization of different small conserved repeats in the same region of cpDNA for generating newly repeated sequences. In this study, due to the presence of the conserved regions which flanked tandem repeats, we proposed three different patterns for generating the repeated sequences among Parideae taxa ( Figure 5A). In the first scenario (I), the R1 b sequence was utilized through SSM mechanism to form three tandem repeats of 164 bp in Trillium govanianum which includes the whole trnI_CAU sequence (Supplementary Data S1). Within the second pathway (II), prior to the process of SSM, a point mutation which changed the adenine base to cytosine base to create a perfect direct repeat between R1 and R1 a occurred. The present of direct repeat (R1 a ) induced SSM process which resulted in formation of two repeats in Trillium taxa. Generally, in the SSM mechanism, initial repeats play an important role. Therefore, in the third case (III), we proposed the formation of initial repeats through the double-strand break (DSB) repair mechanism ( Figure 5B). Specifically, two repair mechanisms may be involved in this case due to the difference in repeat contents among species. First, in the III-A subcase, unequal recombination occurred downstream of trnI_CAU and induced the formation of initial repeats which were employed in SSM process. Meanwhile, in the second subcase (III-B), repairing mechanism of DSB through homology facilitated illegitimate recombination (HFIR) occurred based on direct repeat sequences of 7 bp (5 ′ -ATGGATG-3 ′ ) to create a longer 16 bp-repeat unit (5 ′ -ATGGATGCTTAACAGG-3 ′ ) which was assumed to be an initial repeat unit for SSM event. Because of the different initial repeat units, SSM events occurred and resulted in newly distinct types of repeat sequences in both Paris and Trillium species ( Figure 5B, Table 2, Supplementary Data S1). Albeit the sequence data supported our hypothetical scenarios, there was not essential evidence of in vivo experiment in this study. However, GuhaMajumdar et al. (2008) previously attempted to trace replication slippage in vivo and successfully confirmed this event from results of deletion and duplication in C. reinhardtii and Escherichia coli. Although, this study employed only one type of short tandem repeat, it fundamentally supported the reliability of three hypothetical scenarios in our study. Further studies, which use more types of small sequence repeats in the same region, should be conducted to provide substantial evidence for our hypothesis.
Recently, Kim et al. (2016b) used the number of trnI_CAU to classify the type of duplication events across Parideae. This classification was incongruent with infrageneric circumscription of Paris members, but not for Trillium species. In contrast, in terms of the origin of repeat sequences, there were no relationships between the classification within tribe Parideae and mechanisms of repeat's accumulation. For instance, in Paris, the formation of repeats could be explained by the (III) scenario, except P. incompleta whose repeats have likely arisen from the (I) pathway followed by point mutation events and P. japonica which reflected complex duplication processes ( Table 2, Supplementary Data S1). In Trillium, all three pathways can be found. For example, the (I) scenario was recorded only in T. govanianum. Meanwhile, the (II) and (III-A) pathways could be found in subgenus Phylantherum and subgenus Trillium, respectively. Moreover, repeats were not found in rpl23-ycf2 IGS of T. undulatum (Subgenus Trillium), or in T. decumbens and T. cuneatum (subgenus Phylantherum). These findings suggested that SSM events occurred independently across the tribe Parideae. Additionally, the number of <24 bp-repeat units was more abundant than those of over 24 bp in length, suggesting that that SSM occurred more frequently for short repeats than for long repeat sequences in the tribe Parideae (Melanthiaceae, Liliales). In Anticlea elegans and S. densum, two repeats (19 bp) were found ( Table 2), suggesting that the accumulation of repeats may also occur in tribe Melanthieae, which is composed of 7 genera and 78 species of Melanthiceae.
Although, sequence data provided evidence for different scenarios of SSM and its independence, there was not enough evidence regarding the alternation of initial repeats during the SSM process in Parideae. Notably, small conserved units were found in most of the examined taxa; however, within Liliales, the repeats in the rpl23-ycf2 IGS were mainly present in the tribe Parideae of Melanthiaceae. Therefore, variation within this region may be due to a unique genomic event in Parideae. Previous studies have found diverse genome sizes among Melanthiaceae (Pellicer et al., 2014). In contrast to the trend of reduced genome size in other tribes, Parideae exhibit significant increases in chromosome size and possess the largest nuclear genome in Melanthiaceae. This trend can also been seen in the patterns of repeats between Parideae and other tribes. It is likely that the causes of chromosome changes in Parideae might be related to the accumulation of repeats within this tribe. More studies should be conducted to shed light on the significance of these two unique features of the genomes of Parideae. Additionally, accumulation of repeat sequences was also found in rpl23-ycf2 IGS of monocots such as Acorus calamus (Accession number NC_007407), Sagittaria lichuanensis (Accession number NC_029815), Anomochloa marantoidea (Accession number NC_014062), Eustrephus latifolius (Accession number KM_233639), Curcuma roscoeana (Accession number KF_601574), and Musa acuminata subp malaccensis (Accession number HF677508; Data not shown), suggesting that the rpl23-ycf2 IGS may be one of the "hot spots" for genomic events in angiosperm species.

CONCLUSIONS
In conclusion, comparative analysis of cpDNA in Melanthiaceae revealed that genomic events including pseudogenization, duplication, and deletion in the chloroplast genome are precise sources for mining molecular marker in plants. Specifically, gene loss events of rps16 were potentially valuable molecular data for identifying two sections of the Veratrum species. Melanthiaceae also exhibits a significant change in junctions between LSC and IR regions. Additionally, we provided the first evidence of different employments of small repeat sequences for SSM in chloroplast genomes of monocots species. Though the origin of these differences remains unclear, these data highlight the dynamic molecular evolution in chloroplast genomes. With the increasing number of complete organelle genomes, these patterns could be detected in other species and be useful references for tracing genomic evolution among plants.

AUTHOR CONTRIBUTIONS
HDKD carried out the genomic experiment and drafted the manuscript. HDKD and J-HK participated in the design of the study and revised the manuscript. All authors read and approved the final manuscript.

ACKNOWLEDGMENTS
We would like to thank Sang-Chul Kim of Gachon University (Korea), Anett Krämer of the Botanische Gärten der Universität Bonn (Germany), Peter Brownless of Royal Botanic Garden Edinburgh (United Kingdom), Botanical Garden Maise (Belgium), Garden of Auckland (New Zealand), and Ian Christie from the Scottish Rock Garden Club for collecting and providing the plant materials for this study. Also, we would like to thank Dr. Jung Sung Kim (Gachon University, Korea), Dr. Michael A. Vincent (Miami University, Ohio, USA) and reviewers for helpful suggestions for preparing and improving this manuscript.