The first complete chloroplast genome of Thalictrum fargesii: insights into phylogeny and species identification

Introduction Thalictrum fargesii is a medicinal plant belonging to the genus Thalictrum of the Ranunculaceae family and has been used in herbal medicine in the Himalayan regions of China and India. This species is taxonomically challenging because of its morphological similarities to other species within the genus. Thus, herbal drugs from this species are frequently adulterated, substituted, or mixed with other species, thereby endangering consumer safety. Methods The present study aimed to sequence and assemble the entire chloroplast (cp) genome of T. fargesii using the Illumina HiSeq 2500 platform to better understand the genomic architecture, gene composition, and phylogenetic relationships within the Thalictrum. Results and discussion The cp genome was 155,929 bp long and contained large single-copy (85,395 bp) and small single-copy (17,576 bp) regions that were segregated by a pair of inverted repeat regions (26,479 bp) to form a quadripartite structure. The cp genome contains 133 genes, including 88 protein-coding genes (PCGs), 37 tRNA genes, and 8 rRNA genes. Additionally, this genome contains 64 codons that encode 20 amino acids, the most preferred of which are alanine and leucine. We identified 68 SSRs, 27 long repeats, and 242 high-confidence C-to-U RNA-editing sites in the cp genome. Moreover, we discovered seven divergent hotspot regions in the cp genome of T. fargesii, among which ndhD-psaC and rpl16-rps3 may be useful for developing molecular markers for identifying ethnodrug species and their contaminants. A comparative study with eight other species in the genus revealed that pafI and rps19 had highly variable sites in the cp genome of T. fargesii. Additionally, two special features, (i) the shortest length of the ycf1 gene at the IRA-SSC boundary and (ii) the distance between the rps19 fragment and trnH at the IRA-LSC junction, distinguish the cp genome of T. fargesii from those of other species within the genus. Furthermore, phylogenetic analysis revealed that T. fargesii was closely related to T. tenue and T. petaloidium. Conclusion Considering all these lines of evidence, our findings offer crucial molecular and evolutionary information that could play a significant role in further species identification, evolution, and phylogenetic studies on T. fargesii.


Introduction
The genus Thalictrum belongs to the subfamily Thalictroideae of Ranunculaceae, with the subgenera Thalictrum and Lecoyerium, which includes approximately 200 species and is distributed in Asia, related to their structural features and roles have been studied in T. fargesii (Khamidullina et al., 2006).
Because of its immense potential in herbal medicine, this species is collected from the wild or grown by rural farmers in China.Consequently, in local herbal markets, the species is often confused by morphological similarities with other related species in the genus, and adulterated, altered, or mixed in the herbal formulation with or without intention (Bainsal et al., 2022).No doubt, it poses a serious threat to consumer safety.Thus, proper identification of T. fargesii is of great importance in herbal formulations, which are often misunderstood, confused, or incorrectly recorded during plant identification.For example, during the course of herbarium specimens, T. fortune was found to be mixed with T. fargesii Franchet ex Finet & Gagnepain (1903: 608), resulting in wrong records of the geographical distribution of these species (Zeng et al., 2022).Moreover, Li et al. (2016) clarified the variations in T. fargesii and proposed a synonym for T. xingshanicum, revealing that the name of T. fargesii was previously misunderstood.In addition, herbarium collections of T. fargesii from Neijiang, Sichuan Province were misled by misidentification of the collections of T. ramosum in Gansu Province (Zeng et al., 2023).These misidentifications can be explained by the morphological variation related to environmental factors (e.g., weather, soil properties, elevation level, etc.) where the plants have grown (Hernańdez-Nicolaś et al., 2017).It is important to eliminate such dilemmas using molecular approaches rather than morphological observations (Duminil and Di Michele, 2009).
Identifying member species within the Ranunculaceae family, especially within the Thalictrum, is taxonomically challenging because of the inability to distinguish these species through universal molecular markers and similar morphological traits (Li et al., 2020).Previous studies have revealed that Thalictrum species form a highly supported clade, indicating that they are monophyletic (Cai et al., 2022;Michimoto et al., 2022).Thalictrum species exhibit small chromosomes compared with those of Rununculus, and this genus is challenging and diverse in terms of taxonomy and phylogeny based on variations in morphological traits (Langlet, 1927;Tamura, 1995;Xiang et al., 2022).As a critical solution, chloroplast (cp) genomic resources have facilitated the classification of members of the Ranunculaceae family, providing data and insight into evolutionary relationships (Zhai et al., 2019), phylogenetic resolution (bootstrap support and tree certainty) (Morales-Briones et al., 2019), and marker exploration (Xiang et al., 2022).
In the photosynthesis of green plants, chloroplasts play the most significant role in coordinating the expression of different genes between these organelles and the nuclear genome in response to environmental stress, including cellular signaling (Daniell et al., 2016;Serrano et al., 2016;Yu et al., 2019;Tano et al., 2023).Interestingly, because of the presence of inherited conserved genes and their organization and low recombination and substitution rates, compared with nuclear genomes, cp genomes serve as resources for phylogenetic analysis and evolutionary studies (Wolfe et al., 1987;Corriveau and Coleman, 1988;Birky, 2001;Provan et al., 2001;Zhang et al., 2003;Ravi et al., 2008).Moreover, DNA barcoding and molecular breeding have been studied using cp genomes (Daniell et al., 2016).Several strategies have been adopted to sequence the organelle genome, including amplification of the whole cp genome using a universal set of primers, sequencing (Cronn et al., 2008;Dong et al., 2013), and using total genomic DNA to extract the cp genome (Velasco et al., 2007;McPherson et al., 2013).Considering the available taxonomic evidence, the importance of cp genome studies of folk medicinal plants has been demonstrated by several studies (He et al., 2021;Lin et al., 2021;Pu et al., 2022).
From several previous studies, significant molecular cross-talks have been reported on the characteristics of cp genomes within Thalictrum, such as T. cirrhosum (Zhe et al., 2023), T. coreanum (Park et al., 2015), T. foliolosum (Pu et al., 2022), and Thalictrum aquilegiifolium var.sibiricum (Michimoto et al., 2022).Current research has focused predominantly on T. fargesii, emphasizing the need to construct the whole cp genome using next-generation sequencing, characterization, long and simple sequence repeats (SSRs), phylogenetic analysis, nucleotide diversity, and selection pressure of genes within the Thalictrum.This study aimed to generate molecular data for the cp genome of T. fargesii to provide taxonomic and molecular information to biologists, practitioners, and professionals in herbal medicine to develop molecular markers to avoid misidentification.

DNA extraction and nextgeneration sequencing
The T. fargesii plant sample used for cp genome sequencing was identified and artificially reproduced in the Wenchuan Botanical Garden of the Aba Tibetan and Qiang Autonomous Prefecture in Sichuan Province, China.The voucher specimen (Accession No. CP00002) was identified and deposited at the Herbarium of Neijiang Normal University (Neijiang City, China; Shixi Chen, saihei@foxmail.com).After collecting leaf specimens, they were stored at room temperature and packaged in 0.2 g of silicon dioxide.We used a modified cetyltrimethylammonium bromide (CTAB) method to extract total genomic DNA from fresh leaves (Porebski et al., 1997).Short reads of the T. fargesii cp genome were sequenced using a genomic library with an insert size of 260 bp.
The products were prepared and sequenced on an Illumina HiSeq 2500 platform with approximately 436× coverage.

Comparison of the chloroplast genome within Thalictrum
Eight relevant Thalictrum cp genome sequences were downloaded from the GenBank database.Nine sequences, including T. fargesii, were analyzed to determine the degree of variation and sequence conservation within the same genus (Supplementary Table 1).We compared the IR regions of all species using the web-based tool IRplus (https://irscope.shinyapps.io/IRplus/)to visualize the IR-SC boundaries and their gene orientations (Dıéz Meneńdez et al., 2023).Sequence alignment was performed to identify the variation in sites between the nine Thalictrum cp genome sequences on mVISTA using the alignment program LAGAN (Mayor et al., 2000;Frazer et al., 2004).The sites were checked manually using BioEdit version 7 (Hall, 1999).The sites were subsequently analyzed, and the results are presented in Supplementary Table 2.

Relative synonymous codon usage analysis
Each amino acid has a minimum of one codon and a maximum of six codons owing to the simplicity of the codons.MEGA11 software was used to calculate the relative synonymous codon usage (RSCU) in the cpDNA of T. fargesii (Tamura et al., 2021).

Repeat analysis
SSR locus analysis was performed on the assembled T. fargesii cp genome sequence using the MicroSarellite identification tool (MISA) (Beier et al., 2017).The parameters were set as follows: 1-10, 2-5, 3-4, 4-3, 5-3, and 6-3, and the minimum distance between the two SSRs was set to 0 bp.The software package REPuter was used to conduct a long-repeat analysis and a minimum repeat size of 30 bp was assigned along a Hamming distance of 3 bp (Kurtz et al., 2001).The tandem repeat finder is an online program that detects tandem repeats using the default parameter settings (Benson, 1999).

Predicting RNA editing sites
We predicted RNA editing sites in the cp genome of T. fargesii using a convolutional neural network (CNN) model-based tool called Deepred-mt (Edera et al., 2021).We extracted the PCGs of this cp genome and inputted them into the Deepred-mt tool for prediction, considering threshold probability values greater than 0.9 as reliable results.

Synonymous substitution and selective pressure
The natural selection that drives molecular evolution was analyzed for the Thalictrum chloroplast genome; thus, we calculated the ratio of nonsynonymous (dn) to synonymous (ds) substitutions called w (dn/ds).The w value is an indicator of natural selection of PCGs.Values w > 1, w = 1, and w < 1 indicate positive, neutral, and negative/purifying selection, respectively.The stop codons of all 73 PCGs from these species were deleted and subsequently aligned, and the dn, ds, and w values were calculated using MEGA 11 (Tamura et al., 2021).

Phylogenetic analysis
To ascertain the phylogenetic position of T. fargesii within Thalictrum, nine species were analyzed, and Aconitum delavayi (NC_038097) was chosen as the outgroup.Sequence (full-length) alignment was performed using the MAFFT v7 plugin integrated into PhyloSuite v1.1.15 (Katoh and Standley, 2013).The dataset was refined using the Gblocks program with stringent parameter settings and then subjected to maximum likelihood (ML) analyses.Substitutional saturation was assessed using DAMBE version 7.0.68(Xia, 2018) and phylogenetic trees were constructed using ML algorithms.ML phylogenies were conducted using RAxML v8.2.12, with 1,000 bootstrap replicates, and the GTRGAMMA model (Stamatakis, 2014).

Features of the chloroplast genome
After removing the adapter and low-quality reads, we found 9.4 Gb data for the T. fargesii cp genome.The cp genome was 155,929 bp in length, displayed a circular structure, and was divided into typical quadripartite regions, similar to most land plants (Figure 1).The cp genome includes two 26,479-bp inverted repeats (IRA and IRB), and they separated the genome into a large single-copy (LSC) region of 85,395 bp and a small single-copy (SSC) region of 17,576 bp (Supplementary Table 1).The overall GC content of the cp genome was 38%.
The size of the cp genomes of the Thalictrum ranged from 154,889 bp (T.thalictroides) to 156,103 bp (T.tenue).The cp genome of T. fargesii (155,929 bp) was the fourth largest and most similar to those of other species within the genus (Supplementary Table 1).Moreover, we found the most extended LSC region (85,395 bp) in the T. fargesii cp genome, and the shortest LSC region (84,733 bp) in T. coreanum.In terms of count, the total number of genes found ranged from 128 to 134, and for PCGs, it ranged from 83 to 89.Nonetheless, except for T. viscosum (38 tRNAs), most species, including T. fargesii, possess 37 tRNA genes.
As shown in Table 1, the PCG genes have different functions, including one acetyl-CoA carboxylase, six ATP synthases, one ATPdependent CLP protease, five cytochrome b6, one cytochrome c biogenesis protein, one cytochrome b/f, one envelope membrane protein, two hypothetical chloroplast RF2, one megakaryocyteassociated tyrosine kinase, 12 NADH dehydrogenases, one Nterminal nucleophile amino hydrolase superfamily, seven photosystem I, 14 photosystem II, 25 ribosomal proteins, one ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit, and four RNA polymerases.Each RNA had two copies and was equally distributed within the IR region.Duplicated tRNAs were not present in the LSC region.Moreover, rpl32, the rps19 fragment, one copy of ycf1, and infA were pseudogenes.

The base composition of the chloroplast genome
Base compositions of the LSC, SSC, and IR regions and codons at various locations were examined in the cp genome.The percentages of GC content in LSC, SSC, and IRs were 40.57%, 32.53%, and 43.23%, respectively, indicating that the percentage in the IR region was greater than that in the other regions.Moreover, the frequencies of adenine (A), thymine (T), guanine (G), and cytosine (C) contents were 30.46%,31.12%,18.84%, and 19.58%, respectively.The single-copy regions, LSC (59.43%) and SSC (67.48), had higher AT content than the repeat regions (56.77%) (Table 2).The reduced AT content in the IR regions may be related to the presence of rRNA, which contains fewer AT nucleotides, contributing to genome stability and sequence complexity.Similarly, the CSD (protein-coding region) was rich in AT (61.24%), whereas the GC content was 38.73%.At the first, second, and third codon positions, each had a length of 25,038 bp and contained a range of 29.10%-31.44%A, 23.44%-37.25%T, 16.69%-27.35%G, and 14.61%-20.67%C contents, respectively.These findings disclosed that the cp genome had a higher AT than the GC content and that the codons preferred bases A and T.

Codon usage
The ratio of the frequency of a codon observed to that expected, termed "relative synonymous codon usage" (RSCU), was determined, which indicates codon usage bias in coding sequences.The protein-coding sequences (CDSs) of the T. fargesii cp genome were used to calculate RSCU via MEGA 11, and 64 codons were found.Three of these were stop codons and 61 encoded 20 amino acids.A single codon encodes methionine (Met) and tryptophan (Try), whereas the other amino acids are encoded by two to six codons (Figure 2).Among the 64 codons, 31 had RSCU > 1 in the CDS of the cp genome, indicating that they were high-frequency and optimal codons.A or U was preferred as the third base of the high-frequency codon, as evidenced by the fact that 16 of these codons ended in U, 13 in A, 1 in G, and 1 in C. The most preferred codon was GCU, which encodes alanine (Ala) and has an RSCU value of 2. UUA, which encodes leucine (Leu) Graphical representation of the features identified in the cp genome of T. fargesii.The species name, genome length, GC content, and number of genes are represented in the center of the plot.Represented by arrows, transcripts for genes inside and outside the circle are generated in opposite directions of the tetrad structure of the cp genome.Different colors are used to distinguish between genes that belong to specific functional categories.At the lower left, the legend for the classification of the cp genes according to their function is shown.The darker shaded region inside the inner circle indicates the GC content, while the light color indicates the AT content of the cp genome.The gradient GC content of the cp genome is represented by the second circle, with a zero level based on the outer circle.The GC content of each gene is displayed as the proportion of shaded areas.The boundaries of the small single-copy (SSC), large single-copy (LSC), and inverted repeat (IRA, IRB) regions are denoted in the inner circle.
and has an RSCU value of 1.86, was the second-most preferred codon.

Comparison of IR boundaries within Thalictrum chloroplast genomes
Significant contributions to variations within cp genomes among different species were observed, owing to the contraction and expansion of IR regions, which resulted in gene duplication, deletion, and pseudogene generation (Song et al., 2022).The locations of LSC/IR and SSC/IR junctions are regarded as indices of cp genome evolution (Zhang et al., 2013a).Figure 5 shows that the size of the cp genome within the Thalictrum was 154,889-156,103 bp; the lengths of the LSC and SSC were 84,733-85,507 bp and 17,470-17,657 bp, respectively; and the lengths of the IR regions were 26,272-26,521 bp.The LSC/IRB border is located within rps19 and extends 53-119 bp into the IRB region.Notably, the rps19 gene spanned 119 bp from the LSC to IRB in the cp genome of T. coreanum, whereas it stretched by only 100 bp in the same direction in the T. fargesii plastome.In T. thalictroides, the LSC/IRB border is flanked by an intergenic region of 28 bp between rps19 and the border, and the IRB/SSC border is between the partial ycf1 and ndhF genes.
In the SSC region, the ndhF gene was larger in T. thalictroides than in the other species.Moreover, T. foliolosum lacks ndhF.The ycf1 fragment gene crosses the IRB/SSC border as a pseudogene in T. viscosum, T. cirrhosum, T. balcalense, and T. fargesii.In contrast, the SSC/IRA junction was positioned on the full-length ycf1 gene in all the species.However, the size of the overlapping ycf1 gene at the SSC/IRA junction varies among the Thalictrum species.Remarkably, this gene enveloped 1,039-1,152 bp of the IRA region in all species.In addition, the IRA/LSC border was pointed between the rpl2 and trnH genes and 42-77 bp from the trnH region toward the LSC region.Notably, exceptions were observed in T. fargesii and T. coreanum, where the rps19 fragment was present at the edge of the IRA/LSC.The rps19 fragment ended at the termination of the IRA region in T. coreanum, whereas it was 2 bp extended toward the LSC in T. fargesii.These results demonstrate that the IR and SC borders of the cp genomes within the Thalictrum are primarily similar in terms of organization, gene content, and gene order.

Divergence hotspot regions
Highly variable chloroplast genome sequences can be used to ascertain phylogenetic relationships between species and genera (Liu et al., 2018;Wu et al., 2018).To assess the degree of sequence divergence, we computed nucleotide diversity values (pi) to locate hotspots; hence, the results are presented for the entire T. fargesii cp genome in a plot of pi values ranging from 0 to 0.033 (Figure 6).In the plot, the IR regions were shown to have a lower variability than the SSC and LSC regions.Remarkably, seven highly variable regions (pi > 0.02) were detected: three in the LSC (psbM-trnD-GUC, trnF-GAA-ndhj, and atpB), two in the SSC (ycf1 and trnR-ACG-rrn4.5), and two in the IRA (ndhD-psaC and rpl16-rps3).These regions may also be employed as high-potential fragments to distinguish between different Thalictrum species and may help in the development of molecular markers.

Selection pressure on genes
In this study, we analyzed the rates of nonsynonymous (dn), synonymous (ds), and dn/ds (w) ratios among 77 PCGs common to Thalictrum, and the results are shown in Supplementary Table 8.Among the 77 PCGs, the results of the statistical neutrality test indicated that the genes were relatively stable during the evolutionary process, and only 12 genes were found to have w values less than 1 (Supplementary Figure 2).To manifest, seven genes (atpF, matK, petG, rpl22, rps2, ycf1, and pafI) were under neutral selection (w = 1), and four genes (atpE, ccsA, ndhG, and rpl20) had w values of precisely 0.5, indicating that the genes were under purifying selection.The remaining gene, ndhF, had a w = 0.33, suggesting that it was also under purifying selection.

Phylogenetic analysis
The appearance of a complete T. fargesii cp genome provided us with access to sequencing data that could be used to investigate the phylogeny and, therefore, contribute to our understanding of  Relative synonymous codon usage (RSCU) of 20 amino acids and stop codons in all protein-coding genes of the chloroplast genome of T. fargesii.
The colors of the histograms correspond to the colors of the codons.
evolutionary relationships within the Thalictrum.In the present study, nine species were analyzed to ascertain their phylogenetic position, and Aconitum delavayi (NC_038097) was chosen as an outgroup.ML analysis revealed a single phylogenetic tree with higher bootstrap values, which resulted in constructing one cluster from the complete cp genome of Thalictrum species divided into two clades (Figure 7).Hence, Thalictrum was strongly supported as a monophyletic group [bootstrap support (bs) = 100%].This result indicates that T. fargesii is closely related to T. tenue and T. petaloideum.

Variations in gene content within Thalictrum
Regarding variation in gene content, we observed that pafI and pafII were found only in the cp genomes of T. fargesii and T. cirrhosum (Supplementary Table 9).Moreover, in the case of pseudogenes, ycf1, infA, and rpl32 were identified in the T. baicalense, T. coreanum, T. thalictroides, and T. fragesii cp genomes, whereas the rps19 fragment (as a pseudogene) existed only in T. coreanum and T. fargesii, indicating significant variations.The remaining species in the genus showed a relative absence of pseudogenes in their cp genomes.

The conserved genome of Thalictrum
T. fargesii is of considerable importance in herbal medicine in Asian countries; however, taxonomic controversies need to be resolved, which may be possible from a detailed study of its plastome genome (Li et al., 2020).Such a systemic investigation and evolutionary analysis of T. fargesii has not been published previously.Therefore, our study reports the first whole cp genome composition of T. fargesii, which provides potential evidence via phylogenomics.Previous studies on other angiosperm groups have shown that chloroplast genomes are conserved or highly polymorphic (Abdullah et al., 2020, Abdullah et al., 2021).In the present study, the cp genome was highly conserved in terms of gene order, gene content, and intron number, following the findings of many other taxa, including Thalictrum (Tang et al., 2015;Yu et al., 2018;Xiang et al., 2022).However, in some taxa, such as Amorphophallus of Araceae, some genes were lost (Liu et al., 2019), and in others, such as Pelargonium (Chumley et al., 2006), the structure and gene order diverged from those reported here and in most other angiosperms.
Overall, the structure and organization of the nine observed cp genomes in Thalictrum, including T. fargesii, follow the typical structure commonly found in angiosperms, except for a difference in tRNA content in T. viscosum, which contains 38 tRNAs instead of 37 (Park et al., 2015;Daniell et al., 2016;He et al., 2019;Morales-Briones et al., 2019;He et al., 2021;Cai et al., 2022;Zhe et al., 2023).The total length, GC content, and gene composition  were almost identical in the cp genomes of all studied species, and these highly conserved features might be due to a typical evolutionary passage in Thalictrum (Khan et al., 2020;Villanueva-Corrales et al., 2021;Feng et al., 2022).Compared to the LSC and SSC regions, the IR region was more conserved among the cp genomes.
In addition, the ratio of nucleotide substitutions (dn/ds) and their occurrence are frequently utilized as powerful tools to elucidate the evolution of protein-encoding genes and species adaptations (Kimura, 1989;Fay and Wu, 2003).It is noteworthy that the chloroplast genome of T. fargesii only contains neutral selection sites and purifying sites, suggesting that these genes have conserved functions throughout their evolutionary history (Khakhlova and Bock, 2006).These results offered further evidence that the cp genomes of the Thalictrum are conserved.

Spotlights on significant molecular variations within Thalictrum
Next-generation sequencing was used to sequence the cp genomes of the ethnodrug species T. fargesii.The plastome of this species varies in terms of genomic structure, nucleotide diversity, structural alterations, and phylogeny with the plastomes of Thalictrum species (Park et al., 2015;He et al., 2019;Morales-Briones et al., 2019;He et al., 2021;Cai et al., 2022;Zhe et al., 2023).For instance, there is an apparent difference between Thalictrum plastomes in the IR-SC boundary areas, which causes variations in genome size (Xiang et al., 2022).IR-SC contractions or expansions into or out of neighboring single-copy regions are typically observed in angiosperm plastomes, and this process is a primary mechanism and an example of length variation in cp genomes (Kim and Lee, 2004;Yang et al., 2016;Zhang et al., 2016;Ye et al., 2018).Indeed, in our investigation, several genes, including ycf1, rps19, ndhF, ycf15, and yrpl32, were worth examining more closely.For example, ycf1 is duplicated in the cp genomes of several members of the Thalictrum, with a larger copy (ycf1, 5,616-5,658 bp) positioned at the SSC-IRA boundary and a fragmented copy (yycf1, 1,144-1,152 bp) located at the SSC-IRB boundaries, including T. fargesii (Xiang et al., 2022).Other taxonomic groups also contain the pseudogene ycf1 at the same junction (Henriquez et al., 2020;Mehmood et al., 2020).According to our findings, the shortest ycf1 gene was present in the cp genome of T. fargesii (5,520 bp), with more than 5% of the variant sites within the genus.Thus, ycf1 has enormous potential for use in developing molecular markers and has also been reported in Fritillaria and Pulsatilla (Bi et al., 2018;Li et al., 2018, Li et al., 2020).
Among such variations in cp genomes, molecular evolutionary phenomena including mutations, substitutions, selections, and random genetic drifts play a significant role.One of the molecular features known as "codon use bias" also helps unveil chloroplast and nuclear genome relationships and has been influenced by the GC content at codon locations in various organisms (Yang et al., 2018).Moreover, it is a critical feature for studying the functions of genes and contributes to its evolutionary relationship (Wang et al., 2023;Yang et al., 2023).We disclosed the presence of AT-rich intergenic regions in our present analysis, which is harmonized with the phenomenon that the overall GC content in the cp genomes of terrestrial plants is often low (Yang et al., 2017).Codon usage has a strong relationship with GC content in the codon position, and it is considered an important parameter in understanding evolutionary processes and selection pressure on genes (Zhang et al., 2013b).Our investigation revealed that of the 31 codons with RSCU values higher than one, 16 ended in U and 13 ended in A, indicating natural selection and mutation (Necsulea and Lobry, 2007).This finding suggests that T. prefers to employ synonymous codons with a third base, A or U, which is comparable to that of other Thalictrum species (Xiang et al., 2022;Zhe et al., 2023).This may be attributed to the abundance of A/T nucleotides in the angiosperm cp genomes (Abdullah et al., 2020).For instance, alanine and leucine were the most common codons in the T. fargesii chloroplast, which might be related to the abundance of A or T in the IR region (Chen et al., 2015), and is consistent with previous studies on the cp genomes of angiosperms (Oresǐčand Shalloway, 1998).Lauraceae exhibited a similar trend (Cao et al., 2023); however, the cp genome of Lespedeza showed comparable results (Somaratne et al., 2019).
Repeat sequences, particularly SSRs, are widely distributed in cp genomes, as microsatellite repeats play an essential role in gene expression, transcription control, and chromosomal construction, and display a high level of polymorphism (Qi et al., 2018;Huang et al., 2022;Xia et al., 2022).Our results support the hypothesis that cpgenome SSRs have substantial A/T bias, and this trend frequently occurs in many plants (Vieira et al., 2014;Wang et al., 2018).Similar to T. fargesii, mononucleotide repeats are the most abundant in Fritillaria, Lilium, and Epimedium (Rønsted et al., 2005;Zhang et al., 2016).Xiang et al. (2022) have reported 39 to 60 SSRs among 11 Thalictrum species; however, we have identified a total of 68 such repeats.In future genetic studies, we may leverage that SSRs could be used as a source of molecular markers and genomic barcodes for the T. fargesii cp genome.Previous studies have demonstrated that improperly combined repeating or sliding sequences can cause genomic rearrangements (Lu et al., 2017).Long repeat sequences promote cp genome rearrangement and increase the genetic diversity of a population (Guo et al., 2017;Cui et al., 2019).For example, the ycf2 gene provides a suitable resource for genomic studies in the cp genome of T. fargesii, similar to those in other plant species (Cauz-Santos et al., 2017;Shen et al., 2017;Liu et al., 2018).Owing to the presence of different repeats in genes, such as ycf1, ycf2, and rpl32, resulting in a pseudogenized gene or loss of functions, become a significant molecular feature of cp genome in plant evolution and identification (Santos and Pereira, 2018;Park et al., 2019;Miao et al., 2022).
In higher plants, RNA editing occurs as a codon-specific event for converting cysteine (C) to uridine (U) during posttranslational modification.Therefore, this molecular process is crucial for RNA maturation to prevent unwanted mutations and to act as a signal for The ratio of chloroplast genome variant sites in Thalictrum species.adaptation (Chu and Wei, 2019;Somaratne et al., 2019).plants, including Arabidopsis, tobacco, pea, and tomato, have RNA editing sites (Wang et al., 2016).Thus, our findings on the significant C-to-U editing mechanism in the plastid genome of T. fargesii can aid in detecting missense mutations in its genes (Chen et al., 2018;Small et al., 2020).Divergent regions in cp genomes can help to identify closely related species and reveal phylogenetic information (Dong et al., 2012;Du et al., 2017).These regions have been adopted to generate barcodes for plant identification in other families of plants, including woody plants (Tan et al., 2018;Yan et al., 2023).According to our findings, ndhD-psaC and rpl16-rps3 were found to be divergent hotspots in the cp genome of T. fargesii, and have also been reported in other studies on the plastome of Thalictrum (Xiang et al., 2022).The same phenomenon has been witnessed for numerous other taxa such as Lagerstroemia and Adrinandra (Cui et al., 2019;Dong et al., 2021;Nguyen et al., 2021;Ren et al., 2022;Yang et al., 2022).In addition, these mutational hotspots might aid researchers in studying population genetics and identifying Thalictrum species.
To evaluate the selection pressure on genes and genomes, dn and ds values are important markers in the cp genome.Selection pressure plays a vital role in the adaptation to different environments (Yang and Nielsen, 2000;Gao et al., 2019).Several genes in T. fargesii were found to have a lower dn/ds ratio, suggesting that they are likely to undergo disadvantageous non-synonymous substitutions and purifying selections (Sheikh-Assadi et al., 2022).In T. coreanum, T. foeniculaceum, T. foliolosum, and T. thalictroides, the ndhG gene was significant under positive selection, with a lower codon bias index, suggesting a lower expression level in vivo (Xiang et al., 2022).Moreover, they reported that atpF, petG, rpl20, and rpl22 were under positive selection.In contrast, in the present study, the ndhG and rpl20 were observed under purifying selection, and atpF, petG, and rpl22 were supported by neutral selection in T. fargessii.Thus, purifying selection might be a result of an evolutionary process that preserves the adaptive behavior of this species.
Interestingly, the pafI and rps19 genes in the plastome of T. fargesii exhibited more than 5% variation within the Thalictrum, which was consistent with the findings of Li et al. (2014).In addition, the pafI gene contains two mono-type (10 bp each) SSR repeats and a 39-bp forward repeat at one of its introns, similar to the findings elucidated by Kurt et al. (2023).With the meaningful characteristic architecture of the IRA-LSC, the distance between the rpl2 and trnH genes was markedly different from that of other species in the genus.Moreover, similar to the phenomenon observed in Fritillaria, the presence of the rps19 fragment at the IRA-LSC junction in the T. fargesii cp genome and its distance from trnH make this region a probable string for marker design (Li et al., 2014).Cumulatively, these findings provide molecular evidence for differentiating the ethnodrug plant T. fargesii from other species of Thalictrum.

Insights into the phylogeny of Thalictrum
The phylogeny and classification of Thalictrum have long been debated, and distinguishing T. fargesii plants from plants with similar morphological traits remains unclear (Xiang et al., 2022).Plastid genome sequences have proven useful for phylogenetic investigations of angiosperms (Jansen et al., 2007;Huang et al., 2014;Kim et al., 2015;Li et al., 2019).This approach is valuable for classifying organisms at lower taxonomic levels (He et al., 2012;Zhang et al., 2016).Previous phylogenetic studies of the genus Thalictrum have used traditional approaches to classify this genus that includes rpl16 intron and ndhC-trnV-UAC by Soza et al., (2012Soza et al., ( , 2013)), and ndhA intron, trnLUAA-trnF-GAA, rpl32-trnL-UAG, and rbcL by Wang et al. (2019).Moreover, Xiang et al. (2022) found that 116 IGS regions had stronger support than 114 gene sequences and revealed that the closest phylogenetic relationship existed within T. minus and T. aquilegiifolium, and T. coreanum and T. thalictroides.Several studies have uncovered that close relationships were present between T. minus and T. tenue (He et al., 2019;Lin et al., 2021;Pu et al., 2022), T. tenue and T. petaloideum (He et al., 2019;Pu et al., 2022), T. minus and T. petaloideum (He et al., 2019;Pu et al., 2022), T. minus and T. aquilegiifolium (Cai et al., 2022;Michimoto et al., 2022;Xiang et al., 2022), T. viscosum and T. cirrhosum (Cai et al., 2022;Michimoto et al., 2022), and T. coreanum and T. thalictroides (He et al., 2021;Lin et al., 2021;Michimoto et al., 2022;Xiang et al., 2022) based on all sequences and whole cp genome.However, none of their circumscribed relationships have uncovered the T. fargesii status within this genus.The availability of complete cp genomes of T. fargesii, as investigated by the present authors, provides sufficient information to establish phylogenetic relationships among the same taxa.According to their results, T. fargesii is closely related to T. tenue and T. petaloideum, supporting the conclusion of an earlier study that T. baicalense is closely related to T. tenue, T. minus, and T. petaloideum (He et al., 2021).Interestingly, T. viscosum has been reported to be closely related to T. cirrhosum and T. foeniculaceum (Cai et al., 2022).Conferring with more Thalictrum taxon samples, our phylogenetic studies of cp genome sequences, reported for the first time, revealed significantly enriched phylogenetic resolution and provided robust inferences of intraspecific relationships.This finding expands the scope of future research on the correlation between these mutations in the cp genome in terms of genetic evolution and speciation in this species.Further studies should entail selecting more samples from wild populations based on their ecotypes and collecting more extensive data from both the nuclear and cp genomes to establish more detailed evidence of phylogenetics and species evolution.

Conclusion
Altogether, our findings revealed the complete cp genome of T. fargesii using the Illumina HiSeq platform and a comparative analysis with those of other Thalictrum species provides a fundamental reference for phylogenetic studies.The chloroplast genomes of Thalictrum were compared, and although they were found to be highly conserved in terms of structure, organization, gene order, IR boundaries, and gene content, we have reported distinguishable features of T. fargesii.In particular, two features have clearly demarcated the cp genome of T. fargesii within the genus, which is the length of pseudogenes and their distance from IR/SC borders.Moreover, pafI and rps19 had highly variable sites in the cp genome of T. fargesii compared with other species in the genus.In addition, the divergent regions identified in the present study via nucleotide divergence analysis could help molecular markers to identify this ethnodrug and its contaminants.Phylogenetic analysis revealed close relationships among T. fargesii, T. tenue, and T. petaloidium.The aforementioned analytical findings significantly contribute to the understanding of the cp genome of T. fargesii and provide genomic information to aid taxonomic identification and phylogenetic relationships within the Thalictrum species.In conclusion, our study provides powerful resources and valuable scientific references for identifying T. fargesii plants for the safety and effectiveness of ethnodrug use and contributes to the bioprospecting and conservation of this species.

FIGURE 1
FIGURE 1 FIGURE 3 Repeat type and number of analyzed SSRs in the cp genome of T. fargesii.(A) SSR-type distribution and their frequency in the complete cp genome.(B) Distribution of SSRs in the LSC, SSC, and IR regions.(C) Type and frequency of each identified SSR.

FIGURE 2
FIGURE 2 FIGURE 4 Distribution types and number of long repeats in the cp genome of T. fargesii.(A) Frequencies of different groups of repeats according to their length in the cp genome.Different colors indicate repeats of various sizes in bp.(B) Frequencies of different types of repeats: forward (F), palindrome (P), and reverse (R).(C) Distribution of the long repeats in the LSC, SSC, and IR regions.Different colors indicate the types of repeats.

FIGURE 5
FIGURE 5 Comparison of IR boundaries of chloroplast genomes within Thalictrum.The junctions are the LSC-IRB, IRB-SSC, SSC-IRA, and IRA-LSC.The length inside the boxes indicates the distance between the end of the gene and the border sites.Arrows indicate the distance from the junction.

FIGURE 6
FIGURE 6Sliding window analysis of the complete T. fargesii cp genome.Nucleotide diversity analysis (p value) of the complete chloroplast genome of T. fargesii.Window length: 600 bp; step size: 200 bp.X-axis: Position of the midpoint of the window.Y-axis: Nucleotide diversity of each window.

FIGURE 7
FIGURE 7 Phylogenetic relationships of Thalictrum inferred from maximum likelihood (ML) analysis based on whole chloroplast genomes.The numbers in each node indicate ML bootstrap values.The scale bar shows 0.7 changes.
(A) Distribution of the different variant sites.(B) The frequency of intragenic variant loci in different gene categories.(C) Frequency of variant loci among photosynthesis genes.(D) Frequency of variant loci among genes involved in self-replication.(E) Frequency of variant loci in the t-RNA genes.(F) Frequency of variant loci in the unknown genes.(G) Frequency of variant loci in the other genes.

TABLE 1
Genes according to the categories within the T. fargesii chloroplast genome.
*d, duplicate gene; i, genes with one intron; e, genes with two introns.

TABLE 2
Nucleotide composition in different regions of the chloroplast genome of T. fargesii.