Comparative and phylogenetic analysis of the complete chloroplast genomes of Uncaria (Rubiaceae) species

The genus Uncaria is famous for its high medicinal value. However, the high morphological similarities and unclear interspecific genetic relationships have posed challenges to the classification and identification of Uncaria species. Here, we newly sequenced six chloroplast genomes of Uncaria species: U. hirsuta, U. rhynchophylla, U. rhynchophylloides, U. homomalla, U. sinensis, and U. lancifolia. Comparisons among the chloroplast genomes of Uncaria species showed their conservation in structure, gene content, and order. Ten highly variable loci could be potentially used as specific molecular markers in the identification of Uncaria species. The third position of codons tended to use A/U base, and natural selection contributed more to the formation of codon usage bias in comparison to mutation pressure. Four genes (rbcL, ndhF, rps8, and ycf2) were detected to be subjected to positive selection. Phylogenetic analysis showed that the genus Uncaria was a monophyletic group, belonging to the tribe Naucleeae. Moreover, U. sinensis was not a variant of U. rhynchophylla. U. rhynchophylloides and U. rhynchophylla were not the same species. The results of the comparative and phylogenetic analysis provide valuable references for further research studies of classification, identification, breeding improvement, and phylogenetic relationships in Uncaria species.


Introduction
The genus Uncaria Schreb., in the family Rubiaceae, is primarily distributed in tropical Asia and Australia with 12 species distributed in China (Zhang et al., 2015).Uncaria species, commonly known as "Gouteng" in traditional Chinese medicine, have been extensively employed for the cure of various diseases over a considerable period of time (Zhang et al., 2015;Liu et al., 2023).Prior research studies have reported many pharmacological activities of Uncaria species, such as being antihypertensive, antiarrhythmic, anticonvulsant, sedative, antidepressant, and antithrombotic (Qin et al., 2021).The Chinese pharmacopoeia states that U. rhynchophylla, U. macrophylla, U. hirsuta, U. sessilifructus, and U. sinensis are medical Uncaria species (Chinese Pharmacopoeia Commission, 2020).The chemical compositions and bioactivities vary among different Uncaria species, which results in the divergence of curative effect (Liu et al., 2023).However, the high morphological similarities and the long-standing taxonomic controversies bring difficulties to the identification of Uncaria species.
The genus Uncaria was ever classified into the tribe Cinchoneae based on placentation characteristics (Ridsdale, 1978b).However, Razafimandimbison and Bremer (2001) supported the genus Uncaria classified into the tribe Naucleeae based on molecular and morphological data.Ridsdale (1978a) regarded U. rhynchophylloides as a synonym of U. rhynchophylla, but the point was doubted in some studies (Chen and Taylor, 2011;Turner, 2018;Zhu et al., 2018).Zhao et al. (2018) suggested that U. sinensis was possibly a variant of U. rhynchophylla."Flora of China" accepted the species name of U. homomalla and U. scandens, but it presented that further study was needed to verify the separation of the two species due to the deficiency of obviously morphological differences (Chen and Taylor, 2011).DNA molecular markers, uninfluenced by morphological characters, have solved some taxonomic controversies such as the taxonomic position of the genus Uncaria (Razafimandimbison and Bremer, 2001;Manns and Bremer, 2010).Internal transcribed spacer (ITS) and ITS2 have shown excellent performance in the identification of Uncaria species (Zhang et al., 2015;Tang et al., 2016;Zhu et al., 2018;Liu et al., 2023).However, the low bootstrap support rates of some branches within the genus Uncaria were found in the phylogenetic trees (Zhang et al., 2015;Zhu et al., 2018;Liu et al., 2023), and the phylogenetic relationships were not completely resolved.
Chloroplast is a key organelle responsible for photosynthesis and numerous metabolic activities with independent genome and evolution routes (Li et al., 2020;Li et al., 2022).Chloroplast genome typically possesses conserved circular tetrad structure with the size range from 72 kb to 217 kb (Sheng et al., 2021b).In comparison with nuclear and mitochondrial genomes, chloroplast genome has the characteristics of low sequencing costs, large gene density, highly conserved structure, and moderate evolution rate (Huang et al., 2021).Despite relative conservation, mutation events of substitution, translocation, insertion, and deletion in chloroplast genomes still provide sufficient genetic information for exploring interspecific genetic relationships of plants (Sheng et al., 2021b).Moreover, the specific DNA fragments screened from coding and non-coding chloroplast genomic regions having different nucleotide substitution rate are suitable for phylogenetic research studies of different taxonomic levels (Zhang et al., 2022).Chen et al. (2022) investigated the phylogenetic relationships of five Uncaria species using chloroplast genomes, but the limited number of species did not adequately reveal the phylogenetic relationships of Uncaria species.
Here, the complete chloroplast genomes of six Uncaria species (U. hirsuta, U. rhynchophylla, U. rhynchophylloides, U. homomalla, U. sinensis, and U. lancifolia) were newly obtained by high-throughput sequencing.Combined with three previously released chloroplast genomes of Uncaria species, we did comprehensive comparative analysis of chloroplast genomes of Uncaria species.The objectives of the research were to 1) gain insights into chloroplast genome structural features of Uncaria species, 2) seek the high variable regions for use in Uncaria species identification, 3) explore the codon usage bias and affecting factors, and 4) elucidate the phylogenetic relationships of Uncaria and closely related species.

Plant materials and DNA extraction
Six samples of Uncaria species (U. hirsuta, U. rhynchophylla, U. rhynchophylloides, U. homomalla, U. sinensis, and U. lancifolia) were collected from various regions of China and identified by Professor Changqing Zeng (Guangdong Pharmaceutical University, China) based on morphological traits (Supplementary Table S1).Genomic DNA of silica-dried leaves was extracted using modified cetyl trimethyl ammonium bromide protocol (Li et al., 2013).

Codon usage bias
Common protein-coding sequences with more than 300 bp (Li et al., 2021a) were employed for codon usage analysis.Effective number of codons (ENC), relative synonymous codon usage (RSCU), and Guanine and Cytosine (GC) content at the third position of synonymous codons (GC3s) were calculated by CodonW v1.4.2 (http://codonw.sourceforge.net/).The online CUSP program (http://embossgui.sourceforge.net/demo/)was used to calculated GC content at the first (GC1), second (GC2), and third (GC3) position of codons.ENC plot was drawn with ENC value as ordinate and GC3s as abscissa to explore the factors affecting the codon usage.The standard curve was calculated according to the formula (Li et al., 2021a): ENC expected = 2 + GC3s + 29/[GC3s 2 + (1 − GC3s) 2 ].Using GC12 (the average value of GC1 and GC2) as ordinate and GC3 as abscissa to make the neutral plot could investigate the degree of impact of mutation pressure and natural selection on codon usage bias (Chakraborty et al., 2020;Wang et al., 2022b).The slope of the regression line close to zero showed that codon usage bias was primarily influenced by natural selection, whereas the slope of the regression line close to one indicated that the codon usage bias was primarily influenced by mutation pressure (Chakraborty et al., 2020;Wang et al., 2022b).

Adaptive evolution analysis
The common protein-coding genes (PCGs) were aligned by MAFFT package performed in PhyloSuite v1.2.2 with codon alignment pattern (Zhang et al., 2020).The non-synonymous (dN) and synonymous (dS) substitution and their ratio dN/dS were calculated using DnaSP v6.12.03 software (Librado and Rozas, 2009).The ratio dN/dS > 1, dN/dS = 1, and dN/dS < 1 indicated positive, neutral, and negative selection, respectively (Liu et al., 2021).Moreover, the site model was employed to identify potentially positive selection sites using the CODEML algorithm performed in EasyCodeML v1.4 (Gao et al., 2019).Bayes Empirical Bayes method was used to calculate the posterior probabilities for amino acid sites that were potentially under positive selection.The logarithmic likelihood value of four pair of models (M0 vs.M3, M1a vs.M2a, M7 vs.M8, and M8a vs.M8) was calculated by likelihood ratio test and its statistical significance.

Features of chloroplast genome
The raw data of six Uncaria species in this study were in the range of 4.53 Gb to 11.92 Gb (Supplementary Table S3).After trimming low-quality regions and adaptors, 4.46 Gb to 11.45Gb clean data were used for assembly.To estimate the quality and reliability of assembly results, clean reads were mapped to the assembled chloroplast genomes.There were no low read coverage regions detected in assembled chloroplast genomes of Uncaria species (Supplementary Figure S1).The nine chloroplast genomes of Uncaria species ranged in length from 153,780 bp (U. scandens) to 155,177 bp (U. sessilifructus).The chloroplast genomes of the nine Uncaria species exhibited a typically circular tetrad structure comprising a large single-copy (LSC) region, a small single-copy (SSC) region, and two inverted repeat (IR) regions (Figure 1).As shown in Table 1, the lengths of LSC regions ranged from 85,311 bp (U. lancifolia) to 85,749 bp (U. macrophylla), and the lengths of SSC regions varied from 16,989 bp (U. scandens) to 18,167 bp (U. lancifolia).The lengths of IR regions had inconspicuous variation among Uncaria species and ranged from 25,623 bp (U. rhynchophylloides) to 25,690 bp (U. sessilifructus).Each of the nine chloroplast genomes possessed 129 to 130 genes, comprising 84 to 85 PCGs, 37 tRNA genes, and eight ribosomal RNA (rRNA) genes (Table 1; Supplementary Table S4).Compared with other Uncaria species, U. scandens lost ccsA gene in its chloroplast genome, which caused a reduction in the length of chloroplast genome.In each chloroplast genome of Uncaria species (Supplementary Table S4), 18 genes containing intron were found, and there were three genes (rps12, ycf3, and clpP) with two introns, whereas the others possessed one intron.Mauve alignment results showed good synteny of the nine chloroplast genomes, and gene rearrangements were not observed (Supplementary Figure S2).Furthermore, the region borders and nearby genes of the chloroplast genomes were compared (Figure 2).Among the nine Uncaria species, the ycf1 gene spanned the IRa/SSC junction with the length of 4,480 bp to 4,530 bp located in SSC region.However, another short ycf1 fragment situated at SSC/IRb border was detected as pseudogene, which approximately lost a fragment of 4,500 bp in length compared with complete ycf1 gene.

Sequence divergence and mutational hotspots
The mVISTA result exhibited a high sequence identity among the chloroplast genomes of nine Uncaria species, and a majority of highly variation regions were founded in non-coding regions (Supplementary Figure S3).The nucleotide diversity (Pi) value ranged from 0 to 0.2931 with the average of 0.004525 (Figure 3).The average Pi value of IR regions (0.0009849) was much lower than that in LSC (0.005740) and SSC regions (0.009168), which indicated lower sequence discrepancy appearing in IR regions.A total of 10 highly variable loci were found in chloroplast genomes of Uncaria species.Among them, eight highly variable loci (trnH-GUG-psbA, atpH-atpI, trnT-UGU-trnL-UAA, rps16-trnQ-UUG, rpl32-trnL-UAG, trnS-GCU-trnG-UCC, petA-psbJ, and ndhF-rpl32) were found in non-coding regions, and two highly variable loci (ccsA and ycf1) were detected in coding regions.In addition, we calculated interspecific genetic distance of Uncaria species.On the basis of the complete chloroplast genome sequences, the pairwise genetic distances varied from 0.009 (U. scandens-U.homomalla) to 0.064 (U. sessilifructus-U.sinensis), with the average of 0.046 (Supplementary Table S5).On the basis of common proteincoding sequences of chloroplast genome, the pairwise genetic distances ranged from 0.007 (U. scandens-U.homomalla) to 0.0052 (U. sessilifructus-U.sinensis), with the average of 0.034 (Supplementary Table S6).

Codon usage bias
The GC content at three codon positions (GC1, GC2, and GC3) of the nine Uncaria species was less than 50% (Table 2).The correlation analysis exhibited that GC1, GC2, and GC12 were not significantly correlated with ENC, whereas GC3 and GC3s were significantly correlated with ENC (Supplementary Table S7).The RSCU values among Uncaria species were very similar (Figure 5), and UUA codon encoding Leucine had the maximum RSCU value (1.93-1.96).Each chloroplast genome possessed 29 high-frequency codons (RSCU > 1), and there were 16 codons ended with U base, 12 codons ended with A base, and only one codon ended with G base.However, among 30 low-frequency codons (RSCU < 1), there were only two codons ended with A base, and the others were ended with G/C base.The third position of codon tended to use A/U base in chloroplast genomes of Uncaria species.
The ENC plot exhibited that the point distribution patterns of coding sequences were similar among Uncaria species (Supplementary Figure S4).Some coding sequences were distributed close to the expected curve, whereas most of coding sequences were distributed far away from the expected curve.The results revealed that natural selection also affected the codon usage preference of chloroplast genomes of Uncaria species, other than mutation pressure, and their strength varied among genes.The neutral plot could further estimate the strength of mutation pressure and natural selection driving the formation of codon usage bias (Supplementary Figure S5).GC3 was not significantly correlated with GC12, and the slope of the regression line was much lower than 0.5, which implied that natural selection was the dominant factor affecting codon usage bias of Uncaria chloroplast genomes.

Adaptive evolution
As shown in Figure 6, the rbcL gene was the only gene with dN/dS value higher than 1, and dN/dS values of other genes were lower than 1, which indicated that most of the PCGs in chloroplast genomes of Uncaria species were mainly subjected to negative selection in the process of evolution.In addition, site model was employed to detect potentially positive selection sites of each gene.Four genes were identified to have positive selection sites (Figure 7; Supplementary Table S8).Among them, the rbcL gene possessed five positive selection sites, the ycf2 gene possessed two positive selection sites, and ndhF and rps8 genes separately contained one positive selection site.

Phylogenetic analysis
We used the complete chloroplast genome sequences and common protein-coding sequences to reconstruct phylogenetic trees of Uncaria and Rubiaceae species (Figure 8; Supplementary Figure S6).The topologies of the two phylogenetic trees were highly similar, and all branches within the genus Uncaria had high values of neighbor joining bootstrap support (NJBS), maximum likelihood Comparison of LSC, SSC, and IR boundary regions among the chloroplast genomes of Uncaria species.j indicates a pseudogene.

FIGURE 3
The nucleotide diversity (Pi) of the chloroplast genomes of nine Uncaria species.Ten highly variable loci (Pi > 0.015) are marked out.

Discussion
In this study, the nine chloroplast genomes of Uncaria species exhibited a conserved circular tetrad structure, which was also found in other Rubiaceae species (Zhang et al., 2021;Amenu et al., 2022).Owing to the loss of ccsA gene, the length of SSC region of U. scandens was greatly lower than other Uncaria species.The ccsA gene encodes a protein to mediate heme attached to Ctype cytochrome (Li et al., 2021b).The pseudogenization and the loss of ccsA gene were found in most heterotrophic plants but were rarely found in autotrophic plant (Wicke and Naumann, 2018;Li et al., 2021b).Further studies were needed to judge whether the ccsA gene was transferred to nuclear genome of U. scandens.Expansion and contraction of IR regions are closely associated with the length change of chloroplast genome, which can cause gene loss, gene duplication, gene rearrangement, and pseudogenization in some angiosperms (Li et al., 2020).Because of the ycf1 gene spanning across SSC/IRa boundary, an incomplete fragment of the ycf1 gene appeared at IRb/SSC boundary in each chloroplast genome of Uncaria species.Although the fragment had intact open reading frames, it was identified as a pseudogene in many prior studies (Sheng et al., 2021b;Zhang et al., 2021;Bai et al., 2023).The gene distributions of region boundaries were highly similar among the chloroplast genomes of Uncaria species, and minor contraction and expansion of IR regions were found.
In spite of relatively low sequence variation levels among the chloroplast genomes of Uncaria species, 10 highly variable loci were detected, which could serve as potentially specific DNA molecular markers in the identification and phylogenetic research studies of Uncaria species.Most of highly variable loci were located in noncoding regions implying that non-coding regions mutated more quickly than coding regions.Furthermore, the IR regions exhibited higher conservation than SSC and LSC regions, and no highly variable loci were detected in IR regions except for ycf1.The higher conservation may be associated with copy correction between IR regions through gene conversion and replication (Liu et al., 2022).
SSRs, extensively distributed in the eukaryotic chloroplast genome, were regarded as effective molecular markers applicable The dN/dS value of common protein-coding genes calculated by DnaSP. to the studies of species identification, individual genetic difference.and population evolution (Li et al., 2020;Wu et al., 2021).The size, type, and number of SSRs varied among different Uncaria species.A large proportion of SSRs were mononucleotide repeats and biased toward A/T base, which were similar to the common SSR characteristics of chloroplast genomes in angiosperms (Li et al., 2020).Moreover, the non-coding and single-copy regions were the main distribution regions of SSRs, probably associated with the high-nucleotide polymorphism of these regions.
Codon usage bias means the frequency discrepancy of synonymous codon usage in organisms (Sheng et al., 2021a).It is universally assumed that codon usage bias can reveal the origin, evolution, and mutation modes of genes or species and have a significant impact on protein expression and gene function (Wang et al., 2020).The GC content of different codon position of the nine Uncaria species was lower than 50%, and most of high-frequency codons were ended with A/U base, which showed a high preference to use A/U-ending codons in chloroplast genomes of Uncaria species.Prior studies demonstrate that the nuclear genes preferred to use A/U-ending and G/C-ending codons in dicots and monocots, respectively (Zhang et al., 2012).However, A/Uending codons were generally found as preference codons in chloroplast genomes (Nie et al., 2013).Furthermore, there were no codons detected with an RSCU value higher than 2, and the ENC values were higher than 45, implying a weak codon usage preference of chloroplast genomes of Uncaria species.Many factors influenced codon usage bias, but natural selection forces and mutation pressure were universally regarded as the major factors, which were extensively used to explain intraspecific and interspecific codon usage variation (Zhang et al., 2012;Sheng et al., 2021a).Mutation pressure can act on nucleotide composition preference by shuffling G/C and A/T pairs.Natural selection can cause the codon preference by maximizing efficiency of protein production in highly expressed genes (Duan et al., 2021).The results of ENC plot and neutral plot indicated that mutation pressure and natural selection were the important factors affecting the codon usage bias of chloroplast genomes of Uncaria species, especially the natural selection made the primary contribution.
Although a majority of PCGs in chloroplast genomes of Uncaria species were mainly subjected to negative selection in the process of evolution, four genes (rbcL, ndhF, rps8, and ycf2) were detected to have undergone positive selection.The rbcL gene is responsible for encoding the large subunit of ribulose-1,5bisphosphate carboxylase/oxygenase (Rubisco), which catalyzes carbon fixation of photosynthesis (Singh et al., 2017).On account of being the target of various environmental selection factors such as drought level, temperature, and carbon dioxide concentration, the rbcL gene was usually under positive selection, which is used as the common molecular marker to explore phylogenetic relationships of plants (Wang et al., 2022a;Zhu et al., 2022).The ndhF gene encodes subunit protein of nicotinamide adenine dinucleotide (NADH) dehydrogenase complex participating in photosynthetic electron transport.Prior studies indicated that ndh genes enabled plants to survive in diverse stressful terrestrial conditions and to maintain efficient photosynthesis (Sabater, 2021).The adaptive evolution of ndhF gene could possibly influence the energy transformation and resistance to environmental stresses (Wang et al., 2022a).The rps8 gene encodes small ribosomal subunit protein that is involved in protein translation.The exact functional role of the ycf2 gene remained unclear.Several studies indicated that the ycf2 gene encoded products that were important to chloroplast protein import and cell survival (Kikuchi et al., 2018;Huang et al., 2021).
The classification and phylogenetic relationships of Uncaria species have been still under debate (Chen and Taylor, 2011;Zhao et al., 2018).Prior phylogenetic research studies using different molecular markers found that some clades within Uncaria showed low support rates (Zhang et al., 2015;Zhu et al., 2018;Liu et al., 2023).For example, Zhang et al. (2015) employed NJ method to reconstruct a phylogenetic tree of Uncaria species based on ITS2 sequences, which showed that U. homomalla and U. hirsuta formed a monophyletic clade with the support rate of 21% (Zhang et al., 2015).In this research, the topologies of the two phylogenetic trees reconstructed by the complete chloroplast genome sequences and common protein-coding sequences showed high similarity, and all the clades within the genus Uncaria were well supported.Uncaria species appeared to be close to Naucleeae species in our two phylogenetic trees, which was in agreement with prior studies based on DNA molecular markers (Razafimandimbison and Bremer, 2001;Manns and Bremer, 2010).Our results did not support the view of the genus Uncaira belonging to Cinchoneae."Flora of China" suggested that further investigation was needed to determine whether U. scandens and U. homomalla should be regarded as two distinct species due to their high morphological similarity (Chen and Taylor, 2011).U. homomalla and U. scandens formed a monophyletic clade in our phylogenetic trees.Nonetheless, compared with U. homomalla, U. scandens lost ccsA gene, and their numbers and distributions of SSRs also had differences.Moreover, Zhu et al. (2018) used NJ, MP, and BI methods to construct the phylogenetic trees based on ITS sequence, which indicated that U. homomalla and U. scandens were not clustered into a monophyletic clade.Here, we agreed that U. homomalla and U. scandens were two distinct species, but it is necessary to expand the sampling range and sample size to further confirm the reliability of this view.On the basis of the chemical composition and phylogenetic analysis, Zhao et al. (2018) proposed that U. sinensis may be a variant of U. rhynchophylla.The phylogenetic analysis of ITS or ITS2 sequences also showed that U. rhynchophylla and U. sinensis formed a monophyletic clade (Zhu et al., 2018;Liu et al., 2023).However, U. rhynchophylla and U. sinensis were not clustered as a monophyletic clade in the present study.The phylogenetic conflicts may be owing to incomplete lineage sorting, evolutionary rate difference and introgressive hybridization (Zhang et al., 2022).From morphological points of view, the stipules of U. sinensis were entire or shallowly emarginate, obviously different from U. rhynchophylla, which had deeply bifid stipules.In addition, the diameter of flowering head of U. sinensis (12 mm to 15 mm, excluding corolla) was larger than that in U. rhynchophylla (4 mm to 8 mm, excluding corolla).The diameter of fruiting head also showed a difference between U. sinensis (20 mm to 30 mm) and U. rhynchophylla (10 mm to 20 mm).Therefore, U. sinensis was not a variant of U. rhynchophylla based on phylogenetic analysis of chloroplast genomes and morphological features.Although Ridsdale (1978a) suggested U. rhynchophylloides as a synonym of U. rhynchophylla, "Flora of China" presented that they were two distinct species due to their differences in stipules morphology and diameter of flower head (Chen and Taylor, 2011).Our study also exhibited that U. rhynchophylloides and U. rhynchophylla were not the same species, as they were located far apart in our phylogenetic trees, which was similar to previous studies based on ITS or ITS2 sequences (Zhu et al., 2018;Liu et al., 2023).In summary, we clearly elucidated the phylogenetic relationships of Uncaria species and resolved some taxonomic disputes.

Conclusion
In this study, the nine chloroplast genomes of Uncaria species were highly conserved in structure, gene content, SSR distributions, sequence divergence, and codon usage bias.Ten highly variable loci and four positive selection genes were identified, which provide a reference for further studies on development of specific molecular markers and adaptive evolutionary evaluation of Uncaria species.In addition, the phylogenetic relationships of Uncaria and closely related species were clearly elucidated.The phylogenetic analysis results based on chloroplast genome showed the genus Uncaira belonging to Naucleeae.U. sinensis was not a variant of U. rhynchophylla.U. rhynchophylloides and U. rhynchophylla were not the same species.In summary, these findings are helpful for better understanding the evolutionary patterns and phylogenetic relationships of Uncaria species.
FIGURE 4 Analysis of SSRs in the chloroplast genomes of Uncaria species.(A) The type and number of SSRs.(B) The number of SSRs distributed in LSC, SSC, and IR regions.(C) The number of SSRs distributed in coding sequence (CDS), intron, and intergenic spacer (IGS) regions.

FIGURE 5 RSCU
FIGURE 5RSCU value among the chloroplast genomes of Uncaria species.
FIGURE 7 Four positive genes detected by site model.The positive selection sites with the posterior probabilities > 0.95 are marked out.(A) ndhF gene.(B) rbcL gene.(C) rps8 gene (D) ycf2 gene.

FIGURE 8
FIGURE 8Phylogenetic tree of Uncaria and closely related species based on the complete chloroplast genome sequences.

TABLE 1
Summary of the chloroplast genome features of Uncaria species.

TABLE 2
Main parameters of codon usage bias.