Comparative chloroplast genomes of Argentina species: genome evolution and phylogenomic implications

The genus Argentina Hill belongs to the tribe Potentilleae Sweet and contains approximately 75 species predominantly distributed in the Sino-Himalayan region and the Malesian archipelago. So far we have less knowledge on the phylogenetic relationships within Argentina owing to limited sampling of Argentina taxa or gene fragments in previous studies. Moreover, to date there is no phylogenetic study on Argentina from the perspective of comparative chloroplast (cp) genomics. Here we performed comparative genomic analyses on the cp genomes of 39 accessions representing 18 taxa of Argentina. The Argentina cp genomes presented the typical quadripartite structure, with the sizes ranging from 155 096 bp to 157 166 bp. The 39 Argentina cp genomes contained a set of 112 unique genes, comprising four ribosomal RNA (rRNA) genes, 30 transfer RNA (tRNA) genes, as well as 78 protein-coding genes (PCGs). The cp genome organization, gene content and order in Argentina were highly conserved, but some visible divergences were present in IR/SC boundary regions. Ten regions (trnH-GUG-psbA, trnG-GCC-trnfM-CAU, trnD-GUC-trnY-GUA, rpl32-trnL-UAG, atpH-atpI, rps16-trnQ-UUG, trnS-GCU-trnG-UCC, ndhF-rpl32, trnR-UCU-atpA, and accD-psaI) were identified as excellent candidate DNA markers for future studies on species identification, population genetics and phylogeny of Argentina. Our results indicated that Argentina is monophyletic. In the current sampling, the A. smithiana - A. anserina clade was sister to the remainder of Argentina. Our results corroborated the previous taxonomic treatments to transfer A. phanerophlebia and A. micropetala from the genus Sibbaldia L. to Argentina. Our results showed close relationships among A. stenophylla, A. microphylla, A. taliensis, and A. tatsienluensis, congruent with previous studies based on the morphology of these species. Twenty-six genes (rps3, rps15, rps16, rps19, rpl16, rpl20, rpl22, rpoA, rpoB, rpoC1, rpoC2, atpA, atpF, psbB, psbF, ndhA, ndhB, ndhC, ndhD, ndhF, rbcL, accD, ccsA, matK, ycf1, ycf2) were with sites under positive selection, and adaptive evolution of these genes might have played crucial roles in Argentina species adaptation to the harsh mountain environment. This study will facilitate future work on taxonomy, phylogenetics, and adaptive evolution of Argentina.

Here we conducted comparative genomic analyses on the cp genomes of 39 accessions representing 18 taxa of Argentina by means of bioinformatics.The objectives were to (1) analyze features of Argentina cp genomes, (2) screen divergence hotspots as candidate molecular markers and potential specific barcodes for Argentina, (3) provide insights into the phylogenetic relationships among Argentina species, and (4) explore the adaptive evolution of chloroplast genes of Argentina species.This study will contribute to further studies on species identification, population genetics, phylogenetics, and cp genome evolution of Argentina, and provide a theoretical basis for conservation efforts of Argentina.

Taxon sampling, DNA extraction, and Illumina sequencing
In our study, a total of 18 Argentina taxa represented by 39 Argentina accessions were sampled.In addition, seven Potentilla species were chosen as outgroups in the phylogenomic analyses based on previous work (Feng et al., 2017;Zhang et al., 2017;Li et al., 2024).GenBank accession numbers and voucher information of the sampled taxa are presented in Table 1.For fresh or silica-dried leaf samples, genomic DNA was isolated by the CTAB protocol (Doyle and Doyle, 1987).For herbarium leaves sample, genomic DNA was extracted by the SDS method (Dellaporta et al., 1983;Johnson et al., 2023).The extracted DNA was sheared into fragments using sonication.These fragments were used for short-insert library construction with 300 bp insert size by NEBNext ® Ultra ™ II DNA Library Prep Kit for Illumina ® .Finally, the Illumina HiSeq platform in Novogene was used to sequence the pooled libraries.

Chloroplast genome assembly and annotation
Trimmomatic v. 0.33 (Bolger et al., 2014) was utilized in order to clean adapters in raw high-throughput sequencing data.FastQC v. 0.11.8 (Andrews, 2018) was employed to evaluate the quality of the filtered paired-end reads.Then the filtered raw reads of each accession were used for assembly of the cp genome sequence by NOVOPlasty (Dierckxsens et al., 2017), with the parameters of genome range 120000-220000 and K-mer 39.In cp genome sequence assemblies of Argentina species, rbcL gene in Argentina phanerophlebia (GenBank accession no.MT114192) was set as the seed and its cp genome was used as the reference.Using cp genome of Argentina phanerophlebia (MT114192) as the reference, cp genome annotation of Argentina species was performed using Geneious Prime (Kearse et al., 2012) by transferring annotations.
The initial annotation results were then manually checked and adjusted in Geneious Prime.

Chloroplast genome comparative analyses
The statistics of genome size, LSC/SSC/IR size, number of genes and GC content were summarized in Geneious Prime.With the purpose of detecting potential rearrangements and inversions, the cp genomes alignment of Argentina species was implemented in MAUVE v. 2.4.0 using the progressiveMauve algorithm (Darling et al., 2004(Darling et al., , 2010)).The divergence in the LSC/IR/SSC boundaries among 39 cp genomes in Argentina was compared and illustrated to detect the IR expansion/contraction.The mVISTA program (Frazer et al., 2004) was employed to visualize the divergence level among 39 Argentina cp genomes using Shuffle-LAGAN mode and with A. anserina (L.) Rydb. 1 as a reference.To screen divergence hotspots, we extracted the coding and noncoding regions separately in 39 Argentina cp genomes by "extract annotations" in Geneious Prime and furthermore aligned these homologous loci by MAFFT v. 7.450 (Katoh and Standley, 2013).Finally, the nucleotide variability (Pi) of each homologous locus was calculated in DnaSP v. 6.0 (Rozas et al., 2017).

Phylogenetic analyses
Phylogenetic relationships among the 18 Argentina taxa were inferred by maximum likelihood (ML) and Bayesian inference (BI) methods, with seven Potentilla species as outgroups to root the trees.A total of 46 cp genome sequences were aligned in MAFFT v. 7.450 (Katoh and Standley, 2013) with default parameters.Software trimAL v. 1.4 (Capella-Gutieŕrez et al., 2009) was subsequently used to trim the alignment properly.The ML analysis was performed by RAxML v. 8.2.12 (Stamatakis, 2014), under GTRGAMMA model (option "-m GTRGAMMA") as suggested in the manual, with analysis of rapid bootstrap and search for best-scoring ML tree (option "-f a") and 1000 replicates bootstrap (option "-N 1000").MrBayes v. 3.2.7a(Ronquist et al., 2012) was employed to conduct the BI analysis under best-fit model GTR+I+G as recommended by PartitionFinder2 (Lanfear et al., 2017) using the Corrected Akaike Information Criterion (AICc; Sugiura, 1978) according to Posada and Buckley (2004).Four parallel runs were performed, each run with three heated and one cold Markov chain Monte Carlo (MCMC) chains for 6000 000 generations, sampling one tree every 100 generations as well as starting from random trees.The initial 25% of the trees were regarded as burn-in and discarded.A majority-rule consensus tree was generated using the remaining trees.FigTree v. 1.4.4 (Rambaut, 2018) was finally used to visualize the phylogenetic trees.

Adaptive evolution analyses
To identify selection pressures on the Argentina cp genomes, nonsynonymous (Ka), synonymous (Ks), and Ka/Ks ratios of 78 proteincoding genes (PCGs) of 39 Argentina accessions were calculated, with Potentilla reptans as the reference.Geneious Prime (Kearse et al., 2012) was employed to extract the 78 PCGs shared among the Argentina species and Potentilla reptans.The amino acids sequences and the relative nucleotide sequences were then aligned and converted into codon alignments by ParaAT v.2.0 (Zhang et al., 2012) with MAFFT as the multiple sequence aligner and with the 11th genetic code (-c 11).The KaKs_Calculator 2.0 program (Wang et al., 2010) was subsequently utilized for the analysis of Ka, Ks, and Ka/Ks ratios, with the 11th genetic code and the default model averaging (MA) method.
We also used site models in CodeML (Yang, 2007) executed in EasyCodeML (Gao et al., 2019) to detect positively selected sites of PCGs in 39 Argentina cp genomes.Firstly, 78 PCGs common to the Argentina species and Potentilla species were extracted in the cp genomes by Geneious Prime (Kearse et al., 2012).Each PCG was aligned according to its codons under MAFFT, followed by manually removing stop codons, and used as input for EasyCodeML.Moreover, these alignments were concatenated into a supermatrix and then the ML tree was established by RAxML v. 8.2.12 (Stamatakis, 2014) as an input tree.Likelihood ratio tests (LRTs) of M7 (beta) vs. M8 (beta and w >1) and M8a (beta and w = 1) vs. M8 were performed for detecting positive selection sites.If the LRTs were significant (p-values < 0.05), the Bayes empirical Bayes (BEB) (Yang et al., 2005) analysis was adopted in order to identify positively selected sites with the posterior probabilities threshold of 0.95.
Mauve alignment analysis of 39 Argentina cp genomes showed no gene rearrangement and inversion and Argentina cp genomes had good collinearity (Supplementary Figure S1).Although the cp genome organization, gene content and order in Argentina were highly conserved, some visible divergences were present in IR/SC boundary regions (Figure 2).The variation of cp genome size across land plants is mainly attributed to expansion and contraction of the IR (Ravi et al., 2008;Mower and Vickrey, 2018).Comparative analysis of IR boundaries indicated that border genes were identical among Argentina cp genomes, but slight differences existed in lengths of these genes (ndhF and ycf1) and relative positions of these genes to the boundaries.In junctions of the LSC/IR and SSC/ IR regions, the genes rps19-rpl2-trnH and ycf1-ndhF were located, with rpl2 and ycf1 duplicated or partially duplicated respectively in IR regions.The LSC/IRb junction (JLB) occurred between rps19 gene and one rpl2 gene.Gene rps19 with 279 bp in length occurred entirely in the LSC region with 6-19 bp away from the JLB, while rpl2 with 825 bp in length was placed completely in the IRb region, and the distance between the rpl2 and the JLB were 48-74bp.The IRb/SSC junction (JSB) was located on the truncated ycf1 (yycf1) and ndhF.The yycf1 pseudogene with 1071-1842 bp in length spanned the JSB boundary, with a length of 3-30 bp and 1062-1824 bp in the SSC and IRb region, separately.The length of ndhF was 2238-2280 bp, and the overlap between ndhF and yycf1 existed, in which ndhF expanded into the IRb region for 0-46 bp.The length of ycf1 was 5703-5802 bp and the gene crossed over the SSC/IRa junction (JSA), with a length of 3909-4719 bp and 1062-1824 bp in the SSC and IR region.The IRa/LSC junction (JLA) was located between the other rpl2 and trnH-GUG.Located in the IRa region, rpl2 was separated from the JLA by a spacer varying from 48 bp to 74 bp.Gene trnH-GUG with 74bp in length was located in the LSC region, with 0-11bp away from the JLA.In general, cp genomes  boundaries of Argentina species were greatly conserved and no obvious contraction and expansion existed in the IR region, which further supported that IR boundary shifts were relatively minor in closely related species (Zhu et al., 2016;Liu et al., 2018;Ren et al., 2022;Zhang et al., 2022).The similar findings were also observed in other genera in the tribe Potentilleae, such as Alchemilla (Rono et al., 2020), Chamaerhodos (Li et al., 2021a), and Fragaria (Li et al., 2021b).In addition, the boundary features of different populations of the same species were basically stable, although there are some exceptions.Comparison of the boundaries of the large single-copy (LSC), the small single-copy (SSC) and inverted repeat (IR) regions among 39 Argentina chloroplast genomes.JLB: junction line between LSC and IRb; JSB: junction line between SSC and IRb; JSA: junction line between SSC and IRa; JLA: junction line between LSC and IRa.The figure is not to scale with respect to sequence length and only shows relative changes at or near the IR/ SC boundaries.
The mVISTA program (Frazer et al., 2004) was employed to visualize the divergence level among 39 Argentina cp genomes using Shuffle-LAGAN mode and with A. anserina 1 as the reference.The divergence level of the 39 Argentina cp genomes was investigated and plotted using mVISTA (Supplementary Figure S2).Global comparisons of the genomic sequences revealed that at the genome-scale level, Argentina cp genomes were conserved, with a high degree of similarity and synteny.Additionally, the IR regions were more conserved than the SSC and LSC regions, which is a common phenomenon in most angiosperm cp genomes.Overall, compared with the coding regions, the non-coding regions were more divergent, with the highly variable non-coding regions occurred in the intergenic spacers (IGS).This finding was consistent with studies on cp genomes of other angiosperm taxa (e.g., Li et al., 2021a;Zhang et al., 2022;Bai et al., 2023).
In summary, the size, structure, GC contents, gene content and order of the Argentina cp genomes were highly conserved, only with slight differences in the cp genome length, GC content, and IR/SC boundary region for each species.

Phylogenetic implications
In recent years, cp genome sequences have been widely applied in plant phylogenetic studies, owning to its own merits such as containing more variable sites than the single fragment or the combination of several fragments (Wu et al., 2021;Zhang et al., 2021).In our present study, phylogenetic relationships among Argentina species were explored based on the cp genome sequences using ML and BI methods.The ML and BI analyses recovered identical topologies, with high maximum likelihood bootstrap support values (ML BS) and posterior probabilities (PP) across most nodes.Therefore, only the ML tree was presented in Figure 4.All the currently sampled Argentina species were clustered together with high support (Figure 4, ML BS = 100%, PP = 1.00).Consistent with former studies (Feng et al., 2015(Feng et al., , 2017;;Li et al., 2024), our phylogenetic results strongly supported that Argentina is monophyletic.Based on our current sampling, it is possible to get a The close relationship of these species indicated by our molecular phylogenetic analyses was congruent with previous studies based on the morphology of these species (Ikeda and Ohba, 1999;Li et al., 2003).

Adaptive evolution
Ka/Ks ratios of PCGs in Argentina cp genomes were less than 1 (Supplementary Table S3), indicating that these genes probably experienced purifying selection (Nei and Kumar, 2000).Adaptive evolution typically occurs at only a few amino acid sites, so averaging rates across all sites can result in low ability to detect positive selection (Anisimova et al., 2001).Considering that KaKs_Calculator 2.0 program (Wang et al., 2010) explores selection pressures by MA method, CodeML (Yang, 2007) implemented in EasyCodeML (Gao et al., 2019) was used to identify positively selected sites of PCGs of 39 Argentina cp genomes.The LRTs of M7 vs. M8 and M8a vs. M8 for PCGs of Argentina cp genomes were significant (p-values < 0.05), indicating that M8 (model of positive selection) should be accepted (Table 2).Based on BEB analysis, a total of 26 genes with sites under positive selection were detected (Table 3).The number of positively selected Comparison of the nucleotide diversity (Pi) values in 39 Argentina chloroplast genomes.Li et al. 10.3389/fpls.2024.1349358Frontiers in Plant Science frontiersin.orgsites among these genes was 1-66: 11 genes (atpA, rpoC1, rpoB, ndhC, psbF, rpl20, psbB, rpl16, rps19, ndhB, rps15) possessing one site, three genes (rpoA, rps3, ndhA) containing two sites, five genes (rps16, atpF, accD, rpl22, ycf2) having three sites, two genes (ccsA, ndhD) harboring six sites, rbcL with 10 sites, rpoC2 with 12 sites, two genes (matK, ndhF) with 13 sites, and ycf1 containing the largest number of sites.These 26 genes included four small subunit of ribosome genes (rps3, rps15, rps16, rps19), three large subunit of ribosome genes (rpl16, rpl20, rpl22), four DNA dependent RNA polymerase genes (rpoA, rpoB, rpoC1, rpoC2), two subunits of ATP synthase genes (atpA, atpF), two subunits of photosystem II genes (psbB, psbF), five subunits of NADH-dehydrogenase genes (ndhA, ndhB, ndhC, ndhD, ndhF), subunit of Rubisco gene (rbcL), subunit of Acetyl-CoA-carboxylase (ACCase) gene (accD), c-type cytochrom synthesis gene (ccsA), maturase gene (matK), and ycf1 and ycf2.The majority of Argentina species occurs at (sub)alpine areas above 3000 meters altitude (Kalkman, 1989(Kalkman, , 1993;;Ikeda and Ohba, 1999;Li et al., 2003;Kechaykin and Shmakov, 2016).High-altitude mountains habitats are characterized by low water availability, intense ultraviolet radiation, strong wind and/or abrasion, low air density, extreme temperatures or large diurnal/seasonal thermal fluctuations (Körner, 2003;Körner et al., 2011).The occupation of the high-altitude mountain environments indicates that Argentina species inhabiting these areas have adapted to conditions of high-altitude habitats.Species of Argentina show several morphological adaptations to high-altitude mountains habitats, such as small and compact rosettes, pinnate leaves with small leaflets, plant surface covered with hairs (Kalkman, 1989(Kalkman, , 1993;;Ikeda and Ohba, 1999;Li et al., 2003).Here possible evidence of positive selection in chloroplast coding genes was detected to reveal the adaptation of Argentina species to high-altitude mountains habitats at the molecular level.A total of 26 genes with sites under positive selection were detected and adaptive evolution of these genes might have helped Argentina species to adapt to the harsh mountain environment.As important constituents of protein synthesis machinery, cp ribosomal proteins participate in various processes of plant growth, development as well as reaction to unfavorable conditions (Schippers and Mueller-Roeber, 2010;Fleischmann et al., 2011;Tiller et al., 2012;Tiller and Bock, 2014;Zhang et al., 2016;Robles and Quesada, 2022).Adaptive evolution of these seven subunits of ribosome genes (rps3, rps15, rps16, rps19, rpl16, rpl20, rpl22) may be helpful for the normal growth and development of Argentina species under the extreme environments.Previous studies revealed that 30S ribosomal protein S15 is essential for the maintenance of high cp translational capacity under the cold stress (Fleischmann et al., 2011).Four enzymatic subunits a, b, b' and b'' encoded by rpoA, rpoB, rpoC1 and rpoC2 respectively, constitute catalytic core of the plastid-encoded plastid RNA polymerase (PEP) (Zhelyazkova et al., 2012).Genes of photosystems I and II are only transcribed by PEP promoters, and PEP represents the main transcription machinery of mature chloroplasts (Hajdukiewicz et al., 1997;Zhelyazkova et al., 2012;Kindgren and Strand, 2015).Genes rpoA, rpoB, rpoC1, rpoC2 were all under positive selection, which may be conducive to the transcription of photosynthetic genes of Argentina species in the harsh environments.Generating ATP from ADP using the proton gradient across the membrane (Hahn et al., 2018), the cp ATP synthase is essential for photosynthesis and plant growth (Yamamoto et al., 2023).Six ATP synthase subunit genes (atpA, atpB, atpE, atpF, atpH, and atpI) are encoded by the cp (Wicke et al., 2011), and two out of six genes (atpA and atpF) were subjected to positive selection in the present study.The 47 kDa chlorophyll a-binding protein (CP47) encoded by psbB, acts as the light-capturing antenna for the core complex of photosystem II (PSII) (Hird et al., 1991;Swiatek et al., 2003).The beta subunit of Cytochrome (cyt) b-559 protein encoded by psbF, is essential for proper assembly and activity of PSII reaction center (Pakrasi et al., 1990(Pakrasi et al., , 1991;;Swiatek et al., 2003;Nakamura et al., 2019).The positive selection pressure on psbB and psbF may reflect adaptation of PSII to the alpine environments such as strong light.The ndh genes (ndhA, ndhB, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK) in cp encode eleven NADH-dehydrogenase subunits of the Ndh1-complex bound to the thylakoid membrane (Ifuku et al., 2011).The thylakoid Ndh1-complex is involved in cyclic electron transfer around photosystem I (PSI) and in chlororespiration (Suorsa et al., 2009), which seems to be important in adaptation to stress conditions, such as high light and low temperature (Endo et al., 1999;Rumeau et al., 2007;Yamori et al., 2011).Positive selection of five ndh genes (ndhA, ndhB, ndhC, ndhD, ndhF) in Argentina may reflect the adaptation to alpine environmental stress, such as low temperature and intense light.The rbcL gene encodes the Rubisco large subunit (Wicke et al., 2011).Rubisco catalyzes the assimilation of atmospheric CO2 during photosynthesis (Wilson and Hayer-Hartl, 2018;Whitney and Sharwood, 2021).In land plants, positive selection of rbcL is quite common (Kapralov and Filatov, 2007).Positive selection of rbcL in Argentina is related to adaptation to low CO2 concentrations in the alpine environments.Gene accD encodes the beta carboxyl transferase subunit of ACCase, and ACCase is an essential enzyme that catalyzes de novo fatty acid biosynthesis (Rawsthorne, 2002;Kode et al., 2005).
The accD gene could affect several biological processes such as cp division, leaf development, and seed development and storage compound metabolism (Madoka et al., 2002;Kode et al., 2005;Caroca et al., 2021).The gene ccsA encodes cytochrome c biogenesis protein, which is essential during c-type cytochromes biogenesis in the heme attachment step (Xie and Merchant, 1996).Maturase K protein encoded by the gene matK, is the only putative group II intron maturase of the cp (Neuhaus and Link, 1987).The maturase matK is required for splicing its own and other additional group II introns (Zoschke et al., 2010) and functions in photosynthesis and plant development (Barthet and Hilu, 2007).Essential genes ycf1 and ycf2 in higher plant cp genomes encode products that are indispensable for cell survival (Drescher et al., 2000).In all, the 26 genes with sites under positive selection are associated with biological processes such as self-replication, photosynthesis and biosynthesis, which may have played crucial roles in Argentina adaptation to the harsh mountain environment.

FIGURE 1
FIGURE 1Chloroplast genome map of Argentina species.Genes inside the circle are transcribed clockwise, whereas those outside are transcribed counterclockwise.The dark and light gray area in the inner circle corresponds to the GC content and AT content, respectively.

FIGURE 4
FIGURE 4 Maximum likelihood (ML) tree based on 39 complete chloroplast genome sequences from Argentina, with seven Potentilla taxa as outgroups.Values along branch represent ML bootstrap percentages (only BS < 100 % are shown) and Bayesian posterior probabilities (only PP < 1.00 are shown), respectively.

TABLE 1
Summary of voucher specimens and chloroplast genome characteristics for Argentina species and related outgroups.