A comparison of 25 complete chloroplast genomes between sister mangrove species Kandelia obovata and Kandelia candel geographically separated by the South China Sea

In 2003, Kandelia obovata was identified as a new mangrove species differentiated from Kandelia candel. However, little is known about their chloroplast (cp) genome differences and their possible ecological significance. In this study, 25 whole cp genomes, with seven samples of K. candel from Malaysia, Thailand, and Bangladesh and 18 samples of K. obovata from China, were sequenced for comparison. The cp genomes of both species encoded 128 genes, namely 83 protein-coding genes, 37 tRNA genes, and eight rRNA genes, but the cp genome size of K. obovata was ~2 kb larger than that of K. candle due to the presence of more and longer repeat sequences. Of these, tandem repeats and simple sequence repeats exhibited great differences. Principal component analysis based on indels, and phylogenetic tree analyses constructed with homologous protein genes from the single-copy genes, as well as 38 homologous pair genes among 13 mangrove species, gave strong support to the separation of the two species within the Kandelia genus. Homologous genes ndhD and atpA showed intraspecific consistency and interspecific differences. Molecular dynamics simulations of their corresponding proteins, NAD(P)H dehydrogenase chain 4 (NDH-D) and ATP synthase subunit alpha (ATP-A), predicted them to be significantly different in the functions of photosynthetic electron transport and ATP generation in the two species. These results suggest that the energy requirement was a pivotal factor in their adaptation to differential environments geographically separated by the South China Sea. Our results also provide clues for future research on their physiological and molecular adaptation mechanisms to light and temperature.

In 2003, Kandelia obovata was identified as a new mangrove species differentiated from Kandelia candel. However, little is known about their chloroplast (cp) genome differences and their possible ecological significance. In this study, 25 whole cp genomes, with seven samples of K. candel from Malaysia, Thailand, and Bangladesh and 18 samples of K. obovata from China, were sequenced for comparison. The cp genomes of both species encoded 128 genes, namely 83 protein-coding genes, 37 tRNA genes, and eight rRNA genes, but the cp genome size of K. obovata was~2 kb larger than that of K. candle due to the presence of more and longer repeat sequences. Of these, tandem repeats and simple sequence repeats exhibited great differences. Principal component analysis based on indels, and phylogenetic tree analyses constructed with homologous protein genes from the singlecopy genes, as well as 38 homologous pair genes among 13 mangrove species, gave strong support to the separation of the two species within the Kandelia genus. Homologous genes ndhD and atpA showed intraspecific consistency and interspecific differences. Molecular dynamics simulations of their corresponding proteins, NAD(P)H dehydrogenase chain 4 (NDH-D) and ATP synthase subunit alpha (ATP-A), predicted them to be significantly different in the functions of photosynthetic electron transport and ATP generation in the two species. These results suggest that the energy requirement was a pivotal

Introduction
Mangroves are woody plant communities distributed in tropical and subtropical intertidal zones that play a vital role in reducing the impact of natural disasters and maintaining the coastal ecological environment (Lin, 1999). Kandelia, a typical viviparous mangrove species in the Rhizophoraceae family, is widely distributed from Ganges Delta, Burma, through the South China Sea to southern China, and southern Japan (Tomlinson, 1986;Sheue et al., 2003). Kandelia obovata (known as K. candel until 2003) is one of the most important and dominant mangrove species naturally distributed along the coastal areas of southeastern China with the widest distribution and the highest latitude, including in Hainan, Guangdong, Guangxi, and Fujian, in addition to Hong Kong, Macao, and Taiwan (Li & Lee, 1997;Lin, 1999). As the strongest cold-resistant true mangrove species, it is not only a high-quality mangrove species that has been artificially planted for ecological restoration of northward coastal wetlands in China (Liao & Zhang, 2014), such as Yueqing in Zhejiang Province, but also an ideal plant material suitable for exploring the relationship between genetic variation and geographical distribution of mangrove species.
The genus Kandelia originated from Malabar, India, and Kandelia candel (L.) Druce was named because of its candle-like hypocotyl propagule. Kandelia candel was once regarded as the only species in the Kandelia genus (Hou, 1958;Juncosa & Tomlinson, 1988). Comparative analysis of the morphological characteristics of leaves and propagules of Kandelia populations collected from Brunei, Thailand, and Hong Kong indicated that distinctive differences exist among them (Maxwell, 1995). Naskar & Mandal (1999) further found that the anatomical structures of the Kandelia populations distributed in India and Taiwan were also significantly different. Analysis of the chromosome numbers and karyotypes between populations in the two geographical regions of Kandelia showed 2n = 38 for the Indian populations and 2n = 36 for the Japanese populations (Yoshioka et al., 1984;Das et al., 1995). Based on the abovementioned differences in the morphology, structure, and chromosome number, Sheue et al. (2003) proposed that the populations distributed in the south of the South China Sea (including in India, Burma, Thailand, and Malaysia) continue to be recognized as K. candel, and the populations growing in the north of the South China Sea (including in northern Vietnam, southeastern China, and southern Japan) are recognized as a new species named K. obovata, after their obovate leaves. Although the atpB-rbcL and trnL-trnF fragments of the chloroplast (cp) genome were used as molecular markers in the two geographical populations (Chiang et al., 2001), the detailed features of the whole cp genomes between the two population groups and the functions of differential genes in their adaptation to the respective environments need to be further clarified (Takeuchi et al., 2001;Giang et al., 2006;Geng et al., 2008).
The cp genome has the unique advantages of small haploid size, abundant copy number, relatively conservative gene number and arrangement, lack of recombination, and maternal inheritance (Birky, 2001;Rousseau Gueutin et al., 2015). With the development of next-generation DNA sequencing technologies, the complete cp genome has been widely used for plant identification, phylogeny, and evolution studies. Given that cps are important for interactions between a plant species and its environment (including responses to cold, heat, drought, salt, and light), they serve as hubs in cellular reactions to signal and respond via retrograde signaling (Bobik & Burch-Smith, 2015). Once there are mutated genes in the cp genome, they might play a pivotal role in the plant's adaption to a varied environment.
In this study, we employed genome sequencing technology and bioinformatics to conduct a comparative analysis of whole cp genomes between K. candel samples collected from Malaysia, Thailand, and Bangladesh, and K. obovata samples collected from the coasts of southeastern China. The results obtained from the present study will advance our understanding of differential adaptation to coastal environments across the world conferred by variations in the cp genomes of closely related mangrove species. On a finer scale, this study will also provide a foundation for further unveiling the differential adaptation mechanisms to light and temperature in K. obovata and K. candel.

DNA extraction and sequencing
The whole genomic DNA of 25 individual samples was individually extracted from leaf tissues using the cetyltrimethylammonium bromide (CTAB) method (Doyle, 1991). The extracted DNA was dissolved in 60 mL of TE buffer. High-quality DNA (concentration >35 ng/µL) was used in Illumina resequencing. Kandelia genome sequencing was performed using the Illumina Hiseq X Ten platform with 150 bp paired-end reads.
2.3 Chloroplast genome assembly, gene annotation, and codon usage Before the de novo assembly of the cp genome, quality control of the raw paired-end reads was tested using Trimmomatic v0.40 (Bolger et al., 2014). High quality reads were mapped to cp seed based on bowtie2 v2.3.2 (Langmead & Salzberg, 2012) software with default parameters. The qualitatively assessed paired-end reads from bowtie2 were assembled to produce the cp genome with NOVOPlasty v4.3.1 software (Dierckxsens et al., 2020) using the published cp genome of Kandelia obovata (GenBank Acc. No. MN117072) as a reference (Yang et al., 2019). GetOrganelle software (Jin et al., 2020) was also applied to complete and double-check the assembly of the cp genomes. The coverage of reads in the nuclear genome was measured, which used passed Trimmomatic reads mapping to published nuclear genome based on bowtie2 v2.3.2 (Langmead & Salzberg, 2012) and the mapped genome reads were counted to coverage depth. We also used the same methods to employed the coverage depth of Kandelia cp genoma. The complete cp genome was annotated and corrected using Geseq (Tillich et al., 2017) and CPGAVAS2 (Shi et al., 2019). The circular gene map was visualized using OGDRAW (Greiner et al., 2019). In order to ensure the reliability and accuracy of assemble and annotation results, we have carried out manual Geographic distribution of 25 Kandelia samples collected from 4 different countries including 11 sampling sites for sequencing and assembly the whole chloroplast genomes. The blue solid circle represented Kandelia candel species and the red tangle represented Kandelia obovata species. Xu et al. 10.3389/fpls.2022.1075353 Frontiers in Plant Science frontiersin.org comparing our results with published cp genomes of K. obovata (Yang et al., 2019a and2019b). Two genes, psbI and psbK, were absent in the previously published cp genome of K. obovata (GenBank Acc. No NC042718, MN313722), but they were located in the cp genomes of the mangrove Rhizophoraceae family, such as in Rhizophora apiculata (GenBank Acc. No. MT129631.1) and non-Rhizophoraceae families, such as Avicennia marina in the Acanthaceae family as shown in our previous research (GenBank accession number: MT108381), as well as in our present study. Therefore, we designed primers around gene psbK and performed PCR amplification of cp genomes for further confirmation. The sequence of ATATTTGA ATTTGAATTGAGTTTCGGT was used as psbK-around-F primers. The sequence of GGTTTGTTGGATGTGCTGTGA was used as psbK-around-R primers. The annotated chloroplast genome sequences for the 25 samples of Kandelia have been deposited in the GenBank database under accession numbers successively from ON969308 to ON969332 (Supplementary Table S1). To identify codon usage patterns, all coding sequences (CDS) were subsequently used for the estimation of relative synonymous codon usage (RSCU) through the CUSP program with EMBOSS v6.5.7 with default parameters (Rice et al., 2000).

Comparative chloroplast genome structure analyses
Tandem repetitive sequences were determined using the online program Tandem Repeats Finder (v 4.09) (Benson, 1999) according to the following criteria: match value of 2, mismatch value of 7, delta value of 7, match probability of 85, indel probability of 10, minscore value of 50, and maxperiod value of 500. The tuple sizes were 0, 4, 5, and 7, and the tuple distances were set to 0, 29, 159, and 500. The program REPuter (Kurtz et al., 2001) was used to identify and determine the locations and sizes of forward, reverse, palindrome, and complement sequences having the following parameters: minimum of 30 bp, sequence identity greater than 90%, and maximum computed repeats of 4500. A perl program MicroSAtellite (MISA v2.1) (Beier et al., 2017) identification tool was used to detect simple sequence repeats (SSRs) in the 25 cp genomes. In this study, only perfect repeats were selected for analysis with the following parameters: basic motifs (1~6 bp) and a minimum repeat length of 8 bp (for mono-and di-), 12 bp (for tri-and tetra-), 15 bp (for penta-), 18 bp (for hexa-), and the minimum distance between two SSRs was set to 100. The variations between K. candel and K. obovata were identified by Genome Varscan (parameters: Diff OneInHundred, VarRange 1 -100) and then the primers were designed by Batch Target Region Primer Design with default parameters, the above two tools were used under TBtools v1.099 (Chen et al., 2018) software.
To determine the sequence divergence of the Kandelia cp genomes among the 25 samples, the online genome comparison tool mVISTA (https://genome.lbl.gov/vista/index.shtml) was used with the K. obovata (YQ-2) annotation as the reference. The default parameters were set to align the cp genome in Shufe-LAGAN mode, and the sequence conservation profile was visualized using an mVISTA plot. Based on the comparative genome size results, K. candel (MALA-2) and K. obovata (YQ-2) were selected to visualize the cp genome gene order and the collinear blocks between the two species under Kandelia. The comparison was performed using Mauve v2.3.1 (Darling et al., 2004) with default iterative alignment, seed weight, sum of pairs LCB scoring, and LCB settings. DnaSP v5.10 (Rozas et al., 2017) was applied to determine the level of nucleotide diversity (Pi) among 25 samples, with the K. obovata (YQ-2) cp genome as the reference. When DnaSP6 calculated the Pi value, the step size was set to 650 bp, and the slide window size was 650 bp. The intraspecific Pi values of K. candel and K. obovata were also calculated using these methods. We extended the region with high pi value (Pi >0.01) up and down 1000bp to find the gene closest to this region for further analysis. The ratio of the number of non-synonymous (Ka) substitutions to the number of synonymous (Ks) substitutions (Ka/Ks) of each proteincoding gene was estimated using perl script ParaAT2.0 (Zhang et al., 2012b) with muscle v3.8.31 (Edgar, 2004) and KaKs_Calculator2.0 (parameters: -c 11 -m MS) .

Structural modeling and molecular dynamics simulation of ndhD and atpA
Each pair of genes of Kandelia was aligned using CLC Main Workbench 6 software (https://digitalinsights.qiagen.com/). Loci with variations in the species-specific genes with nonsynonymous mutations were the focus of our attention between K. candel and K. obovata. We searched the genes with mutations and identified mutated sites located in the domain area with SMART (Letunic et al., 2021). The ndhD and atpA genes met the criteria. The structural models of ndhD and atpA were then built.
Templates for homologous modeling were searched via BLASTP (Johnson et al., 2008) on the Protein Data Bank (PDB) database (Burley et al., 2021) with a criterion of sequence identity > 30% (more is better). With a comprehensive consideration of E-value and query coverage, an optimal template was selected by PDB ID 1FX0_A (Groth & Pohl, 2001) for atpA and 6HUM_D for ndhD (Schuller et al., 2019). Model building for K. candel proteins was then carried out on MOE software (https://www.chemcomp.com/Products. htm) with default settings. K. obovata proteins were obtained by modifying the K. candel model with corresponding mutation positions. MOE Protein Builder was used for specific amino acid mutation followed by an energy minimization choice of "Selected Sidechain + Tether BB".
Molecular dynamics (MD) simulations were performed using the GROMACS v2021.5 (Van Der Spoel et al., 2005) with GROMOS 54A7 force field (Nathan et al., 2011). Systems were solvated with a water model (spc216.gro file) in a dodecahedron box with a boundary distance of 1.0 nm to the box edge. To maintain a neutral condition, counter-ions were added to the systems (6 Na + for atpA, 2 Clfor ndhD). The initial structures were then optimized by an energy minimization process using the Steepest Descent method until the energy gradient was ≤ 10 kJ mol -1 nm -1 . Subsequently, a two-stage equilibration process was conducted. The first 100-ps NVT equilibration for temperature control and the second 100-ps NPT equilibration for pressure control. The final MD simulations lasted 50 ns at a constant temperature of 300 K and a constant pressure of 1 atm for each system. Root Mean Square Deviation (RMSD), root-mean-square fluctuations (RMSF), and b-factor value of the residues were calculated from the MD trajectory files using the embedded commands in GROMACS. Pymol v2.4.1 (https://github.com/schrodinger/ pymol-open-source) was used for protein structure visualization and mapping the b-factor value to the corresponding residue on the graph. As the b-factor value went from small to large, the colors changed in the following order: grey -blue -yelloworange -red.
To study the interaction between ATP-A and ADP, molecular docking was carried out on MOE software with a General docking scenario. The potential binding site was selected based on the result of the Site Finder application in MOE.
To explore the consistency of ATP-A and NDH-D sequences changes and genes differentiation in mangrove plants and model angiosperms Arabidopsis thaliana. The proteins sequence of APA-A and NDH-D of six different species were compared and visualized with CLC Main Workbench 6 software (https:// digitalinsights.qiagen.com/). Based on the comparison results, the phylogenetic trees of these two proteins were established respectively, the parameters as follows: Algorithm NJ, replicates 1000.

General features of Kandelia complete chloroplast genomes
A total of 25 samples of Kandelia were used to obtain 5~10 Gb raw reads with a mean coverage of~25× to 50× of whole genomes and 2298× to 3484× of cp genome base coverage. The amplified sequences were consistent with the assembly results, which illustrated that the psbI and psbK genes were present in Kandelia cp genomes (Supplementary Figure S1), and the assembly of the cp genomes was correct ( Figure 3). The circular maps of the 25 cp genomes are shown in Figure 3. The 25 cp genomes ranged from 165,247 to 168,262 bp in length. The total sizes of K. candel ranged from 165,247 to 166,729 bp, while the sizes of K. obovata ranged from 168,070 to 168,262 bp. The length of K. obovata was approximately 2 kb longer than that of K. candel on average. All cp genomes had a circular assembly with a typical quadripartite structure, which was composed of large and small single-copy (LSC and SSC) regions and two inverted repeats (IRs) (Figure 3, Table 1). The LSC length showed the greatest difference between the two species, being 92,380~93,683 bp for K. candel and 94,711~94,908 bp for K. obovata, which resulted in the differential length of the two species. GC contents in both LSC (38.2~8.3%) and SSC (28.1~28.6%) were slightly lower in K. obovata (Supplementary Table S4).

Codon usage analysis
The ATT codon was the most abundant in the Kandelia cp genomes (42.87%). The TGA codon was the least used in Kandelia cp genomes (2.69%). The analysis of relative synonymous codon usage (RSCU) values revealed the dominance of 20 amino acids. RSCU values were computed for K. candel and K. obovata cp genomes based on their protein-coding sequences. Supplementary Figure S2 shows the codon content of 20 amino acids and stop codons in all protein-coding genes in the cp genomes of the two species. The coding regions of K. candel were composed of 24,542, 24,680, and 24,690 codons. In K. obovata, the coding regions were composed of 24,674, 24,681, and 24,680 codons. The most prevalent amino acid was leucine, with 2,579~2,592 codons in K. candel and 2591 codons in K. obovata, while the rarest was cysteine, with 306 codons in K. candel and 307 codons in K. obovata, showing the sequence diversity between the two species. When codons with no preference value were set to 1.00, codons for leucine, serine, and arginine were the most abundant (RSCU = 6), while those for tryptophan and methionine were the least abundant (RSCU = 1) (Supplementary Figure S2). In addition, nearly all A/T-ending codons had RSCU values >1, whereas G/Cending codons had RSCU values <1.

SNPs analysis and construction phylogenetic trees
There were 1522 indels in the 25 cp genomes of Kandelia, comprising 112 (7.36%) SSR-related indels and 1410 (92.64%) non-SSR-related indels. In total, 73.52% of the indels were present in 1141 intergenic space regions, while 2.46% of the indels were located in exons and 24.02% were present in the introns ( Figure 4A). The atpE-atpB-rbcL regions contained 60 A B indels, followed by trnS (GCU)-rps4-trnT (UGU) with 44 indels and trnD (GUC)-psbM-petN with 33 indels. The single nucleotide site generally displayed a high frequency of SSRrelated indels in the present study; however, we also found five indels located in the trnS (GCU)-rps4-trnT (UGU) region, which were 13~56 bp in length. All SSR-related indels were A/T-type SSRs. Three SSR-related indels were located in the coding regions, and the other 41 SSR-related indels were found in the non-coding regions. The sizes of the non-SSR-related indels ranged from 1 to 169 bp, with one bp indel (1,171) being the most common ( Figure 4B). The largest indel (88 bp) in the spacer of trnK (UUU)-trnQ (UUG) was a deletion in K. candel. The second largest indel was in the spacer of ndhG-ndhI (81 bp), which was also a deletion in K. candel. The relationships among the 25 samples were analyzed using principal component analysis (PCA) based on SNPs data, and a phylogenetic tree of the SNP markers in the whole cp genome was identified (Figures 4C, D). Using the longest cp genome of YQ-2 as the reference genome, the other 24 cp genomes were clustered into two main groups, KC and KO, the latter included KO1 and KO2 subgroups. The KC group belonged to K. candel and included all samples (THAI, BGLA-1, BGLA-2, BGLA-3, MALA-1, MALA-2, and MALA-3) of K. candel. The KO1 and KO2 groups belonged to K. obovata, which consisted of DZXY1-3, JLFG1-3, YX1-3, YQ-1, YQ-3, and LZ-1 in the KO1 group and LD-1, LD-2, LD-3, FD-1, and DZG-1 in the KO2 group. The first two principal components comprised 68.32% of the total variance, with PC1 reflecting the variability of the K. candel and K. obovata groups.
The phylogenetic trees of K. candel and K. obovata within the genus Kandelia were established with the UPGMA method by utilizing 76 shared single-copy protein genes (Figure 2A). The phylogenetic trees showed that the individuals of K. candel clustered in one clade as a KC group. Kandelia obovata individuals were gathered in another clade, and samples were gathered further into the KO1 and KO2 subgroups, which was consistent with the phylogenetic tree constructed based on SNPs ( Figure 4D). The KC branch of K. candel was subdivided into two clades: MALA1-3 for one clade and BGLA1-3 together with THAI for another clade. The KO1 and KO2 groups of K. obovata branches showed that most samples from the same location were more closely related; however, some samples collected from neighboring provinces or even far from each other also showed closer kinship. For example, LD 1-3 and DZG-1, both from Hainan Province, were clustered with FD-1 from Fujian Province, and YQ 1-3 from Zhejiang Province was grouped with JLFG-1 from Fujian Province. This might be because Fujian Province was one of the major collection areas for propagules of K. obovata in artificial introductions or transplants. In addition, a phylogenetic tree based on 38 homologous pair genes was constructed among 7 genera, 13 mangrove species, including 39 samples ( Figure 2B). The results illustrated that Kandelia was closely related to Bruguiera and Rhizophora under the Rhizophoraceae family, but they were well differentiated into two species. In terms of the 25 Kandelia samples, this result also coincided with the phylogenetic trees constructed with SNPs ( Figure 4D) and 76 shared single-copy genes in these cp genomes ( Figure 2B), all illustrating that the genus Kandelia consisted of two distinct species.

Comparative analysis of cps genomes structural variations
The distribution of long repeats in Kandelia cp genomic sequences was analyzed and summarized, as shown in Supplementary Figure S3A. There were three types of repeats: tandem repeats (TRs), forward repeats (FRs), and palindromic repeats (PRs) (Supplementary Figure S3A). By searching the repeats of each of the 25 cp genomes, the number of repeats was significantly different in TRs and PRs between K. candel and K. obovata. There were 2~10 TRs in K. candel and 17~22 TRs in K. obovata (Supplementary Figure S3B). The average number of TRs in K. candel was 11, which was significantly lower than that in K. obovata (7). In terms of FRs, the mean number was 37 in the K. candel cp genomes, whereas it was 59 in K. obovata genomes. In contrast, the PR numbers in K. candel were greater than those in K. obovata, at 5~8 and 2~4, respectively. The highest number of repeats (88) was found in the LD-3 cp genome of K. obovata, with 19 TRs, 66 FRs, and three PRs. Among these repeats, 47 of the forward repeats were found with 28~46 bp, six repeats with 47~64 bp, five repeats with 65~83 bp, four repeats with 84~102 bp, three repeats with 103~120 bp, and one repeat with length longer than 121 bp (Supplementary Figure S3C). The lowest number of repeats (a total of 41) was found in the BGLA-1 cp genome of K. candel (nine TRs, 27 FRs, and five PRs), of which 26 of the FRs were found with 28~46 bp, 14 repeats had length lower than the average value (40) of K. obovata, and one repeat had a length of 47~64 bp (Supplementary Figure S3A, C, D).
The distribution of three types of SSRs, namely mononucleotides, dinucleotides, and trinucleotides, is shown in Supplementary Figure S4A. The total number of SSRs were 73 to 78 with an average density from 456.25 SSRs/Mb to 487.5 SSRs/Mb of K. obovata. In K. candel, the SSRs density were from 456.25 SSRs/Mb to 506.25 SSRs/Mb (73~81). AAT repeats were found in Kandelia, and there was only one trinucleotide type duplication per cp genome. In addition, the SSR sequences of the whole cp genome displayed a prevalence o f A T -r i c h m o n o n u c l e o t i d e s ( 9 8~1 0 0 % ) a n d dinucleotides (100%).
The mononucleotide SSR was found to be the most abundant, followed by dinucleotides and then trinucleotides, in both species. They accounted for 91.86%, 6.82%, and 1.32% of total SSRs in K. obovata individuals, on average, while they accounted for 92.33%, 6.37%, and 1.30% in K. candel individuals (Supplementary Table S3). Most of the SSRs in the 25 cp genomes were found in the LSC region (Supplementary Figure S4B). A significant difference in SSR distribution between K. candel and K. obovata was also found in the LSC region. The number of SSRs in the LSC region was 53 (YX-1) to 61 (MALA-1, 2, and 3). The average SSR number in K. candel (59) was greater than that in K. obovata (56). The most mononucleotide SSRs distributed in the LSC region in K. candel were 54, on average, whereas there were 50 in K. obovata.
Similarly, SSRs in the SSC region were mostly mononucleotides, but SSRs as dinucleotides were also found in K. candel individuals. The number of SSRs in the SSC region was 9~12 (average of 14.37%) in K. obovata, which is higher than that in K. candel (i.e., 8~11, average of 11.99%). In the IRa region, four to six mononucleotide SSRs were found. The least SSRs were found in the IRb region with four mononucleotides, except MALA-2 with five mononucleotides. The DZXY cp genome of K. obovata had one dinucleotide SSR in the IRb region, which was not found in any of the other samples. Molecular markers to distinguish these two species were shown as in Supplementary Table S5.
Using the longest K. obovata (YQ-2) cp genome as the reference, the comparative sequence analyses exhibited high sequence similarities and high gene structure order consistency among the 25 cp genomes (Supplementary Figure S5). The colinearity in gene placement between the cp genomes of K. candel and K. obovata was analyzed with MALA-2 and YQ-2 as representatives. The results indicated that gene clusters had no changes, whether in the single-copy regions or in inverted repeat regions between the two species, showing high conservation at the whole cp genome level ( Figure 5). However, a gap of 1,149 bp in YQ-2 was detected in the LSC region ( Figure 5). The gap region was very rich in AT (90.37%).
The nucleotide diversity (Pi) value was calculated using the DnaSP program to evaluate the mutation hotspots in the 25 cp genomes (Figure 6). The results illustrated that the Pi values varied from 0 to 0.04 in the peer window of the 25 Kandelia cp genomes. Seven of these loci, i.e., atpA (0.0156), ndhD (0.0148), matK (0.0152), rps4 (0.0402), atpB (0.0228), and rbcL (0.0157), were located at high values (Pi >0.01) area. Furthermore, sequence coherence analysis showed that the atpA gene in the LSC region and the ndhD gene in the SSC region displayed two distinct patterns between K. candel and K. obovata, which aroused our interest in further comparison. In addition, the matK gene and the rps4 gene in the LSC region displayed a consistent pattern in all samples of K. candel. The atpB gene and the rbcL gene in the LSC region displayed a consistent pattern in all samples of K. obovata. The Pi value of intraspecific cp genome nucleotide diversity in K. candel was between 0 and 0.023, while it was 0~0.015 in K. obovata. The Ka and Ks values estimated for each protein-coding gene showed that they were in the range of 0 to 0.5, and none of the Ka/Ks values for these genes were greater than 1 in the present study. In particular, the Ka/Ks value of atpA gene is 0.10 and that of ndhD gene is 0.26. (Supplementary Figure S6,  Supplementary Table S3).

FIGURE 5
Gene map comparison between K.candel and K.obovata chloroplast genomes aligned using Mauve, showing a big 'gap' of 1,149bp with rich A and T in LSC regions in K. obovata. Xu et al. 10.3389/fpls.2022.1075353 Frontiers in Plant Science frontiersin.org

Structure analysis and molecular dynamics simulation of NDH-D and ATP-A proteins
Analysis of the SNPs between K. candel and K. obovata showed that the ndhD and atpA genes had some SNPs located in highly variable regions ( Figure 6). All the mutation sites of proteins NDH-D and ATP-A were identified in the domain area using the SMART program (Supplementary Figures S7A, S8A). To explore the influences of mutation sites of NDH-D and ATP-A on protein structures, we carried out homology modeling, molecular docking, and a followed-up molecular dynamic (MD) simulation. b-factors (Supplementary Figure S7B), RMSD (Supplementary Figure S8B), and RMS fluctuation ( S u p p l e m e n t a r y F i g ur e S 8 C ) w e r e c a l c u l a t e d f o r better understanding.
The NDH complex D sub-protein (NDH-D) was reconstructed with the K. candel protein model by selecting the appropriate homologous protein, and K. obovata proteins were obtained by modifying the K. candel NDH-D protein model with corresponding mutation positions ( Figure 7). As shown in Figure 7C, Phe22, Leu45, and Met426 of NDH complex D subprotein (NDH-D) in K. candel were substituted by Leu22, Ile45, and Ile 426 in K. obovata. Clearly, due to the sense mutations of the protein amino acids in K. obovata, the protein structure changed accordingly (Figures 7A, B). After MD simulations, bfactors for residues 22, 45, and 426 were calculated. The b-factors for the 22nd, 45th, and 426th sites were all higher in K. candel than in K. obovata (Supplementary Figure S7B), indicating a more flexible conformation at these three sites in K. candel. Although the overall trend of the conformational changes of NDH-D in both K. candel and K. obovata was consistent ( Figure 7D), the marked differences in protein conformation fluctuated around the residues 70~80, 160~180, and 450 ( Figure 7E).
The structure of ATP a subunit protein (ATP-A) coded by the gene atpA was constructed in the same way as the NDH-D protein (Figure 8). According to the SNP result, the Tyr89 in protein ATP-A in K. candel was mutated to the Ser89 in K. obovata, (Figure 8D). The predicted structures of ATP-A were shown in Figures 8B, C. RMSF analysis after MD simulations illustrated that great differences existed in the regions from the residue 26 to 96 amino-acid in ATP-A between species ( Figure 8A). As the docking results showing in Figures 8E, F, a larger absolute value of binding affinity (-6.55 kcal/mol vs. -5.56 kcal/mol) indicated a stronger interaction between ATP-A and ADP in K. obovata, suggesting that the a subunit protein of ATP complex may work more actively in ATP synthesis in K. obovata. The Pi value in sliding-window analysis of the whole chloroplast genomes. The genes highlighted in blue color showcase the SNPs between K. candel and K. obovata. The genes highlighted in orange color were SNPs within K. candel species. The purple color highlighted genes were SNPs within K. obovata species.
Kandelia showed that K. obovata had a mutation site in ATP-A, while K. candel was identical at this site with others mangrove and Arabidopsis, which located in the protein domain (Supplementary Figure S9). Among these, K. obovata was consistent in the amino acid at the position of 22nd with Arabidopsis but different from K. candel and the other two Rhizophoraceae species, Bruguiera sexangula and Rhizophora apiculata. The sites of 22nd, 45th and 426th showed that K. candel and A. thaliana were consistent while K.obovata species were different from all the other compared species (Supplementary Figure S9).
The two animations for the whole 50ns MD simulations demonstrating the ATP-A-ADP binding are provided in the supplementary movie document (Supplementary Animations A, B).
The predicated 3D structural modeling and molecular dimnamics simulation of the NDH-D protein. By setting different colors (gray to blue to yellow to orange to red) according to the b-factor values, the color in the structure is closer to red, the more flexible the structure. Kandelia (Rhizophoraceae) was regarded as a monotypic genus consisting of a single K. candel (Hou, 1958;Juncosa & Tomlinson, 1988). Based on studies on leaf anatomy, coldresistance adaptation, chromosome number, and molecular markers from both cp and mitochondria, K. obovata was identified as a new mangrove species differing from K. candel 20 years ago (Sheue et al., 2003), but the genomic information between them remained unknown, which limited our understanding about how their genes evolved and differentiated during the process of plant speciation. This also causes confusion in scientific exchanges, as some scholars still use K. candel for the plant materials of Kandelia collected either from China or Japan (Geng et al., 2008;Enoki et al., 2009;Zhang et al., 2012a;Feng et al., 2022). The predicated 3D structural modeling and molecular dimnamics simulation of the ATP-A protein. By setting different colors (gray to blue to yellow to orange to red) according to the b-factor values, the color in the structure is closer to red, the more flexible the structure. (A) The conformational variations of protein amino acid residues during the molecular dynamics simulation process showing the distinct difference in the region from 26aa to 96aa. (B) The 3D model of ATP-A protein for K. candel. (C) The 3D model of ATP-A protein for K. obovata. (D) The mutation at the site of the 89 aa of ATP-A protein between K. candel and K. obovata. (E) Molecular dimnamics simulation of ATP-A protein and ligand ADP docking exhibuted by binding affinity for K. candel (E) and K. obovata (F). Xu et al. 10.3389/fpls.2022.1075353 Frontiers in Plant Science frontiersin.org The plastid genome provides valuable information for species identification, population genetics, species differentiation, and phylogenetic analyses (Daniell et al., 2016;Kong et al., 2021;Luo et al., 2021). In this study, we obtained the high-quality cp genomes of Kandelia (Table 1, Figure 3), with the cp genome size ranging from 165,247 to 168,262 bp, which were almost 3 kb larger than previously published K. obovata cp genomes Yang et al., 2019). The number of genes and arrangements were also different in comparison with previous studies. Genes psbK and psbI were assembled and annotated in our study, which were located in the aforementioned 3 kb regions, but they were absent in previous studies Yang et al., 2019). To confirm the correctness of our cp genome assembly, we designed primers around gene psbK and double-checked the locations of psbK and psbI genes by randomly selecting seven samples with PCR amplification. The results verified that they were in the right sequence position in the cp genomes (Supplementary Figure S1). In addition, the fragment from 8,966 to 52,067 bp was in inverse order in the LSC region, which was different from the reported cp genome of K. obovata Yang et al., 2019). This inversion has been previously reported in other cp genomes (Wei et al., 2020;Cui et al., 2021;Kim & Cheon, 2021;Wang et al., 2021).
The genomes of all samples shared 83 protein-coding genes, 37 transfer RNA genes, and eight ribosomal RNA genes. The size of the cp genome in K. obovata, however, was roughly 2 kb longer than that in K. candel, with the greatest difference in the LSC regions. The GC content in the IRs remained the same, but both LSC and SSC were slightly lower in K. obovata in comparison with K. candel (Figure 3, Supplementary Table S4).
Indels are an important class of genetic variations and play important roles in species evolution (Biju et al., 2019;Yang et al., 2019). We identified 1,522 indels in 25 cp genomes of Kandelia, comprising 112 SSR-related indels and 1,410 non-SSR-related indels ( Figure 4). The phylogenetic tree based on the indel data of all cp genomes divided 25 samples into two separate branches. PCA analysis was in accordance with the phylogenetic tree established by the SNPs. Phylogenetic relationship analyses based on 76 homologous protein genes from the single-copy genes in these 25 cp genomes ( Figure 2A) and 38 homologous pair genes among 13 mangrove species, including 40 samples ( Figure 2B), provided strong support to the genus Kandelia consisting of two distinct species. These two phylogenetic analyses also coincided with the results obtained by the SNPs of the whole cp genome tree. Our integrative phylogenetic trees provided detailed molecular information revealing that K. candel and K. obovata are two species geographically separated by the South China Sea, as previously suggested by Sheue et al. (2003). In terms of K. obovata clustering into two subgroups, among the many possible causes, introduction or transplant was likely the main reason for gene homogenization. Among these, YQ in Zhejiang Province and LD in Hainan Province were the two artificial introduction places for K. obovata, and Fujian was the main provenance of the propagules of K. obovata (Li et al., 2001;Wang & Wang, 2007).
Repeat sequences are an important part of the genome and play an important role in the evolution of organisms, genetics, and regulation of gene expression. Comparison of the tandem, forward, and palindromic repeats markedly differed between K. candel and K. obovata (Supplementary Figure S3). Tandem and forward repeats in K. obovata were more frequent than those in K. candel (Supplementary Figure S3B, C). In contrast, palindromic repeats were less frequent in K. obovata than in K. candel (Supplementary Figure S3D). This also resulted from the differences in cp genome sizes between K. candel and K. obovata, due to these different repetitive sequences mainly located in the intergenic regions. Microsatellites, also known as simple sequence repeats (SSRs), are short repeating DNA sequences of one to six base pairs that are ubiquitous in the genome (Powell et al., 1996;Fang et al., 2020). Owing to the number of repeat units that may vary between individual genotypes, SSR is extremely versatile and useful for genetic analysis (Zalapa et al., 2012). In this study, more SSRs were found in K. candel than in K. obovata, especially the type of mononucleotide in the Malaysian samples (Supplementary Figure S4). Moreover, SSRs typically consist of a higher number of AT bases than GC bases (Wang et al., 2022), which is consistent with our observations and the high AT content in the nucleotide composition of K. candel. Given that SSRs were applied to create maps of genetic linkage, variety identification, and molecular markers (Gupta et al., 2022;Steele et al., 2006), detailed information on the SSRs identified in the present study can facilitate future research on selected target regions for more in-depth population studies between these two species within genus Kandelia.

Variations and non-synonymous mutations of the cp genomes between K. candel and K. obovata
Mauve analysis showed an indel with a length of 1,149 bp in the LSC region between K. obovata and K. candel (Figure 5), which explains the striking difference in cp genome sequence composition between K. candel and K. obovata. Near this region with a high nucleotide diversity value (within 2 kb upstream and downstream), there were non-synonymous mutations in the atpA and ndhD genes between K. candel and K. obovata ( Figure 6). MatK and rps4 had non-synonymous mutation sites in different samples of K. candel, while the atpB and rbcL genes had non-synonymous mutation sites in different samples of K. obovata ( Figure 6). Moreover, the genes with a Pi value of >0.01 mentioned above were located in the LSC and SSC regions. Taking cp DNA atpB-rbcL spacer and trnL-trnF spacer as examples, both were used as universal cp DNA markers to identify the two geographically distributed populations of Kandelia (Chiang et al., 2001). In the present study, detailed information obtained by comparison of the 25 cp genomes of Kandelia indicated that there were 58 variant loci in atpB-rbcL and four variant loci in trnL-trnF spacers between the two species. As the cp genome has a copy-dependent repair mechanism, which in turn ensures the conservation and stability of the two IR regions in the cp genome (Khakhlova & Bock, 2006;Wang et al., 2022), the variation in the IR region was much less than in the LSC and SSC regions. Sequence variations existed throughout the cp genomes between the K. candel and K. obovata, especially the non-synonymous mutation sites between the two species. Predicting the mutation sites in nonsynonymous amino acid residues would doubtlessly provide a foundation for further research on this protein in adaptation to habitat.

Structural and functional simulation
analyses of non-synonymous proteins between K. candel and K. obovata Light intensity and temperature are the two key environmental factors for differentiating K. candel and K. obovata geographically distributed in the northern and southern banks of the South China Sea, respectively. Adaptations to these key factors are the most critical characteristics of their differential genes and corresponding protein evolution in the northward colonization of Kandelia. Comparative analyses of sequence divergence and mutation hotspots revealed that the atpA gene in the LSC region and the ndhD gene in the SSC region exhibited distinct patterns between K. candel and K. obovata (Figures 3, 5, 6, 7C, 8D).
The cp NDH complex is composed of 11 plastid genes (ndh A~J) together with at least 18 nuclear genes that function as thylakoid NAD(P)H dehydrogenase (Shen et al., 2021). The complex is involved in electron transfer from NAD(P)H to plastoquinone, protecting the plant cell against photooxidative stress and maintaining optimal rates of photosystem I (PSI) cyclic photophosphorylation (Braukmann et al., 2009;Peredo et al., 2013). NDH-dependent cyclic electron flow (CEF) provides extra DpH and ATP for the CO 2 assimilation, which is essential for balancing the changing demands for ATP/ NADPH and regulating photosynthetic machinery in response to various environmental conditions (Zhu et al., 2001;Yukawa et al., 2005;Shikanai, 2016;Shikanai, 2020;Shen et al., 2021). Therefore, it functions as a value-feeding electron to poise the redox level of the intermediaries to optimize the rate of the cyclic electron transport in accordance with the changes in light intensity (Casano et al., 2000;Joet et al., 2002). More research has shown that ndh genes are of great importance in photosynthesis regulation in adaptation to harsh environments. Small changes in any of the ndh genes significantly decrease the photosynthesis rate (Endo et al., 1999;Silva et al., 2016). Adaptation to submersed environments is accompanied by the complete loss of the NDH complex in an aquatic angiosperm (Peredo et al., 2013). A series of T-to-C inactivating mutations occurred in ndh genes, which were further corrected back to T during evolution (Martıń & Sabater, 2010), implying the importance of the ndh genes in the stability of the complex. Given the location of NDH-D in the transmembrane helix of the NDH complex, it plays a key role in maintaining the activity and plasticity of the complex (Li et al., 2013;Shen et al., 2021). Any variations in NDH-D would impact photosynthetic electron transport, energy generation, and light adaptation.
In the present study, the ndh-D gene showed three nonsynonymous mutation sites, two of which mutated from base C in K. candel to base A in K. obovata and one from base G in K. candel to base A in K. obovata, which therefore caused the amino acids at three sites to vary from F (Phe), L (Leu), and M (Met) to L (Leu), I (Ile), and I (Ile) in K. obovata, showing intraspecific consistency but the obvious interspecific difference at the 22nd, 45th, and 426th amino acids in the protein sequence between the two species (Figure 7). Further comparative analysis of the structural modeling of the NDH-D protein between the two species illustrated that b-factors for the three sites were higher in K. candel, illustrating that the protein possessed higher flexibility at the three amino acid regions in comparison with K. obovata ( Figure 7). As protein structural plasticity is closely related to its active function in light harvesting and photosynthetic electron transport (Tian et al., 2012;Sun et al., 2019), considering K. candel mainly distributed in Southeastern Asia with higher light intensity and temperature in comparison with K. obovata growing in Eastern Asia, we suggested this might relate to the long-term adaptation to differential light radiation in their respective habitats.
Temperature has a strong impact on protein evolution. It has been found that orthologous proteins of species evolved at different temperatures commonly exhibit distinctive differences in function and structural stability in adaptation to temperature (Fields et al., 2015;Cvetkovska et al., 2018). In chloroplasts, F 0 F 1 -ATP synthase (cp H + -ATPase) is a protein complex responsible for ATP synthesis and energy generation. It contains two components with independent functions, a membrane-intrinsic CF 0 , which is responsible for the proton flux across the membrane, and a membraneextrinsic CF 1 , which catalyzes the synthesis of ATP from ADP and phosphate using the energy of electrochemical transmembrane potential of protons (Nelson & Ben-Shem, 2004;Hahn et al., 2018). The core component of CF 1 is a3b3g, consisting of three ATP-A (a) and three ATP-B (b) subunits, which are encoded by the chloroplast DNA (Rodermel & Bogorad, 1987). It has been reported that mutation or editing deficiency at a certain site results in a substitution of amino acid residue, impairing the assembly of chloroplast ATP synthase . A large number of studies have suggested that when exposed to lower temperatures, resistant plants showed higher ATPase activity than susceptible ones, which provide extra energy in plant responses to low temperatures (Gilmore & Bjrkman, 1995;Schttler & Toth, 2014). Ultracytochemical localization studies on the effect of ATPase on low-temperature adaptability also provide direct evidence that chloroplast ATPase is closely related to the plant's tolerance to low temperatures (He, 2020). Our analysis showed that K. obovata had serine (S) at the 89th site in the ATP-A protein domain instead of tyrosine (Y), as in K. candel ( Figure 8D). Molecular dynamics simulation indicated that the ATP-A protein of K. obovata had a higher binding ability with ADP ( Figures 8E, F), which implied that its ATPase possessed higher efficiency in ATP formation. In addition, the gene atpB remained consistent within K. candel individuals but exhibited variations in SNP loci within the different populations of K. obovata ( Figure 6). Hence, the mutation sites in the genes atpA and atpB of K. obovata populations might closely relate to their adaptation to the lower temperature in the northern part of the South China Sea, which endowed it with higher tolerance to lower temperature in the northward colonization of K. obovata.
The above results provide molecular proof for the divergence of K. obovata and K. candel in adaptation to different habitats. This also well explained the previous physiological experimental results obtained both from a common garden trial in Hong Kong (Maxwell, 1995) and in a lab chilling experiment conducted between these two Kandelia species (Short et al., 2021). In 2008, low air temperature in winter caused serious damage to several mangrove species, but K. obovata exhibited the highest cold tolerance among mangroves during extreme cold events in China (Chen et al., 2010), which could be more ecophysiological proof to support our conclusions gained from the cp genomes.

Conclusions
The main finding reported here first provides the whole cp genome of K. candel, and comparative analyses of the whole cp genomes were conducted between K. candel and K. obovata. Although the cp genome is regarded as having a highly conservative nature and a slow evolutionary rate, interspecific differences exist between K. candel and K. obovata in adaptation to differential environments geographically separated by the South China Sea, including the sizes of cp genomes, codon usages, repeat sequences, and indels of genes. Significant interspecific differences were found in ndhD and atpA, which are involved in photosynthetic electron transport and ATP formation, implying that energy generation plays a pivotal role in their ability to adapt to different temperatures and light, two main geographical environmental factors. Future comparative studies on photosynthetic electron transport and photophosphorylation caused by the genetic variations of atpA and ndhD in these species are needed to clarify their response mechanisms to varied light radiation and temperatures.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: GenBank database under accession numbers from ON969308 to ON969332.

Author contributions
XZ designed the research project. XX, YZ performed research. XX, QL, YS and XZ analyzed data. XX, XZ and YS wrote the manuscript. LC, WW, GC, WN, MI, PP, HZ prepared plant materials and contributed to scientific discussion and revision of the manuscript. All authors have read and approved the final manuscript.