Higher Genomic Variation in Wild Than Cultivated Rubber Trees, Hevea brasiliensis, Revealed by Comparative Analyses of Chloroplast Genomes

Rubber tree is the only commercialized natural resource to produce high-quality natural rubber with unique physical and chemical properties. They currently foster in Southeast Asia with marked morphological and productive differences with wild germplasms native to the Amazonian basin of South America. Here, we report complete chloroplast (cp) genomes of six cultivated and six wild accessions of H. brasiliensis using Illumina paired-end sequencing platform. The 12 H. brasiliensis cp genomes ranged from 161,168 to 161,254 bp. All 12 cp genomes displayed a typical quadripartite structure, which consisted of a pair of IR regions (26,787–26,804 bp) separated by a LSC region (89,216–89,284 bp) and a SSC region (18,370–18,377 bp). Phylogenomic analysis revealed that cultivated and wild rubber trees failed formed separate clades. However, we observed that wild rubber trees possessed more variable sites and ∼2.8-fold higher level of nucleotide variation than cultivated rubber trees despite a short domestication history. We drew a comprehensive map of genomic variation across rubber tree plastomes, exhibiting that the density of genomic variants in wild rubber trees was slightly higher than that detected in cultivated ones. The obtainability of genomic variation across cp genomes will provide useful information for better conserving and utilizing rubber tree germplasms.


INTRODUCTION
Chloroplasts are key organelles that play a key role in plant cell for photosynthesis and biochemical pathways, including the biosynthesis of starch, fatty acid, pigmengts, and amino acids (Raman and Park, 2015;Dong et al., 2016;Xu et al., 2017). In angiosperms, chloroplast (cp) genomes contain a circular DNA ranging from 72 to 217 kb in size (Sugiura, 1992(Sugiura, , 1995Shi et al., 2012), consisting of two large inverted repeat (IR) regions, a large-single-copy (LSC) region and a smallsingle-copy (SSC) region (Bendich, 2004;Raubeson and Jansen, 2005;Yang et al., 2010Yang et al., , 2017. Moreover, cp genomes usually exhibit maternal inheritance, enabling the conservation of gene content and genome structure (Sasaki et al., 2003;Parks et al., 2009). A recent study reported that the entire plastome is transcribed in photosynthetic green plants, and that this pattern originated from prokaryotic cyanobacteria -ancestor of the chloroplast genomes that diverged about 1 billion years ago (Shi et al., 2016). Concatenating sequences from a large number of chloroplast genes may overcome the problem of multiple substitutions that results in the loss of phylogenetic information between chloroplast lineages, and thus, can reduce gene-sampling errors due to substitutions. Owing to their small sizes, relatively conserved genome structure, gene content, gene order and simple complexity compared to mitochondrial and particularly nuclear genomes, the cp genomes have recently provided valuable information for taxonomic classification, phylogenetic reconstruction and adaptive evolution as a result of sequence divergence among plant species (e.g., Huang et al., 2014;Zhang et al., 2016;Gao et al., 2019). Given decreased costs to generate increasingly large amount of genome sequences using the Next Generation Sequencing (NGS) platform, up to now (Mardis, 2008), approximately 3,869 plant cp genomes have been sequenced and deposited in the National Center for Biotechnology Information (NCBI), including the cp genome of rubber tree, Hevea brasiliensis (Wild.) Muell. Arg (Tangphatsornruang et al., 2011).
H. brasiliensis is a diploid plant (2n = 2x = 36), a member of the family Euphorbiaceae, which was most commonly known as the rubber tree (Leitch et al., 1998;Lau et al., 2016). It is a cross-pollinated tropical economical tree that grows to 30-40 m tall and can live up to 100 years in the wild (Priyadarshan and Clément-Demange, 2004). The domestication of the rubber tree, which is native to the Amazon basin in South America, began in 1896 and then spread to Southeast Asia with the relocation of H. brasiliensis seedlings (Chan, 2000). Over 100 year of traditional breeding has greatly increased rubber productivity from its wild populations (Priyadarshan and Clément-Demange, 2004). Nevertheless, although artificial selection has led to a slow growth in rubber productivity, it is even more immediately essential to raise new H. brasiliensis varieties with desired agronomic traits. Rubber trees produce natural rubber that is still required for the world's high-performance engineering components, such as heavy tires. Thus, there will be huge potential to be exploited through genetic breeding programs to help enlarge genomic diversity, important for the generation of more environmentally resilient and high-yielding varieties (Sneller et al., 1997). With this regard, long-standing efforts to obtain high-quality reference nuclear genome assemblies (Rahman et al., 2013;Lau et al., 2016;Tang et al., 2016;Pootakham et al., 2017;Liu et al., 2020) and the mitochondrial genome (Shearman et al., 2014) have long put to apply for genomics-assisted selection to make use of the unexploited reservoir of novel alleles from the wild.
In this study, we sequenced, assembled and characterized the cp genomes of six cultivated and six wild rubber trees using the next-generation sequencing platform. We comprehensively characterized the organization, gene content, intraspecific diversity and structural variants across the 12 cp genomes of rubber tree. The obtained results may largely help to improve our understanding of cp genome evolution of rubber tree and will provide abundant genetic resources to assist the exploration of the precious H. brasiliensis wild germplasms for future rubber tree genetic improvement program.

Plant Material Sampling
We selected a panel of world-wide representative rubber trees, including six cultivated and six wild accessions, respectively ( Table 1

DNA Extraction, Genome Sequencing, and Assembly
Total genomic DNA for each sample was extracted from ∼100 mg dried leaves using the Mag-MK Plant Genomic DNA extraction kit (Sangon Biotech, CA). The DNA concentration was quantified using Life Invitrogen Qubit R 3.0 (Life Invitrogen, United States), and DNA concentration >30 ng µL −1 was finally determined using Bioanalyzer 2100 (Agilent Technologies). Short-insert paired-end sequencing libraries were generated according to the Illumina standard protocol. Genome sequencing was performed on Hiseq 2500 sequencing platform (Illumina, San Diego, California, United States). Approximately 5.0 GB of raw data were generated with read length of pair-end 100 bp. First, raw reads were trimmed to obtain the high-quality clean data by removing adaptor sequences and low-quality bases with Q-value ≤20 using CLC-quality trim tool 1 . The cp genome reads were isolated from the mixed DNA of nucleus, mitochondria and chloroplast based on the previously published rubber tree cp genome sequence (Tangphatsornruang et al., 2011). Then, the filtered reads were assembled into contigs using CLC genome assembler v4.06, which were aligned (≥90% similarity and query coverage) and ordered according to the earlier reported cp genome (Tangphatsornruang et al., 2011). Finally, the draft cp genome for each rubber tree was assembled by Geneious 9.0.5 software 2 , and genomic regions with ambiguous alignments were trimmed manually and considered as gaps, which were filled by sequencing fragments yielded by Polymerase Chain Reaction (PCR) using Geneious assembly software.

Repeat Sequence Annotation
Simple sequence repeats including SSRs were predicted using MISA (Thiel et al., 2003) with the parameters set to 10 repeat units ≥10 for mononucleotide SSRs, six repeat units ≥6 for dinucleotide, five repeat units ≥5 for trinucleotide, four repeat units ≥4 for tetranucleotide, and three repeat units ≥3 for pentanucleotide and hexanucleotide SSRs, respectively.

Detection of Genomic Structural Variants
To assess levels of genomic diversity variable and parsimonyinformative base sites and nucleotide diversity (Pi) were calculated for the cultivated and wild H. brasiliensis cp genomes using DnaSP version 6.1 (Rozas et al., 2017). To examine microstructural variants among the H. brasiliensis cp genomes, numbers and positions of insertions/deletions (indels) and SNP across the 12 cp genome sequences were detected using Mummer 3 . The number of SNPs and Indels were identified using BCFtools (Li, 2011). Circos was used to display the detected chloroplast structural variants (Krzywinski et al., 2009).

Phylogenomic Analyses
The phylogenomic analyses were performed for complete cp genome sequences of the 12 cultivated and wild H. brasiliensis accessions using J. curcasas as outgroup. Neighbor-joining (NJ) analysis was executed using MAFFT version 7.221 (Katoh et al., 2005), while Maximum likelihood (ML) analyses of the six Euphorbiaceae species using Gossypium barbadense as outgroup were conducted using MEGA 7.0 (Kumar et al., 2016). RAxML (Stamatakis et al., 2008) searches relied on the general time reversible (GTR) model of nucleotide substitution with the gamma model of rate heterogeneity. Non-parametric bootstrapping test was completed using 500 replicates as implemented in the "fast bootstrap" algorithm of RAxML (Stamatakis et al., 2008). The aligned data matrices are available upon request.

Chloroplast Genome Sequencing and Assembly
We employed the Illumina short-read technology with pairedend libraries on the HiSeq2000 sequencing platform to resequence them. For each H. brasiliensis accession, ∼5 GB raw reads were generated with an average length of ∼100 bp (Liu et al., 2020

Repeat Sequences in the 12 H. brasiliensis cp Genomes
Chloroplast simple sequence repeats (cpSSRs) are usually applied to characterizing genetic diversity and performing phylogenetic analyses (Flannery et al., 2006;Yang et al., 2011;Jiao et al., 2012). In total, we detected 105-113 SSRs in the examined H. brasiliensis cp genomes (Figure 2A and Supplementary  Table S1A). Our results showed slightly more SSRs in wild rubber tree cp genomes than cultivated rubber three cp genomes. It is clear that mononucleotides were the most abundant while tetranucleotides were the lowest across these cp genomes (Figure 2A and Supplementary Table S1A). Mononucleotide repeat (A/T) was the most abundant, ranging from 90 to 97 across the examined H. brasiliensis cp genomes ( Figure 2B and Supplementary Table S1B).

Genomic Structural Variation Between Cultivated and Wild Rubber Tree Plastomes
We assessed levels of nucleotide variation between cultivated and wild rubber tree plastomes. Our results showed that wild rubber trees possessed more variable sites and ∼2.8-fold higher level of nucleotide variation than cultivated rubber trees ( Table 4). This finding highly supports our former population genomic analysis based on nuclear genome resequencing of these rubber tree germplasms (Liu et al., 2020). With these 12 cultivated and wild rubber tree cp genomes at hand, we drew a comprehensive map of their precise genomic variation on the basis of microstructural variants. Using the wild rubber tree cp genome (67) ( Table 1) as a reference, we totally identified 725 SNPs and 493 InDels (Figure 3 and Supplementary Table S2). Our results showed that the wild rubber tree (78) from Acre, Brazil harbored the largest number (193) of genomic variants, while rubber tree cultivar (Reyan8-79) had the fewest number of genomic variants (91). We interestingly observed that the density of genomic variants was as high as ∼3.86/Kb in wild rubber trees, which is slightly larger than ∼3.71/Kb detected in cultivated rubber trees. It is clear that  Frontiers in Ecology and Evolution | www.frontiersin.org genomic variants differently occurred across cultivated and wild rubber tree cp genomes, further supporting the observed patterns of genomic variation detected in the genus Oryza (Gao et al., 2019). In gene regions, such as rpoC1, there were a number of genomic variants detected in wild rubber trees, while cultivated rubber trees showed no genomic variation.

Expansion and Contraction Among H. brasiliensis and the Other Spurge Plastomes
Although the cp genome size, structure, gene order and gene number are well-conserved, the junctions between IR regions and single-copy (SC) boundary regions were considered as a primarily mechanism causing the length variation of angiosperm cp genomes (Huang et al., 2014;Liu et al., 2018). To elucidate the potential expansion and/or contraction of IR regions, we assessed the variation in the IR/LSC and IR/SSC boundary regions across the six spurge plastomes, including cultivated and wild H. brasiliensis plastomes. The junctions between IR and SSC regions were highly conserved between the cultivated and wild H. brasiliensis plastomes. The genes (ycf1 and rps19-rpl2-rpl22-trnH) were located within the junctions of SSC/IR and LSC/IR regions (Figure 4). Two copies of the ycf1 gene crossed SSC/IRa and SSC/IRb. Compared to the relatively fixed location of the ycf1 gene across all cp genomes, the LSC/IR boundary regions were seemingly variable. The rps19 gene in M. esculenta and H. brasiliensis overlapped the LSC/IRb region with 186-bp and 96-bp located at the IRb region, the rpl22 gene in R. communis spanned the LSC/IRb region with 26-bp located at the IRb region, and the intergenic spacer of rps19-rps12 extended 18-bp or 71bp to the LSC region in E. esula and J. curcas, respectively. The junction of LSC/IRa regions was located in the intergenic spacer of rps19-trnH (R. communis, M. esculenta and H. brasiliensis), rpl2-trnH (J. curcas), and rpl2-rps19 (E. esula). We observed that the distances between trnH and IRa region varied from 1 to 198-bp.

Comparative Analyses Among H. brasiliensis and the Other Spurge Plastomes
Pairwise chloroplast genomic alignments among six spurge plastomes showed a relatively high degree of synteny conservation. The R. communis cp genome was used as a reference for plotting the overall sequence identity of the cp genomes of the six spurge plastomes (Figure 5). Our results revealed relatively high sequence identities. The LSC and SSC regions exhibited less similarity than the two IR regions across all Euphorbiaceae species. All the rRNA genes were highly conserved and similar to other plant chloroplast genomes (Huang et al., 2014). In addition, non-coding regions displayed greater sequence divergence than protein-coding regions. These highly divergent regions contained ccsA, ndhA, rbcL, rpoC2, rpl33, ycf3, atpI-atpH, psbM-petN, and petA-psbJ (Figure 5). Such hotspot regions could be developed as molecular markers and species barcoding for future phylogenetic analyses and species identification in the family Euphorbiaceae.

Phylogenomic Analysis of Cultivated and Wild H. brasiliensis Germplasms
To determine evolutionary relationships of cultivated and wild rubber trees, we compared all 12 cp genomes sequences using J. curcas as outgroup. Using neighbor-joining (NJ) method our results showed that cultivated and wild rubber trees failed to form separate clades (Figure 6A), further supporting our former phylogenomic analysis based on nuclear genome resequencing (Liu et al., 2020). We further investigated phylogenetic relationships of the six Euphorbiaceae species established by utilizing complete cp genomes using Gossypium barbadense as outgroup. Phylogenetic analyses of the aligned data matrix using Maximum Likelihood (ML) suggested that that cultivated and wild rubber trees formed together with 100% bootstrap supports, which were further grouped with M. esculenta with a full bootstrap support ( Figure 6B). Thus, the ML tree has well-resolved phylogenetic relationships of these spurge species with ≥95% bootstrap values, which is fairly congruent with former phylogenetic analyses based on complete chloroplast and nuclear genomes (Tangphatsornruang et al., 2011;Rahman et al., 2013;Tang et al., 2016;Pootakham et al., 2017;Liu et al., 2020).

CONCLUSION
In the present study, we present the 12 complete chloroplast genomes of cultivated and wild H. brasiliensis. We characterized the microstructural variation in the five spurge plastomes, including H. brasiliensis, Euphorbia esula, Jatropha curcas, Manihot esculenta, and Ricinus communis. We show that the H. brasiliensis cp genome differed from the other spurge plastomes with large inversions. We find that cp genomes of wild rubber trees are more variable than cultivated rubber tree cp genomes. The detection of precise genomic variation across rubber tree cp genomes further reveals more genomic variants occurred in wild rubber trees than cultivated rubber trees. Further efforts are still needed to obtain a comprehensive map of chloroplast genomic variation in cultivated and wild rubber trees in a broad geographic range, which will help to better conserve and utilize rubber tree germplasms.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.