De novo Transcriptome Characterization of Rhodomyrtus tomentosa Leaves and Identification of Genes Involved in α/β-Pinene and β-Caryophyllene Biosynthesis

Plant-derived terpenes are effective in treating chronic dysentery, rheumatism, hepatitis, and hyperlipemia. Thus, understanding the molecular basis of terpene biosynthesis in some terpene-abundant Chinese medicinal plants is of great importance. Abundant in mono- and sesqui-terpenes, Rhodomyrtus tomentosa (Ait.) Hassk, an evergreen shrub belonging to the family Myrtaceae, is widely used as a traditional Chinese medicine. In this study, (+)-α-pinene and β-caryophyllene were detected to be the two major components in the leaves of R. tomentosa, in which (+)-α-pinene is higher in the young leaves than in the mature leaves, whereas the distribution of β-caryophyllene is opposite. Genome-wide transcriptome analysis of leaves identified 138 unigenes potentially involved in terpenoid biosynthesis. By integrating known biosynthetic pathways for terpenoids, 7 candidate genes encoding terpene synthase (RtTPS1-7) that potentially catalyze the last step in pinene and caryophyllene biosynthesis were further characterized. Sequence alignment analysis showed that RtTPS1, RtTPS3 and RtTPS4 do not contain typical N-terminal transit peptides (62–64aa), thus probably producing multiple isomers and enantiomers by terpenoid isomerization. Further enzyme activity in vitro confirmed that RtTPS1-4 mainly produce (+)-α-pinene and (+)-β-pinene, as well as small amounts of (−)-α-pinene and (−)-β-pinene with GPP, while RtTPS1 and RtTPS3 are also active with FPP, producing β-caryophyllene, along with a smaller amount of α-humulene. Our results deepen the understanding of molecular mechanisms of terpenes biosynthesis in Myrtaceae.

The family Myrtaceae is one of the most significant essential oil-yielding plant families and it is known for high terpene concentrations in the foliage (Keszei et al., 2010). Despite an abundance of chemical information, the molecular mechanisms underlying terpene biosynthesis are poorly understood. Keszei et al. (2010) described 70 unique partial terpene synthase transcripts and 8 full-length cDNA clones from 21 myrtaceous species, first charactering a 1,8-cineole synthase from Eucalyptus sideroxylon and a caryophyllene synthase from Eucalyptus dives based on phylogenetic relationships and leaf oil composition. Recently, Eugenia uniflora was used to identify genes involved in the terpene biosynthesis pathway by high-throughput RNA sequencing, yielding several predicted candidate TPSs associated with mono-, sesqui-, and tri-terpenes biosynthesis (Guzman et al., 2014). However, functional characterization of related genes encoding TPS in Myrtaceous is still lacking. As an abundance and variety of terpenoids exist in Myrtaceae owing to the structural differences of the TPSs present (Keeling et al., 2008;Keszei et al., 2010), further research into terpene biosynthesis is required.
Rhodomyrtus tomentosa (Ait.) Hassk (Myrtaceae), widely distributed in East Asia and Southeast Asia, including Japan, Thailand, and southern China, is an evergreen shrub (Keszei et al., 2008;Saising et al., 2011). The stems, leaves, and fruits of R. tomentosa are widely used as a traditional medicine to treat chronic dysentery, rheumatism, hepatitis, and hyperlipemia by virtue of high content of mono-and sesquiterpenes (Chen, 1984). However, a lack of genomic information hinders our understanding for terpenoid biosynthesis in this species. In this study, the main active components in the leaves of R. tomentosa were identified. 138 unigenes potentially involved in terpenoid biosynthesis though transcriptome analysis depicted a complete biosynthesis pathway for α-pinene and β-caryophyllene. Further enzyme activity in vitro confirmed some TPS genes function in α/β-pinene and β-caryophyllene biosynthesis.

Illumina Sequencing, de novo Assembly and Function Annotation
In order to further understand the molecular mechanism of terpenoid biosynthesis in R. tomentosa, a comprehensive transcriptome in R. tomentosa leaves were performed, which  Table S1). The high-quality reads were deposited in the NCBI SRA database (accession number: SRP132648). Considering unavailable reference genome for R. tomentosa, Trinity (Trinityrnaseq_r20131110) were used to de novo assemble all of the clean reads (Grabherr et al., 2011). A total of 146,480 contigs ranging from 201 to 21,224 bp, with a mean length of 1,379 bp and an N50 length of 2,161 bp were assemble (Supplementary Table S1). Overall, these transcripts represented 83,175 unigenes with an average length of 888 bp and an N50 length of 1,702 bp, among which 53,334 coding DNA sequences (CDS) were detected (Supplementary Table S1). By searching against five public protein databases, a total of 53,742 unigenes (64.61%) were annotated. Among them, the unigenes matched to NCBI non-redundant protein sequences (NR), SWISS-PROT, eukaryotic ortholog groups (KOG), kyoto encyclopedia of genes and genomes (KEGG) and gene ontology (GO) databases were 49,545 (59.57%), 42,547(51.15%), 35,016 (42.09%), 26,096 (31.37%) and 8,902 (10.7%), respectively (Supplementary Table S2).
RtTPS 1-4 contain all the conserved domains of the TPS family, including the RR(P)X 8 W, RXR, and DDXXD (X is any amino acid) motifs, and absolutely conserved arginine, cysteine and histidine residues in active-site (Bohlmann et al., 1998). However, our results indicated that RtTPS1, RtTPS3-4 can catalyze the formation of α/β-pinene, β-caryophyllene and α-humulene (Figure 4). The amino acid number of RtTPS1, RtTPS3-4 at upstream of conserved RRX8W motif is 38, 52, and 48, respectively, which result in an incomplete plastidial targeting sequence (62-64 aa) (Figure 2). Bohlmann et al. (1998) suggested that short N-terminal sequence may play a role in the isomerization step of the terpenoid cyclization reaction, thus producing multiple pinene isomers and enantiomers. In addition, differences in amino acids between loop and helix in domains may also lead to diverse products. Phillips et al. (2003) reported that these differences could conceivably determine substrate folding, thereby providing a stereochemical switch. Domain swapping and directed mutagenesis in (−)-(4S)limonene synthase (LS) and (−)-(4S)-limonene/(−)-(1S, 5S)α-pinene synthase (LPS) from Abies grandis suggested that amino acids in the predicted D through F helix regions are critical for product determination (Katoh et al., 2004).
Interestingly, RtTPS1 and RtTPS3 were also able to catalyze FPP to produce β-caryophyllene (65.71% for RtTPS1 and 93.05% for RtTPS3) and α-humulene (34.29% for RtTPS1 and 6.95% for RtTPS3) (Figure 4B and Supplementary Table S7). In previous reports, G. hirsutum TPS1 (GhTPS1) and OkBCS from O. kilimandscharicum were also reported to produce mainly β-caryophyllene and smaller amounts of α-humulene (Huang et al., 2013;Jayaramaiah et al., 2016). Moreover, the ratio of two substances catalyzed by OkBCS is similar to our results (Jayaramaiah et al., 2016). In addition, CsTPS4FN, CsTPS5FN and CsTPS9FN in Cannabis sativa could also collectively catalyze GPP and FPP. Especially, CsTPS5FN was an unusual TPS-b member lacking a N-terminal plastidial targeting sequence, which was able to produce sesquiterpenes when incubated with FPP (Booth et al., 2017). It is well known that monoterpenes are synthesized in plastids, while sesquiterpenes  Supplementary Table S5. are synthesized in cytosol (Jayaramaiah et al., 2016;Ruan et al., 2016). TPS1 and TPS3 produced monoterpenes and sesquiterpenes simultaneously, but they did not have N-terminal plastid-targeting sequences (Figures 2, 4). This requires the transport of IPP/DMAPP (GPP and FPP precursor) between plastid and cytosol, as occurs in hop trichomes and snapdragon flowers (Dudareva et al., 2005;Wang et al., 2008). Overall, diverse structure and extensive catalytic activities of RtTPS1-4 suggested divergent and convergent functions of R. tomentosa TPSs in terpene biosynthesis.

Expression Patterns of RtTPS1-7
To study the TPS gene expression in the young and mature leaves of R. tomentosa, we measured the accumulation of RtTPS 1-7 transcripts using reverse transcription quantitative real-time PCR (qRT-PCR) (Figure 5). The expression levels of RtTPS3, RtTPS4 and RtTPS7 had significantly expressed levels in the young leaves than in the mature leaves, while RtTPS1 were more highly expressed in the mature leaves than in the young leaves, and RtTPS2, RtTPS5 and RtTPS6 showed similar expression in the two tissues. These results suggest that candidate TPS genes involved in pinene and caryophyllene were differentially regulated at transcriptional level. In terms of the four candidate genes encoding to RtTPS1-4, the expression levels of RtTPS3 and RtTPS4 in the young leaves were higher than that of RtTPS1 and RtTPS2, contributing probably to more pinene biosynthesis in the young leaves. Similarly, high expression level of RtTPS3 may also contribute to high biosynthesis of caryophyllene in the young leaves. However, the expression levels of these genes did not correlate well with the synthetic level in the old leaves, suggesting that other TPS transcripts which failed to identify in our study can lead to accumulation of pinene and caryophyllene in the old leaves, since transcriptome only represents a set of transcripts of a period. , respectively, were detected by GC-MS. pET30a empty vector was used as control. Products were identified using authentic standards. 1, (+)-α-pinene; 2, (−)-α-pinene; 3, (+)-β-pinene; 4, (−)-β-pinene; 5, β-caryophyllene; 6, α-humulene.

CONCLUSION
This was the first attempt to elucidate the R. tomentosa transcriptome using Illumina next-generation sequencing and de novo assembly. A total of 138 unigenes involved in the biosynthesis of the terpenoids were identified in R. tomentosa. Based on GC-MS and transcriptome results, a complete biosynthesis pathway for α-pinene and β-caryophyllene were depicted. Enzyme activity assay in vitro confirmed RtTPS1-4 function in biosynthesis of α/β-pinene and β-caryophyllene in R. tomentosa, suggesting overlapped and divergent functions of TPS in plant species.

Plant Materials and GC-MS Analysis
The 3-year old plants growing in South China Botanical Garden Chinese Academy of Sciences in Guangzhou, China were collected. The leaves of 5 independent plants were one biological replication, and three biological replications were performed. The young leaves are the opposite leaves of the first node on the branches, showing small leaf area and light color, while the old leaves are opposite leaves of eighth node on the branches, showing large leaf area and leathery green. Fresh leaves of R. tomentosa (50.0 g) were extracted with 500mL of deionized water and 30mL of dichloromethane using Simultaneous Distillation-Extraction for 3 h. GC-MS analysis was performed in an Agilent 7890 GC system coupled with a 5975 MS detector (Agilent Technologies, United States). The essential oil, diluted 10 times using dichloromethane, was added as an internal standard. One µL was injected in split mode in an HP5-MS column (30 m × 250 µm × 0.25 µm film thickness). The temperature program included: an initial oven temperature of 40 • C (1 min hold), followed by a two-step temperature increase, first to 130 • C (at a rate of 4 • C min −1 , 5 min hold) and then to 250 • C (at a rate of 10 • C min −1 , 5 min hold). MS conditions were: ionization mode: El, electron energy 70 eV; interface temperature:

RNA Isolation and Solexa Sequencing
Total RNA of leaves were extracted from three independent plants using a TRIzol reagent (Invitrogen, United States) and digested with RNase free DNAase I (Qiagen, Germany). The cDNA libraries were constructed following the Illumina manufacturer's instructions. In brief, the polyA + RNA was purified from total RNA using Oligo(dT) magnetic beads and broken into short fragments using divalent cations at 94 • C for 5 min. Using these short fragments as templates, random hexamer-primer was used to synthesize the first-strand cDNA, followed by the synthesis of second-strand cDNA using DNA polymerase I and RNase H. Short fragments were purified with a QiaQuick PCR Extraction Kit (Qiagen) and ligated to sequencing adapters. The products were amplified by PCR to create cDNA libraries. The cDNA libraries were sequenced using Illumina HiseqTM 2000 system.

Sequence Assembly and Annotation
The sequencing-received raw image data were transformed by base calling into raw reads. Reads were assembled using Trinity software (Grabherr et al., 2011). The longest assembled sequences were referred to as contigs. Reads were then mapped back to contigs with paired-end reads to detect contigs from the same transcript and the distances between these contigs. N was used to connect each two contigs to represent unknown sequences, and then for Scaffold. Finally, sequences were obtained that lacked N and could not be extended on either end, and were defined as unigenes. The unigene sequences were aligned by BLASTx to various databases including the NR database 1 , SWISS-PROT 2 , KEGG 3 (Kanehisa et al., 2006), and KOG database 4 (Tatusov et al., 1997) using BLAST (E-value < 1E −5 ).

Phylogenetic Analysis
Phylogenetic analysis was performed based on the deduced amino acid sequences of TPSs from R. tomentosa and other plants. All of the full-length protein sequence of TPSs was assembled using Clustal X 2.0 software, and created a bootstrap neighbor-joining evolutionary tree by MEGA 5.0 software with 1000 bootstrap replicates. The scale represents 0.1 amino acid substitutions per site.

Recombinant Protein Purification and Enzyme Activity Assay
Full-length cDNAs of TPSs including RtTPS1, RtTPS2, RtTPS3 and RtTPS4 were PCR amplified using primers ( Supplementary  Table S6), and ligated into the pET30a vector. The constructed vector was introduced into the Escherichia coli strain BL21 (DE3) for protein expression. The recombinant protein was first induced with IPTG at 15 • C for 16 h. Then, the cells were harvested and resuspended with binding buffer, (20 mM sodium phosphate, 0.5 M sodium chloride, and 40 mM imidazole pH 7.4). The recombinant enzyme was purified by Ni-IDA-Sepharose CL-6B (Spectrum Chemical Manufacturing, United States) according to the manufacturer's instructions after renaturation by 2M urea. The purity of the His-tagged protein was determined by SDS-PAGE followed by Coomassie Brilliant Blue staining. Enzyme activity assays were performed in a volume of 500 µl reaction buffer (25 mM HEPES, pH 7.0, 10 mM MgCl 2 , 5 mM dithiothreitol), containing 10 mM substrate (GPP and FPP, respectively) and 10 µg protein. After incubating for 30 min at 30 • C, the reaction mixture was extracted with 500 µl pentane, and 1 µl was subjected to analysis by GC-MS and GC-FID as described above.

qRT-PCR Analysis
Total RNA was isolated from young and mature leaves of R. tomentosa using a Trizol Kit (Promega, United States). Firststrand cDNA was synthesized from 2 µg of purified RNA using HiScript QRT SuperMix for qPCR (Vazyme, Nanjing, China). Two microliter (100 ng µL −1 ) of cDNA in 20 µL solution systems was used for gene expression performed with SYBR Premix Ex Taq (Takara) on a Roche LightCycler 2.0 system (Roche Applied Science, Branford, CT, United States). The primers of genes were listed in Supplementary Table S8, and the PCR amplification conditions were as follows: 94 • C for 5 min; 40 cycles of 95 • C for 20 s, 55 • C for 20 s, and 72 • C for 30 s. For each gene, expression data were normalized with expression level of actin gene and calculated by 2 − C t method. The experiment was carried out three biological and technical replications. The significant differences between samples were statistically evaluated by Student's t-test method.

AUTHOR CONTRIBUTIONS
WF and YW conceived the study. S-MH and S-CY designed the experiments. S-MH, XW, Q-MZ, and KC performed the experiments. J-LY performed the chiral analysis of leaf volatiles and enzymatic products. YD and J-JZ analyzed the data. S-MH and WF wrote the manuscript. All authors read and approved the final manuscript.

ACKNOWLEDGMENTS
We would like to thank LetPub (www.letpub.com) for providing linguistic assistance during the preparation of this manuscript.