Transcriptional Analysis of Metabolic Pathways and Regulatory Mechanisms of Essential Oil Biosynthesis in the Leaves of Cinnamomum camphora (L.) Presl

The roots, bark, and leaves of Cinnamomum camphora are rich in essential oils, which mainly comprised monoterpenes and sesquiterpenes. Although the essential oils obtained from C. camphora have been widely used in pharmaceutical, medicinal, perfume, and food industries, the molecular mechanisms underlying terpenoid biosynthesis are poorly understood. To address this lack of knowledge, we performed transcriptome analysis to investigate the key regulatory genes involved in terpenoid biosynthesis in C. camphora. High-oil-yield trees of linalool type and low-oil-yield trees were used to assemble a de novo transcriptome of C. camphora. A total of 121,285 unigenes were assembled, and the total length, average length, N50, and GC content of unigenes were 87,869,987, 724, 1,063, and 41.1%, respectively. Comparison of the transcriptome profiles of linalool-type C. camphora with trees of low oil yield resulted in a total of 3,689 differentially expressed unigenes, among them 31 candidate genes had annotations associated with metabolism of terpenoids and polyketides, including four in the monoterpenoid biosynthesis pathway and three in the terpenoid backbone biosynthesis pathway. Collectively, this genome-wide transcriptome provides a valuable tool for future identification of genes related to essential oil biosynthesis. Additionally, the identification of a cohort of genes in the biosynthetic pathways of terpenoids provides a theoretical basis for metabolic engineering of essential oils in C. camphora.


INTRODUCTION
As a member of the Lauraceae family, Cinnamomum camphora is an evergreen tree widely distributed around the southern portions of the Yantze River in China, with the largest numbers in Jiangxi, Guangdong, Fujian, Zhejiang, Yunnan, and Sichuan. Many parts of the camphor tree, including roots, stems, leaves, flowers, fruits, and bark, are rich in essential oils that have a wide range of uses in the pest control, spice, cosmetics, and pharmaceutical industries (Guo et al., 2016;Jiang et al., 2016). In order to achieve sustainable production, large-scale plantations have been established in China. Under this model, camphor trees are maintained as bushes, and the leaves and twigs are harvested every year for essential oil production. C. camphora plantations are characterized by fast growth and high afforestation density, and therefore, they can provide a large number of raw materials for the essential oil industry.
Essential oils in the camphor tree primarily consist of monoterpenes and sesquiterpenes. Depending on the principal chemical component of their essential oils, camphor trees can be divided into different chemical types, such as linalool, borneol, camphor, cineol, and iso-nerolidol (Guo et al., 2017). With the gradual improvements in C. camphora cultivation and essential oil extraction techniques, use of C. camphora as a source of natural linalool, which has long been preferred over its synthetic form for the perfume industry, is becoming more popular. However, the molecular mechanisms underlying the terpenoid biosynthesis in C. camphora are still poorly understood, which impedes the progress of identifying "superior" C. camphora clones that could increase the efficiency of using C. camphora for essential oil production.
The biosynthetic pathway of terpenoids in plants starts with the synthesis of the universal terpenoid precursors isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP) through the mevalonate (MVA) or the methylerythritol 4phosphate (MEP) pathways. Isoprenyl diphosphate synthases (IDSs) catalyze reactions involving IPP and DMAPP, forming the intermediate phosphate precursors, geranyl diphosphate (GPP), geranylgeranyl diphosphate (GGPP), and farnesyl diphosphate, which are then catalyzed by terpene synthases (TPSs) to form diverse terpenoids (Chen et al., 2003;Cheng et al., 2007;Ma et al., 2012). The enzymes in the MVA pathway have been shown to be located in cytosol or peroxisomes, whereas all the enzymes involved in the MEP pathway are found in plastids (Reumann et al., 2007;Simkin et al., 2011). In contrast, the localizations of IDSs and TPSs are more diverse and are often at the site of terpenoid biosynthesis (Ma et al., 2012). Although characterizations of the genes involved in terpenoid biosynthesis have been reported in many plant species, including peppermint, Arabidopsis, conifers, Hevea brasiliensis, and Salvia miltiorrhiza (Battaile and Loomis, 1961;Lange and Ghassemian, 2003;Sando et al., 2008a,b;Zulak and Bohlmann, 2010;Ma et al., 2012), many of the enzyme-encoding genes have yet to be elucidated because of the complexity of the terpenoid biosynthesis pathway, especially in the species without a wholegenome sequence.
In the studies of C. camphora, transcriptome analyses for leaves of camphor trees of linalool-, borneol-, camphor-, cineol-, and iso-nerolidol types led to the identification of 424 unigenes in the biosynthetic pathways of terpenoids (Jiang et al., 2014) and transcriptome profiling of linalool and borneol types of C. camphora identified three monoterpene synthase genes possibly involved in the biosynthesis of borneol (Chen et al., 2018). Although these two transcriptome studies have provided valuable insights into the terpenoid biosynthesis pathway, comparisons between only high-oil-yielding individuals could exclude the identification of genes upstream of the biosynthetic pathway of terpenoids. In this study, we used RNA-Seq to identify key enzyme-encoding genes putatively associated with terpenoid biosynthesis in C. camphora, which may partially determine its essential oil composition. Differentially expressed genes (DEGs) between camphor trees of linalool-type and lowoil-yield camphor trees were identified. In addition to four unigenes in the "monoterpenoid biosynthesis" pathway, this analysis also identified three unigenes in the "terpenoid backbone biosynthesis" pathway. Overall, these results contribute to the general understanding of terpenoid biosynthesis in C. camphora and provide a theoretical basis for metabolic engineering of terpenoids in this species.

Plant Materials
Mature leaves of linalool-type and low-oil-yield C. camphora were harvested from 3-year-old clones grown at the experimental garden of NanChang Institute of Technology, China, in October. The clones were propagated from mother trees through cutting propagation. For each biological replicate (n = 3), leaves from at least five tree clones, which were cloned from the same mother tree, were collected and mixed. The three replicates each for the low-oil-yield variety and linalool-type C. camphora were termed N-(1-3) and LI-(1-3) separately. All the samples were harvested and snap-frozen in liquid nitrogen and then stored at −80 • C until RNA extraction. Meanwhile, leaves were also collected for oil extraction and chemical component determination.

Essential Oil Extraction and Determination of Essential Oil Yield
Fresh leaves were harvested and processed for oil extraction immediately after harvesting. Leaf samples were hydrodistilled in a modified Clevenger-type apparatus for 4 h. Extracted essential oils were weighed, and oil yields were calculated using the formula: Essential oil yield (%) = W1/W2 × 100 where W1 is the weight of extracted essential oil, and W2 is the weight of fresh leaves.

Gas Chromatography-Mass Spectrometry Analysis
Analyses of essential oils were performed on a gas chromatography system (Agilent 7890B, United States) equipped with a TG-5MS capillary column (30 m × 0.25-mm internal diameter × 0.25-µm film thickness). Helium with a flow rate of 1 mL/min was used as the carrier gas. The injector was maintained at 220 • C. The initial temperature of the column was set at 40 • C and increased to 280-300 • C at a rate of 6.5 • C/min. The mass spectrometer was set to scan in the range of 50-650 m/z with a scan rate of 0.5 scans/s. Compounds were determined based on their relative retention time and mas spectra data. Details were described in our previous study (Zhang et al., 2019).

cDNA Library Preparation and RNA Sequencing
Total RNA was extracted from samples using the RNeasy Plant Mini Kit (Qiagen, United States) following the manufacturer's instructions. Next, gDNA was removed using DNase I digestion (Qiagen, United States). The integrity of extracted RNA was determined by the RNA Nano 6000 Assay Kit of the Agilent 2100 Bioanalyzer (Agilent Technologies, United States). Extracted RNA was sent to Gene Denovo Biotechnology Co. (Guangzhou, China) for cDNA library preparation and sequencing. Up to 1 µg of total RNA from each sample was used for library construction using the NEB #7530 kit for Illumina R (New England Biolabs, United States), as described by the manufacturer. mRNA was first enriched by oligo(dT) beads and fragmented into small fragments using fragmentation buffer and reverse-transcribed into cDNA with ProtoScript II Reverse Transcriptase and random primers. Second-strand cDNA was synthesized by DNA polymerase I along with RNase H, dNTP, and buffer. Next, cDNA fragments were purified with 1.8 × Agencourt AMPure XP Beads, endrepaired, and ligated to Illumina sequencing adapters after addition of poly(A) tails. Ligated fragments were subjected to size selection by gel electrophoresis and polymerase chain reaction (PCR) amplified. The resulting cDNA library was sequenced using Illumina HiSeq TM 4000.

De novo Assembly and Functional Annotation
Raw reads from each sample were filtered by removing reads containing more than 10% of unknown nucleotides, low-quality reads that contained more than 40% of low-quality (Q value ≤ 20) bases, and adapter sequences according to the Illumina adapter list. Residual ribosome RNA (rRNA) reads were identified by mapping the filtered reads to rRNA, and the identified sequences were then removed. The resulting 87.9 M reads were assembled using short read assembler, Trinity (Grabherr et al., 2011), and further clustered to obtain non-redundant unigenes by Corset (Davidson and Oshlack, 2014). The completeness of the transcriptomic assembly was assessed using BUSCO (Simao et al., 2015) by comparing the 1,440 embryophyta-specific genes to our unigenes. Unigenes were annotated by BLASTx (with an E-value threshold of 1e −5 ) to the Swiss-Prot protein database (Bairoch and Apweiler, 2000) and NCBI non-redundant protein (Nr) database (O'Leary et al., 2016). Nr annotation results of unigenes were used for Gene Ontology (GO) annotation by Blast2GO software (Conesa et al., 2005), and functional classification of unigenes was performed using WEGO software (Ye et al., 2006).

Identification of Differentially Expressed Genes
Expression values of each gene were calculated as reads per kilobase (of exon) per million (RPKM) mapped reads. Biological samples with low and high oil contents were grouped, and comparison was undertaken to identify DEGs between the two experimental groups. Principal component analysis (PCA) was also performed as a quality control. DEGs with a fold change ≥ 2 and p < 0.01 across the two experimental groups were identified using the edgeR package (Robinson et al., 2010).

Quantitative Reverse Transcriptase-PCR
Leaves from five randomly selected 3-year-old tree clones of each chemotype were collected for quantitative reverse transcriptase (RT)-PCR analysis. Total RNA was extracted and reversetranscribed into cDNA using the HiScript II Q RT SuperMix for quantitative PCR (qPCR) (Vazyme, China). The PCRs were performed using ChamQTM SYBR R qPCR Master Mix (Vazyme, China) according to the manufacturer's instructions on an ABI Step One Plus System with three replicates. Sequences of the gene-specific primers are listed in Supplementary Table S1. Relative gene expression was calculated using the 2 − Ct method, and ACTIN was chosen as the internal reference gene to normalize expression levels.

Chemical Composition of Leaf Essential Oils From C. camphora of Different Oil Contents
In general, there were no major differences in morphology between camphor trees of different chemotypes, and the trees were therefore initially classified by odor alone. Fresh leaves of the six samples were harvested, and the essential oils were obtained by water distillation. The average oil outputs of the samples from linalool-and low-oil-yield chemotypes were 1.3 and 0.3%, respectively. Chemical composition of the extracted essential oils was determined by gas chromatography-mass spectrometry (GC-MS). Table 1 lists the major components detected, 1-15 are monoterpenes, and the remaining are sesquiterpenes. In the essential oils of the linalool type, β-linalool was the principal constituent, making up ∼90% in average. Among the various constituents of the essential oils extracted from the different chemotypes of camphor trees, the major difference was observed for the monoterpene content. Therefore, transcriptome analyses between trees of linalool type and trees with low oil yield were performed to investigate the molecular mechanisms underlying the biosynthesis of monoterpenes.

De novo Assembly of the C. camphora Transcriptome
To construct a genome-wide transcriptome of C. camphora, total RNA of the experimental samples was isolated. This pooled total RNA was sequenced and a total of 87.9 M 150 bp paired-end reads remained after raw reads were processed with sequence filtering and quality control (Supplementary Table S2). De novo assembly was carried out with short reads using Trinity (see section "Materials and Methods"). A total of 121,285 unigenes were assembled with a high N50 value (1063), as well as high 1 | Chemical composition of the essential oils from leaf extracts of C. camphora.
values for both the maximum unigene length (15,713) and percentage of mapped reads (>98%), indicating a good assembly quality ( Table 2). In addition, BUSCO analysis was performed to assess the completeness of the final genome assembly. Based on the 1,440 embryophytic-specfic genes, 79.8% were identified as complete genes, including 76.3% complete and single-copy genes and 3.5% complete and duplicated genes, and 8.8% were identified as fragmented genes, indicating good completeness of the genome assembly (Supplementary Figure S1).

RNA-Seq Analyses of DEGs Between C. camphora Leaves With High and Low Oil Contents
To get a better understanding of the dynamic processes of terpenoid biosynthesis in C. camphora, we performed transcriptome analyses of camphor leaves with high and low oil contents based on the oil yield and GC-MS results ( Table 1). Trimmed reads were mapped to the de novo transcriptome assembly of C. camphora. PCA demonstrated a good similarity between samples in the high-and low-oilcontent groups (Supplementary Figure S2A). Similarly, the range of Pearson correlation coefficient between two samples in each experimental group is 0.93-0.96 (Supplementary Figure S2B), also indicating that the samples in the same groups are highly correlated. Further transcriptome analysis identified 3,689 DEGs with p < 0.01 and |log2 fold_change| > 2, which comprised 2,120 up-regulated and 1,569 down-regulated unigenes. The GO terms "metabolic process" in the "biological process" category and "catalytic activity" in the "molecular function" category were the two most highly represented within DEGs (Supplementary Figure S3). In accordance with the GO analysis, KEGG enrichment analysis revealed that most of the 3,689 DEGs were in "metabolism" class (Figure 2A), which included "biosynthesis of secondary metabolites, " "phenylpropanoid biosynthesis, " "monoterpenoid biosynthesis, " "terpenoid backbone biosynthesis, " and other pathways (Supplementary Table S5).

Putative Genes Involved in Terpenoid
Backbone and Terpenoid Biosynthesis in C. camphora Annotated protein-coding enzymes were mapped into the terpenoid backbone biosynthesis pathway, and gene expression data of each enzyme in the two experimental groups were indicated with logRPKM values (Figure 2B and Supplementary  Table S6). Two unigenes encoding key enzymes, AACT-like in the MVA pathway and DXS-like in the MEP pathway, exhibited a higher expression level in the groups with high oil contents. Because both MVA and MEP pathways generate the precursors IPP and DMAPP for the final production of terpenoids, increased expression of these two genes could explain the higher amount of terpenoids we observed in this group (Table 1). Additionally, the GGPPS-like gene also had a higher expression level in the leaves of linalool type. Among the DEGs, 14 unigenes were annotated as terpenoid synthase-encoding genes, including four in monoterpenoid biosynthesis, four in sesquiterpenoid and triterpenoid biosynthesis, and six in diterpenoid biosynthesis ( Figure 2B and Supplementary Table S6). One LIS-like and one GerS-like unigene were up-regulated in the monoterpenoid biosynthesis pathway, indicating that increased availability of terpenoid precursors, along with increased levels of monoterpene synthases, might be the reason for the higher oil yield and higher accumulation of linalool and other monoterpenes in the trees of linalool type. Besides monoterpenoid synthases, four sesquiterpenoid synthase unigenes encoding GerD and four diterpenoid synthase unigenes encoding CLS, CYP88D6, GA2OX1, and GA2OX8 were also up-regulated. In general, the upregulation of terpenoid synthase unigenes in the trees of linalool-type C. camphor is consistent with their increased accumulation of terpenoids, suggesting that the presence of various terpenoid synthases might be the reason for the occurrence of varied terpenoids in the essential oils of C. camphora. To validate the reliability of our RNA-Seq data, four DEGs were chosen for qRT-PCR examination. As shown in Figure 2C, the expression profiles of these unigenes observed by qRT-PCR are identical to their profiles identified in the RNA-Seq analysis.

DISCUSSION
C. camphora oils have a wide variety of uses, including ornamental, medicinal, and industrial, which makes understanding the regulation of essential oil biosynthesis in camphor tree highly important (Guo et al., 2016;Jiang et al., 2016;Chen et al., 2018). Because of earlier unregulated harvesting, many natural sources of camphor trees have been exhausted. In response to this, camphor tree plantations have been established in many countries including China and Japan. These camphor trees are maintained as bushes and are harvested for twigs and leaves before essential oil extraction. Camphor trees can be divided into different chemotypes based on their main terpenoid components. In this study, we identified camphor trees of linalool type and low oil contents without a dominant terpene component. A total of 26 main components were identified by GC-MS. We further demonstrated that the chemical components in the essential oils of C. camphora leaf extracts contained mainly monoterpenes and sesquiterpenes with very minor quantities of diterpenes (Table 1). Moreover, linalool took up ∼90 of the total essential oil contents in linalool type, indicating that monoterpene levels account for the major difference between the essential oils of different chemotypes. Terpenoid biosynthesis and its regulation are extensively documented in many plant species. Despite having high structural diversity, all terpenes are derived from two isomeric basic backbone molecules, IPP and DMAPP, which are synthesized either through the MVA or MEP pathways (Sacchettini and Poulter, 1997). Although many members of the Lauraceae family are rich in terpenoids and are economically important tree species, limited studies are available analyzing genes involved in its terpenoid biosynthesis pathway (Yang et al., 2005;Lin et al., 2014;Chen et al., 2018;Chen et al., 2020). This study was focused on identifying genes involved in terpenoid biosynthesis and key enzymes involved in the biosynthesis of the backbone IPP and DMAPP molecules in C. camphora. A transcriptome of C. camphora was also generated to provide a reference for future RNA-Seq-based expression profiling of terpenoid biosynthesis. Functional annotation analysis showed significant enrichment in genes associated with "metabolic pathways" and "biosynthesis of secondary metabolites" (Figure 1). This is consistent with our observation that September-November is the time of the year when C. camphora harbors the highest oil contents (Qin et al., 2015), and it is also the harvesting time of our samples. Furthermore, 116 and 18 expressed unigenes were identified to be highly homologous with 49 and 11 known enzymes in the terpenoid backbone biosynthesis and monoterpenoid biosynthesis pathways, respectively.
By comparing the transcriptome profiles of leaves of linalool and leaves with low oil contents, 3,689 unigenes were identified as DEGs. Among them, expression levels of AACT-like and DXS-like, catalyzing the first steps toward IPP and DMAPP in the MVA and MEP pathways, were up-regulated in the leaves of linalool type. DXS genes are believed to be a significant control point in the MEP pathway (Wolfertz et al., 2004;Rohdich et al., 2006), and manipulation of gene expression levels of DXS in Arabidopsis altered the levels of tocopherols, carotenoids, chlorophylls, abscisic acid, and various other isoprenoids (Estévez et al., 2001). However, the contribution of each pathway to the generation of the common precursors for varied terpenoid biosynthesis is still unknown, because crosstalk between the two pathways has been documented in many species including chamomile, Arabidopsis, tobacco, and snapdragon (Adam and Zapp, 1998;Bick and Lange, 2003;Hemmerlin et al., 2003;Laule et al., 2003;Dudareva et al., 2005). In addition, the expression level of GGPPS-like in the "terpenoid backbone biosynthesis" pathway was up-regulated in the comparison, indicating increased supply of precursors for diterpenes biosynthesis. Although the percentage of diterpenes in leaf extracts of linalool-type camphor trees is not high, their oil yield is much higher than trees of low oil yield. Therefore, higher GGPPS expression level may simply be correlated with a general increase in oil content. Interestingly, it has been documented that GPP, the precursor for monoterpene, is constitutively and ubiquitously expressed in Arabidopsis (Van Schie et al., 2013), and our results also showed that the level of GPPS was indifferent between the groups regardless of different accumulation of monoterpenes.
Noticeably, LIS-like shared a high sequence identity (>99%) with some of the identified linalool synthases in the Lauraceae family. The conserved DDxxD and NSE/DTE motifs for Mg 2+ or Mn 2+ ion binding as well as the predicted D562 active site cleft were all properly distributed in the LISlike protein (Supplementary Figure S4 and Supplementary  Table S7). However, caution should be used when assigning protein functions to terpenoid synthase genes predicted based on sequence similarities as it has been reported that point mutations alone were sufficient to significantly change the kinetic activity of a linalool synthase gene in Cinnamomum osmophloeum (Lin et al., 2014). Additionally, alternative splicing has been shown to alter linalool biosynthesis, as reported in Camellia sinensis (Liu et al., 2018). Higher levels of monoterpene production could potentially lower production of other terpenes and isoprenoid derivatives due to the competition for common precursors. Indeed, four unigenes in the "carotenoid biosynthesis pathway" and two unigenes in the "steroid biosynthesis pathway" were downregulated in the comparison (Supplementary Table S6). Similar examples can be seen in tobacco plants producing monoterpene where lower levels of β-caryophyllene were detected in the flowers of these plants (Lücker et al., 2004). Additionally, both transgenic tomato and Arabidopsis plants producing a phytoene synthase and the strawberry linalool/nerolidol synthase separately had significant growth retardation due to the severe inhibition of gibberellin biosynthesis (Fray et al., 1995;Aharoni et al., 2004).
Other genes of interest emerged from this study, which include several members of the WRKY family. WRKY21like, WRKY47-like, and WRKY65-like were all up-regulated in the leaves with high oil content (Supplementary Table S6). WRKY transcription factors have been reported to regulate the expression of genes involved in terpene biosynthesis in many species, including Arabidopsis, cotton, Artemisia annua, Catharanthus roseus, and others (Xu et al., 2004;Ma et al., 2009;Mao et al., 2011;Suttipanta et al., 2011). Moreover, it was suggested that the linalool-type C. osmophloeum contains a W-box element (T)(T)TGAC(C/T) in the promoter region of the linalool synthase gene, enabling regulation by WRKY transcription factors (Lin et al., 2014). Based on these previous studies, we suspect that members of the WRKY family might be involved in the regulation of terpene synthase genes of C. camphora and therefore could potentially play critical roles in the quantitative difference of oil contents between different chemotypes.
All in all, this study identified a cohort of genes in the terpenoid backbone biosynthesis and monoterpenoid biosynthesis pathways that were commonly up-regulated in leaves with high linalool contents compared to leaves with low oil contents. The knowledge obtained from this study could facilitate manipulation of camphor tree essential oil production through metabolic engineering of essential oil biosynthesis. Additional studies should focus on the isolation of monoterpene synthases genes from different chemotypes of C. camphora. Additionally, functional analyses are needed to demonstrate the impact of these genes on terpenoid production.

AUTHOR CONTRIBUTIONS
HZ, JH, and ZJ designed the experiments. BZ, JZ, XJ, and JH performed the experiments. HZ and JH analyzed the data. JH wrote the manuscript. All authors read and approved the final manuscript.

ACKNOWLEDGMENTS
We thank HuiJuan Cheng and Yiqun Zhang for their assistance in sample harvesting and essential oil extraction. We would also like to thank Yanbo Wang and reviewers for their help in improving the manuscript.