Identification of Putative Genes Involved in Limonoids Biosynthesis in Citrus by Comparative Transcriptomic Analysis

Limonoids produced by citrus are a group of highly bioactive secondary metabolites which provide health benefits for humans. Currently there is a lack of information derived from research on the genetic mechanisms controlling the biosynthesis of limonoids, which has limited the improvement of citrus for high production of limonoids. In this study, the transcriptome sequences of leaves, phloems and seeds of pummelo (Citrus grandis (L.) Osbeck) at different development stages with variances in limonoids contents were used for digital gene expression profiling analysis in order to identify the genes corresponding to the biosynthesis of limonoids. Pair-wise comparison of transcriptional profiles between different tissues identified 924 differentially expressed genes commonly shared between them. Expression pattern analysis suggested that 382 genes from three conjunctive groups of K-means clustering could be possibly related to the biosynthesis of limonoids. Correlation analysis with the samples from different genotypes, and different developing tissues of the citrus revealed that the expression of 15 candidate genes were highly correlated with the contents of limonoids. Among them, the cytochrome P450s (CYP450s) and transcriptional factor MYB demonstrated significantly high correlation coefficients, which indicated the importance of those genes on the biosynthesis of limonoids. CiOSC gene encoding the critical enzyme oxidosqualene cyclase (OSC) for biosynthesis of the precursor of triterpene scaffolds was found positively corresponding to the accumulation of limonoids during the development of seeds. Suppressing the expression of CiOSC with VIGS (Virus-induced gene silencing) demonstrated that the level of gene silencing was significantly correlated to the reduction of limonoids contents. The results indicated that the CiOSC gene plays a pivotal role in biosynthesis of limonoids.

Limonoids produced by citrus are a group of highly bioactive secondary metabolites which provide health benefits for humans. Currently there is a lack of information derived from research on the genetic mechanisms controlling the biosynthesis of limonoids, which has limited the improvement of citrus for high production of limonoids. In this study, the transcriptome sequences of leaves, phloems and seeds of pummelo (Citrus grandis (L.) Osbeck) at different development stages with variances in limonoids contents were used for digital gene expression profiling analysis in order to identify the genes corresponding to the biosynthesis of limonoids. Pair-wise comparison of transcriptional profiles between different tissues identified 924 differentially expressed genes commonly shared between them. Expression pattern analysis suggested that 382 genes from three conjunctive groups of K-means clustering could be possibly related to the biosynthesis of limonoids. Correlation analysis with the samples from different genotypes, and different developing tissues of the citrus revealed that the expression of 15 candidate genes were highly correlated with the contents of limonoids. Among them, the cytochrome P450s (CYP450s) and transcriptional factor MYB demonstrated significantly high correlation coefficients, which indicated the importance of those genes on the biosynthesis of limonoids. CiOSC gene encoding the critical enzyme oxidosqualene cyclase (OSC) for biosynthesis of the precursor of triterpene scaffolds was found positively corresponding to the accumulation of limonoids during the development of seeds. Suppressing the expression of CiOSC with VIGS (Virus-induced gene silencing) demonstrated that the level of gene silencing was significantly correlated to the reduction of limonoids contents. The results indicated that the CiOSC gene plays

INTRODUCTION
Citrus is the largest fruit crop in the world, with over 4,000 years history of cultivation. Limonoids are one of the most important classes of phytochemicals predominantly produced in the Rutaceae, Meliaceae, Cneoraceae, and Simaroubaceae families, with citrus being the only one among them that produce edible fruits. Evaluation of the biological activity of citrus limonoids has revealed the potential of these compounds in anticancer, antimicrobial, antioxidant and insecticidal capacities (Alford and Murray, 2000;Chowdhury et al., 2003;Ono et al., 2011;Kim et al., 2013). Limonoids exist in a number of plant organs and tissues such as stems, leaves, seeds and fruit tissues (McIntosh et al., 1982;Rouseff and Nagy, 1982;Herman et al., 1989;Li et al., 2014). The accumulation of limonoids is influenced by genotype, environment, the type of tissues and developmental stages of the plant (Rouseff and Nagy, 1982;Baldwin et al., 2010;Chebrolu et al., 2012;Li et al., 2014;Chen et al., 2015;Wang et al., 2016).
Limonoids are synthesized via the isoprenoids biosynthetic pathway, starting with cyclization of squalene (Roy and Saraf, 2006). In triterpenoids biosynthetic pathway, oxidosqualene cyclases (OSCs) are the first committed enzymes, which convert triterpenoid carbon structures into precursors of triterpenoid metabolites (Phillips et al., 2006). The large number of OSC enzymes has been characterized in plants with different substrate and product specificities (Thimmappa et al., 2014). These OSC enzymes catalyze the cyclization reactions of 2,3-oxidosqualene to form sterols, brassinosteroids, cucurbitacins, and scaffolds of other triterpenes (Yokota, 1997;Shang et al., 2014;Thimmappa et al., 2014). Those triterpene scaffolds were further modified by tailoring enzymes such as CYP450s, glycosyltransferases, methyltransferases and acyltransferases to produce all kinds of aglycones and glucosides of triterpenes (Thimmappa et al., 2014). Nomilin, the precursor and one of the major limonoids is a triterpenoid (Hasegawa et al., 1986;Ou et al., 1988;Manners, 2007). All kinds of limonoids are produced in the limonoids biosynthetic pathway (Figure 1) via oxidization, isomerization, methylation, acetylation and hydrolyzation of nomilin (Hasegawa et al., 1980;Hasegawa and Herman, 1985). Comparatively in recent decades, while many studies have focused on the how, when and where limonoids are biosynthesized and accumulated in citrus, there has been only a few on the genetic control of biosynthesis of limonoids. The discovery of non-bitter limonoid glucosides and identification of the limonoid glucosyltransferase gene (LGT) responsible for converting limonoid aglycones into corresponding non-bitter limonoid glucosides was the most important breakthrough in understanding naturally occurring limonoid debiting processing (Hasegawa et al., 1997). This provided the possibility to genetically modify the citrus to produce fruits free from limonoid bitterness which still could be beneficial to human health with high concentration of limonoid glucosides. However, improvement of overall production of limonoids is still not possible due to the key genes controlling the biosynthesis of limonoids have not yet to be identified. Therefore, the aim of this study is to analyze the correlation of gene expression and limonoids contents with different citrus samples, from multiple genotypes, tissues and developmental stages to identify the factors that contribute to control of biosynthesis of limonoids.

Plant Materials and Sampling
All the citrus varieties used in this study were provided by the National Citrus Germplasm Repository, located in the Citrus Research Institute of Chinese Academy of Agricultural Sciences (CRIC), in Beibei, Chongqing, China.
For DGE analysis, leaf, phloem and seed samples were collected at same time from Dongfengzao pummelo (C. grandis (L.) Osbeck) on August, 2014. The leaf samples at three developmental stages (LI, LII, LIII) (Supplementary Figure S1) were collected. Phloem section in the internode between two leaves at the LII stage was collected and termed as PII. The collected seeds sample at this time was termed as SII. Those same samples were used for both transcriptome sequencing and determination of limonoids content.
The seeds of GXSTY were germinated for virus-induced gene silencing (VIGS).

RNA Extraction and Sequencing
Total RNA was extracted following the protocol provided by RNAprep pure Plant Kit (Tiangen Biotech (Beijing) Co., LTD., China). Transcriptome sequencing was conducted by BGI Tech Solutions Co., Ltd., Wuhan, China.

Raw Data Processing, Functional Annotation, and Statistical Analyses
The raw reads were processed to remove the adapter sequences and low-quality reads (Length < 30 bp, quality < 9). The data have been submitted to NCBI database (NCBI accessions: SRP071684). Sequencing saturation analysis shows that the number of mapped genes tends to saturation (Supplementary Figure S2). The resulted high quality reads (Length ≥ 30 bp, high quality ≥ 9) were then mapped to the reference genome and gene sequences of Citrus clementina (Phytozome version 10.0) using TMAP software. The mapped reads were normalized by calculating the reads per kilobase per million mapped reads (RPKM) for each gene. Subsequently the p values and log 2 of each gene were then calculated. Those genes were blasted to NCBI Non-redundant database (E-value ≤ 1e −5 ), and annotated with Blast2GO software 1 and Kyoto Encyclopedia of Genes and Genomes database (KEGG) using BLASTx (E-value ≤ 1e −5 ). The differentially expressed genes (DEGs) were identified at threshold of | log 2 | ≥ 1 and FDR < 0.001 based on the method of Poisson distribution (Audic and Claverie, 1997). Localization of DEGs FIGURE 1 | Biosynthetic pathways of limonoids in citrus (Roy and Saraf, 2006). into metabolic pathways was performed using the MapMan software (Thimm et al., 2004).
Particular expression patterns of differential expressed genes were analyzed with K-means clustering based on their log 2 ratio. Visualizations of the clusters of different expression patterns were performed using MultiExperiment Viewer (MEV; version 4.9).
Amino acid sequences of OSC and CYP450s genes were downloaded from NCBI database 2 following published reports 2 http://www.ncbi.nlm.nih.gov/protein/?term (Thimmappa et al., 2014). Sequences alignment and phylogenetic analysis were performed with MEGA5.0 software. Phylogenetic trees were constructed using the neighbor-joining method with 1,000 bootstrap replicates. The accessions of OSC and CYP450s genes are listed in Supplementary Files S1, S2, respectively.

Quantitative Real-Time PCR (qRT-PCR) Analyses
Some highly differential expressed transcripts identified by DGE were selected as candidate genes for qRT-PCR analysis. The leaves of seven varieties and the seeds of GXSTY and ESP from 4 different developmental stages were used to validate the expression of those candidate genes in relation to the biosynthesis of limonoids. The primers of qRT-PCR were designed using Primer premier 5.0 software (Supplementary  Table S1).
1.0 µg total RNA was reversely transcribed into single strand cDNA using RevertAid First Strand cDNA Synthesis Kit (Thermo Scientific, USA). qRT-PCR reaction was performed in a total volume of 20 µl. It contained 1 × iTaq TM Universal SYBR R Green Supermix (Bio-Rad, USA), 200 nM of each primer, and 2 µl cDNA template (ten-fold dilution of first strand cDNA). The PCR program was performed as follows: initial denaturation for 10 min at 95 • C, 40 cycles of 15 s at 95 • C and 60 s at 59 • C on a Real-time PCR system (Applied Biosystems 7500 Fast, Applied Biosystems, USA). "No template controls" (NTC) were included in each run and each sample was set in three technical replicates. The citrus Actin and GAPDH genes were used as the endogenous reference genes to normalize the expression level of the genes. Relative quantity of the expression of each gene was calculated using the 2 − Ct method (Livak and Schmittgen, 2001).

Silencing of CiOSC Gene with VIGS
A 367 bp fragment of CiOSC amplified from the RNA of leaves of GXSTY with CiOSC-V2 primers (Supplementary Table S1) was inserted into XbaI and BamHI sites of the TRV vector (TRV-derived vector was supplied by Prof. Yuncong YAO from Beijing University of Agriculture). The constructions of TRV1 and TRV2 were transformed into Agrobacterium tumefaciens strain EHA105 by electroporation. Agrobacterium suspensions of TRV1 and TRV2 were prepared as previously described (Liu et al., 2002). Germinated seeds from GXSTY variety were submerged into Agrobacterium suspensions inside of a vacuum chamber as previously reported (Deng et al., 2012;Yan et al., 2012). After vacuum infiltration, the germinated seeds were rinsed with water, sown in soil in a growth chamber. Thirty days post-infiltration (dpi), the whole plants were ground into fine powder in liquid nitrogen immediately after removal of young leaves and shoot apex, and stored at −80 • C.
The extraction of total RNA and cDNA synthesis were performed using the method described above. The expression of CiOSC gene was determined with qRT-PCR analysis. Plants infected with empty vector, and non-infected plants were used as controls. To normalize the gene expression, CiGAPC2 was used as a reference gene. All the samples were set in three technical replicates.

Determination of Limonoids Contents
The contents of limonin and nomilin were determined by HPLC in two replicates as reported previously (Wang et al., 2016). Nomilin and limonin were extracted from the samples twice  with dichloromethane in an ultrasonic bath for 20 min per extraction. The ratio of dichloromethane (mL) to sample (grams) was 10: 1. For HPLC analysis, the column was maintained at 30 • C. Fifteen microliter sample solutions were injected to run in a mobile phase consisting of acetonitrile: deionized water at 40: 60 (V:V) with a flow rate of 1 ml/min. Nomilin and limonin were detected at 210 nm wavelength. The total content of limonoids is the sum of the contents of limonin and nomilin.  21,789 in SII. Those genes were blasted with public databases of Nr, GO and KEGG. Differential expression analysis was performed by pair-wise comparison using |log 2 | ≥ 1 and FDR < 0.001 as thresholds. LI was defined as the reference sample because of its low limonoids content. As shown in Figure 2A, the numbers of DEGs were 2,086 between LII and LI, 4,082 between LIII and LI, 5,837 between PII and LI, and 9,428 between SII and LI. Venn Diagram analysis indicated that 924 DEGs were shared across all four sets of paring comparisons, which accounted for 7.6% of the total genes ( Figure 2B). However, only 47 out of 924 genes were annotated to be involved in secondary metabolism.

Digital Gene Expression Analyses
To search for the genes involved in the biosynthesis of limonoids, K-means cluster analysis was performed using the genes expressed in leaves at three stages to investigate their expression profiles. The 924 shared DEGs were clustered into ten groups. Genes with the same expression pattern were filtered by MEV software (Figure 3A). More than two-third of genes were characterized to 22 functional categories such as DNA, mics, signaling, RNA, stress, cell wall, transport, hormone metabolism, cell, development, secondary metabolism and lipid metabolism. There were 92 genes not assigned to any functional categories ( Figure 3B).
Functional analysis indicated that in the 382 genes, 7 of them participate in secondary metabolisms of phenylpropanoids, alkaloid, wax, phenols, terpenoids and flavonoids. ciclev10010416m (termed as CiOSC) was the only gene involved in the biosynthesis of triterpenoids. The OSC gene encodes OSC, a key enzyme in terpenoids biosynthetic pathway. Phylogenetic analysis demonstrated that CiOSC was clustered in the same sub-clade with cucurbitadienol synthases genes CPQ and Bi, which regulate biosynthesis of triterpenoids in squash (Cucurbita pepo) and cucumber (Cucumis sativus L.), respectively (Shibuya et al., 2004;Shang et al., 2014; Figure 4A). Therefore, CiOSC could be one of the key genes affecting biosynthesis of limonoids through regulating the production of triterpenoids.
Sixty-six transcription factor genes (TFs) were present among the 924 DEGs, covering 19 families. The AP2/EREBP and bHLH are the two largest families. In this study, only six AP2/EREBP members and one bHLH member were positively correlated with limonoids contents in the leaf samples LII and LIII. The expression of the remaining TFs (WRKY, C2H2, MYB, Trihelix, and GRAS members) also corresponded to limonoids contents.

Correlation Analyses of Candidate Genes
To verify the DEGs obtained from sequencing and the bioinformatic analyses were indeed corresponding to the gene expression profiles, 20 candidate genes correlated with biosynthesis of limonoids were selected for qRT-PCR analysis. The results demonstrated that the expression profiles were consistent between the results of DGE and qRT-PCR (Supplementary Figure S3).
The materials of developing leaves, phloem and seeds from Dongfengzao pummelo, and the seeds at four development stages from ESP and GXSTY varieties were used for validating the genes correlated with biosynthesis of limonoids ( Table 3). The expression patterns of 15 genes were found to correspond to the levels of limonoids contents. The results of Poisson correlation analysis showed that the expression levels of most of the genes were positively correlated with limonoids contents in ESP and   GXSTY (Table 4). Correlation coefficient of CiOSC between gene expression levels and limonoids contents in ESP and GXSTY were at 0.722 and 0.741, but not high enough levels to be significant. The expression levels of CiOSC were significantly different among the varieties tested, and highly linked to the limonoids contents. High levels of expression were found in GXSTY and JC, which had higher limonoids contents. The reverse was observed in the lower limonoids content varieties of HPJG, RAJG, WCZPG, GCMG, and DNXY ( Figure 4B). This verification was carried out with the remaining 14 genes, including four TFs, six CYP450s, three UGTs and one acyltransferase genes, which had the same expression patterns in developing leaves. The amino acid sequences of CYP450s and UGTs are included in the Supplementary Files S2, S3. Phylogenetic analysis classified six CYP450s into three clusters (Supplementary Figure S4). The expression of the candidate genes demonstrated close relationship with the limonoids contents during seed development. In particular, significantly positive correlations were observed between the levels of gene expression and the limonoids contents in ciclev10021695m (MYB), ciclev10030087m (CYP450), ciclev10022473m (CYP450), ciclev 10000886m (CYP450), and ciclev10025448m (CYP450) genes, with the correlation coefficients between 0.958 b and 0.995 a . The expression of three UGTs genes (ciclev10020010m, ciclev 10015042m, ciclev10001658m) showed different correlations with limonoids contents in the developing seeds of ESP and GXSTY. In low limonoids content variety GXSTY, UGTs genes expressed positively (correlation coefficients 0.652∼0.771) with limonoids contents during seeds development, but negative correlations (−0.61 ∼ −0.85) with limonoids contents in high limonoids content variety ESP.

Confirmation of CiOSC Gene Related to the Biosynthesis of Limonoids
Virus-induced gene silencing was used to silence the expression of the CiOSC gene in the seedlings of C. grandis variety GXSTY. Six positive seedlings with significantly reduced expression of CiOSC gene were obtained from ten infiltrated seedlings (Table 5 and Supplementary Figure S5). No obvious morphological differences were observed among untreated seedlings, seedlings infected with empty vector, and TRV-CiOSC  Figure S6). Compared to the seedlings infiltrated with empty vector, the level of gene expression in those six seedlings decreased to 9.09∼51.10% (Table 5). Consequently, the limonin and nomilin contents of those seedlings were reduced to 35.08∼50.59% and 44.49∼60.83%, respectively (Table 5 and Figure 5). The relative expression level of CiOSC gene in one of the transgenic plants was significantly inhibited (only 9.09% of control). Consequently contents of limonin and nomilin were reduced to only 35.08 and 44.49% of the control, respectively.

DISCUSSION
It is generally known that triterpenes are the primary precursor for synthesis of limonoids. Some of the CYP450s have been reported to play critical roles in modification of triterpenes, including hydroxyl, ketone, aldehyde, carboxyl, or introduction of epoxy groups (Shang et al., 2014;Thimmappa et al., 2014). Seven out of nine genes controlling biosynthesis of cucurbitacin C, the bitterness substance in cucumber, were members of CYP450s (Shang et al., 2014). In this study, the CYP450 family members of ciclev10030087m, ciclev10000886m, ciclev10031272m, ciclev10025448m, ciclev10031065m, and ciclev 10022473m were found significantly corresponding to the accumulation of limonoids during the seeds development in ESP and GXSTY. This suggests that those genes are obviously involved in the biosynthesis of limonoids.
UGTs were the other key genes identified by DGE analysis. They are the essential genes regulating glucosylation of triterpenes. This study demonstrated positive correlation with limonoids contents, in low limonoids content pummelo genotype but negative correlation, in high limonoids content pummelo genotype through the gene expression analysis of UGTs (ciclev10020010m, ciclev10015042m and ciclev10001658m). This may be due to the difference in ratios of limonoid glucosides to nomilin and limonin between those two varieties, as the function of UGTs is to convert nomilin and limonin to limonoid glucosides. To prove this, further work on analysis of limonoid glucosides contents in these two varieties will be required. However, it has been reported that in some limonoids rich varieties, most of limonoids accumulated as nomilin and limonin, instead of limonoid glucosides (Kita et al., 2003;Zaare-Nahandi et al., 2008).
Oxidosqualene cyclases are critical enzymes for converting 2,3-oxidosqualene or squalene into the precursor of triterpene scaffolds (Thimmappa et al., 2014). Limonoids is one group of triterpenoids specially synthesized via MVA and MEP pathways (Ou et al., 1988;Endo et al., 2002). However, to date, there has been no report on involvement of OSC genes in biosynthesis of limonoids. In the present study, DGE analysis indicated that the expression of CiOSC, a member of OSC genes in pummelo corresponded to the biosynthesis of limonoids. Further investigation demonstrated that the expression of CiOSC was highly correlated to the limonoids contents among pummelo varieties. Suppression of CiOSC expression with TRV-mediated VIGS resulted in the reduced expression of CiOSC which led to the significantly decreased production of limonoids. The results suggested that CiOSC could be a critical gene participating in biosynthesis of limonoids in citrus. The evidence also showed that the expression of CiOSC was much higher in the leaves and phloem than that of the seeds. Previous studies reported that nomilin is synthesized in the phloem, and transported to the other organs and tissues to synthesize into different forms of limonoids (Hasegawa and Hoagland, 1977;Ou et al., 1988). OSCs are responsible for the synthesis of the pre-structure of triterpenoids in many plant species. The CiOSC should act through controlling the production of triterpenoids to regulate the biosynthesis of nomilin in citrus. Phylogenetic analysis revealed that CiOSC was the ortholog of Bi and CPQ genes, which were the first committed enzymes of cucurbitacin biosynthesis in cucumber and in squash, respectively (Shibuya et al., 2004;Shang et al., 2014). The significant involvement of CiOSC in biosynthesis of limonoids demonstrated in this study suggests that a large portion of triterpenoids synthesized by CiOSC may be supplied to the biosynthesis of limonoids in citrus. Based on the genes related to the biosynthesis of limonoids identified in this study, a preliminary draft of genetic factors in the limonoids biosynthetic pathway could be proposed (see Figure 6).
The first glimpse of information on genetic study of biosynthesis of limonoids needs further investigation and verification of the functions of those putative genes. Identification of the major genetic factors on the limonoids biosynthetic pathway should be possible with more research.

AUTHOR CONTRIBUTIONS
FW designed the study, performed experiment, ran data analysis and prepared the draft of manuscript. MW, XL, and YX performed some of experiments. SZ and WS participated in some of experiments and assisted in the preparation of manuscript. XZ is responsible for overall supervision of the study and revised subsequent versions of the manuscript. All authors have read and approved the final version of the manuscript.

ACKNOWLEDGMENT
We thank Chunzhen Cheng for his assistance on data analysis with MapMan software.