Transcriptome Analysis of Dendrobium officinale and its Application to the Identification of Genes Associated with Polysaccharide Synthesis

Dendrobium officinale is one of the most important Chinese medicinal herbs. Polysaccharides are one of the main active ingredients of D. officinale. To identify the genes that maybe related to polysaccharides synthesis, two cDNA libraries were prepared from juvenile and adult D. officinale, and were named Dendrobium-1 and Dendrobium-2, respectively. Illumina sequencing for Dendrobium-1 generated 102 million high quality reads that were assembled into 93,881 unigenes with an average sequence length of 790 base pairs. The sequencing for Dendrobium-2 generated 86 million reads that were assembled into 114,098 unigenes with an average sequence length of 695 base pairs. Two transcriptome databases were integrated and assembled into a total of 145,791 unigenes. Among them, 17,281 unigenes were assigned to 126 KEGG pathways while 135 unigenes were involved in fructose and mannose metabolism. Gene Ontology analysis revealed that the majority of genes were associated with metabolic and cellular processes. Furthermore, 430 glycosyltransferase and 89 cellulose synthase genes were identified. Comparative analysis of both transcriptome databases revealed a total of 32,794 differential expression genes (DEGs), including 22,051 up-regulated and 10,743 down-regulated genes in Dendrobium-2 compared to Dendrobium-1. Furthermore, a total of 1142 and 7918 unigenes showed unique expression in Dendrobium-1 and Dendrobium-2, respectively. These DEGs were mainly correlated with metabolic pathways and the biosynthesis of secondary metabolites. In addition, 170 DEGs belonged to glycosyltransferase genes, 37 DEGs were related to cellulose synthase genes and 627 DEGs encoded transcription factors. This study substantially expands the transcriptome information for D. officinale and provides valuable clues for identifying candidate genes involved in polysaccharide biosynthesis and elucidating the mechanism of polysaccharide biosynthesis.

Dendrobium officinale is one of the most important Chinese medicinal herbs. Polysaccharides are one of the main active ingredients of D. officinale. To identify the genes that maybe related to polysaccharides synthesis, two cDNA libraries were prepared from juvenile and adult D. officinale, and were named Dendrobium-1 and Dendrobium-2, respectively. Illumina sequencing for Dendrobium-1 generated 102 million high quality reads that were assembled into 93,881 unigenes with an average sequence length of 790 base pairs. The sequencing for Dendrobium-2 generated 86 million reads that were assembled into 114,098 unigenes with an average sequence length of 695 base pairs. Two transcriptome databases were integrated and assembled into a total of 145,791 unigenes. Among them, 17,281 unigenes were assigned to 126 KEGG pathways while 135 unigenes were involved in fructose and mannose metabolism. Gene Ontology analysis revealed that the majority of genes were associated with metabolic and cellular processes. Furthermore, 430 glycosyltransferase and 89 cellulose synthase genes were identified. Comparative analysis of both transcriptome databases revealed a total of 32,794 differential expression genes (DEGs), including 22,051 up-regulated and 10,743 down-regulated genes in Dendrobium-2 compared to Dendrobium-1. Furthermore, a total of 1142 and 7918 unigenes showed unique expression in Dendrobium-1 and Dendrobium-2, respectively. These DEGs were mainly correlated with metabolic pathways and the biosynthesis of secondary metabolites. In addition, 170 DEGs belonged to glycosyltransferase genes, 37 DEGs were related to cellulose synthase genes and 627 DEGs encoded transcription factors. This study substantially expands the transcriptome information for D. officinale and provides valuable clues for identifying candidate genes involved in polysaccharide biosynthesis and elucidating the mechanism of polysaccharide biosynthesis.

INTRODUCTION
The Orchidaceae is one of the largest and most widespread families of flowering plants, with more than 250,000 species (Leitch et al., 2009). The genus Dendrobium is one of the largest genera of the Orchidaceae and has nearly 1100 species throughout the world and is spread widely in India across to Japan, south to Malaysia, and east to Australia, New Guinea, and the Pacific islands (Wu et al., 2009). Dendrobium officinale, a critically endangered orchid in the wild (http://www.iucnredlist. org/details/46665/0), has been one of the most important Chinese herbs in China for hundreds of years and is eaten or used as folk medicine for antipyretic, eye-benefitting and immune regulatory purposes (Yang et al., 2006).
The major active ingredients of D. officinale are polysaccharides, alkaloids, phenols, coumarins, terpenes, flavonoids, amino acids, benzyl compounds, and several trace mineral elements (Weng, 2003;Li et al., 2011). D. officinale has a thick water-soluble polysaccharide-rich stem. Dendrobium polysaccharides are mainly composed of glucose and mannose, as well as a small amount of rhamnose, xylose, and arabinose (Fan et al., 2009;Luo et al., 2010). Polysaccharides have been demonstrated in recent years to show prominent bioactivities, including antioxidant, immune stimulation, and anti-tumor (Hsieh et al., 2008;Fan et al., 2009;Luo et al., 2010;Wang et al., 2010;Liu et al., 2011;Xia et al., 2012). Soluble polysaccharides from D. officinale exerted stronger immune modulatory activity than D. fimbriatum, D. nobile, D. chrysotoxum, and D. huoshanense (Meng et al., 2013). Dendrobium polysaccharides have gained increasing attention in the biomedical and drug delivery fields. On the current market, the quality of D. officinale is mainly determined by the content of soluble polysaccharides. The component of polysaccharides from different Dendrobium species is different. For example, the polysaccharide fractions from D. denneanum are composed of glucose, mannose and galactose in the ratio of 227:59:17, as well as small amounts of xylose and arabinose (Fan et al., 2009). The polysaccharide fraction from D. huoshanense consists of glucose, mannose and galactose in the ratio of 31:10:8 (Zha et al., 2007). The polysaccharides from D. officinale were shown to be a 2-O-acetylglucomannan, composed of mannose, glucose, and arabinose in a 40.2:8.4:1 molar ratio (Hua et al., 2004). On the whole, mannose and glucose are the main monosaccharides in these Dendrobium species. Although the bioactivities, composition, structure, and physicochemical properties of polysaccharides from Dendrobium are well defined, the enzymes and encoding genes responsible for their synthesis and metabolic pathway remain poorly characterized. Therefore, an understanding of the molecular mechanisms underlying the synthesis of Dendrobium polysaccharides is essential.
So far, the transcriptome of only one Dendrobium species has been sequenced (Guo et al., 2013). It only revealed limited information related to genes in the stem in a certain stage, focusing on the putative alkaloid biosynthetic genes and genetic markers. The molecular mechanisms underlying polysaccharides synthesis and the related metabolic pathway for D. officinale remain unknown. In this study, we established two transcription databases for juvenile and adult D. officinale and identified 430 glycosyltransferase genes (GTs) and 89 cellulose synthase genes (CesA). Differentially expressed genes (DEGs) were analyzed. Differentially expressed GTs, CesA and transcription factors (TFs) are also reported. Such data for D. officinale could be used as an important resource to investigate GTs and the metabolic pathway of polysaccharides in D. officinale. Furthermore, this database will supply important clues to explore other biological mechanisms in this Dendrobium species and in other orchids.

Plant Materials and Growth Conditions
D. officinale was grown in the greenhouse of the South China Botanical Garden and used in this study. Seeds derived from selfing were germinated and cultured on half-strength Murashige and Skoog (MS) (Murashige and Skoog, 1962) medium containing 0.1% activated carbon, 2% sucrose, and 0.6% agar (pH = 5.4). The cultures were incubated at 26 ± 1 • C with a 12 h photoperiod under cool white fluorescent lamps delivering a photosynthetic photon flux density (PPFD) of ca. 45 µmol m −2 s −1 . Material was collected from juvenile seedlings and adult plants. The juvenile seedlings were 10 months old after germinating in vitro. The adult plants were 18 months old after juvenile seedlings was transplanted into pots and placed in the greenhouse of the South China Botanical Garden at a day/night temperature of 28/25 • C with a 12-h period. The materials used are shown in Figure 1. cDNA libraries were prepared from entire D. officinale plants at the juvenile and adult stages. Plants were collected in November 2013 at the vegetative stage.
Fresh samples were used to extract total RNA immediately.

cDNA Library Preparation and Illumina Sequencing for Transcriptome Analysis
Total RNA (25 µg) was extracted using Column Plant RNAout 2.0 (Tiandz Inc., Beijing, China) according to the manufacturer's protocol. Preparation of the cDNA library was described in detail in a previous study employed for another orchid, Cymbidium sinense (Zhang et al., 2013). Two cDNA libraries were constructed from juvenile seedlings and adult plants in which equal amounts of total RNA were pooled from three biological replicates. Shenzhen Genome Institute (BGI, Shenzhen, China) and reads were generated in a 100 bp paired-end format according to the manufacturer's instructions (Illumina Inc. San Diego, CA). All raw transcriptome data were deposited in the GeneBank Short Read Archive. The accession numbers were SRR1904494 and SRR1909493for Dendrobium-1 and Dendrobium-2, respectively.

De novo Assembly and Functional Annotation Analysis of Illumina Sequencing
Raw reads from the sequencing machine were generated by base calling. After filtering raw reads by removing adaptor sequences, empty reads, reads with unknown nucleotides larger than 5% and low quality reads (with ambiguous sequences "N"), clean reads were obtained for de novo assembly. De novo assembly of the transcriptome was carried out with Trinity (ver. 2012-10-05) with the default parameters to form contigs (Grabherr et al., 2011). These contigs were then further processed with sequence clustering software, TGICL (Pertea et al., 2003), to form longer sequences defined as unigenes. The generated unigenes were used for BLASTX alignment (E < 0.00001) and annotation  against protein databases, including non-redundant (nr), Swiss-Port, COG, and KEGG protein databases. With nr annotation, the Blast2GO program (Conesa et al., 2005) was used to obtain the Gene ontology (GO) annotation of unigenes, then WEGO software (Ye et al., 2006) was used to perform GO functional classification for all unigenes and to understand the distribution of gene functions. KEGG is a major public pathway-related database (Kanehisa et al., 2008) that is able to analyze a gene product during a metabolic process and related gene function in cellular processes. KEGG pathway annotation was performed using a BLAST search against the KEGG database (KEGG, http:// www.genome.jp/kegg/).

Identification of Differentially Expressed Genes (DEGs)
To compare the differences in gene expression at two developmental stages, the RPKM method (reads per kb per million reads) was used to calculate read density. By taking into account the variations in gene length and the total mapped number of sequencing reads, the RPKM measure provides normalized values of gene expression that enable transcript comparisons between samples. The false discovery rate (FDR) was used to determine the threshold P-value in multiple tests. We used an FDR < 0.001, P ≤ 0.05 and an absolute value of the log 2 ratio >1 as the threshold to determine significant differences in gene expression. The DEGs were used for GO and KEGG enrichment analyses according to a method used for Cymbidium sinense (Zhang et al., 2013).

Quantitative Real-Time PCR Validation
Total RNA was extracted as indicated above. Each RNA sample was treated with RNase-free DNase (Promega, Madison, USA) following the manufacturer's protocol in an effort to remove any residual genomic DNA (gDNA). DNase-treated RNA (2 mg) was subjected to reverse transcriptase reactions using M-MLV reverse transcriptase (Promega, Madison, USA) according to the manufacturer's instructions. The sequences of the specific primer sets are listed in Additional file 1. The constitutively expressed gene, D. officinale actin (cloned by our laboratory; NCBI accession number: JX294908), was used as the internal control. qRT-PCR was performed according to our previously published study (He et al., 2015). The expression level was calculated as 2 − Ct and normalized to the Ct value of D. officinale actin. The qRT-PCR results were obtained from three biological replicates and three technical repeats for each gene and sample.

De novo Assembly and Sequence Annotation
A total of 102 million 100 bp reads were assembled into 107,086 contigs with a mean length of 824 bp in Dendrobium-1, and a total of 86 million 100 bp reads were assembled into 129,235 contigs with a mean length of 728 bp in Dendrobium-2 ( Table 1). Using paired-end reads, the Dendrobium-1 contigs were further assembled into 93,881 unigenes by Trinity with a mean length of 790 bp. The size distribution of these contigs and unigenes in Dendrobium-1 are shown in Figure 2A. The assembly produced a substantial number of large contigs and unigenes: 34,113 contigs were >1000 bp in length and 27,968 unigenes were >1000 bp in length (Figure 2A). The contigs in Dendrobium-2 were further assembled into 114,098 unigenes by Trinity with a mean length of 695 bp. The size distribution of these contigs and unigenes are shown in Figure 2B. The assembly produced a substantial number of large contigs and unigenes: 33,624 contigs were >1000 bp in length and 27,229 unigenes were >1000 bp in length ( Figure 2B). The contigs in two transcriptome sequencing databases were integrated and assembled into a total of 145,791 unigenes. These unigenes were annotated using BLASTX searches against NCBI, Nr, Swiss-Prot, KEGG, and COG databases. In total, there were 67,396 annotated unigenes (46.23% of all unigenes), providing a significant BLAST result. Among them, 66,541 unigenes (98.73% of all annotated unigenes) showed significant similarity to known proteins in the Nr database and 25,982 unigenes (38.55%) were annotated in COG based on sequence homologies.
In the COG classification, 25,982 unigenes were classified into 25 functional classifications (Figure 3). The most dominant term was "General function prediction only" and 7861 unigenes (30%) matched it. "Translation, " "replication, recombination, and repair" also shared a high percentage of genes among the categories, and only 4 and 19 unigenes matched the terms "nuclear structure" and "extracellular structures, " respectively. In addition, 2754 unigenes were annotated as the "carbohydrate transport and metabolism" category and 1386 unigenes in the "secondary metabolites biosynthesis transport and catabolism" category, both of which may play an important role in the biosynthesis of polysaccharides and small molecules with proven bioactivity.

Gene Ontology Classification and Metabolic Pathway Assignment by KEGG
A total of 24,002 annotated unigenes were grouped into 41 functional groups by using GO assignments. Among these groups, 22 groups were involved in biological processes, 9 groups in cellular components and 10 groups in molecular functions. Metabolic processes and cellular processes were dominant in the biological process category. Within the molecular function category, a high percentage of genes were associated with catalytic activity and binding. Most assignments in cellular components were to cell components and cell membranes (Figure 4).
In this study, a total of 67,396 annotated sequences were mapped to reference canonical pathways in KEGG. In total, 17,281 sequences were assigned to 126 KEGG pathways (Additional file 2). The metabolic pathways represented the greatest group (4473 unigenes, or 25.88%), with most unigenes involved in starch and sucrose metabolism (320 unigenes), amino sugar and nucleotide sugar metabolism (288 unigenes), fructose and mannose metabolism (135 unigenes), and galactose metabolism (124 unigenes). A total of 2115 unigenes were involved in the biosynthesis of secondary metabolites, including phenylpropanoid biosynthesis, terpenoid backbone biosynthesis, cyanoamino acid metabolism, carotenoid biosynthesis, and others ( Table 2). These pathways provide a valuable resource for investigating specific processes, functions and pathways during D. officinale development.
Mannose and glucose are the main monosaccharide building blocks in D. officinale. Fructose and mannose metabolism found in the KEGG pathway involved 135 unigenes. A detailed metabolic pathway for fructose and mannose metabolism is shown in Figure 5. Every gene in the pathway was associated with several unigenes. The pathway will be useful for further studies on the effect of the fructose and mannose metabolism pathway on the biosynthesis of active polysaccharides.  (Figure 6). GTs, which are enzymes that synthesize oligosaccharides, polysaccharides, and glycoconjugates, were dominant in carbohydrate-active related unigenes, and 430 GTs were divided into 35 GT families. A comparison of GTs families and numbers among A. thaliana, O. sativa, and D. officinale is shown in Additional file 4. D. officinale lacks several GT families, GT9, GT16, GT19, GT30, GT33, GT37, GT50, GT57, and GT58, which were present in A. thaliana and O. sativa. GT59 and GT76 were present in A. thaliana and O. sativa, but not in D. officinale while GT39 was present only in D. officinale.
The family of mannans is the most widespread group of polysaccharides in higher plants (Moreira and Filho, 2008). Cellulose synthase (CesA) superfamily genes were involved in the biosynthesis of mannan polysaccharides (Liepman et al., 2005). The CesA superfamily is classified into one CesA family and nine cellulose synthase-like (Csl) families, namely CslA/B/C/D/E/F/G/H/J. The CesA superfamily members of A. thaliana and O. sativa were used as bait for blasting the candidate unigenes from D. officinale protein libraries. A total of 89 candidate unigenes for CesA in D. officinale were identified and were listed in Additional file 5. A molecular phylogenetic tree (Figure 7) was constructed by using MEGA4 (Tamura et al., 2007), employing 19 unigenes that were translated into amino acid sequences, together with other CesA superfamily members from A. thaliana and O. sativa. The 19 unigenes were classified into six families, CesA, CslA, CslC, CslD, CslE, and CslH with 6 and 5 unigenes belonging to CesA and CslA families, respectively.

Screening and Identification of DEGs
To identify the DEGs during both developmental stages, the number of clean tags for each gene was calculated, and the genes that were differentially expressed between the two samples were identified according to the method described by Audic and Claverie (1997).
Among 32,794 DEGs, a total of 18,517 (56.46% of all DEGs) unigenes provided a significant BLAST result. Approximately 4722 unigenes could be annotated in KEGG and 6356 unigenes could be annotated in GO based on sequence homologies, while 18046, 13790, and 7490 unigenes were annotated in Nr, SWISSprot, and COG, respectively.
In the KEGG classification, 4722 DEGs were significantly enriched in 42 pathways (Additional file 8). Most genes were correlated to metabolic pathways and biosynthesis of secondary metabolites. Gluconeogenesis was also significantly enriched. Furthermore, the number of up-regulated genes was more than the number of down-regulated genes in the three pathways.
The analysis of biological processes on these DEGs was performed based on GO functional classification. These genes, including specific unigenes in Dendrobium-1 and Dendrobium-2, as well as up-regulated and down-regulated unigenes in Dendrobium-1 vs. Dendrobium-2, were mainly correlated to metabolic and cellular processes and to responses to stimuli (Additional file 9). The main biological process for these DEGs was similar.
TFs have been implicated in a variety of developmental and physiological roles in plants. More TFs have also been isolated and characterized for several plant secondary metabolic pathways. In our D. officinale DEGs database, a total of 627 putative transcripts encoding TFs were identified, including 301 up-regulated unigenes and 326 down-regulated unigenes ( Table 6). They belonged to known TF families, the most abundant being the MYB family, including 82 unigenes. In addition, 75 DEGs belonged to the bHLH family, 66 to the AP2/ERF family, 60 to the WRKY family, 33 to the Homeobox famlily, 30 to the MADS family, 24 to the NAC family, and 23 to the bZIP family. All these TFs have been identified as positive or negative regulators in the biosynthesis of secondary metabolites in other plants (Grotewold et al., 1998;van der Fits and Memelink, 2000).

Illumina Sequencing and Sequence Annotation
D. officinale is a very important traditional Chinese herb within the Orchidaceae. Even though polysaccharides are one of the most important active constituents of D. officinale, little is known about the mechanisms responsible for polysaccharide synthesis and metabolism. The aims of this study were to generate a large amount of cDNA sequence data that would facilitate more detailed studies in D. officinale, and to identify the genes related to polysaccharide synthesis and metabolism. The availability of transcriptome data for D. officinale will meet the initial information needs for functional studies of this species and its relatives. In this study, two RNA-seq was performed using Illumina sequencing, which generated a total of 145,791 unigenes. A total of 67,396 (46.23%) unigenes provided a significant BLAST result. This information far exceeded that reported previously (Guo et al., 2013) and provides more adequate resources to study this Dendrobium species.
Glycosyltransferase Genes and their Differential Expression Patterns in D. officinale D. officinale has a thick and soluble polysaccharide-rich stem. The biosynthesis of polysaccharides involves the action of hundreds of different GTs, which catalyze the transfer of sugar moieties from activated donor molecules to specific acceptor molecules to form glycosidic bonds. At the same time, the glycosylation reactions have a cascading effect, which affect many aspects of plant growth and development. The most recent update of CAZy (http://www.cazy.org/GlycosylTransferases.html) indicates that GTs from diverse species can be classified into 97 families. A total of 463 and 571 GT genes had been listed in A. thaliana and O. sativa, assigned to 42 and 43 families, respectively. We identified 430 possible GTs in the D. officinale transcriptome database that were divided into 35 GT families (Additional file 4).
The category and proportion of GT genes are different in different plants. The function of each GT family also shows differences. GT1 is a major GT family in plants and is commonly known as UDP glycosyltransferase (UGT) (Weis et al., 2008). GT1 plays an indispensable role in the biosynthesis and modification of plant natural products (Jones and Vogt, 2001).  In A. thaliana and O. sativa, GT1 is the major family (26 and 35%, respectively. The second group consists of GT2, GT8, GT31, and GT47 families, each accounting for approximately 6-9% of the genes. In D. officinale, the major GT families (GT1, GT2, and GT41) represent approximately 15-16% each of total GT genes. The third group consists of GT4 and GT8 (5 and 8%, respectively). The amount of GT41 in D. officinale exceeds that in A. thaliana and O. sativa. Moreover, GT51 and GT39 are specific to D. officinale. These specificities may reflect unique metabolic aspects of D. officinale.
In a previous study on D. officinale, polysaccharides were shown to be distributed in all organs, but mainly accumulated in stems while the soluble polysaccharide content changed in different developmental stages (He et al., 2015). The polysaccharide content in the stems of adult plants was higher than in seedlings but the content in the leaves and roots of adult plants was lower than in seedlings (He et al., 2015). In our present study, a total of 170 GTs showed differential expression in a comparison between adult plants and juvenile seedlings, including 70 up-regulated GTs and 100 down-regulated GTs ( Table 3). The up-regulated GTs likely mainly accounted for the synthesis of soluble polysaccharides in adult plants while downregulated GTs were probably used to build plant cell walls and other morphological structures in seedlings in the juvenile stage.
GT1 was the major gene among the down-regulated genes, and GT2 was the major gene among the up-regulated genes. The up-regulated and down-regulated GTs families contained 20 and 19 GT families, respectively. Furthermore, all DEGs, including up-regulated genes, down-regulated genes, as well as specific genes in adult plants or juvenile seedlings, showed similar metabolic pathways and biosynthesis of secondary metabolites (Additional file 9). All these results suggest that a variety of GTs together mediated the synthesis of soluble polysaccharides and the development of morphological structures. More studies  on their expression patterns and functions in the future could be used to elucidate the molecular mechanisms that regulate polysaccharide synthesis and secondary metabolism in D. officinale.

Cellulose Synthase Genes and their Differential Expression Patterns in D. officinale
Soluble polysaccharides are synthesized from monosaccharides such as mannose, glucose, galactose, arabinose, rhamnose, and others (Zha et al., 2007). Mannose is also the major component of polysaccharides from Dendrobium species such as D. officinale, D. huoshanense, D. nobile, D. fimbriatum, and D. chrysotoxum (Fan et al., 2009;Luo et al., 2010;Meng et al., 2013;He et al., 2015). Mannans are also promising bioactive polysaccharides for use in drugs (Alonso-Sande et al., 2009). Many studies have proven that CesA superfamily genes are involved in the biosynthesis of mannan polysaccharides (Liepman et al., 2005;Lerouxel et al., 2006). The CesA superfamily is classified into one cellulose synthase (CesA) family and nine cellulose synthase-like (Csl) families, namely CslA/B/C/D/E/F/G/H/J. Among them, CslF, CslH, and CslJ are specific to monocotyledonous plants while CslB and CslG are found exclusively in dicotyledonous plants (Richmond and Somerville, 2000;Suzuki et al., 2006). Several studies have demonstrated that the Csl families are involved in the biosynthesis of mannan polysaccharides. For example, CslA subfamily members encode β-1,4-mannan synthase (Liepman et al., 2005;Yin et al., 2009), CslC subfamily members encode β-1,4-glucan synthase (Cocuron et al., 2007), while CslF and CslH subfamily members participate in the biosynthesis of β-(1,3;1,4)-D-glucan (Nemeth et al., 2010;Burton et al., 2011;Taketa et al., 2012). The function of the remaining subfamilies members is still unknown. We identified 89 CesA-related genes in the transcriptome database (Additional file 5), which were classified into one CseA family and nine Csl families, including CslB and CslG families. Among these CesA genes, 37 genes showed differential expression between adult plants with juvenile seedlings. These differentially expressed CesA genes only contained five CslA families (Table 5). Furthermore, the up-regulated genes were only found in CslE and CslG families while the down-regulated genes were found exclusively in CslA, CslD, and CslG families. The number of down-regulated genes exceeded that of up-regulated genes. We speculate that these upregulated CslE and CslG family genes might encode some enzyme responsible for the synthesis of mannan polysaccharides in the stem of D. officinale. However, these down-regulated CslA, CslD, and CslG family genes might participate in the synthesis of the backbones of polysaccharides to build plant cell walls and other morphological structures in juvenile seedlings.

Transcription Factors Involved in Polysaccharide Biosynthesis and Other Secondary Metabolism
TFs play diverse roles in regulating the activity of polysaccharide biosynthesis and other secondary metabolism pathways. For   secondary cell wall deposition and to integrate the metabolic flux through the lignin, flavonoid, and polysaccharide pathways in Arabidopsis (Bhargava et al., 2010). Overexpression of PAP1, a MYB TF from Arabidopsis, resulted in strongly enhanced expression of phenylpropanoid biosynthesis genes as well as enhanced accumulation of lignin, hydroxycinnamic acid esters and flavonoids (Borevitz et al., 2000). In our study, 82 MYB TFs were found to be differential expression in both developmental stages, including 35 up-regulated TFs and 47 down-regulated TFs ( Table 6). The up-regulated TFs were probably related with some aspect of secondary metabolism, including polysaccharide and alkaloid biosynthesis, and down-regulated TFs were likely related with morphogenesis, including cell wall formation. In recent years, many WRKY genes have been isolated from medicinal plants and have been shown to play an important role in secondary metabolism. CjWRKY1 from Coptis japonica Makino was the first regulator identified in the biosynthesis of berberine, a benzylisoquinoline alkaloid, and transient expression of CjWRKY1 in C. japonica protoplasts increased the level of transcripts of berberine biosynthetic genes (Kato et al., 2007). AaWRKY1 from Artemisia annua L. could activate Amorpha-4, 11-diene synthase to regulate artemisinin biosynthesis (Ma et al., 2009). SUSIBA2, a WRKY TF from Hordeum vulgare cv. "Pongo, " was participated in sugar signaling by binding to the sugar-responsive elements of the iso1 promoter (Sun et al., 2003). In our study, 60 WARK TFs were discovered in the DEGs database ( Table 6). There have been reports that AP2/ERF family TFs play an important role in plant secondary metabolism. For example, overexpression of the JA-inducible AP2/ERF-domain TF ORCA3 of Catharanthus roseus, led to increased expression of several metabolic biosynthetic genes and consequently increased the accumulation of terpenoid indole alkanoids in suspension cells (van der Fits and Memelink, 2000). The NIC2/ORCA3 ERF subfamily from Nicotiana tabacum was independently recruited to regulate jasmonate-inducible secondary metabolism in distinct plant lineages (Shoji et al., 2010). In our study, 66 AP2/ERF TFs were discovered in the DEGs database and differentially expressed in the two developmental stages: 21 were up-regulated and 45 were down-regulated ( Table 6). In plants, bHLH and bZIP TFs were also isolated and confirmed to regulate secondary metabolism. CrMYC2 belongs to the bHLH TF family and regulates ORCA gene expression, and the AP2/ERF-domain TFs, ORCA2, and ORCA3, in turn regulate a subset of alkaloid biosynthesis genes in C. roseus (Zhang et al., 2011). In our work, 75 HLH TFs were discovered. All these TF types known to be involved in regulating secondary metabolism were found in our dataset. They may play important roles in D. officinale development, stress responses and secondary metabolism.

CONCLUSIONS
D. officinale is a very important Chinese medicinal herb. A total of 145,791 unigenes were obtained in two transcriptome databases of D. officinale, 135 of which were involved in fructose and mannose metabolism. In addition, 430 glycosyltransferase and 89 cellulose synthase genes were identified. Comparative analysis of the transcriptome in juvenile seedlings and adult plants revealed a total of 32,794 DEGs that were mainly correlated with metabolic pathways and the biosynthesis of secondary metabolites. A total of 170 glycosyltransferase genes, 37 cellulose synthase genes and 627 transcription factors showed differential expression. This data could be used to investigate pathways associated with polysaccharide biosynthesis and various secondary metabolites in D. officinale.