Transcriptome Assembly and Systematic Identification of Novel Cytochrome P450s in Taxus chinensis

Taxus spp. is a highly valuable medicinal plant with multiple pharmacological effects on various cancers. Cytochrome P450s (CYP450s) play important roles in the biosynthesis of active compounds in Taxus spp., such as the famous diterpenoid, Taxol. However, some specific CYP450 enzymes involved in the biosynthesis of Taxol remain unknown, and the systematic identification of CYP450s in Taxus has not been reported. In this study, 118 full-length and 175 partial CYP450 genes were identified in Taxus chinensis transcriptomes. The 118 full-length genes were divided into 8 clans and 29 families. The CYP71 clan included all A-type genes (52) belonging to 11 families. The other seven clans possessed 18 families containing 66 non-A-type genes. Two new gymnosperm-specific families were discovered, and were named CYP864 and CYP947 respectively. Protein sequence alignments revealed that all of the T. chinensis CYP450s hold distinct conserved domains. The expression patterns of all 118 CYP450 genes during the long-time subculture and MeJA elicitation were analyzed. Additionally, the expression levels of 15 novel CYP725 genes in different Taxus species were explored. Considering all the evidence, 6 CYP725s were identified to be candidates for Taxol biosynthesis. The cis-regulatory elements involved in the transcriptional regulation were also identified in the promoter regions of CYP725s. This study presents a comprehensive overview of the CYP450 gene family in T. chinensis and can provide important insights into the functional gene studies of Taxol biosynthesis.


INTRODUCTION
Taxol (generic name paclitaxel), the main bioactive component of the Taxus species, is a highly effective anti-cancer agent widely used in the treatment of various sarcomas, melanomas, and carcinomas (Murphy et al., 1993). However, Taxol availability is restricted due to the insufficient natural supply; thus, Taxol supply and cost remain serious concerns because of its increasing requirements in chemotherapy (Cragg et al., 1993). In the near future, Taxol production and its potential precursors must rely on biological methods, either in Taxus tissues or in cell cultures (Ro et al., 2006;Ajikumar et al., 2010;Zhou et al., 2015). Therefore, understanding the biosynthetic pathway of Taxol and the enzymes that catalyze this series of reactions and their underlying molecular mechanism is essential.
The Taxol biosynthetic pathway starts with the cyclization of the universal diterpenoid precursor geranylgeranyl diphosphate to taxa-4(5),11(12)-diene. This taxane core is then decorated with a series of eight Cytochrome P450 (CYP450)-mediated oxidations, three CoA-dependent acylations, and several other transformations that lead to baccatin III, to which the C13side chain is appended to afford Taxol Croteau et al., 2006). Thus, CYP450s play a major role in Taxol biosynthesis (Rasool and Mohamed, 2016). Approximately half of the 20 enzymatic steps of the pathway are thought to be catalyzed by CYP450 oxygenases. The proposed order of the oxygenations on the taxane core begins with C5 and C10, then C13 and C9, and later C7 and C2. The final reactions include the epoxidation of the C4, C20-double bond (leading to the oxetane ring) and C1 oxygenation (Wheeler et al., 2001). In the past few years, studies have concentrated on the molecular biochemistry of Taxol biosynthesis Wang et al., 2016). However, the CYP450 genes responsible for C1 hydroxylation, oxetane formation, C9 oxidation of the taxane core, and C2 ′ -sidechain hydroxylation in Taxus remain unidentified.
CYP450s represent one of the largest gene families and play vital roles in many plant metabolic processes. They catalyze the oxidative modification of various substrates using oxygen and NAD(P)H (Chapple, 1998). Structurally, all plant CYP450s identified so far are membrane-bound enzymes. The vast majority of them are anchored on the endoplasmic reticulum by the hydrophobic signal sequence at the N-terminus (Williams et al., 2000). CYP450 protein sequences contain four unique structures, including the Heme-binding, PERF, and EXXR motifs, located in the K-helix and I-helix. The conserved cystein in the heme-binding motif (F-G-R-C-G) regulates the iron in their heme group. The R amino acid residue of the short string in the PERF motif, together with E and R amino acid residues in the K-helix, forms a salt bridge that is generally considered to be involved in locking the heme pocket into the corresponding position and ensuring the stability of the core structure (Hasemann et al., 1995). CYP450s for all organisms are named and classified by a P450 nomenclature committee (David Nelson: dnelson@uthsc.edu). To differentiate them from other species, plant CYP450s belong to families ranging from CYP71-CYP99, and then from CYP701-CYP999 (Durst and Nelson, 1995;Danielson, 2002). Moreover, CYP450s in plants are classified into two categories: A-type and non-A-type (Paquette et al., 2000). Based on the available sequences, plant CYP450s are further classified into 11 clans (Paquette et al., 2000). Atype CYP450s include only the CYP71 clan, whereas non-A-type CYP450s include 10 clans, namely, CYP51, CYP72, CYP74, CYP85, CYP86, CYP97, CYP710, CYP711, CYP727, and CYP746.
In recent years, the functional characterization of CYP450s involved in terpene biosynthesis has gathered attention. Many studies focusing on the transcriptome-wide identification of CYP450s for terpene biosynthesis have been done. An earlier research performed transcriptomic analyses based on 454 pyrosequencing data from Panax ginseng flowers, roots, stems, and leaves, thereby leading to the identitification of 326 potential CYP450s, including CYP716A47, which is related to the ginsenoside biosynthesis . In addition, ∼300 isotigs similar to CYP450s have been discovered from Salvia miltiorrhiza hairy roots using the RNA-Seq technology, of which six were further studied and the CYP76AH1 was functionally confirmed to catalyze the turnover of miltiradiene in tanshinones biosynthesis . Last year, 70 highly expressed CYP450s in the Xanthium strumarium trichomes were studied using an extensive analysis of transcriptomes. Among them, four CYP71 members (CYP71AV14, CYP71BL7, CYP71DD1, and CYP71AX30) were found to be the candidates involved in sesquiterpene lactone biosynthesis (Li et al., 2016). Transcriptome identification of genes for Taxol biosynthesis has been studied, however, no study has demonstrated interest in the CYP450 genes of Taxus (Wu et al., 2011;Sun et al., 2013). So, a systematic set of CYP450s and a standard nomenclature could be benefit to the functional identification of CYP450s, which might be involved in the biosynthesis of active ingredients in Taxus.
In this study, we established systematic information of CYP450s in Taxus chinensis by mining available transcriptome data. We first identified and classified putative full-length CYP450-encoding sequences. Phylogenetic analysis allowed us to identify groups of paralogs for further evaluation. Next, based on the KEGG database, we investigated the potential involvement in various biosynthesis pathways of these CYP450s, with emphasis on taxane biosynthesis. Moreover, the expression profiles in silico were characterized, some selected CYP450 genes were confirmed by qRT-PCR. And the expression levels of 15 novel CYP725 genes in different Taxus species were also explored. Lastly, the cis-regulatory elements involved in the transcriptional regulation were identified in the promoter regions of CYP725s. Our findings would contribute to advanced research and applications of CYP450 genes in Taxus.

In silico Mining of T. chinense CYP450s
All unigenes (containing contigs and singletons) of assembled T. chinense transcriptomes (Accession Numbers: SRR1343578, SRR1339474, and GSE28539) were obtained, then, they were clustered using CD-HIT software (Version 4.6), and the sequence identity >90% as the cutoff. The HMM model (PF00067) were retrieved from the Pfam database (http://pfam.sanger.ac.uk). After redundant sequences were removed, the HMMER program was used to identify the CYP450s, with an e-value cutoff of 1e-5. The nucleotide sequences of these selected unigenes were subjected to ORF Finder software (http://bioinf.ibun.unal.edu. co/servicios/sms/orf_find.html) for open reading frame (ORF) identification. The full-length CYP450 genes were identified as described by Chen et al. (2014). The search results were further consolidated with previously published genes in Genbank and other existing Taxus transcriptomes, including Taxus × media (SRR534003 and SRR534004), Taxus mairei (SRR350719), Taxus cuspidata (SRR032523).

Phylogenetic Analysis
For comparison, a collection of CYP450s from Picea glauca and corresponding CYP names were obtained from Genbank. Multiple sequence alignment was performed with the ClustalW algorithm-based AlignX module in MEGA 7 software. The phylogenetic tree was constructed by Neighbour-Joining (NJ) method by p-distance in MEGA 7. The significance level for NJ analysis was examined by bootstrapping with 1,000 repeats.

Functional Annotation
Functional annotation was performed by searching T. chinensis CYP450s against the SWISS-PROT, NR, and NT databases, using BLAST with an E-value of 1e-5. The CYP450 genes were also mapped in Kyoto Encyclopedia of Genes and Genomes (KEGG) database to obtain reference metabolic pathways.

Expression Pattern Analysis
To analyze the expression levels of CYP450s in different Taxus cell lines, we used the reported Solexa sequencing libraries from cell lines CA (subcultured for 10 years) and NA (subcultured for 6 months), and Methyl Jasmonate (MeJA)-mediated Taxus cells harvested at 16 h after inoculation (Li et al., 2012;Zhang et al., 2015). Raw expression counts were calculated by FPKM method (reads of fragments per kilobase per million mapped). To identify the differentially expressed genes (DEGs), a greater than two-fold change and a false discovery rate (FDR) ≤ 0.001 were used to determine significant changes in expression. Heatmaps based on raw expression counts were generated with HemI 1.0 (Heatmap Illustrator software, version 1.0). The expression patterns of 17 randomly selected CYP450 genes in CA and NA were validated by qRT-PCR using the same RNA as the Solexa sequencing sample. The RNA were stored at −80 • C.
The expression patterns of 15 novel CYP725 genes in different Taxus species, including T. chinensis, T. cuspidate, and T. media were detected using leaves from 5-year-old plants as sample.
Total RNAs was extracted using the RNAprep Pure Plant kit following manufacturer's protocal (TianGen, Beijing). RNA purity and concentration were detected using a NanoDrop_2000 UV-Vis spectrophotometer (Thermo, USA). Approximately 2 µg total RNA was reverse transcribed using the RevertAid TM First Strand cDNA Synthesis Kit (Thermo, USA). cDNA products were diluted 10-fold prior to use for real-time PCR reaction. Gene specific primers were designed by primer premier 5.0 software, and a housekeeping gene (actin) was chosen as the internal reference gene. All primer sequences were listed in Table S1. qRT-PCR reactions were performed in 10 µl volume containing 1 µl diluted cDNA, 0.3 µM forward primer, 0.3 µM reverse primer, and 5 µl 2×SYBR Green PCR Master Mix (Applied Biosystems). The thermal conditions of the qRT-PCR reactions was 95 • C for 5 min, and 40 cycles of 95 • C for 10 s, 55 • C for 10 s, and 72 • C for 15 s. Each experiment was performed with three biological and technical replicates. The relative expression levels were calculated using the 2 − CT method.

The Construction of T. chinensis Transcriptomes
To globally identify potential genes in T. chinensis transcriptomes were constructed from two Taxus cell lines CA and NA (Zhang et al., 2015), and the MeJA-treated Taxus cells for 16 h (Tm16) and those of mock-treated cells (Tm0; Li et al., 2012), resulting in 67,147 Unigenes with an average length of 910 bp and N50 of 1,552 bp ( Table 1). The GC percentage of the unigenes was 41.37%.

Identification and Classification of T. chinensis CYP450s
A total of 118 full-length and 175 partial CYP450 genes were identified in T. chinensis. The total number of T. chinensis CYP450 genes was 293, which is more than that of Arabidopsis thaliana (272). However, without the whole genome sequence, the strict criteria of all T. chinensis CYP450 genes were unmet. To further screen for full-length CYP450 genes, we performed BLAST search using standard CYP450 domains against other existing Taxus transcriptomes, such as T. media, T. mairei, and T. cuspidata. No new full-length CYP450 gene was discovered, because their sequencing size was highly limited. Then, the 175 partial CYP450s were assembled using previously known genes in Genbank, and no full-length CYP450s were obtained. Classification of the 118 full-length CYP450 genes was executed by alignment with CYP450 database (Nelson, 2009) using standard sequence similarity cutoffs, specifically 97, 55, and 40% for allelic, subfamily, and family variants, respectively. Based on these cutoffs, the 118 full-length CYP450s were classified into 8 clans and 29 families and were divided into two categories: A-type (CYP71 clan) and non-A-type (all other clans; Table 2). Among them, only 6 CYP450s have been previously identified, and the other 112 CYP450s were obtained for the first time (Accession Numbers: MF448573-MF448684).
T. chinensis CYP450s were further compared with three other typical plant species, such as angiosperms A. thaliana and Medicago sativa and the gymnosperm P. glauca (Tables 3, 4). The results showed that CYP750 was the largest A-type family in T. chinensis but absent in A. thaliana and M. sativa. Conversely, CYP71, CYP79, CYP81, CYP82, CYP83, and CYP89 families were found in A. thaliana and M. sativa, but not in T. chinensis and P. glauca. For the non-A-type CYP450s, the CYP725, and CYP866 families were found in T. chinensis and P. glauca, but absent in A. thaliana and M. sativa. CYP72, CYP87, CYP96, CYP714, CYP721, and CYP722 families were only found in A. thaliana and M. sativa. Moreover, two new gymnospermspecific families, named CYP864 and CYP947, were discovered in T. chinensis. The above results showed that distinct genetic differences exist between the gymnosperm and angiosperm, and P. glauca is a good reference for T. chinensis in comparative studies of CYP450 genes.

Phylogenetic Analysis
The sequences of P. glauca CYP450 proteins and the 118 full-length T. chinensis CYP450 proteins were used to construct NJ phylogenetic trees for A-type ( Figure 1) and non-A-type (Figure 2) CYP450s, separately, using the MEGA7 package. The results showed that P. glauca subfamilies are grouped with T. chinensis CYP450s. Based on phylogenetic trees, 44.1% (52 genes) of the 118 full-length CYP450s are A-type and belong to 11 families. The remaining 55.9% (66 genes) CYP450s are non-A-type and are distributed to 18 families and 7 clans. The A-type CYP450s (71 clan) have been identified to be related to the biosynthesis of secondary compounds. The Figure 2 showed that non-A-type CYP450s include a more diverse group of genes belonging to the remaining 7 clans. These genes involved in the metabolic pathways of primary products (such as carotenoid, oxylipin, etc.), plant hormone and secondary products. 5 CYP450s have been previously identified to be involved in Taxol biosynthesis, such as TcCYP725A1 (taxane 5-alpha-taxadienol-10-beta-hydroxylase, T10βH), TcCYP725A2 (taxane 13-alpha-hydroxylase, T13αH), TcCYP725A4 (taxadiene 5-alpha-hydroxylase, T5αH), TcCYP725A5 (taxoid 7-beta-hydroxylase, T7βH), and TcCYP725A6 (taxoid 2alpha-hydroxylase, T2αH). In addition, TcCYP725A3, which is highly homologous with taxane 14β-hydroxylase (T14βH) in T. cuspidata. It is suggested that the CYP725A subfamily underwent independent evolution to carry its unique function.

Physicochemical and Structural Analyses of T. chinensis CYP450s
The physicochemical parameters of each CYP450 gene were calculated using ExPASy. Most of the members had relative molecular weights close to 55 kDa. Approximately four-fifths of the CYP450 proteins had relatively high isoelectric points (pI > 7); the remaining proteins, particularly those in CYP736 family, had pI < 7. TargetP was used to predict the localizations of 118 T. chinensis CYP450 proteins, most of them were predicted to be anchored on the endoplasmic reticulum. Hitherto, no plant CYP450s have been found to be located in the mitochondria.
All typical conserved structures of CYP450 proteins were present in non-A-type T. chinensis CYP450s ( Figure S1), including the cysteine heme-iron ligand signature motif (with PFG element), the PERF motif, K-helix region, and I-helix region. Interestingly, the conserved I-helix region did not exist in A-type T. chinensis CYP450s. For the heme-binding motifs, the Atype CYP450s displayed the signature "PFGxGRRxCxG, " whereas "xFxxGxRxCxG" was found in non-A-type CYP450s. Consistent with previous studies, PERF motifs of A-type and non-A-type CYP450s were different. In T. chinensis, PERF motifs were "PERF" for A-type and "FxPx" for non-A-type. Moreover, the EXXR motif of A-type was consistent with non-A-type CYP450 proteins. These elements ensure structural stability and flexibility, thereby enabling proteins to bind to appropriate substrates.

Annotation of T. chinensis CYP450s
KEGG pathway-based analysis was performed to further understand the functions of the CYP450 genes. In total, the 118 full-length CYP450s were assigned to 16 KEGG pathways ( Figure S2). Significantly, only 2 CYP450s were mapped to the diterpenoid biosynthesis pathway, including TcCYP729B25 and TcCYP701A59. CYP701 is related to the biosynthesis of diterpenoid acids, gibberellins.
Information on the specific metabolic pathways in gymnosperms was highly limited, resulting the CYP725s that are related to diterpenoid Taxol biosynthesis were mapped to the carotenoid biosynthesis pathway. The most represented CYP725 family in Taxus (Table 4) plays an important role in the biosynthesis of the diterpenoid anti-cancer drug, Taxol. Moreover, the acquired T2αH, T5αH, T7βH, T10βH, T13αH, and T14βH have high sequence similarity with each other. With an amino acid sequence similarity higher than 70%, the taxoid CYP450 monooxygenases are more conserved than any other known plant of the CYP450 type. Taxoid hydroxylases, with their unique structures and substrate selectivities, form an especially cohesive group. The novel CYP725 proteins identified in this study may be related to Taxol biosynthesis. However, the functional importance of these proteins remains to be determined.

Expression Profile Analysis of CYP450s
Illumina transcriptome sequencing technology was used to analyze the gene expressions partners of all 118 full-length Frontiers in Plant Science | www.frontiersin.org    CYP450s. These data sets were generated from total RNAs isolated from two Taxus cell lines, CA and NA, and MeJAmediated Taxus cells harvested 16 h after inoculation (Li et al., 2012;Zhang et al., 2015). From these Illumina data, all 118 full-length CYP450 genes expressed in different Taxus cell lines, CA and NA, were hierarchically clustered using HemI Heatmap Illustrator v1.0   software (Figure 3, Table S2). According to previous research (Song et al., 2014), contents of secondary metabolites in NA were significantly higher than in CA. The amount of Taxol was 1.88 times higher than that in CA. The downregulation of secondary metabolites in CA may be due to the decreased activity of specific enzymes, including CYP450 monooxygenases. Figure 4 showed highly different expression profiles in NA and CA. TcCYP73A171, TcCYP74A74, TcCYP75A77, TcCYP75B115, TcCYP76AA71, TcCYP76Z4, TcCYP728Q12, TcCYP736E22, and TcCYP750C20 were lowly expressed in CA, but were highly expressed in NA. Moreover, TcCYP76AA67, TcCYP90A54, and TcCYP750B2 members were not detected in CA. CYP450 oxygenases that potentially related to the Taxol biosynthesis were mainly analyzed. The identified transcripts involved in the Taxol biosynthetic pathway and their specific expression levels in CA and NA were shown in Figure 4. Most enzymes of the native methylerythritol phosphate (MEP) pathway were highly expressed in NA (Table S3). Co-expression with a Taxol biosynthesis marker gene, taxadiene synthase (TS), the acquired taxoid hydroxylases were also highly expressed in NA. The expression levels of T2αH and T7βH greatly increased more than 10-fold, T10βH and T13αH increased two-fold, and T5αH increased slightly. In this study, 10 novel CYP725A genes (TcCYP725A9, TcCYP725A10, TcCYP725A11, TcCYP725A16, TcCYP725A18, TcCYP725A19, TcCYP725A20, TcCYP725A21, TcCYP725A22, and TcCYP725A23) showed higher expression levels in NA. Previous research confirmed that taxane biosynthesis is regulated by MeJA elicitation in T. chinensis cells (Li et al., 2012). Transcriptome profiles of T. chinensis cells at 16 h (Tm16) FIGURE 2 | Phylogenetic tree of non-A-type CYP450 proteins from T. chinensis (Tc) and P. glauca (Pg). The spokes corresponding to Tc and Pg CYP450s are shown in blue and black, respectively. The genes marked in red are described in the text.
To verify expression profiles obtained from Illumina sequencing, we performed qRT-PCR on 17 randomly selected CYP450 genes (Figure 6, Table S5). Consistent with the Illumina data, most genes showed strong expression levels in NA. The expression fold changes of some genes, such as TcCYP73A170, TcCYP716B29, TcCYP750C3, and TcCYP716B were higher than RNA-seq results. qRT-PCR results validated that the RNA-seq data is reliable.
The different expression levels of 15 novel CYP725 genes among three different species such as T. chinensis, T. cuspidate, and T. media were also investigated (Figure 7). The qRT-PCR profiles showed that CYP725 genes had different expression profiles in different Taxus species. All genes but CYP725A13 and CYP725A19 showed a low expression level in T. chinensis. Six genes (TcCYP725A9, TcCYP725A11, TcCYP725A16, TcCYP725A20, TcCYP725A22, and TcCYP725A23) showed the highest expression level in T. media, followed by T. cuspidata, and little amount in T. chinensis. The taxol content in FIGURE 3 | Expression pattern of T. chinensis CYP450s in CA and NA cell lines according to the analysis of RNA-Seq dataset. The color scale shows the expression quantity (red: high expression; blue: low expression). Heat map was created using HemI. Heatmap Illustrator v1.0. "⋆" indicates the known taxoid hydroxylase genes, and " " indicates the candidate genes.

Identification of Candidates CYP450s Involved in Taxol Biosynthesis
CYP450s that were potentially related to Taxol biosynthesis were further identified based on the three following criteria: (1) belonging to CYP725 family. To date, the CYP450 enzymes identified in Taxol biosynthesis are primarily CYP725s, (2) expression level was corresponding with the known Taxol biosynthesis hydroxylases CYP725A1, CYP725A2, CYP725A4, CYP725A5, and CYP725A6, (3) expression profile was consistent with Taxol content. Among the 118 full-length CYP450s, 15 novel genes belong to the CYP725 family (from CYP725A9 to CYP725A23). More importantly, TcCYP725A9, TcCYP725A11, TcCYP725A16, TcCYP725A20, TcCYP725A22, and TcCYP725A23 were highly expressed in NA and upregulated after MeJA elicitation. The expression pattern of these six CYP725s were similar to the known Taxol biosynthesis hydroxylases. Interestingly, all these six CYP725 genes were expressed at the highest level in T. Media, and the lowest in T. chinensis. Their expression rule were coincident with Taxol content in Taxus organs. Therefore, they are likely to involved in Taxol biosynthesis, and their specific functions require further study.

Cis-Regulatory Elements in the Promoters of TcCYP725s
As Taxus genome information was not complete, only the promoter fragments of 5 TcCYP725s were identified (Accession Numbers: MF598831-MF598835). In addition to the common cis elements CAAT-box and TATA-box, 14 types of cis-acting elements in the TcCYP725s were discovered (Figure 8). The Skn-1 motif contributed to gene expression in the endosperm, and the CAT-box was required for gene expression in the meristem. The common cis-acting elements G-box and TG-box were responsive to light. All the other cis-regulatory elements are related to stresses and hormones, such as TC-rich repeats (required for defense and stresses), the TCA element (salicylic acid response), MBS (drought inducibility), HSE (heat response), W-box (responsive to fungal elicitors and plant hormones), the GARE motif (gibberellin response), the TGACG motif (MeJA response), ABRE (abscisic acid response), TGA (auxin response), and ERE (ethylene response). The Skn-1 motif is present in most of CYP725 genes, indicating that CYP725s play important roles in immature tissues. In addition, the W-box was widely found in CYP725 genes. Previous study showed that TcWRKY1 protein regulated Taxol biosynthesis in T. chinensis cells by specifically interacting with the two W-box of 10deacetylbaccatin III-10β-O-acetyl transferase (DBAT; Li S. et al., 2013). These putative cis-acting elements would contribute to future researches of the transcriptional regulation of Taxol biosynthesis.
centralized resource, systematic nomenclature, and biological functions, has significantly hampered research efforts to elucidate biosynthetic pathways for the medicinal ingredients in Taxus. To our knowledge, this study was the first to overcome these limitations by (1) identifying a large set of CYP450s; (2) establishing a systematic nomenclature for these CYP450s; (3) mining the candidate CYP450s that are likely related to Taxol biosynthesis; and (4) analyzing the cis-regulatory elements in the promoters of CYP725 genes to provides useful information to the transcriptional regulation of Taxol biosynthesis. Heat map was created using HemI. Heatmap Illustrator v1.0. "⋆" indicates the known taxoid hydroxylase genes, and " " indicates the candidate genes.

T. chinensis CYP450s Identified in This Study
Totally, 118 full-length and 175 partial CYP450 genes were identified in T. chinensis. The number of CYP450 genes in T. chinensis is of the same order of magnitude as the number of CYP450s found in other gymnosperm and angiosperm plants, that is, 307 in P. glauca, 272 in A. thaliana, 332 in soybean, 334 in flax, and 455 in rice (Guttikonda et al., 2010;Babu et al., 2013;Warren et al., 2015). The previously identified five genes involved in Taxol biosynthesis, such as TcCYP725A1 (T10βH, AAN52360.1), TcCYP725A2 (T13αH, AAX59903.1), TcCYP725A4 (T5αH, AAU93341.1), TcCYP725A5 (T7βH, AAR21106.1), and TcCYP725A6 (T2αH, AAV54171.1), were included in the 118 full-length CYP450s. Moreover, TcCYP725A3, which is highly homologous with T14βH (Accession No. Q84KI1) in T. cuspidata, was explored in this study (Jennewein et al., 2003). This inferred that the CYP450s data in our research were representatives, and the remaining 112 novel full-length CYP450 sequences were of great value for the advanced research and applications of CYP450s in Taxus.

Gymnosperm-Specific CYP450s in T. chinensis
Some known gymnosperm-specific CYP450 subfamilies were discovered in T. chinensis. In the CYP71 clan, 8 CYP76AA and 18 CYP750 gymnosperm-specific members were identified among the 118 full-length CYP450s. Previous research revealed that the gymnosperms CYP76AA25 and CYP750B1 can catalyze the hydroxylation of sabinene to trans-sabin-3-ol from Thuja plicata (Bohlmann et al., 2015). The CYP85 clan, which includes many gymnosperm CYP450s, is involved in the biosynthesis of plant metabolites (Hamberger and Bohlmann, 2006;Zerbe et al., 2013): the CYP720B is a conifer-specific subfamily with four members in T. chinensis, and the CYP720B4 has been characterized in the biosynthesis of dehydroabietic acid, an ingredient related to the insect resistance of Picea sitchensis (Hamberger et al., 2011); the CYP716B is a gymnosperm-specific subfamily, with three members found in T. chinensis, and the sole functionally determined CYP716B gene is a taxoid 9αhydroxylase in Ginkgo biloba . Furthermore, two new gymnosperm-specific families were discovered in this study, namely, CYP864 in the 72 clan and CYP947 in the 85 clan (Communicated with Professor David Nelson, unpublished work. The gymnosperm CYP450 names have just been expanded by naming transcriptome data from the 1KP project). Two new gymnosperm-specific families that were recently discovered in P. glauca, including 71 clan family CYP867 and 72 clan family CYP866, were also found in T. chinensis (Warren et al., 2015). The exact functions of these new gymnosperm-specific families require further investigation.

Potential Candidate CYP450s Involved in Terpenoid Biosynthesis
Terpenoids are one of the most widespread classes of secondary metabolites in higher plants, which are biosynthesized from basic isoprene units (C 5 H 8 ) and further modified by various oxidoreductases, acyltransferases, dehydrogenases, and glucosyltransferases. CYP450-dependent oxidative modification is essential for the terpenoid biosynthesis. Hitherto, more than 50 CYP450 genes, which belong to CYP51, CYP71, CYP72, CYP76, CYP88, CYP93, CYP97, CYP701, CYP705, CYP706, CYP707, CYP714, CYP716, CYP720, CYP725, CYP735, and some unassigned families related to the biosynthesis of terpenoids in medicinal plants have been identified . The structural diversity of terpenoid compounds depends on the rearrangement modifications of their skeletal structures and extensive oxidative modification (Zerbe et al., 2013). Therefore, it shouldn't be surprising that so many CYP450s families have been found related to their biosynthesis.
Taxus spp. specifically employs up to eight CYP450-mediated oxidatives to create the diterpene, Taxol Kaspera and Croteau, 2006). So far, the C-2, C-5, C-7, C-10, and C-13 hydroxylases have been successfully obtained Schoendorf et al., 2001;. Unfortunately, the genes responsible for C-1 hydroxylation, oxetane formation, C-9 oxidation, and C-2 ′ hydroxylation remain unknown. In this study, an intriguing result was that 6 CYP725s (TcCYP725A9, TcCYP725A11, TcCYP725A16, TcCYP725A20, TcCYP725A22, and TcCYP725A23) were found to belong to the candidates that were involved in Taxol biosynthesis. By a blast search, TcCYP725A9, TcCYP725A11, TcCYP725A16, TcCYP725A22, and TcCYP725A23 were found to show high sequence similarity (>66%) to the T10βH (Schoendorf et al., 2001). The T10βH transformed taxadien-5α-yl acetate to taxadien-5α-acetoxy-10βol, which was an important intermediate in the biosynthesis of Taxol. In addition, the BLAST analysis of TcCYP725A20 revealed that the most homology (63%) found in public databases was with T13αH, an enzyme capable of hydrolyzing the taxadien-5αol at its C-13 position. The above information may also suggested that these 6 CYP725 genes were likely candidate contribute to the formation of Taxol.

Future Plans
Because the whole genome information of T. chinensis is unavailable, 5 ′ Race and 3 ′ Race can amplify numerous fulllength CYP450 genes based on the current study. Functional  predictions of the candidate CYP725s will be performed by heterologous expression in yeast. Linking in vivo feeding studies to cell-free enzyme systems, with available taxanes or suspension-cultured cells as ingredients, will enable better understanding of the Taxol biosynthetic pathway. Ultimately, we can reconstruct this secondary metabolite pathway in a microbial system to establish the engineered production of Taxol. Moreover, the further researches of the transcriptional regulation of Taxol biosynthesis will be done, based on the putative cis-acting elements in the promoters of taxoid hydroxylase.

AUTHOR CONTRIBUTIONS
Conceived and designed the experiments: CF, LY, WL. Performed the experiments: WL, SZ, KD. Analyzed the data: MZ, KD. Contributed reagents/materials/analysis tools: SZ, YC. Wrote the paper: WL.