Transcriptomic Analysis Reveals Key Genes Related to Betalain Biosynthesis in Pulp Coloration of Hylocereus polyrhizus

Betalains have high nutritional value and bioactivities. Red pulp pitaya (Hylocereus polyrhizus) is the only fruit containing abundant betalains for consumer. However, no information is available about genes involved in betalain biosynthesis in H. polyrhizus. Herein, two cDNA libraries of pitaya pulps with two different coloration stages (white and red pulp stages) of Guanhuahong (H. polyrhizus) were constructed. A total of about 12 Gb raw RNA-Seq data was generated and was de novo assembled into 122,677 transcripts with an average length of 1183 bp and an N50 value of 2008. Approximately 99.99% of all transcripts were annotated based on seven public databases. A total of 8871 transcripts were significantly regulated. Thirty-three candidate transcripts related to betalain biosynthesis were obtained from the transcriptome data. Transcripts encoding enzymes involved in betalain biosynthesis were analyzed using RT-qPCR at the whole pulp coloration stages of H. polyrhizus (7-1) and H. undatus (132-4). Nine key transcripts of betalain biosynthesis were identified. They were assigned to four kinds of genes in betalain biosynthetic pathway, including tyrosinase, 4, 5-DOPA dioxygenase extradiol, cytochrome P450 and glucosyltransferase. Ultimately, a preliminary betalain biosynthetic pathway for pitaya was proposed based on betalain analyses, gene expression profiles and published documents.


INTRODUCTION
Plant pigments are classified into four categories i.e., chlorophyll, carotenoid, anthocyanin, and betalains. Betalains not only play important roles in ranging in color from red to violet to yellow in plants, but also have high nutritional value and positive effects in health and disease by high antioxidant and anti-inflammatory capabilities (Swarna et al., 2013;Allegra et al., 2014;Lee et al., 2014;Clifford et al., 2015;Martinez et al., 2015). It is interesting that betalains and anthocyanin cannot co-exist naturally in one plant at the same time (Stafford, 1994). In contrast to anthocyanins and carotenoids, the biosynthetic pathway of betalains is only partially understood (Gandia-Herrero et al., 2005a,b;Tanaka et al., 2008). Studies on betalain biosynthesis are mainly focused on characterizations of several key enzymes, such as tyrosinase (TYR), 4, 5-DOPA-dioxygenase (DOD), and glucosyltransferases (GT). Betalain biosynthetic pathways have been determined or inferred based on characterization and activity analyses of those key enzymes (Gandia-Herrero and Garcia-Carmona, 2013).
Transcription factors or regulatory genes are also involved in betalain-producing plants. MYB1 (BvMYB1) was verified to regulate betalain pathway in B. vulgaris (Hatlestad et al., 2015). Stracke et al. (2014) obtained 70 R2R3-MYB genes as well as genes encoding three other classes of MYB proteins containing multiple MYB repeats from B. vulgaris. Those R2R3-MYB genes were functionally categorized which led to the identification of a sugar beet-specific clade with an atypical amino acid composition in the R3 domain, putatively encoding betalain regulators (Stracke et al., 2014).
Dragon fruit or pitaya is one of the tropical fruits under Hylocereus of the Cactaceae. Currently, there are two types of pitayas i.e., red-flesh pitaya (H. polyrhizus) and white-flesh pitaya (H. undatus) that have been commercially produced by the large-scale as a new type fruit crop. Pitaya has been drawn much attention of the world for its commercial value and excellent nutritional properties (Adnan et al., 2011). Previous studies showed that major pigment compounds of pitaya are water-soluble betalains in term of red-violet betacyanins and yellow/orange betaxanthin (Stintzing et al., 2002). To date, red pulp pitaya (H. polyrhizus) is the only fruit containing betalains for consumer. Pitaya pigment has attracted much attention for its antioxidant and antiproliferative activities and become one of the hot research topics in the world. In the past decades, great progress has been made in pitaya betalain in term of physical and chemical properties (Esquivel et al., 2007;Woo et al., 2011), purification and identification (Stintzing et al., 2002;Wybraniec et al., 2009;Naderi et al., 2010;Rebecca et al., 2010a,b;Lim et al., 2011), antioxidant and radical scavenging capacity (Wu et al., 2006;Tenore et al., 2012;Garcia-Cruz et al., 2013). Several tentative betalain biosynthesisrelated compounds were identified and compared according to metabolite profiling of red pulp (H. polyrhizus) and white pulp (H. undatus) pitayas (Suh et al., 2014). However, genes related to betalain biosynthetic pathway in pitaya is not clear yet. RNA-Seq is a technique that allows rapid and comprehensive understanding of transcriptome level of variations based on nextgeneration sequencing technologies. The application of RNA-seq has accelerated gene expression profiling and gene identification in many plant species. In this study, RNA-Seq technology was used to screen key genes related to betalain biosynthesis in pulp coloration of Guanhuahong (H. polyrhizus). The aim of the present study is to isolate key genes encoding steps in betalain biosynthetic pathway in pitaya.

Plant Materials
One varieties i.e., Guanhuahong (H. polyrhizus) and two superior pitaya selections i.e., 7-1 (H. polyrhizus) and 132-4 (H. undatus) with excellent quality were used as materials. Plants were cultivated in Dalingshan Forest Park, Dongguan City, Guangdong Province, China. The same pulps from Guanghuahong were collected for RNA-Seq and data validation by RT-qPCR on the 28 th (white pulp stage) and 42 nd days (red pulp stage) after artificial pollination (DAAP) (Figures S1A,B) in October and November, 2013. The temperature was 16-32 • C with relatively dry climate. Two transcriptome libraries were generated by pooling equal quantities of RNA from each of three fruit developmental stages. Each of these libraries consisted of equal amounts of RNA from three biological replicates of each developmental stage. A pool of pulps from three fruits of 7-1 and 132-4 were collected from four different plants for betalain measurement and expression analyses of candidate transcripts related to betalain biosynthesis on the 19 th , 22 nd , 23 rd , 24 th , 25 th , 26 th , 27 th , and 28 th (Figures S1C,D) DAAP in July and August, 2014. The temperature is 24-36 • C with relatively humid climate. All samples were immediately frozen in liquid nitrogen and stored at -80 • C until use.

Measurement of Betalains
Betalain contents were extracted and measured following the method of Garcia-Cruz et al. (2013) with a minor modification. 0.5 g fresh pulps were ground into fine powder in liquid nitrogen and extracted with 5 mL 80% aqueous methanol (v/v) solution. Samples were sonicated for 10 min in a ultrasonic cleaner (SB25-12DT, Ningbo, China) and then stirred for 20 min in darkness at room temperature. Supernatants were collected at 2200 × g for 10 min and the residues were subjected to a similar second extraction. The supernatants were measured through spectrophotometry (Infinite M200, Tecan Co.). Betacyanin and betaxanthin contents were calculated by the following formulas: betacyanins or betaxanthin contents (mg/100 g fresh pulps) = (A 538 or A 483 × DF × W × V × 100)/(ε × P × L). A 538 is absorbance for betacyanins at 538 nm (max); and A 483 is absorbance for betaxanthins, DF is a dilution factor, W is the molecular weight (550 g/mol for betanin and 308 g/mol for indicaxanthin); V is the pigment solution volume (ml); ε is the molar extinction coefficient (60,000 L/mol·cm for betanin and 48,000 L/mol·cm for indicaxanthin), and L is the length of the cell (1 cm). P is the fresh pigment weight (g). All determinations were performed in triplicate.

Experimental Procedures of RNA-Seq
Total RNA Extraction, Library construction, and ILLUMINA Deep Sequencing Total RNA extraction, library construction and RNA-Seq were performed by Bo'ao Biotechnology Corporation (Beijing, China). RNA samples were extracted using the TruSeq RNA Sample Preparation Kit (Autolab Biotechnology, Beijing) according to the manufacture's protocol. RNA quality and quantity were analyzed by 1% agarose gel and NanoDrop ND1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA). The RNA integrity number (RIN) values (>7.0) were assessed using an Agilent 2100 Bioanalyzer (Santa Clara, CA, USA). Briefly, the poly-A containing mRNA molecules were purified from 3.0 µg total RNA using poly-T oligo-attached magnetic beads. The cleaved RNA fragments were reversely transcribed into the first-strand cDNA using random hexamers, following by the second-strand cDNA synthesis using DNA polymerase I and RNase H. The cDNA fragments were purified, end blunted, "A" tailed, and adaptor ligated. PCR was used to selectively enrich those DNA fragments that have adapter molecules on both ends and to amplify the amount of DNA in the library. The number of PCR cycles was minimized to avoid skewing the representation of the library. The two libraries were qualified by Agilent 2100 bioanalyzer and quantified by Qubit and RT-qPCR. The produced libraries were sequenced on the HiSeq 2500 platform.

De novo Transcriptome Data Processing and Assembly
A Perl program (Bo'ao Biotechnology Corporation, Beijing, China) was used to filter out low quality sequences from raw sequencing data. The quality of each base was checked from the first base of each read. Once a low-quality base (quality < 10) was detected, it was removed together with following sequences. For paired-end reads, if one read was less than 30 bases after the trimming of low quality bases, the whole pairedend reads were eliminated. Then the high quality reads (High-Quality ≥ 10, Length Cutoff ≥ 30 bp) were assembled with software trinityrnaseq-r2013-11-10 ( Grabherr et al., 2011) to construct unique consensus sequences.

Detection of Differentially Expressed Unigene
The edgeR software (Robinson et al., 2010) was used to identify differentially expressed genes between the two libraries based on the negative binomial distribution by empirical Bayes estimation and exact tests. Genes with a P-value ≤ 0.01 and expression ratio ≥ 2 or expression ratio ≤ 0.5 were considered significant difference between the two libraries.

RT-qPCR Validation of Differentially Expressed Transcripts
Some differentially expressed genes were selected to validate the accuracy of RNA-Seq data using RT-qPCR. All betalains-related transcripts were further elucidated their expression patterns at all pulp coloration stages of 7-1 and 132-4. The pitaya actin gene was used as the internal control for normalization of gene expression (Yan et al., 2013). The primers of RT-qPCR (Table S1) were designed using BatchPrimer3 v1.0 (http://batchprimer3. bioinformatics.ucdavis.edu/index.html). Total RNA was isolated from pulp the 19 th , 22 nd , 23 rd , 24 th , 25 th , 26 th , 27 th , and 28 th (Figures S1C,D) DAAP pulp from 7-1 and 132-4 using the Quick RNA isolation Kit (0416-50) (Huayueyang Biotechnology, Beijing) according to the manufacture's protocol, respectively. After treated with DNase I (TaKaRa, Japan), single-stranded

Betacyanin and Betaxanthin Contents At All Pulp Coloration Stages of Pitaya
Betalain contents were determined at all stages of pulp coloration in H. polyrhizus and H. undatus. As shown in Figure 1, higher contents of betacyanin and betaxanthin were detected in H. polyrhizus (7-1) than that of H. undatus (132-4). Betacyanin and betaxanthin contents were increasing at all stages of pulp coloration in H. polyrhizus (7-1) while there were no significant changes in H. undatus (132-4) (Figure 1).

Sequencing and De novo Assembly of Transcriptome
As shown in Table 1, two libraries of the red and white pulp stages produced 67,123,022 and 63,770,742 raw reads (NCBI accessions: SRR2924904), respectively. A single read length was 100 bp. After filtering out low quality reads, 62,639,274 reads (88.71% of the raw data) and 59,004,932 reads (87.33% of the raw data) were obtained, respectively from red and white pulp libraries. Q20 are 96.28 and 95.67%, respectively. The high-quality reads were assembled into 122,677 transcripts with an average length of 1183 bp and an N50 value of 2008 by Trinity software. The range of transcripts length was 201-12,560 bp. The length distribution of these transcripts was shown in Figure S2. The percentages of mapping to transcript were 89.69 and 87.37%, respectively.

Analysis of Differentially Expressed Transcripts Between White and Red Pulp Stages
As shown in Figure 3 and Table S2, 117,185 transcripts were distributed in red pulp library, and 116,582 were in white pulp library. 6086 and 5483 transcripts independently existed in red and white pulp libraries, respectively. Differentially expressed transcripts were identified by comparisons with the two libraries using p ≤ 0.01 and fold changes (ratio ≥ 2 or ratio ≤ 0.5) as thresholds. A total of 8871 transcripts were considered as significantly differentially expressed between the white and red pulp libraries and were assigned to 241 KEGG pathways (File S2). 4107 were up-regulated transcripts. The differentially expressed transcripts were classified into three categories in GO assignments: cellular component, molecular function and biological process (Figure 4). As expected, the differentially expressed transcripts involved in pigmentation were presented in the biological process. In flavone and flavonol biosynthesis (ko00944), differentially expressed transcripts were focused on flavonol 3-O-methyltransferase (E2. 1.1.76, up-regulated expression, comp37084_c6_seq1, comp37084_c6_seq2, comp37385_c0_seq2, comp37385_c0_seq1, comp34873_c0_seq1, comp31547_c0_seq1) and flavonoid 3 ′ -monnooxygenase comp (E1.14.13.21, down-regulated expression, comp32441_c0_seq1). There were 9 up-regulated  transcripts and 10 down-regulated transcripts in flavonoid biosynthesis. All up-regulated transcripts were annotated as chalcone isomerase (CHI) and chalcone synthase (CHS). Dihydroflavonol reductase (DFR), anthocyanidin synthase (ANS), and leucoanthocyanidin reductase (LAR) were downregulated expression genes in pitaya which is consistent with the conjecture of betalain-producing plants (Shimada et al., 2005).

Genes Involved in Betalain Biosynthesis
An integrative pathway of betalain biosynthesis was outlined according to differentially expressed transcripts (Figure 5). Two methods were used to collect genes involved in betalain biosynthesis pathway in H. polyrhizus. One is that the annotated transcripts were searched based on standard gene names and synonyms derived from comprehensive pathway from the KEGG pathway (http://www.kegg.jp/kegg-bin/search_pathway_ text?) and a review from combined functional annotations. A total of 106 transcripts were assigned to the pathway (File S3A). The other way is that key gene sequences were downloaded from NCBI database to align with the two libraries by assembling de novo. Forty nine transcripts were aligned to the key genes of those published paper using p < 0.001 as a threshold (File S3B). Thirty-three transcripts related to betalain biosynthesis were identified from the white and red pulp transcriptome database ( Table 2). Eighteen transcripts including tyrosine 3-monooxygenase, 4, 5-DOPA dioxygenase extradiol, cytochrome P450 and GT were significantly upregulated expression. Fifteen were down-regulated including transcripts encoding TYR and DOPA decarboxylase. Nine transcripts (comp19031_c1_seq1, comp29696_c0_seq1, comp29696_c1_seq1, comp29696_c1_seq2, comp29696_c1_seq3, comp29696_c1_seq4, comp29696_c2_seq1, comp30986_c0_seq1, comp37692_c0_seq1) annotated as DOD were obtained from the two libraries. They all were up-regulated expression in red pulp stage except comp19031_c1_seq1 (File S3C). Phylogenetic analysis showed that comp37692_c0_seq1, comp29696_c0_seq1, and comp30986_c0_seq1 belonged to the same type related to betalain biosynthesis. By contrast, comp29696_c1_seq1, comp29696_c1_seq2, and comp19031_c1_seq1 shared relatively low similarity with the other transcripts (Figure 6). These results suggested that comp37692_c0_seq1, comp29696_c0_seq1, and  comp30986_c0_seq1 are putative candidate transcripts encoding DOD. The roles of TYR were divided into two branches in betalain biosynthetic pathway i.e., dopaxanthin and cyclo-DOPA syntheses. Seven transcripts were identified from the transcriptome database. Among the 7 transcripts, 6 down-regulated transcripts (comp37375_c0_seq1, comp37375_c0_seq2, comp37375_c0_seq3, comp37375_c0_seq4, comp37375_c0_seq5, comp37375_c0_seq6) were hypothesized to be polyphenol oxidase (equivalent TYR) while one up-regulated transcript (comp37674_c0_seq1) was annotated as tyrosine 3-monooxygenase (equivalent tyrosine hydroxylase) (File S3C).
Twelve transcripts were obtained based on reported results of B5GT, B6GT, and CDOPA5GT. Only one transcripts (comp32889_c0_seq1) encoding B6GT had a relatively higher expression level in red than white stage (File S3A). Expression FIGURE 5 | A diagram of the comprehensive betalains biosynthesis pathway in plant from KEGG pathway and Gandia-Herrero's review (Gandia-Herrero and Garcia-Carmona, 2013). A preliminary betalain pathway in H. polyrhizus was shown by green arrows. x, tyrosinase hydroxylation activity; y, tyrosinase oxidation activity. levels of B5GT (2 transcripts) and CDOPA5GT (6 transcripts) were relatively low from white to red pulp stage (File S3A). However, no significant difference in B5GT and CDOPA5GT was detected between the two stages.
Eight transcripts encoding DDC in decarboxylated betalains biosynthesis had down-regulated expression patterns. Very low expression levels of comp38514_c0_seq1 and comp38987_c0_seq1 were detected in the two stages, suggesting they are irrelevant to betalain biosynthesis. COMT was encoded by up-regulated transcripts (comp32369_c0_seq2). But comp32369_c0_seq2 had a low expression level in the two stages (File S3C).
As a whole, a preliminary pathway including some candidate transcripts was proposed according to betalain analyses and differentially expressed transcripts in combination with published documents (Figure 1, File S4, and Figure 5). Most up-regulated transcripts were mainly focused in one pathway and distributed on DOD, cytochrome P450 and GT (Figure 5).
Verification of the Accuracy of the RNA-Seq Data Using RT-qPCR Ten unknown transcripts with significant difference between the two stages and 14 transcripts involved in betalain biosynthesis were selected to validate the accuracy of RNA-Seq data by RT-qPCRs (Figure 7). Expression patterns of these 24 transcripts were consistent with the RNA-Seq data. These results suggested that RNA-Seq data is credible and can be used for subsequent experiments. The IDs, RPKM value (Reads per kilobase of exon model per million mapped read), primers of the 24 transcripts were shown in File S3 and Table S1.

Expression of Candidate Transcripts Related to Betalain Biosynthesis at All Pulp Coloration Stages
To elucidate roles of candidate transcripts in betalain biosynthetic pathway, the expression levels of candidate transcripts related to betalain biosynthesis were detected at all stages of pulp coloration in H. polyrhizus and H. undatus. Most transcripts had higher expression levels in red pitaya except comp37375_c0_seq2 (Figure 8). Higher expression level of comp37375_c0_seq2 was observed in H. undatus than H. polyrhizus ( Figure 8A). Three expression patterns i.e., increasing, increasing at first and decreasing thereafter, and decreasing were detected at all pulp coloration stages of H. polyrhizus and H. undatus.
Cyt P450 (comp36993_c0_seq4) was up-regulated during pulp coloration in H. polyrhizus compared to stable and lower in H. undatus (Figure 8H). TYR (comp37375_c0_seq2), DOD (comp37692_c0_seq1 and comp30986_c0_seq1), B6GT (comp32889_c0_seq1) and cyt P450 (comp16058_c0_seq1 and comp35191_c2_seq2) had the same expression trends (increasing at first and decreasing thereafter) at all pulp coloration stages of H. polyrhizus (Figures 8A-F,J). The highest expression levels of comp37375_c0_seq2, comp35191_c2_seq2, comp32889_c0_seq1, and comp30986_c0_seq1 were detected at the early stage. The maximum expression levels of comp37375_c0_seq2, comp32889_c0_seq1, and comp30986_c0_seq1 were detected in H. polyrhizus and H. undatus on the 23 rd DAAP (pulp color begins to change from white to red) compared to comp35191_c2_seq2 on the 24 th DAAP (Figures 8A,C,F,J). By contrast, comp37692_c0_seq1 and comp16058_c0_seq1 were up-regulated expression at the early stages of pulp coloration and declined slightly at the late stages of pulp coloration (Figures 8D,E). DDC (comp37261_c1_seq14) showed a decreasing trend at all pulp coloration stages and higher expression levels were detected in H. polyrhizus than H. undatus ( Figure 8K). Comp37674_c0_seq1, comp36238_c0_seq2, and comp36993_c0_seq3 had up-regulated trends at all pulp coloration stages in H. polyrhizus and H. undatus. However, no significant difference was detected between H. polyrhizus and H. undatus (Figures 8B,I,G).

DISCUSSION
Betalains have attracted much attention of the world due to their high nutritional value and bioactivities. However, except for the cloning of single gene in the betalain pathway in model plant, the biosynthetic pathway of betalains remains to be fully clarified. In the present study, transcriptome sequencing was performed to screen genes related to betalain biosynthesis from H. polyrhizus. A preliminary pathway (Figure 5) was proposed based on betalain analyses and expression analyses of candidate transcripts in combination with published documents. Genes involved in each step of the betalain biosynthetic pathway were obtained from the transcriptome dataset, suggesting that it had high coverage. Higher expression levels of transcripts related to betalains were detected in H. polyrhizus and 9 transcripts were involved in betalain biosynthesis.
TYR played a primary role in the initial betalain biosynthetic steps. Positive correlations between betacyanin contents and TYR activities were found in P. grandiflora, B. vulgaris, and S. salsa (Steiner et al., 1999;Gandia-Herrero et al., 2004;Wang et al., 2007). In this study, two transcripts (comp37674_c0_seq1 and comp37375_c0_seq2) encoding tyrosine hydroxylase were obtained from the two libraries. Different expression patterns of comp37674_c0_seq1 and comp37375_c0_seq2 were observed. Comp37674_c0_seq1 showed up-regulated trends at all pulp coloration stages in H. polyrhizus and H. undatus (Figure 8B) suggesting TYR hydroxylation activity played equivalent function in H. polyrhizus and H. undatus. Similar results were obtained in H. polyrhizus and H. undatus according to metabolite profiling of L-DOPA (Suh et al., 2014). Comp37375_c0_seq2 increased gradually before 23 DAAP and reduced thereafter in H. polyrhizus and H. undatus (Figure 8A). The results were consistent with founding that TYR hydroxylation activity was increased at first and then decreased, with rising of betacyanin contents during seedling development of B. vulgaris (Steiner et al., 1999). TYR-formed dopa must be protected from the oxidase activity of TYR so that it could be used as substrate of DOD (Steiner et al., 1999). In our study, higher expression level of comp37375_c0_seq2 was detected in H. undatus than H. polyrhizus ( Figure 8A) indicating that comp37375_c0_seq2 keeps high oxidation activity in H. undatus to reduce L-DOPA activity. Cytochrome P450 gene encodes a key enzyme in the betalain synthesis pathway (Hatlestad et al., 2012;Yang et al., 2015). It can convert L-DOPA into cyclo-DOPA which is required for red betalain production in B. Vulgaris (Hatlestad et al., 2012). In our study, three transcripts (comp16058_c0_seq1, comp36993_c0_seq4, and comp35191_c2_seq2) encoding cyt-P450 was achieved from the two libraries. The expression level of comp16058_c0_seq1 increased gradually during pulp color transition from white to red stages and decreased at full maturation stage. Higher expression levels of comp16058_c0_seq1 were detected in H. polyrhizus than H. undatus ( Figure 8E). Similar expression pattern of comp36993_c0_seq4 was also observed in H. polyrhizus during the whole pulp coloration stages, but its expression level was lower than that of comp16058_c0_seq1 ( Figure 8H). It was reported that expression level of CYP76ADs was related to betalains in B. vulgaris (Hatlestad et al., 2012), Amaranthus hypochondriacus (Casique-Arroyo et al., 2014) and Mirabilis jalapa (Suzuki et al., 2014). In this study, betacyanin, and betaxanthin contents were increasing at all stages of pulp coloration in H. polyrhizus (Figure 1). Expression levels of comp16058_c0_seq1 and comp36993_c0_seq4 were positive correlation with betalain contents (Figure 1). In addition, comp35191_c2_seq2 had undergone a sharp increase and a dramatic decrease from 23 rd to 25 th day (Figure 8F), indicating that comp35191_c2_seq2 is an inducible gene involved in betalain pathway and contributed to the early stage. Those results suggested that comp16058_c0_seq1 is an important candidate gene of cyt-P450 involved in betalain biosynthesis. Further work is being carried out to understand its roles in betalain biosynthesis of H. polyrhizus via genetic transformation.
DOD is a key enzyme contributed to 4, 5-seco-DOPA (the precursor of betalamic acid). The expression level of DOD was positive correlation with betalains in B. vulgaris (Gandia-Herrero and Garcia-Carmona, 2012) and Parakeelya (Chung et al., 2015). In the present study, similar trend was observed between expression level of DOD and betalain accumulations (Figures 1,  8D). Lower expression levels of DOD (comp37692_c0_seq1 and comp30986_c0_seq1) were detected in H. undatus while higher expression levels of comp37692_c0_seq1 were observed at late stage of pulp coloration in H. polyrhizus compared to lower at early stage (Figures 8C,D). The expression of comp37692_c0_seq1 is higher than comp30986_c0_seq1 nearly in all stages. The expression trends of comp37692_c0_seq1 and comp30986_c0_seq1 are consistent with comp16058_c0_seq1 (cyt-P450) and comp35191_c2_seq2 (cyt-P450), respectively. Therefore, comp37692_c0_seq1 and comp30986_c0_seq1 are considered as candidate genes of DOD.
GTs catalyze the transferring of sugar moieties to form more complex betacyanins and keep stability of generated pigments (Gandia-Herrero and Garcia-Carmona, 2013). Previous studies showed there are various betacyanins in red pitaya and several glycosylation reactions were predicted (Suh et al., 2014). In this study, 14 transcripts annotated GT were obtained from transcriptome data and only two transcripts (comp32889_c0_seq1 and comp36238_c0_seq2) were upregulated in the red stage (File S3C). The result is not consistent with the above-mentioned conjecture. Furthermore, the expression trends of the two transcripts were not accordant with the betalain contents during the eight stages (Figures 1, 8I,G). The reason may be some GTs contributed to be betacyanins have not been identified.
Aromatic-L-amino-acid decarboxylase (DDC, equivalent Tyrosine/DOPA decarboxylase) and catechol Omethyltransferase (COMT) are involved in the formation of decarboxylated betalains (Gandia-Herrero and Garcia-Carmona, 2013). In this study, DDC (comp37261_c1_seq14) showed a downtrend in white and red pitayas ( Figure 8K). Negative correlation was detected between betalain contents and expression level of comp37261_c1_seq14 (Figures 1, 8K) suggesting comp37261_c1_seq14 isn't a key gene in betalain biosynthesis of pitaya. The result is also consistent with the report of low decarboxylated betalain contents in red pitaya (Suh et al., 2014).

CONCLUSIONS
To date, pitaya is increasingly gaining the public attention due to its high nutritional value and strong antioxidant properties. This study presented the first transcriptome of H. polyrhizus. The large RNA-Seq data could provide valuable information concerning betalain biosynthesis in red pitaya. A total of about 12 Gb raw RNA-Seq data was generated and de novo assembled into 122,677 transcripts, in which 122,668 were annotated. The putative betalain-related genes were identified and characterized in the pulp coloration process of pitaya. Based on betalain analyses, expression analyses of candidate transcripts and published documents, we proposed a preliminary pathway of betalain biosynthesis in pitaya. Further work such as enzyme activity analyses and genetic transformation are being carried out to elucidate their roles in betalain biosynthesis of pitaya.     File S3 | Functional annotation and analyses of transcripts. (A) Transcripts were searched based on standard gene names and synonyms. (B) Transcripts were achieved by aligning pitaya libraries with the key gene sequences downloaded from NCBI database. (C) All differentially expressed transcripts involved in betalain biosynthesis transcripts.

AUTHOR CONTRIBUTIONS
File S4 | Annotation and identification of some candidate transcript sequences from NCBI.