A chromosome-scale genome assembly of turmeric provides insights into curcumin biosynthesis and tuber formation mechanism

Yin, Yanpeng; Xie, Xiaofang; Zhou, Luojing; Yin, Xianmei; Guo, Shuai; Zhou, Xianjian; Li, Qingmiao; Shi, Xiaodong; Peng, Cheng; Gao, Jihai

doi:10.3389/fpls.2022.1003835

ORIGINAL RESEARCH article

Front. Plant Sci., 26 September 2022

Sec. Plant Breeding

Volume 13 - 2022 | https://doi.org/10.3389/fpls.2022.1003835

This article is part of the Research TopicChallenges and Progressions in Herbal GenomicsView all 8 articles

A chromosome-scale genome assembly of turmeric provides insights into curcumin biosynthesis and tuber formation mechanism

Yanpeng Yin^1,2†

Xiaofang Xie^1†

Luojing Zhou¹

Xianmei Yin¹

Shuai Guo¹

Xianjian Zhou³

Qingmiao Li³

Xiaodong Shi⁴

Cheng Peng^1*

Jihai Gao^1*

¹State Key Laboratory of Southwestern Chinese Medicine Resources, Chengdu University of Traditional Chinese Medicine, Chengdu, China
²Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
³Sichuan Provincial Key Laboratory of Quality and Innovation Research of Chinese Materia Medica, Sichuan Academy of Traditional Chinese Medicine Sciences, Chengdu, China
⁴School of Food and Biological Engineering, Chengdu University, Chengdu, China

Curcuma longa, known as the ‘golden spice’ and ‘life spice’, is one of the most commonly utilized spices in the world and also has medicinal, cosmetic, dye and flavoring values. Herein, we present the chromosomal-level genome for turmeric to explore the differences between tubers and rhizomes in the regulation of curcumin biosynthesis and the mechanism of tuber formation. We assembled the turmeric genome into 21 pseudochromosomes using Pacbio long reads complemented with Hi-C technologies, which has a total length of 1.11 Gb with scaffold N50 of 50.12 Mb and contains 49,612 protein−coding genes. Genomic evolutionary analysis indicated that turmeric and ginger have shared a recent WGD event. Contraction analysis of gene families showed possible roles for transcription factors, phytohormone signaling, and plant-pathogen interactions associated genes in adaptation to harsh environments. Transcriptomic data from tubers at different developmental stages indicated that candidate genes related to phytohormone signaling and carbohydrate metabolic responses may be associated with the induction of tuber formation. The difference in curcumin content between rhizomes and tubers reflected the remodeling of secondary metabolites under environmental stress, which was associated with plant defense in response to abiotic stresses. Overall, the availability of the C. longa genome provides insight into tuber formation and curcumin biosynthesis in turmeric as well as facilitating the understanding of other Curcuma species.

Introduction

The genus Curcuma, comprised of 70 species, the largest genus in the Zingiberaceae family, is generally distributed in tropical and subtropical regions, including China, India, Thailand, Indonesia and Malaysia, with a few species occurring in Australia and the South Pacific (Figure 1B) (Kocaadam and Şanlier, 2017; Zhang et al., 2018). It has many cultivars as its highly variable morphology and the wide range of chromosome numbers in the genus, with diploid, triploid and tetraploid plants (Leong-Skornicková et al., 2007; Záveská et al., 2016). Curcuma longa belongs to the triploid species (2n=3x=63), a rhizomatous perennial herb whose rhizome is used as one of the most common sources of spices in the world (Figure 1). Usually, C. longa, also referred as “turmeric”, was said to be widely used for treat a variety of diseases in Asian countries for at least 2,500 years (Gupta et al., 2013). Due to its special flavor, turmeric has been widely used as a flavouring agent, cosmetic, textile dye and so on. Several secondary metabolites have been reported from turmeric roots, including flavonoids (mainly curcuminoids), phenylpropanoids and terpenoids (Sun et al., 2017), which play important roles in plant growth and development, as well as stress resistance (Kroymann, 2011; Geng et al., 2020; Gu et al., 2020). Curcumin, the main medicinal ingredient and natural golden spice distributed the rhizomes of turmeric, belongs to curcuminoid, an extremely rare diketone colored substance in nature, mainly including curcumin, demethoxycurcumin, bisdemethoxycurcumin and so on. According to the modern pharmacological research, curcumin has anti-oxidant, anti-inflammatory, analgesic, anti-tumor, anti-diabetic effects (Prasad et al., 2014; Shehzad et al., 2017). In addition, as one of the natural colors with a high safety, curcumin has been used to add color and flavor to foods (Sharma et al., 2005; Prasad et al., 2014).

FIGURE 1

Figure 1 Morphological traits of Curcuma longa. (A) image of C longa. (B) global distribution range of C longa, which was obtained from the GBIF website (www.gbif.org). (C) a photographs of C longa flowers, (D) rhizomes, (E) tubers, (F) turmeric powder.

In China, the rhizomes and tubers of turmeric are used as two different herbal medicines (named as “Jiang Huang” and “Huang Si Yu Jin”, respectively), which have different clinical uses and differ greatly in their curcuminoid content (Sun et al., 2018; Liao et al., 2020). Turmeric tuber is rarely produced in practice, but it is undoubtedly a valuable part. The formation of tubers is commonly caused by environmental and endogenous factors, e.g., moisture, temperature, light, high sucrose and hormonal changes (Ravi et al., 2009; Aksenova et al., 2012). The transition from rhizomes terminal enlargement to tubers of turmeric is a complex biological process involving many changes of genes. Therefore, increasing our understanding of genetic factors in medicinal plant roots may be important for our understanding of how medicinal compounds, such as curcumin, are biosynthesized and the regulatory relationships between survival and secondary metabolites under abiotic stresses (Jogawat et al., 2021). To date, limited dataset resources restrict further elucidation of the genetics underlying of these interesting traits. Despite the great value of turmeric tubers, there are deficiencies in the mechanism of formation, biosynthesis of active ingredients and other related molecular biological basis.

High-quality genomes enable to understand the basic genes involved in curcumin biosynthesis and the traits for tuber formation. Here, we present the chromosome-level genome of C. longa in Zingiberaceae family, with PacBio, Hi-C and Illumina sequencing technology, and combined transcriptomic data to identify a number of candidate genes for curcumin biosynthetic pathway. Our data derived from developing rhizomes and tubers may be useful in analyzing transcriptomic changes associated with tuber development. The reference genome of C. longa provides insights into the biosynthesis of curcumin and development of applications in Zingiberaceae.

Result

Genome sequencing and assembly

A triploid turmeric (2n=3x=63) plant was selected for genome sequencing and assembly. In total, 89X Illumina short reads, 202X PacBio long reads, and 96X Hi-C reads were generated for turmeric (Table S1). C. longa genome size was estimated to be approximately 1.19 Gb, and the heterozygosity rate was 3.53% based on the K-mer analysis method using Illumina short reads (Figure S1). PacBio long reads were used for the initial assembly, with an average read length of 25.6 kb (Figure S2), and redundant sequences from heterozygous genomic regions were filtered out using purge_haplotigs, then the contig sequences were further polished using Illumina short reads. The de-redundancy assembled genome size was 1.11 Gb with a contig N50 of 2.24 Mb, and the GC content was 40.18% (Table 1 and Table S2). Then the assembly was enhanced and adjusted by 106.73 Gb Hi-C reads to provide a chromosome-level assembly, resulting in a 1.11 Gb genome with a scaffold with N50 of 50.12 Mb (Table 1 and Table S2). 96.25% of the assembled sequences were anchored onto 21 pseudochromosomes (Figure 2; Figure S3 and Table S3). Based on this assembly result, we evaluated the genome completeness of C. longa using BUCSO with the embryophyte_odb10 database. The genome assembly completeness reached 95.2%, while 1.1% and 3.7% were partially present or missing, respectively (Table S4). The final LTR assembly index (LAI) value for turmeric was 13.56, which meets the reference quality (10 ≤ LAI < 20) (Ou et al., 2018)and is higher than that of a previous assembled turmeric (10.26) (Chakraborty et al., 2021), indicating that our assembly was high integrity.

TABLE 1

Table 1 The major characteristics of C. longa (2n=3x=63) genome.

FIGURE 2

Figure 2 Characterization of the turmeric genome. Tracks: (A) Gene density, (B) GC content, (C) Copia density, (D) Gypsy density, (E) LTR assembly index (LAI), and (F) synteny block.

Genome annotation

We annotated the repetitive elements of the turmeric genome through homology-based, methods and ab-initio prediction, and 69.99% of the assembly was determined to be composed of repetitive elements. Among them, retrotransposons accounted for 60.98%, and DNA transposons accounted for 8.40%. More than 50% of the C. longa genome consisted of long terminal repeat retrotransposons (LTR-RTs), The percentages of Gypsy and Copia were 21.40% and 31.09% (Table S5). For protein-coding gene annotation, we used a combination of three methods: homology-based, ab initio, and RNA-Seq prediction methods. Overall, 60,686 protein-coding genes were predicted in our C. longa assembly, among 49,612 protein-coding genes were assigned functions in the five databases (NR, TrEMBL, Swissprot, KOG and KEGG) (Table S5). Moreover, the genes annotated as ncRNAs included 354 microRNAs, 1,829 tRNAs, and 1,285 rRNAs (Table S7).

Comparative genomics analysis

To understand the evolutionary dynamics of the C. longa genome, we collected genome sequences of representative species (including Arabidopsis thaliana, Oryza sativa, Phoenix dactylifera, Zea mays, Ensete glaucum, Musa acuminate, Musa balbisiana and Zingiber officinale), and performed a comparative genomic analysis with the genome sequence of C. longa. In the C. longa genome, there were 52,499 candidate orthologs gene families, 5,222 of which were specific to C. longa (Figure 3A and Table S8). In addition, we focused on the gene families of species closely related to turmeric, with a total of 8,177 gene families were shared by Zingiberales (C. longa, Z. officinale, M. balbisiana and M. acuminate) (Figure 3B). Gene Ontology (GO) annotation showed that these turmeric genome-specific genes were enriched mainly in catalytic activity (GO:0003824), oxidoreductase activity (GO:0016491), hydrolase activity (GO:0016787), membrane (GO:0016020), integral component of membrane (GO:0016021), cellular nitrogen compound metabolic process (GO:0034641), cellular aromatic compound metabolic process (GO:0006725) and organic cyclic compound metabolic process (GO:1901360) (Figure 3C and Table S9). Oxidoreductase activity may have developed to help plants scavenge ROS and eliminate oxidative damage (Lv et al., 2020), and antioxidant defense is fundamentally connected to redox signaling, cellular communication, and acclimation (Luthje et al., 1997; Liebthal et al., 2018). In total, unique gene families in turmeric were mainly related to defense functions, cell membrane composition, and cellular metabolic processes.

FIGURE 3

Figure 3 Comparative genomic analysis of C. longa. (A) Orthologous genes found in different plant species. (B) Venn diagram of the number of shared gene families within C. longa, Z. officinale, M. balbisiana and M. acuminate. The number in the Venn diagram is the number of gene families. (C) GO enrichment analysis of gene families specific to turmeric.

Genome evolution

The species of Zingiberaceae are a controversial and complex (taxonomic status) flowering family, and more genetic data are needed to confirm their relationships (Kress et al., 2002; Liang et al., 2020). A phylogenetic tree was constructed based on 119 single-copy orthologous genes in C. longa and eight other species. The phylogenetic tree revealed that C. longa is strongly related to Z. officinale, these are both from the Zingiberaceae lineage, and diverged from their common ancestor at ~16 Mya. Further estimated the times of divergence among these plants, i.e., the Musaceae (M. acuminate, M. balbisiana and E. glaucum) separated from the Zingiberaceae (C. longa and Z. officinale) at approximately 59 Mya (Figure 4A). Expansion and contraction of gene families are significant features of species selective evolution. We detected 613 significant expansion and 5307 significant contraction gene families in turmeric genomes (Figure 4A). The expanded gene families were enriched in secondary metabolic processes and carbohydrate metabolism, including phenylalanine, polyketides, terpenoids, butanoate and propanoate (Figure S4), while contracted gene families were mainly related to transcription factors, plant hormone signal transduction and plant-pathogen interaction (Figure S5). Selective expansion or contraction of these gene families that have been preserved during evolution suggests that turmeric is adaptively altered to its environment. In addition, a significant expansion of 7,357 gene families was detected in the ginger genome compared to turmeric, which may be one of the reasons for the large difference in genome size between the both (Cheng et al., 2021).

FIGURE 4

Figure 4 Genome comparison and evolution analysis. (A) Phylogenetic tree of C. longa and eight other species based on single-copy protein sequences, and divergence times (Mya) are indicated by the numbers beside the branch nodes. The numbers of contracted gene families are in blue, and those of expanded gene families are in red. (B) Distribution of the synonymous substitution (Ks) rates for C. longa, Z. officinale, and M. acuminate genome, peaks in the intraspecies Ks distribution indicate whole-genome polyploidization events, and peaks in the interspecies Ks distribution indicate speciation events. (C) Synteny conservation between C. longa and Z. officinale genomes. (D) Dot plots of paralogs in the C. longa genome.

Whole-genome duplications events were one of most important drivers of genome evolution (Van De Peer et al., 2009a; Van De Peer et al., 2009b), and we used synonymous substitution rates (Ks) values to estimate the timing of large-scale duplication (Blanc and Wolfe, 2004). A peak of duplication with Ks approximately 0.35 was found in Z. officinale and C. longa genome, suggesting that a recent whole-genome duplication event occurred in the Zingiberaceae clade (Figure 4B) (Cheng et al., 2021; Li et al., 2021). The dot plot (Figure 4D) also illustrates the paralogs inherited from the recent WGD in the turmeric genome (2-2 diagonal relationships). The Musa lineage also experienced an ancient WGD event (Ks= ~ 0.45) (D’hont et al., 2012; Wu et al., 2016) (Figure 4B). Comparison of interspecies between the orthologs gene pairs in C. longa vs. M. acuminate, and Z. officinale vs. M. acuminate, indicated that the divergence of Zingiberaceae and Musaceae (Ks= ~ 0.7) before the recent WGD event (Ks= ~ 0.3-0.4) (Figure 4B). A colinear maps were constructed by comparing the turmeric genome with the ginger genome, we found a total of 564 syntenic blocks (21,829 pairs of collinear genes) between turmeric and ginger (Figure 4C and Figure S6). Synthetic depth analysis showed a 2:2 ratio of genes observed in the comparison of turmeric and ginger genomes (Figure S7).

Genes involved in curcumin biosynthesis

Turmeric has a distinctive spicy aroma and it is well known as a spice. However, only a few species in the Zingiberaceae family have the unique flavor of turmeric, and the compounds of turmeric rhizomes and tubers were analyzed by UPLC-MS/MS. Phenols and terpenoids are the two main classes of compounds that were detected, including curcumin and its derivatives. Curcumin is the recognized one of the main ingredient in C. longa with a variety of biological activities and pharmacological effects (Dosoky et al., 2019). The accumulation of large amounts of curcumin in the rhizome is remarkable distinctive feature of turmeric (Fabianowska-Majewska et al., 2021). Then we analyzed the genes in the curcumin biosynthetic pathways.

Previous studies identified some key enzymes in the curcumin biosynthetic pathway and confirmed the phenylpropanoid pathway are involved in the biosynthesis of curcumin (Ramirez-Ahumada et al., 2006). In this study, we identified genes involved in this pathway based on the genome-wide level of turmeric, and evaluated and compared their expression levels in rhizomes (Rh), adventitious roots (Ar), and tubers at different developmental stages (Tu1-Tu3) (Figure 5A). Transcriptomic data showed that Rh, Ar and Tu1-3 had different gene expression profiles (Figure 5B), indicating the presence of different regulatory events in the roots at different developmental stages. We identified 68 candidate genes associated with the curcumin biosynthesis in five tissues, and these genes were mainly encoding seven key enzymes, e.g., phenylalanine ammonia-lyase (PAL), cinnamate 4-hydroxylase (C4H), 4-coumarate-CoA ligase (4CL), shikimate O-hydroxycinnamoyl transferase (HCT), caffeoyl-CoA O-methyltransferase (CCOMT), diketidyl-CoA synthase (DCS) and curcumin synthase (CURS) (Figure 5C). DCS and CURS are the key enzymes responsible for biosynthesis of curcumin compounds, and they are involved in significantly expanded genes in turmeric same as in ginger (Figure 6B). DCS synthesizes feruloyl diketide-CoA and coumaroyl diketide-CoA, and CURS then converts the diketide-CoA esters to curcuminoid (Katsuyama et al., 2009). DCS and CURS had higher mRNA levels in rhizomes compared with tubers, whereas the expression levels of both increased from Tu1 to Tu2 stage, finally decreased from Tu2 to Tu3 stage according to the three stages of tubers development (Figure 5C). These two genes had similar expression patterns in adventitious roots, and the expression levels were greatly higher compared with that in rhizomes and tubers. As the result of UPLC-MS/MS analyses, curcuminoids were more abundant in the rhizomes than in the tubers (Table 2 and Figure S8). This result was consistent with the correlation between gene expression and metabolite concentrations in plant secondary metabolite biosynthesis, where these related genes were often expressed coordinately (Li et al., 2015). In agricultural production, our investigation showed that the formation of turmeric tubers may be related to water-deficit conditions. A study showed that the expression levels of CURS and DCS were upregulated in turmeric under water-deficit conditions (Chintakovid et al., 2022), which was consistent with the expression pattern of the tuber formation stage (Tu1 to Tu2). In addition, the difference in curcumin content between rhizomes and tubers reflected the remodeling of secondary metabolites under environmental stress, which was related to plant defense in response to abiotic stresses (Obata and Fernie, 2012; Zhu, 2016).

FIGURE 5

Figure 5 Curcumin biosynthesis pathway in turmeric. (A) Phenotypic comparison of five root developmental stages of turmeric. (B) Principal component analysis (PCA) of five turmeric tissues based on the transcriptomic data. Three biological replicates were performed for each developmental stage. (C) Relative expression profiles (yellow-blue scale) of genes encoding enzymes possibly involved in curcumin biosynthesis (heat map). The enzymes involved at each step are shown in red.

FIGURE 6

Figure 6 The PKS Genes Clusters Identified in the C. longa Genome. (A) Phylogenetic analysis of PKS genes from C. longa, Z. officinale, M. acuminate and O. sative. Different colored boxes represent the classification of PKSs. (B) Distribution pattern of the expanded gene families in curcumin biosynthesis pathway and comparison with other species.

TABLE 2

Table 2 The m/z value and fold change of curcuminoid in turmeric different tissues (tubers and rhizomes).

Identification of polyketide synthases (PKS) gene family

The plant type III polyketide synthase (PKS) gene family, which catalyzes and synthesizes a variety of different structural and biologically active plant secondary metabolites, can provide unique polyketide backbones for various metabolic pathways (e.g., curcuminoids, flavonoids, stilbenes, diarylheptanoids and quinolones) associated with various functions of plant growth, development and defense (Austin and Noel, 2003; Dao et al., 2011; Yu et al., 2012a; Pothiraj et al., 2021). To further validate the subtype designation of C. longa PKSs, three species (Z. officinale, M. acuminate, and O. sativa) were selected for comparative analysis of PKS gene families. A total of 30 putative PKS proteins in the C. longa genome were annotated according to the conserved domains (PF00195 and PF02797) using HMMER. We constructed a phylogenetic tree by aligning the PKS proteins among C. longa, Z. officinale, M. acuminate and O. sativa to confirm the classification of PKS family proteins in turmeric. The 30 turmeric PKS genes were clustered into four groups, the known chalcone synthases (CHSs), DCSs, CURSs and anther specific CHS (ASCs) (Pothiraj et al., 2021) (Figure 6A). ASCs were specifically expressed in pollen development and comprise a monophyletic clade (Ageez et al., 2005; Jiang et al., 2008). CHSs, and DCSs along with CURSs were regarded as key enzymes for flavonoids, and curcuminoid biosynthesis, respectively. We further examined the genomic dataset to determine the copy number of genes involved in the biosynthesis pathways of curcumin, which revealed that the genes encoding CURS and DCS were expanded in the turmeric and ginger genomes (Figure 6B). The striking expansion of the DCS and CURS gene families in turmeric and ginger underpin the existence of curcuminoids mainly in Zingiberaceae. Our work identified the taxonomic and functional characteristics of the turmeric type III PKS genes, which provided a reference for systematic analysis of the functions of the PKS gene family and a basis for further studies on the biosynthesis of related secondary metabolites in Zingiberaceae.

Analysis of gene expression patterns in turmeric tuber formation

To further increase our understanding of the tuber formation and development, we performed transcriptomic analysis in rhizomes (Rh), adventitious roots (Ar) and three developmental stages tubers (Tu1-Tu3) of turmeric. Using STEM (Short Time-series Expression Miner) software, 6,868 DEGs (differentially expressed genes) from five tissues of turmeric were clustered into six clusters (from cluster 1 to 6) based on the expression patterns of the genes (Figure 7A). In general, cluster 3, 5 and 6 had 1639 genes and they all displayed a trend of increasing expression from Tu1 to Tu2 phase and decreasing expression from Tu2 to Tu3 phase (Figure 7A). KEGG enrichment analysis showed that these genes were related to starch and sucrose metabolism, secondary metabolism such as phenylpropanoid, terpenoids and polyketides, cytochrome P450 and cutin, suberine and wax biosynthesis, corresponding to tuber expansion, accumulation of secondary metabolites and environmental adaptation (Figure 7B). Another, there were 3,060 genes in cluster 1, 2 and 4, which showed a progressive decreasing expression trend from Tu1 to Tu3 stages, enrichment analysis indicating these genes related to transcription factors and plant hormone signal transduction (Figure 7B). For example, the transcription factor HY5 (HYPOCOTYL5) coordinates the metabolism of carbon and nitrogen in response to light, and hence shoot and root growth (Chen et al., 2016). AP2 (APETALA2) acts primarily in the regulation of developmental programs (Licausi et al., 2013). IAAs (auxin/indole-3-acetic acid) and ARFs (auxin response factors) are the key transcription factors in regulating the expression of auxin-responsive genes (Li et al., 2006), including IAA6/10/15/17/27, ARF11/19 and so on.

FIGURE 7

Figure 7 Expression patterns and enrichment analysis of DEGs in developing turmeric tubers. (A) Patterns of gene expressions across five turmeric tissues (Ar (adventitious roots), Tu1-3 (tubers1-3) and Rh (rhizomes)) inferred by STEM analysis. (B) Heat maps of significantly enriched pathways from six clusters. The yellow and red colors indicate the P-value for significantly enriched pathways. (C) Intersection diagram showing the distribution of differentially expressed genes among five turmeric tissues based on the transcriptomic data. (D) Heat maps for differentially expressed genes related to Tu1-Tu2 and Tu2-Tu3 stage. (E) Heat maps for differentially expressed genes related to phytohormone signaling pathways. (F) Schematic representation of overall effects of phytohormone signaling, carbohydrate on turmeric tuber development.

We also analyzed key DEGs at different developmental stages (Figure 7C). For the DEGs that were up-regulated in the Ar vs. Tu1 stage and were mainly involved in the biosynthesis of cell wall, nutrient element transporters and metabolism, and these genes were enriched in pathways including ‘glycosyltransferases’, ‘transporters’, and ‘starch and sucrose metabolism’ (Figure S9A and Table S10). Among these DEGs, we identified two XylT (xylosyltransferase), GAUT (galacturonosyltransferase), UGT (glucuronosyltransferase), three GBSS (granule-bound starch synthase), TPS (trehalose phosphate synthase), three SWEET (sugars will eventually be exported transporters), STP5 (sugar transport protein 5), NIP5 (NOD26-like intrinsic proteins 5), two TIP1 (tonoplast intrinsic protein 1), two TIP2, PIP2 (plasmamembrane intrinsic proteins), two TPT (triose phosphate/phosphate translocator) and GST (glutathione S-transferase). By comparing the gene expression profiles of Tu1 to Tu2 stage, we found that DEGs with higher transcript abundances in Tu2 stage were enriched in pathways including ‘cytochrome P450’, ‘cutin, suberine and wax biosynthesis’, and ‘flavone and flavonol biosynthesis’ (Figure S9B and Table S11). DEGs with higher expression in the Tu2 to Tu3 stage were enriched in ‘starch and sucrose metabolism’, ‘vitamin B6 metabolism’ and ‘transcription factors’ (Figure S9C and Table S12). Several genes related to cuticle development (CYP86A2), flavone biosynthesis (CYP75A1), starch synthesis (TPP, GBE), threonine biosynthesis (ThrS), regulation of growth and development (HD-Zip, MADS-box) were identified (Figure 7D). In summary, we found that genes involved in tuber development were significantly enriched in the carbohydrate metabolism, and genes of the transporter pathway were significantly enriched in the early stages of tuber formation, which contributed to the synthesis of sucrose in the leaves and its conversion to starch after transport to the tuber. In addition, phytohormones are among the important signaling substances that regulate plant development, growth and reproduction. In turmeric, genes related to tuber development were found to be significantly enriched in the phytohormone signaling pathways (Figure 7E), e.g., auxin, gibberellin, and cytokinin (Abelenda and Prat, 2013; Kondhare et al., 2021). For the auxin pathway, ARF and IAA regulate each other to participate in tuber development. In addition, other phytohormone-related genes including the PYR1 (abscisic acid receptor 1) (Santiago et al., 2009), SRK2E (serine/threonine-protein kinase) (Kulik et al., 2011), ABI5 (abscisic acid insensitive 5) (Collin et al., 2021). in the abscisic acid pathway, GID 1/2 (gibberellin insensitive dwarf 1/2) (Yoshida et al., 2018)in gibberellin pathway, ARR17 (arabidopsis response regulator 17) (Chang et al., 2019) in the cytokinin pathway, and COI1 (coronatine-insensitive protein 1) (Yan et al., 2009), transcription factor MYC2/4 (Van Moerkercke et al., 2019) in the jasmonic acid pathway were also highly expressed in the tuber development stage. These results suggested that regulation of phytohormones and carbohydrate metabolism genes contribute to the development of tubers (Figure 7F).

Discussion

Given the importance of turmeric for edible and medicinal purposes, the genome of turmeric further enriches biological information for functional gene mining, tuber formation and secondary metabolite biosynthesis. The heterozygosity of turmeric is as complex as previously reported for the ginger genome (Li et al., 2021). In addition, turmeric, a triploid species, has a heterozygosity of 3.53%, which was a great challenge for genome assembly. We applied PacBio sequencing to assemble the genome by SMARTdenovo, using purge_haplotigs to filter redundant sequences in heterozygous genomic regions, and combined with Hi-C technology to assemble large-scale scaffolds into pseudochromosomes. In our case, this turmeric genome has a higher quality of assembly and annotation compared to older versions of turmeric draft genomes (Chakraborty et al., 2021). The complex diversity of morphology, chromosome numbers and chromosome sizes in curcuma species indicates that these species have a unique evolutionary history in which polyploid events may play an important role (Leong-Skornicková et al., 2007). Polyploidy is more common in the curcuma and zingiber genus (Anamthawat-Jónsson and Umpunjun, 2020). We identified the polyploidy events in two sequenced Zingiberaceae species, which observed one recent WGD shared between turmeric and ginger. This WGD event occurred before the diverged form ginger, and contraction of multiple gene families were detected in the turmeric genome, which may be responsible for its adaptation to harsh environments. Turmeric is mainly distributed in tropical and subtropical areas, and its smaller chromosomes may be the result of habitat adaptation (Chen et al., 2013). Consistent with previous studies, the changes in chromosome size may be related to the climate of the habitat, which species with smaller chromosomes were mainly distributed in tropical and subtropical climates, while larger chromosomes were distributed in temperate climates (Stebbins, 1966). This result will facilitate the study of the evolution of species and the molecular mechanisms of their adaptation to the environment.

Turmeric is one of the most popular spices in the world and its rhizomes are the main source of its ingredients. Compared to the rhizomes, the tubers of Curcuma species such as C. longa, C. kwangsiensis, C. phaeocaulis, C. wenyujing also have important value, e.g., spices, medicines, cosmetics, dyes and flavourings (Dosoky and Setzer, 2018). Current research on tuber formation focuses on potatoes, sweet potatoes, cassava and other crops (Ding et al., 2020; Zierer et al., 2021), but lack of research on the tuber development and formation mechanism of medicinal plants such as turmeric, which is necessary to fully elucidate the functions of medicinal plant roots in regulating the synthesis and distribution of secondary metabolites. Here, using a high-quality assembled turmeric genome, together with transcriptomic data during the development stages of the root system, we propose a regulatory mechanism for tuber formation and infer the contributions of genes related to phytohormone signaling and carbohydrate metabolism.

Basal metabolism plays a key role in the tuber formation of turmeric, e.g., carbon metabolism and starch and sucrose metabolism (Kondhare et al., 2020; Yoon et al., 2021). Our RNA-seq analysis revealed numerous genes were significantly up-regulated expressed in Tu1-Tu3 stage involved in carbohydrate metabolism, such as XylT, GBSS and SWEET. XylT participates in the biosynthesis of plant cell wall xyloglucan, which is the abundant hemicellulosic component of the primary cell wall (Ehrlich et al.; Ding et al., 2020). A high correlation between starch and sucrose content and tuber size. GBSS is a key enzyme for starch biosynthesis, and its expression promotes starch accumulation (Fulton et al., 2002). Plants that use sucrose as photosynthetic products realize the transport and distribution of photosynthetic products between source and sink organs through long-distance transport, which involves the transmembrane transport of sucrose (Riesmeier et al., 1994). SWEET participates in important physiological processes of plant growth and development by regulating the transportation, distribution and storage of sugar in plants (Chen, 2014; Breia et al., 2021). In addition, a series of enzymes such as GAUT, UGT, STP5, aquaporin (NIP5, TIP1, TIP2 and PIP2), TPT and GST are also involved in the tuber growth and development, including suger transport (Kong et al., 2019), photosynthesis (Lee et al., 2017) and stress response (Gullner et al., 2018; Kurowska, 2020; Zhang et al., 2021), suggesting their important roles in this process. Futhermore, the DEGs of phytohormone signaling pathway were significantly enriched in the developmental stage of turmeric tuber, inferring that hormones play an important regulatory role in the induction and formation of tuber. The formation and development of plant tubers and other storage organs are the result of the coordinated regulation of a variety of endogenous hormones (Hartmann et al., 2011; Roumeliotis et al., 2012). Several genes related to the biosynthesis of cytokinin, ABA, and IAA were identified, e.g., PYR1, SRK2E, ABI5, ARR17. Among them, cytokinin affects tuber formation and expansion, ABA affects tuber expansion by regulating the transfer of carbohydrates to tubers, and IAA promotes tuber expansion (Kondhare et al., 2021). During tubers growth and development, sugar and hormone signaling are usually tightly coupled (Gibson, 2005; Woodward and Bartel, 2005; Eveland and Jackson, 2012). However, the interaction between these genes and how they systematically regulate the formation and expansion of turmeric tubers still need further investigation.

In general, turmeric cannot adapt to harsh environments, such as water shortage and decreasing temperature, it can lead to dormancy of turmeric rhizomes (Rezvanirad et al., 2016; Chintakovid et al., 2021). A previous study screened the environmental factors that affect turmeric yield, such as longitude, latitude, soil pH, annual sunshine, and annual precipitation, and found that annual precipitation was one of the main influencing factors (Wu et al., 2019). However, its tubers can exist as a drought-resistant organ. Higher plants have evolved morphologically and physiologically to cope with challenging environmental conditions, and many have developed belowground storage organs such as tubers (Kondhare et al., 2021). Therefore, based on the results of surveys in turmeric growing areas and our research, we inferred that the formation of turmeric tubers is closely related to the drought environment, and according the above transcriptomic data, we found that some genes that resist abiotic stress also play a role in the development of tubers. Our results showed that genes such as GST, NIP5, TIP and PIP2 were significantly expressed at the tuber development stage, which could explain the presence of turmeric tubers as drought-tolerant organs. Curcumin as a secondary metabolite also plays a key role in the defense mechanism of the plant (Sathiyabama et al., 2016), and the expression of the metabolite curcumin as well as key enzymes for curcumin biosynthesis in the tubers was detected based on transcriptomic and metabolomic data. In other words, drought may be one of the reasons for the induction of tuber formation in turmeric.

Taken together, we provide a chromosome-level genome of turmeric, a reference genome of the Zingiberaceae family. The phylogenetic position and a WGD event of the turmeric genome was revealed by comparative genomic analysis. The integration of transcriptomic and metabolomic data improved our understanding of the curcumin biosynthetic pathway and turmeric tuber formation. Moreover, this provides a basis for understanding the relationship between plant survival and secondary metabolite synthesis under drought stress. These data are an important reference for further studies on molecular breeding in turmeric and agricultural production, as well as for comparative genomic analysis of Zingiberaceae family.

Method

Sample collection and genome sequencing

Curcuma longa was collected in Qianwei County (Leshan, China, coordinates: 104°07′22″E, 029°03′48″N), and total genomic DNA was extracted from the root tissues of C. longa following the CTAB method (Raimundo et al., 2018). Based on the Illumina protocols, we constructed two Illumina libraries with fragment size of ~ 350 bp, and the libraries were subjected to paired-end 150 bp (PE 150) sequencing using the Illumina HiSeq X-ten platform (Illumina, San Diego, CA, USA). Then the raw Illumina reads were filtered to obtain clean reads for estimating genome size, GC content, and heterozygosity level. Another library was constructed using the PacBio single-molecule real-time (SMRT) library construction protocol for the second genomic DNA sequencing approach. The genome was sequenced on the PacBio Sequel platform (Pacific Biosciences, Menlo Park, CA, USA). A total of ~224.90 Gb of clean data were used for genome assembly. As a third approach, a Hi-C library was generated and sequenced on a HiSeq X Ten platform (PE 150 bp). ~106.73 Gb Hi-C clean reads were obtained after removing low-quality reads.

RNA sequencing and gene expression analysis

Rhizomes, tubers and adventitious roots collected from C. longa were used for transcriptome analysis, where tubers were collected according to their growth segments at three developmental stages (Tu1-Tu3). Each tissue was collected in three biological replicates. All samples were immediately stored in liquid nitrogen after being rinsed with Milli-Q water. The RNA Prep Pure Plant Kit (TIANGEN, China) was employed to extract total RNA as described by the manufacturer. The Illumina Novaseq 6000 platform was used for RNA sequencing (Illumina, San Diego, CA, USA). Raw reads were trimmed to remove adaptors using Trimmomatic (Bolger et al., 2014). Clean reads were mapped to the genome using HISAT2 (Kim et al., 2015), and then TPM (per million transcripts) was analyzed using StringTie (Pertea et al., 2016). Differentially expressed genes (DEGs) were determined using DEseq2 package (https://bioconductor.org/packages/release/bioc/html/DESeq2.html). The gene expression profiles of different tissues were presented as heat maps using TBtools (Chen et al., 2020).

Genome survey and genome assembly

We estimated the genome size using k-mer frequencies. Jellyfish (Marçais and Kingsford, 2011) and GenomeScope (Vurture et al., 2017) were used to construct the 21-mer distribution of 98.73 Gb Illumina short reads. As a consequence, we calculated the haploid C. longa genome size to be 1.19 Gb, with a heterozygous rate of 3.53%. Canu (Koren et al., 2018) was used for long read error correction, and SMARTdenovo (Liu et al., 2020) was utilized for de novo assembly of PacBio reads. Additionally, Purge_Haplotigs pipeline (Roach et al., 2018) was performed to filter the redundant contigs of de novo assembly. The Illumina short reads and PacBio long reads were used for further polishing with the NextPolish (Hu et al., 2020) to improve the accuracy of the assembly. The contigs of our draft genome were assembled, ordered, and oriented onto the 21 pseudochromosomes of C. longa (2n=3x=63) by ALLHIC (Zhang et al., 2019) using paired-end reads from Hi-C. The completeness of the genome assembly was assessed by Bench-marking Universal Single-Copy Orthologs (BUSCO) v5.0 (Simão et al., 2015) with embryophyta_odb10. We also calculated the LTR assembly Index (LAI) (Ou et al., 2018) using the LTR_Finder and LTR_retriever package to assess genome quality.

Repeat annotation

De novo identification and homology-based searches were used to annotate repeat sequences in the C. longa genome. For the de novo prediction, a custom repeat library was produced by LTR_Finder (Xu and Wang, 2007) and RepeatModeler (Flynn et al., 2020), and then merged with the known repeat database Repbase (Jurka et al., 2005) to generate the final repeat library. The assembly genome was aligned to the final repeat library using RepeatMasker (Chen, 2004) and for homology−based searches.

Gene annotation

We combined three different strategies: ab initio prediction, homology-based prediction and transcript-based prediction for the protein-coding genes of C. longa genome. For ab initio predictions, we used Augustus (Stanke and Waack, 2003), and GlimmerHMM (Majoros et al., 2004). Based on homologous comparisons, protein sequences from three related species (Musa acuminate, Musa balbisiana and Zingiber officinale) were provided as protein evidence using Exonerate v2.2.0 (Slater and Birney, 2005). Then, the RNA-sequencing data from five tissue of C. longa were assembled into transcripts by Trinity (Grabherr et al., 2011) to perform transcript-based prediction and the gene structure was further predicted using PASA (Haas et al., 2003). Finally, EVidenceModeler (Haas et al., 2008) was employed to integrate the prediction results of three strategies. The functional annotation of protein-coding genes was by BLASTP (Camacho et al., 2009) (e-value cutoff of 1e-5) against non-redundant protein sequence (NR) (Aron et al., 2011), Kyoto Encyclopedia of Genes and Genomes (KEGG) (Ogata et al., 1999), Swiss-Prot (Boeckmann et al., 2003), KOG and TrEMBL (Boeckmann et al., 2003) databases.

Noncoding RNA prediction

For noncoding RNA annotations, the whole C. longa genome was scanned for microRNAs and rRNAs using BLASTN based on the Rfam (Kalvari et al., 2018) database. tRNAs were predicted by tRNAscan-SE (Lowe and Eddy, 1997).

Comparative genomics analyses

The protein-coding genes from C. longa and eight other species (Arabidopsis thaliana, Oryza sativa, Phoenix dactylifera, Zea mays, Ensete glaucum, Musa acuminate, Musa balbisiana, Zingiber officinale) were clustered into orthologous groups using OrthoFinder v2.3.7 (Emms and Kelly, 2019). The longest protein sequences were employed to perform all-against-all comparisons using Diamond (E-value ≤ 1e−5). The single-copy protein sequences were aligned by MAFFT (Katoh et al., 2002), and positions showing poor alignment were eliminated with Gblocks v0.91b (Castresana, 2000). IQ-TREE v2.0.7 (Nguyen et al., 2015) constructed the phylogenetic trees with JTT+G+F model. The bootstrap support values were calculated on 1000 replicates. MCMCTREE (Yang, 2007) was applied to calculate divergence time of above species with parameters (burnin=5,000,000, nsample=5,000,000 and sampfreq=30). Two calibration points from the TimeTree database (http://www.timetree.org/) were selected as normal priors to constrain the age of the nodes, such as published divergence times for M. acuminate - E. glaucum (~ 56–69 Mya), O. sativa - Z. mays (~ 42–52 Mya). CAFE5 (Mendes et al., 2021) was used to calculate the expansion and contraction of OrthoFinder-derived gene clusters based on changes in gene family size. Functional enrichment analysis of these expanded and contracted genes in C. longa genome was performed using the clusterProfiler R package (Yu et al., 2012b).

Genome synteny and synonymous substitution rates (Ks) analysis

All-against-all comparisons BLASTP (E-value ≤ 1e−5) analyses of proteins were conducted between and within the three species (C. longa, Z. officinale and M. acuminate). The distribution of the synonymous substitution rate (Ks) among orthologs (between C. longa-Z. officinale, C. longa-M. acuminate and Z. officinale -M. acuminate) and paralogs (within the C. longa, Z. officinale and M. acuminate genome) revealed species differentiation and whole-genome duplication events. MCScanX (Wang et al., 2012) was used to identify syntenic blocks using BLASTP results and genome annotation files for each species. The Ks values of ortholog pairs or paralog pairs were plotted using the WGDI (https://pypi.org/project/wgdi/), where the Ks distribution was used to evaluate the WGD events, and the divergence time was calculated using the formula T = Ks/2r, where r is the neutral substitution rate.

Identification and analysis of polyketide synthases (PKS) genes

We selected three species (Z. officinale, M. acuminate and O. sativa) for comparative analysis of the PKS gene family together with turmeric. Using HMMER (Eddy, 2009) with the Pfam protein family database (http://pfam.xfam.org/), and genes containing the Chal_sti_synt_N (PF00195) domain and Chal_sti_synt_C (PF02797) domain (E-value ≤ 1e−5) were considered as candidate PKSs. The putative protein sequences of all PKS genes were aligned using MAFFT and then a phylogenetic tree was constructed using IQTREE with 1000 bootstrap replicates.

UPLC-MS/MS analysis of curcumin

Turmeric rhizomes and tubers were collected at maturity and detected for curcumin using a Waters ACQUITY UPLC coupled with a XEVO QTOF mass spectrometer (Waters Corporation, Milford, MA, USA). The analytical conditions were as follows: solvent system of acetonitrile(A), water (0.1% formic acid, B); 0–10min, 5%–15% A, then 10–20 min, 15%–30% B, and 20–30 min, 30%–70% B; flow rate, 0.2 mL/min; temperature, 35°C; injection volume, 1 μL; scanning mode, positive and negative ion mode.

Data availability statement

The data presented in the study are deposited in the China National GeneBank DataBase (CNGBdb) https://db.cngb.org/, accession number CNP0003456.

Author contributions

CP and JG designed the project. LZ, XZ, and QL collected plant materials. YY, XS, and XY conducted experiments. YY and SG analyzed data and conducted bioinformatic analysis. YY, XX, and JG wrote and modified the manuscript. All authors contributed to the article and approved the submitted version.

Funding

This work was supported by National Interdisciplinary Innovation Team of Traditional Chinese Medicine (ZYYCXTD-D-202209), Major increase and decrease of expenditure at the central level (2060302), Talent Project in Chengdu University of TCM (QNXZ2018017, QNXZ2019001), Multi-dimensional evaluation of characteristic traditional Chinese medicine resources and product development innovation team (2022C001), and Science and Technology Planning Project of Sichuan Provincial Department of Science and Technology (2020YFN0152, 22CXTD0009).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2022.1003835/full#supplementary-material

References

Abelenda, J. A., Prat, S. (2013). Cytokinins: Determinants of sink storage ability. Curr. biol.: CB 23, R561–R563. doi: 10.1016/j.cub.2013.05.020