Analysis of genetic and chemical variability of five Curcuma species based on DNA barcoding and HPLC fingerprints

The rhizomes of Curcuma species have a long medicinal history in Asia. In China, Curcuma species mainly be utilized to make pharmaceutical products, including C. phaecocaulis, C. aromatica, C. wenyujin, C. kwangsiensis and C. longa. In this study, twenty-four samples were selected to study the genetic and chemical variability among five Curcuma species. The ITS2 and trnK intron gene fragment were used to identify the five Curcuma species, the differences in chemical composition were computed using the Euclidean distance based on the data of HPLC characteristic peak areas and the content of six key components, and agronomic characteristics were analyzed including morphological and volatile oil characteristics. The ITS2 and trnK intron gene fragment could distinguish the five Curcuma species clearly. The genetic distance between Curcuma species ranged from 0.0085 to 0.0767 based on the data of ITS2 gene sequences with 32 variation sites, and the genetic distance between Curcuma species ranged from 0.0003 to 0.0194 based on the data of trnK intron gene sequences with 39 variation sites. Five Curcuma species showed otherness chemical composition characteristics, with the Euclidean distance ranging from 3.373 to 6.998. The C. longa showed the biggest variation compared with other species, with the Euclidean distance above 6.239. Among the samples of the original plants of Ezhu, the volatile oil yield of W1 was the highest, reached to 105.75 mL per single plant. Among all the samples, J6 showed the highest yield of volatile oil, reached to 149.42 mL per single plant. The results showed that chemical composition similarity of the medicinal plants was the primary proof for the selection of the original plants of the Curcuma medicinal materials. The genetic distance and chemical variability were important references for discovering new medicinal plant resources.


Introduction
There are many species of Curcuma, with at least 120 different species worldwide (Zaveskáet al., 2012).Curcuma species are widely distributed in 17 provinces of China, among which C. longa L., C. aromatica Salisb, C. phaecocaulis Val., C. wenyujin Y. H. Chen et C. Ling and C. kwangsiensis S. G. Lee et C. F. Liang are most commonly used (Zhang et al., 2018).The Chinese Pharmacopoeia records three Chinese medicine materials derived from Curcuma rhizomes including Jianghuang (CURCUMAE LONGAE RHIZOMA), Pianjianghuang (WENYUJIN RHIZOMA CONCISUM) and Ezhu (CURCUMAE RHIZOMA) with the effect of clearing veins and reducing pain.The original plants of Jianghuang and Pianjianghuang are C. longa and C. wenyujin, respectively.While, the original plants of Ezhu are C. wenyujin, C. kwangsiensis, and C. phaecocaulis.Volatile oil obtained from Curcuma species dried rhizomes have also shown pharmacological effects that include anti-cancer, anti-inflammatory and other properties (Guo et al., 2013;Zhang et al., 2017;Wu et al., 2021).The Ezhu You, which is the volatile oil obtained derived only from the rhizomes of C. wenyujin, is also recorded in Chinese Pharmacopoeia.
DNA barcoding is a method that can effectively identify species according to the short DNA fragments information.Ribosomal DNA (rDNA) including gene fragments such as ITS1 and ITS2 is commonly used to identify species.In the rDNA region, the internal transcribed spacer (ITS) as a high-mutant non-coding area provides more information sites in system development and distinguishes differences between Curcuma species (Pedersen, 2004).Compared to ITS1, ITS2 has lower length variation and more common primer sites, which can better elucidate the genetic relationship between species (Tedersoo et al., 2015;Nilsson et al., 2019).The chloroplast DNA (cpDNA) including gene fragments such as trnH-psdA, matK, and trnK intron is commonly used to identify species (Minami et al., 2009;Zaveskáet al., 2012;Vinitha et al., 2014;Chen, 2015;Liu et al., 2022).Several researches have demonstrated that trnK intron gene segments are effective at identifying Curcuma species (Duan et al., 2017;Liu et al., 2022).Therefore, we chose the trnK intron and ITS2 gene to evaluate the genetic distance between different Curcuma species.
In this study, ITS2 and trnK intron gene sequences were used to analyze the genetic distance between different Curcuma species.To examine the chemical composition variation of the rhizomes of five Curcuma species, we established HPLC fingerprints and measured the quantity of 6 key components in the samples.And then, the relationship between genetic distance and chemical composition variation were also analyzed.Finally, major morphological and volatile oil characteristics were examined, which could provide a guide for Curcuma medicinal industry.

Experimental materials
A total of 24 experimental materials had been collected from major producing areas in China, including 5 samples of C. phaecocaulis, 6 samples of C. aromatica, 4 samples of C. wenyujin, 3 samples of C. kwangsiensis, and 6 samples of C. longa (Table 1).The rhizomes of 22 samples (except G2 and G3) were harvested and dried in December in Hangzhou (Latitude: 119.97554°;Longitude: 30.36784°) with a growth period of one year.In addition, two fresh C. kwangsiensis samples (G2 and G3) were collected from Guangxi in June 2023 to study genetic distance, while the corresponding dry rhizomes also harvested and dried in December 2022 were collected to study the chemical composition.

Morphological identifications of plant
The plant height, stem color, leaf sheath color, leaf epidermal hair on the front and back sides of the leaf, midrib characteristics, rhizome dry weight per single plant, the weight ratio of primary rhizome to secondary rhizome, rhizome inner section color and other agronomic characteristics were recorded.

Extraction of volatile oil
The volatile oil was extracted according to the 2020 edition of the Chinese Pharmacopoeia in the 2204 volatile oil determination method A. 10.00g of dried and ground herbs and 500 ml of water and zeolite were added to a 1000 ml round bottom flask connecting the volatile oil analyzer to the reflux condenser.Added water from the upper end of the condenser to fill the graduated part of the volatile oil tester, slowly heated it to boiling in the electric heating sleeve, and kept it slightly boiling for 5 hours, until the amount of oil in the tester no more increase and stopped heating.When the temperature dropped to room temperature, read the volatile oil volume, and calculated the volatile oil content in the test solutions.

HPLC analysis
The test solutions were made by ultrasonically extracting 0.5 g of dried and ground herbs in 10 mL of methanol at room temperature for 30 minutes.The extracted solution was cooled, then added to the initial weight along with methanol.As for the standard solution, curdione (6.50 mg), curcumenol (5.04 mg), germacrone (4.14 mg), curzerene (7.58 mg), furanodienon (6.74 mg) and beta-elemene (12.03 mg) were respectively placed in a 10 mL volumetric flask and dissolved with methanol as stock solution.All test and standard solutions were filtered through a 0.45 mm filter before being used for HPLC analysis.

Data analysis
ContigExpress software was used to splice the two-way sequencing peak map and delete the weak or overlapping peak regions at both ends in order to obtain the DNA sequence.From GenBank, we retrieved and downloaded standard sequences for The reference chromatogram was generated using a Similarity Evaluation System for Chromatographic Fingerprint of TCM (Version 2012).Origin Pro 2022 software was used to perform Principal component analysis (PCA) and hierarchical clustering analysis (HCA).HCA was applied using the Heat-mapper plug-in, which used the intergroup join method, and the distance formula for sample similarity was the square Euclidean distance.PCA used the Principal Component Analysis plug-in for unsupervised pattern recognition.IBM SPSS Stastics26.0 software was used to standardize the data of HPLC 17 characteristic peak areas and 6 key components contents of 24 samples, and the Euclidean distance between different Curcuma species was calculated using the standardized data.The Spearman correlation analysis results of the content of 6 key components in the five Curcuma species were obtained using IBM SPSS Stastics26.0 software, and the correlation analysis diagram was generated by the website https://www.chiplot.online/.One-way ANOVA was also performed using IBM SPSS Stastics26.0 software on the sample volatile oil production rate and yield (Production rate = Volatile oil weight (mL)/Powder weight (g), Yield = Production rate * Rhizome dry weight per single plant (g)).
We combined the ITS2 and trnK intron sequences and constructed a p-distance matrix, designated Matrix A. Matrix B and Matrix C, correspondingly, were made using the Euclidean distance matrix by the data of standardizing the 17 characteristic peak areas and 6 chemical component contents of the HPLC between samples (Table S4, Table S5 and Table S6).A 999 permutation Pearson correlation coefficient calculation was performed on the p-distance matrix and the Euclidean distance matrix using the vegan Mantel function in R to assess how well the chemical composition matched the genetic background.

Agronomic characteristics of Curcuma species
In the germplasm resource garden, the samples were morphologically identified based on the characteristics of the Curcuma species above ground (plant height, color of the leaf sheath, presence or absence of epidermal hair on the leaf's front and back surfaces, and characteristics of the leaf midrib) and underground (Rhizome dry weight per single plant, weight ratio of primary rhizome to secondary rhizome, and color of the rhizome inner section).The biomass of C. phaecocaulis and C. kwangsiensis was higher than the other species.The secondary rhizomes of C. longa were more developed, while the primary rhizomes were smaller and less developed.C. wenyujin and C. aromaticawere similar in shape, with the primary distinctions being that the rhizome profile of the latter had a darker yellow hue and the back of the leaf was smooth and hairless as shown in Table 1 and Figure 1.The morphological identification of five Curcuma species in this study were consistent with previous reports (Kita et al., 2016).
In the production and application of Curcuma medicinal materials, the yield and production rate of volatile oil were important productivity indicators, and color was the most rapid and direct important index for quality evaluation.The Chinese Pharmacopoeia also includes the extract of Ezhu You (ZEDOARY TURMERIC OIL), which is the volatile oil derived from the rhizomes of C. wenyujin.The volatile oil production rate of the dried rhizomes of the five Curcuma species was, from high to low: C. longa > C. wenyujin > C. aromatica > C. kwangsiensis > C. phaecocaulis.Among the samples of the original plants of Ezhu, the volatile oil yield of W1 was the highest, reached to 105.75 mL per single plant.Among all the samples, J6 showed the highest yield of volatile oil, reached to 149.42 mL per single plant (Figure 2, Figure S1).Meanwhile, the volatile oil derived from different species showing distinguishable colors, which were translucent light 3.2 DNA barcode result analysis

ITS2 gene sequences
PCR amplification of ITS2 fragments of all samples using primers ITS2ZF/ITS8ZR was performed and successfully sequenced, resulting in 24 high-quality ITS2 fragment sequences.The ITS2 sequences were between 225 and 239 base pairs compared with the standard sequences (Figure 3); the shortest G1 -G3 sequences were 225 base pairs and the longest J1 sequence was 239 base pairs.The differences in ITS2 region loci in the samples were shown in 0 -32 differential loci (including 7 singleton loci and 9 continuous loci).When C. wenyujin samples W1 -W4 were set as reference sequences, the number of differential sites in C. phaeocaulis, C. aromatica, C. longa, and C. kwangsiensis was 0 -6, 1 -7, 7 -11 and 15, respectively.There was no difference in ITS2 region between samples P2 -P5 and W1 -W4.Fewer differences existed between Y1 -Y6 and W1 -W4, and G1 -G3 differed from W1 -W4 significantly.
We constructed a Neighbour-joining tree based on the ITS2 gene fragment of 24 sample sequences and 11 standard sequences to examine the genetic distance between samples (Figure 3).The findings indicated that Alpinia officinarum Hance as an external standard, its genetic distance from samples was considerably greater.G1 -G3, J1 -J6, and Y1 -Y6 were each clustered into one unit and were identified as C. kwangsiensis, C. longa and C. aromatica.W1 -W4 and P1 -P5 were grouped together and identified as C. wenyujin and C. phaeocaulis.The intraspecific genetic distance ranged from 0.000 to 0.0230, the interspecific genetic distance ranged from 0.0085 to 0.0767, and the average distance was 0.03257.It was clarified that while the ITS2 contained enough genetic variation to discriminate Curcuma species, but it was insufficient to distinguish between C. wenyujin and C. phaeocaulis.

TrnK intron gene sequences
According to the explanation of the Curcuma species trnK intron gene structure in the literature (Cao et al., 2002;Komatsu et al., 2008;Kita et al., 2016), the sample sequences were divided into 5 categories (K (gl) Wtk, Atk, K (pl) Ztk, Ptk, Ltk), and there were 10 -14 continuous unequal thymine at the 501 loci (Figure 4).For the first time, we have found the Ltk (12T) structure in C. longa (J2), and the K (pl) Ztk (14T) structure in C. phaeocaulis (P1).C. aromatica had one or two base substitutions at 146, 645, 2493 and 2584, an 8 bp fragment deletion at 712 and a 14 bp fragment insertion at 747.A 4 bp gene fragment insertion at 728 in samples P2-P5.The sequence alignment results showed that the trnK intron gene structure regions of the five Curcuma species were highly The volatile oil characteristics of five Curcuma Species (n = 3).Symbol (*) denotes P < 0.05; Symbol (***) denotes P < 0.001.
conserved and only single base substitution and small fragment deletion or insertion existed among different samples.
The Neighbour-joining tree was constructed with the sample sequences and the standard sequences (Figure 4).The average genetic distance was 0.0078, the intraspecific genetic distance ranged from 0.0000 to 0.0028 and the interspecific genetic distance ranged from 0.0003 to 0.0194.C. aromatica and C. wenyujin were divided into separate branches, whereas C. phaeocaulis, C. kwangsiensis, and C. longa were placed together.

HPLC fingerprints
The five Curcuma species could be distinguished using the HPLC fingerprints, however, each sample could not be distinguished with absolute precision (Figure 5).We attempted to identify five Curcuma species using HPLC and examined the chemical variations among them.The results of HCA and PCA revealed that the chemical composition of five Curcuma species rhizomes were different (Figure 6).The HPLC fingerprints of C. wenyujin, C. aromatica, C. phaeocaulis, C. kwangsiensis and C. longa showed 10, 6, 3, 3 and 8 distinctive peaks with larger peak areas, respectively.The HPLC fingerprints among the samples were quite different, and the similarity ranged from 0.211 to 0.999.The differences among samples of the same species were relatively small, and the similarity was greater than 0.599 (Table S1).In order to understand species with similar chemical composition, we used 17 characteristic peak areas of the HPLC fingerprints to analyze the Euclidean distance of five Curcuma species (Table S2)    HPLC fingerprints of five Curcuma species.Peak 6 is Curdione; peak 8 is furanodienon; peak 9 is curcumenol; peak 10 is germacrone; peak 14 is curzerene; peak 16 is beta-elemene.

Chemical composition analysis
We quantitatively examined six chemicals in the samples: curdione, curcumenol, germacrone, curzerene, furanodienon and beta-elemene, in order to further investigate the material basis that resulted in the distinct chemical composition across Curcuma species (Table S3, Figure 7).The content of curzerene (0.7883 -1.6192 mg/g) and curdione (8.4492 -10.2745 mg/g) were higher in C. wenyujin.The content of curdione (6.8979 -12.1390 mg/g) and curcumenol (1.0091 -1.4891 mg/g) were higher in C. aromatica.In C. phaeocaulis, there was a lot of furanodienon (0.8945 -1.9348 mg/ g), and beta-elemene was abundant in C. longa (2.4254 -5.1776 mg/ g).HCA and PCA results revealed that although different Curcuma species contained high levels of the six chemicals, it was challenging to distinguish among them on these results (Figure 6).

Metabolic pathways of terpenoids
The Spearman correlation analysis was performed on six components.Components with a significant positive correlation were curdione and curcumenol, curzerene and beta-elemene.Furanodienon had a significant negative correlation with betaelemene.The content of germacrone and beta-elemene was positively correlated, and the similarity coefficient of was 0.314 (Figure 8).Under certain conditions, curdione can be converted into curcumenol (Shiobara et al., 1985), germacrone is an upstream compound of beta-elemene synthesis (Barrero et al., 2011), which explained the correlation between curdione and curcumenol, germacrone and beta-elemene (Figure 8).But there was little evidence to support a relationship between the biogenic production processes of other components.Our findings suggested that among the Curcuma species, curzerene and betaelemene might be upstream and downstream products in the synthetic pathway.Furanodienon might compete the same upstream product in the synthetic pathway with beta-elemene.

Relationship between genetic distance and chemical variation
The Mantel test is a method for comparing sample distances in two sets of distance measurement matrices.In the current study, we employed the Mantel test to investigate the relationship between 24 samples' chemical composition and the degree of genetic variation (Duan et al., 2017).The Mantel test function based on Pearson's coefficient in R was used to test the correspondence between the Pdistance matrix and the Euclidean distance matrix.The results showed that matrix A and matrix B were weak positive correlation (r = 0.2994, p = 0.004), and matrix A and matrix C were also weak positive correlation (r = 0.2789, p = 0.005), indicating that there was a weak correspondence between the genetic and chemical variability (Duan et al., 2017).

FIGURE 7
Comparison chart of 6 key components content of five Curcuma species.Different lowercase letters represent significant differences.(Park et al., 2017;Feng et al., 2020).In this study, we found that ITS2 and trnK intron gene fragment could distinguish the five Curcuma species very well.Similar to previous studies, C. longa showed a relatively farther genetic distance from other Curcuma species, including C. wenyujin, C. aromatica, C. kwangsiensis, and C. phaecocaulis showed closer genetic distance to each other (Kita et al., 2016;Deng et al., 2018).However, the correlation between genetic distance based on finite gene sequence and chemical variability showed a relatively low level.As finite sequence of several genes cannot reflect the genetic distance, and may not precisely predict the relationship between genetic distance and chemical variation.Then, more molecular markers whole genome information can be applied to analyze the genetic distance between Curcuma species to furtherly explore the correlation between genetic and chemical variability.The potential pharmacodynamics of new species can be predicted by analyzing the genetic distance between species of the same genus and known medicinal pants.

Differential effect based on chemical composition variation of different Curcuma medicinal materials
Numerous bioactive ingredients in herbs are the material basis of pharmacological effects.However, it is a very difficult task to evaluate the efficacy of medicinal herbs, due to the complex chemical composition and the interaction among components.For instance, furanodienon, beta-elemene, curdione, and germacrone can all act on many cancer cell targets and have positive anti-cancer effects (Zeng JH. et al., 2012;Zhong et al., 2014;Huang et al., 2017), and breast cancer cell proliferation is significantly reduced when curdione, germacrone, and furanodienon are combined (Kong et al., 2013).Modern pharmacological studies have found that Curcuma rhizomes extract has effective anti-cancer properties.However, different Curcuma species have different targets in different cancer models.For example, C. wenyujin can increase apoptosis in hepatoma cells by inducing increased expression of apoptotic genes Bid and Bax, and decreased expression of anti-apoptotic gene Bcl2 (Liu et al., 2019).Similarly, C. kwangsiensis can induce apoptosis of nasopharyngeal cancer cells by reducing the expression of Bcl2 and promoting the expression of p53 (Zeng J. et al., 2012).Furthermore, C. phaeocaulis can inhibit the growth of liver cancer cells by inhibiting STAT3 activity (Dong et al., 2018), and C. longa acts as a PARP inhibitor to induce apoptosis in cervical cancer cells (Li et al., 2014) and so on.According to the results of our current studies, the chemical composition of C. wenyujin and C. aromatica, C. phaeocaulis and C. kwangsiensis was the most similar, and the Euclidean distance was 3.373 and 5.209, respectively.The chemical composition between C. longa and others were the largest, with the Euclidean distance were above 6.239.Therefore, we hypothesized that the components was the main factor in the selection of the original plants of the medicinal material, and thus they shared similar pharmacological effects.

Conclusion
The original plants of medicinal material recorded in the Chinese pharmacopoeia has been clearly specified, which is the primary proof of safety and effectiveness in application.Only a limited number of medicinal plants have been recorded in the Chinese pharmacopoeia, but there are many medicinal plants of the same genus that have not been fully studied.To determine whether a plant has the potential similar efficacy as a known herb is the first problem to be solved for discovering new medicinal plant resources.According to this research, the genetic distance data could provide some reference clues to find new medicinal plant resources.While the basic basis of different original plants for Jianghuang and Ezhu, both derived from Curcuma species, is mainly according to the differences of chemical composition.As the Euclidean distance of the chemical composition of C. aromatica and C. wenyujin is 3.373, which is much lower than that between C. wenyujin and C. phaeocaulis reached to 5.332.Therefore, C. aromatica has the potential to be the original plant for Ezhu.

FIGURE 4
FIGURE 4TrnK intron gene sequences and genetic relationship among five Curcuma species.Hyphens (-) denote alignment gaps.

FIGURE 5
FIGURE 5 FIGURE 6 HCA and PCA analysis based on chemical composition.(A) HCA analysis based on the data of 17 characteristic peak areas; (B) PCA score based on the data of 17 characteristic peak areas; (C) HCA analysis based on the data of 6 chemicals; (D) PCA score based on the data of 6 chemicals.

FIGURE 8
FIGURE 8Diagrams of the relationship among the six components of the five Curcuma species.Symbol (*) denotes P < 0.05; Symbol (**) denotes P<0.01.

TABLE 1
The agronomic traits of experimental samples (n = 3).
Our survey showed a number of Curcuma species were utilized in the manufacture of medicines in China.The four different species of Curcuma listed in the Chinese Pharmacopoeia are C. wenyujin, C. longa, C. kwangsiensis, and C. phaeocaulis.C. wenyujin is a variety of C. aromatica Salisb found in Flora of China, and C. aromatica Salisb is also used in large quantities in production, so we included it as one of the research objects.Several medicinal materials recorded in Chinese Pharmacopoeia originated from different species of plants.Such as Huangjing (POLYGONATI RHIZOMA) originated from three species including Polygonatum kingianum Coll.et Hemsl., Polygonatum sibiricum Red and Polygonatum cyrtonema Hua.Studies have revealed that the genetic gap between Huangjing original plants is much closer than that of other Polygonatum species 4.1 Analysis of genetic distance betweenCurcuma species