Identification of Chemotypic Markers in Three Chemotype Categories of Cannabis Using Secondary Metabolites Profiled in Inflorescences, Leaves, Stem Bark, and Roots

Previous chemotaxonomic studies of cannabis only focused on tetrahydrocannabinol (THC) dominant strains while excluded the cannabidiol (CBD) dominant strains and intermediate strains (THC ≈ CBD). This study investigated the utility of the full spectrum of secondary metabolites in different plant parts in three cannabis chemotypes (THC dominant, intermediate, and CBD dominant) for chemotaxonomic discrimination. Hierarchical clustering, principal component analysis (PCA), and canonical correlation analysis assigned 21 cannabis varieties into three chemotypes using the content and ratio of cannabinoids, terpenoids, flavonoids, sterols, and triterpenoids across inflorescences, leaves, stem bark, and roots. The same clustering results were obtained using secondary metabolites, omitting THC and CBD. Significant chemical differences were identified in these three chemotypes. Cannabinoids, terpenoids, flavonoids had differentiation power while sterols and triterpenoids had none. CBD dominant strains had higher amounts of total CBD, cannabidivarin (CBDV), cannabichromene (CBC), α-pinene, β-myrcene, (−)-guaiol, β-eudesmol, α-eudesmol, α-bisabolol, orientin, vitexin, and isovitexin, while THC dominant strains had higher total THC, total tetrahydrocannabivarin (THCV), total cannabigerol (CBG), camphene, limonene, ocimene, sabinene hydrate, terpinolene, linalool, fenchol, α-terpineol, β-caryophyllene, trans-β-farnesene, α-humulene, trans-nerolidol, quercetin, and kaempferol. Compound levels in intermediate strains were generally equal to or in between those in CBD dominant and THC dominant strains. Overall, with higher amounts of β-myrcene, (−)-guaiol, β-eudesmol, α-eudesmol, and α-bisabolol, intermediate strains more resemble CBD dominant strains than THC dominant strains. The results of this study provide a comprehensive profile of bioactive compounds in three chemotypes for medical purposes. The simultaneous presence of a predominant number of identified chemotype markers (with or without THC and CBD) could be used as chemical fingerprints for quality standardization or strain identification for research, clinical studies, and cannabis product manufacturing.


INTRODUCTION
Cannabis is a complex herbal medicine containing several classes of secondary metabolites, including cannabinoids, terpenoids, flavonoids, and steroids among 545 identified compounds (Turner et al., 1980;ElSohly and Slade, 2005;Ross et al., 2005;ElSohly and Gul, 2014;Russo and Marcu, 2017;Pollastro et al., 2018;Jin et al., 2020). For medical applications, researchers widely adopt a chemotaxonomic perspective that describes three chemotypes (chemical phenotypes) based on the content of two major cannabinoids: psychoactive tetrahydrocannabinol (THC) and non-psychoactive cannabidiol (CBD) (Small and Beckstead, 1973;Turner et al., 1979;Mandolino et al., 2003;de Meijer et al., 2009). THC dominant strains have a ratio of THC/CBD > 1, intermediate strains have THC/CBD ≈ 1, and CBD dominant strains have THC/CBD < 1. Although most clinical studies focus on THC and CBD, increasing amounts of evidence show that whole plant extract has additional benefits when compared to single cannabinoids. In one study, whole cannabis extract was more effective in inducing cancer cell death than applying pure THC on cancer cell lines (Baram et al., 2019). In addition, individual cannabis extracts with similar amounts of THC produced significantly different effects on the survival of specific cancer cells, and specific cannabis extracts may selectively and differentially affect different cancer cells lines (Baram et al., 2019). In another study, extracts from five strains with similar CBD concentrations had different anticonvulsant properties in mice (Berman et al., 2018). These studies suggest that there may exist therapeutic-enhancing interactions or synergistic effects amongst cannabinoids as well as between cannabinoids and other secondary metabolites, known as the "entourage effect" (McPartland and Russo, 2001;Russo, 2011;Blasco-Benito et al., 2018). It is therefore essential to have a comprehensive, full spectrum metabolic fingerprinting of secondary metabolites in cannabis materials for research and clinical studies. Previous research also focused on female inflorescences, however, each part of the plant has a wide range of indications, primarily related with pain and inflammation, as ancient herbal medicines in various cultures (Smith and Stuart, 1911;Brand and Wiseman, 2008;Brand and Zhao, 2017;Ryz et al., 2017). Our previous study profiled cannabinoids, terpenoids, flavonoids, sterols, and triterpenoids, not only in cannabis inflorescences, but also in leaves, stem bark, and roots (Jin et al., 2020). By profiling these compounds in each cannabis plant part and associating them with therapeutic benefits, cannabis plant material that is currently treated as waste has potential to be developed into natural health products or medications.
Cannabis classification is a fundamental requirement for future medical research and applications, and it is best enabled through an overview of the class and content of potentially therapeutic secondary metabolites in each plant part. Currently, researchers attempted to discriminate and identify the chemical differences between the categories of "Sativa" (narrow-leaflet drug, NLD) and "Indica" (wide-leaflet drug, WLD) (Fischedick et al., 2010;Hazekamp and Fischedick, 2012;Hazekamp et al., 2016). Results of the chemotaxonomic separation of "Sativa" and "Indica" were mixed, and THC and CBD concentrations appeared to have no differentiation value. However, certain terpenoids were more prominent in some strains than others (Hillig, 2005b;Fischedick et al., 2010;Hazekamp and Fischedick, 2012;Fischedick, 2015Fischedick, , 2017Hazekamp et al., 2016;Jin et al., 2017;McPartland and Guy, 2017). The mixed results in the current body of literature may be due to experimental design shortcomings. Firstly, the vernacular terminology ("Sativa" and "Indica") is inadequate for medical applications due to the misuse of the botanical nomenclature, extensive cross-breeding, and unreliable labeling during unrecorded hybridization (McPartland, 2017). Secondly, samples in most classification studies were collected from disparate sources (Fischedick et al., 2010;Hazekamp et al., 2016) and are subject to inconsistent environmental factors during the growth phases (Aizpurua-Olaizola, 2016) and post-harvest treatment (Jin et al., 2019). Additionally, inappropriate sample preparation and extraction procedures during laboratory analysis may affect classification results (Jin et al., 2020). All these factors contribute to the variation in chemical profiles of the final products, which in turn leads to inconsistent results and poor classification accuracy. More accurate classification results are obtainable when plants are grown in a single location, under identical environmental conditions, and uniformly processed (McPartland, 2017).
The chemical profile of CBD dominant and intermediate strains, which have gained increasing attention due to CBD's use as a therapeutic (Avraham et al., 2011;French et al., 2017;McGuire et al., 2018;Bloomfield et al., 2020), have not been studied or compared to THC dominant strains in the current literature. In this study, we used unsupervised hierarchical clustering and principal component analysis (PCA) as well as supervised canonical correlation analysis to test the goodness of fit between chemotype labeling (THC dominant, intermediate, and CBD dominant) and chemotypic variation of the full spectrum of secondary metabolites in various plant parts of 21 strains. This study also identifies chemotypic markers within each chemotype, which will facilitate strain selection for further clinical and research studies.
The objectives of this study are to: 1. investigate whether modern cannabis strains can be differentiated using a full spectrum of secondary metabolites in three chemotypes, including 14 cannabinoids, 45 terpenoids, 7 flavonoids, 3 sterols, and 3 triterpenoids, in inflorescences, leaves, stem bark, and roots; 2. investigate whether the secondary metabolites described above can differentiate strains into three chemotypes without leveraging THC and CBD data; and 3. identify chemotypic markers that can be leveraged to select and distinguish chemotypes.

Plant Material
In this project, 21 commercially available cannabis strains were grown in a commercial greenhouse (Figure 1) under a cannabis research license issued by Health Canada. Where possible, the reported ancestry ("Sativa-dominant, " "Indica-dominant, " or "hybrid") was obtained from the Leafly online database 1 or from the licensed cultivator providing the strain (Supplementary Table 1). Three to five cuttings per strain were rooted for 2 weeks, followed by vegetative growth under 24 h photoperiod for 2 months, and then flowered under 12 h photoperiod. After 2 months of flowering, the plants were harvested and hung to dry in a closed environment. Cannabis roots were removed and dried in the same room together with the other plant parts. Horticultural fans were used to maintain air circulation, and the temperature was kept under 35 • C. The plants were dried for 7 days until the leaves and stems became brittle. At this time, the plants' moisture content is usually below 10-15% (mg/mg%) (Potter, 2009;Caplan, 2018

Sample Preparation, Extraction, and Assay
A total of 82 plants representing 21 strains were harvested. Inflorescences, leaves (fan leaves), stem bark, and roots were separately collected for each plant and analyzed for the full spectrum of secondary metabolites. Sugar leaves (small leaves extending from the inflorescences) were treated as a part of the inflorescences. Samples were prepared and analyzed according to previously developed and validated methodologies (Jin et al., 2020). Five to eight flower heads (2-4 g) of each plant were pulverized with a SPEX Geno/Grinder homogenizer (SPEX SamplePrep, Canada). Dried leaf material was crushed using a mortar and pestle and sifted through a 1.18 mm sieve. Dried stem bark and root samples were ground with the SPEX Geno/Grinder homogenizer. For cannabinoids and terpenoids extraction, 400 mg of plant material was extracted with 20 mL methanol (with 100 µg/mL tridecane as an internal standard for mono-and sesquiterpenoids) by sonication for 20 min at room temperature. For cannabinoids, the extract was spiked with 9 -THC-d 3 (0.5 µg/mL) as an internal standard prior to LC-MS analysis. One aliquot of the extract was used to quantify monoand sesquiterpenoids using GC-MS. For flavonoids extraction, 250 mg of the sample was extracted with 5 mL of ethanol, water, and hydrochloric acid at a 25:10:4 volume ratio. The extract was hydrolyzed in a 100 • C water bath for 135 min. The tube was then repeatedly rinsed with methanol, and the rinses were combined with the extract in a 50 mL volumetric flask, which was filled to volume with methanol. For the flavonoids assay, HPLC was used with an UV detector at 350 nm for the quantification of seven flavonoids and MS detector for compound identification. For triterpenoids and sterols extraction, 1 g of dried sample was extracted with 20 mL ethyl acetate by sonication for 1 h, followed by maceration for one day at room temperature. The extract was spiked with cholesterol (50 µg/mL) as an internal standard prior to GC-MS analysis.

Statistical Analysis
In total, 82 plants representing 21 strains were included in the following analysis. Cannabinoids were calculated as the sum of their neutral forms, metabolites (if applicable), and cannabinoid acids (multiplied by a factor converting acids into their corresponding neutral forms). For example, total THC = 9 -THC + 8 -THC + CBN (cannabinol, degradation product of THC) + 0.877 × tetrahydrocannabinolic acid (THCA), total CBD = CBD + 0.877 × cannabidiolic acid (CBDA), total cannabigerol (CBG) = CBG + 0.878 × cannabigerolic acid (CBGA), total cannabichromene (CBC) = CBC + 0.877 × cannabichromenic acid (CBCA), total tetrahydrocannabivarin (THCV) = THCV + 0.867 × tetrahydronabivarinic acid (THCVA), and total cannabidivarin (CBDV) = CBDV + 0.867 × cannabidivarinic acid (CBDVA); (Upton et al., 2014;Jin et al., 2020). Total cannabinoids was calculated as the sum of 14 cannabinoids. Total monoterpenoids (terpenoids with two isoprene units in the chemical structure) was the sum of the 29 monoterpenoids in Supplementary Table 2.5, and total sesquiterpenoids (terpenoids with three isoprene units) were calculated as the sum of the 16 sesquiterpenoids. Total terpenoids was the sum of total monoand sesquiterpenoids. Total flavonoids was the sum of seven flavonoids after acid hydrolysis, including orientin, vitexin, isovitexin, quercetin, luteolin, kaempferol, and apigenin. Total sterols was the sum of campesterol, stigmasterol, and β-sitosterol. Total triterpenoids was the sum of β-amyrin, epifriedelanol, and friedelin. Compound ratios were calculated by dividing the content of one compound by the total content of that metabolite group. For example, the ratio of β-pinene was calculated as its absolute value divided by total terpenoids. Secondary metabolites were quantified in each plant part. The following analyses were carried out only on the metabolites in the plant part where they were of highest levels among all plant parts. This distinction is made for isolating metabolites where they are present in sufficiently high concentrations (above 0.05%) to be of pharmacological interest (Russo, 2011). First, correlations were calculated between individual cannabinoids, terpenes, flavonoids, sterols, and triterpenoids. Because absolute values vary with environmental factors and relative proportions are more stable (Hillig, 2005a), compound ratios were used. Then, unsupervised (no preassigned categories as constraints) hierarchical clustering using Ward's minimum variance method (Ward, 1963) and PCA (Jolliffe, 2002) were used to check within-strain and between-cluster variation. Finally, the data were subjected to supervised (with preassigned categories as constraints) canonical correlation analysis with preassigned chemotypes in Table 1. The full spectrum of secondary metabolites, without THC and CBD, were subjected to hierarchical clustering, PCA, and canonical correlation analysis to investigate whether the absence of THC and CBD data would affect differentiating strains into chemotypes.
Canonical correlation analysis is also called canonical variates analysis, and is a multiple discriminant analysis that calculates the correlation between preassigned clusters and the set of covariates (chemical compounds in this study) describing the observations (Hotelling, 1936). The first canonical variable is the linear combination of the covariates that maximizes the multiple correlation between the clusters and the covariates. The second canonical variable is a linear combination uncorrelated with the first canonical variable that maximizes the multiple correlation. The analysis outputs a biplot with the first two canonical variables that provide maximum separation among the clusters. To identify marker metabolites that contribute most to the groupings, one-way ANOVA followed by Tukey honestly significant difference (HSD) post hoc test at the 0.05 significance level were used to determine whether significant differences exist between all clusters and each pair of clusters. Statistical analysis was performed with JMP 14.0.0.

Secondary Metabolites Profiled in Cannabis Inflorescences, Leaves, Stem Bark, and Roots
Secondary metabolites profiled in inflorescences, leaves, stem bark, and roots are provided in Supplementary Table 9. Average total cannabinoids content from 82 plants of 21 strains decreased in order of inflorescences, leaves, stem bark, and roots, as shown in Supplementary Figure 1. Total cannabinoids were between 7.06 and 24.42% with an average of 15.90 ± 4.02% (SD) in inflorescences, between 0.95 and 4.28% with an average of 2.17 ± 0.71% in leaves, between 0.06 and 2.33% with an average of 0.58 ± 0.28% in stem bark, and less than 0.03% in roots (Supplementary Table 2 Average total terpenoids as the sum of mono-and sesquiterpenoids in the same population decreased in order of inflorescences, leaves, stem bark, and roots (Supplementary Figure 1). Total terpenoids in inflorescences was between 0.753 and 3.305% with an average of 1.509 ± 0.467%, in leaves between 0.035 and 0.197% with an average of 0.103 ± 0.032%, and in stem bark and roots less than 0.03% (Supplementary Table 2

.1).
Average total terpenoids content in inflorescences and leaves for the three chemotypes are summarized in Supplementary Tables 2.5, 2.6.
Average total flavonoids as the sum of orientin, vitexin, isovitexin, quercetin, luteolin, kaempferol, and apigenin was highest in leaves, lower in inflorescences, and less than 0.03% in stem bark and roots (Supplementary Figure 1). Total flavonoids in inflorescences were between 0.028 and 0.284% with an average of 0.091 ± 0.050%, and in leaves between 0.051 and 0.470% with an average of 0.188 ± 0.098% (Supplementary Table 2.1). Flavonoids exist in cannabis plants as both aglycones and conjugated glycosides and were estimated to be less than 1% in leaves (McPartland and Russo, 2001) The results of this study was congruent with this estimate, since the flavonoids were not converted to conjugated glycosides. All seven flavonoids were quantifiable in inflorescences in three chemotypes (Supplementary Table 2.7), while quercetin and kaempferol were below the quantification limit in leaves (Supplementary Table 2.8). All flavonoids identified in inflorescences and leaves were less than those reported in other studies (Flores-Sanchez and Verpoorte, 2008), possibly due to differences in strains and plant growth stage, since flavonoids content fluctuate with plant age (Vanhoenacker et al., 2002).
Total sterols content as the sum of three phytosterols, campesterol, stigmasterol, and β-sitosterol was highest in roots, lower in stem bark, and was less than 0.03% in inflorescences and leaves (Supplementary Figure 1). Total sterols content in roots was between 0.037 and 0.085% with an average of 0.066 ± 0.009%, and in stem bark was between 0.037 and 0.082% with an average of 0.055 ± 0.013% (Supplementary Table 2.1). Average total sterols content in stem bark and roots of the three chemotypes are summarized in Supplementary Tables 2.9, 2.10.
Total triterpenoids as the sum of β-amyrin, epifriedanol, and friedelin was highest in roots, lower in stem bark, and was less than 0.03% in inflorescences and leaves (Supplementary Figure 1). Total triterpenoids in stem bark was between 0.008 and 0.136% with an average of 0.039 ± 0.023%, in roots was between 0.080 and 0.275% with an average of 0.182 ± 0.043% (Supplementary Table 2.1). Average total triterpenoids content in stem bark and roots in the three chemotypes are summarized in Supplementary Tables 2.11, 2.12.
The distribution of secondary metabolites in each plant part agreed with conclusions from our last study (Jin et al., 2020). Correlation and classification analyses were performed only for metabolites in the plant part where they were present in the highest concentrations representative for that strain. For example, the average terpenoid content in leaves were low (0.103 ± 0.032%) compared to the levels in inflorescences (1.509 ± 0.467%), and only 15 mono-and sesquiterpenoids that were detected in inflorescences were above the quantification limit in leaves (Supplementary Table 2.6). In addition, the correlations between cannabinoids and terpenoids in leaves were like those in inflorescences, especially for the terpenoids that are abundant in both these two plant parts, including αpinene, β-pinene, limonene, linalool, β-caryophyllene, trans-βfarnesene, α-humulene, trans-nerolidol, (−) guaiol, β-eudesmol, α-eudesmol, and α-bisabolol (Supplementary Figure 2 and Supplementary Table 8). As such, using the terpene profile in inflorescences was adequate for clustering purposes. Flavonoids in inflorescences and leaves were included in the analysis because quercetin and kaempferol were quantifiable in inflorescences but not in leaves. For sterols, the content and ratios of three sterols are similar between stem bark and roots. Because total sterols in roots (0.064-0.068%) are slightly higher than them in stem barks (0.052-0.059%), the sterol profiles in roots were used in the data analysis. Triterpenoid profile in roots were used because the content of total triterpenoids was above the threshold for pharmacological interest in all plant parts except in roots. To summarize, the most abundant secondary metabolites in individual plant parts were used in the statistical analysis for identifying differences between the three chemotypes. These metabolites were cannabinoids, terpenes, and flavonoids in inflorescences; flavonoids in leaves; and sterols and triterpenoids in roots (Supplementary Table 7).

Unsupervised Hierarchical Clustering
The same set of data was used to build a dendrogram of the 82 plants using hierarchical clustering, where almost all plants of the same strains were clustered together, except for one 5-CBD plant that was mixed with 4-CBD plants and plants of 15-THC that were mixed with 23-THC plants (Figure 3). The dendrogram shows two major branches: CBD dominant strains and intermediate strains together as one major branch, and THC dominant strains as the other. The dendrogram using absolute values of the secondary metabolites is shown in Supplementary  Figure 4. These results both confirmed the minimum withinstrain variation (between plants within each strain) and betweencluster variation (between strains within each chemotypes). The full spectrum of secondary metabolites without total THC and total CBD resulted in a dendrogram with the same grouping results (Supplementary Figure 5).   compounds that contributed most to the separations along PC1 and PC2 with the absolute value of loadings equal to or greater than 0.45. PC1 was positively correlated with three cannabinoids (total CBD, total CBDV, and total CBC), one monoterpenoid (1,8-cineole (eucalyptol)), four sesquiterpenoids (β-eudesmol, (−)-guaiol, α-eudesmol, α-bisabolol), three flavonoids (orientin, vitexin, and isovitexin), three sterols (campesterol, stigmasterol, and β-sitosterol), and one triterpenoid (epifriedanol), which were compounds identified as positively correlated with total CBD. PC1 was negatively correlated with one cannabinoid (total THC), four monoterpenoids (limonene, camphene, fenchol, and linalool), four sesquiterpenoids (α-humulene, β-caryophyllene, trans-nerolidol, and trans-β-farnesene), four flavonoids (quercetin, kaempferol, and apigenin), and one triterpenoid (friedelin), which were compounds identified as positively correlated with total THC. THC dominant strains were scattered in both lower left quadrant and upper right quadrant along PC2. Compounds positively correlated with PC2 and negatively correlated with PC1 (PC1 < 0 and PC2 > 0), including total THC, total CBG, total THCV, α-terpineol, camphene, fenchol, linalool, ocimene, borneol, α-humulene, βcaryophyllene, trans-nerolidol, quercetin, and kaempferol, were more abundant in THC dominant strains than those in CBD dominant and intermediate strains. β-Myrcene was negatively correlated with PC2 and positively correlated with PC1, which means it was more abundant in CBD dominant and intermediate strains. Two flavonoids, luteolin and apigenin, were negatively correlated with PC1 and PC2, and were more abundant in THC dominant strains in the left lower quadrant than other THC dominant strains. Although some compounds were more correlated with CBD, they may be more abundant in some THC dominant strains. For example, compounds positively correlated with PC2 and positively correlated with PC1, including orientin (L), vitexin (L), and isovitexin (L), were more abundant in THC dominant strains in the upper right quadrant than strain in C1 and C2, even though these flavonoids were positively correlated with CBD. This may be the result of extensive strain crossing and hybridization. PCA using absolute values of the secondary metabolites are also shown in Supplementary Figure 6. The full spectrum of secondary metabolites without total THC and total CBD resulted in a similar PCA scatter plot where PC1 and PC2 explained 32.6 and 16.1% of the total variance, respectively (Supplementary Figure 7).

Supervised Canonical Correlation Analysis
The canonical correlation analysis of 82 plants showed good separation between the three chemotypes ( Figure 5). Each plant was predicted to be in its originally preassigned cluster with 100% accuracy (Supplementary Table 4). Canonical correlation analysis using the absolute values of 45 compounds were also investigated (Supplementary Figure 8), with 100% accuracy in sorting each plant into its originally preassigned chemotypes. The full spectrum of secondary metabolites, absent total THC and total CBD, also predicted each plant to be in its originally preassigned cluster with 100% accuracy (Supplementary Figure 9). However, the distance between three clusters were smaller along two canonical axes due to reduced differences in the chemical profiles of three chemotypes after removing the THC and CBD data.

Identification of Chemotypic Markers for Three Chemotypes
Means (±SD), Tukey HSD multiple tests at the 0.05 significance level, and p value of one-way ANOVA of 45 quantifiable compounds (using ratios) for each of the three chemotypes are listed in Supplementary Table 5 and plotted in Figure 6. The largest number of significant differences (Tukey HSD multiple tests at the 0.05 significance level) was 37, which was between C1 and C3. The most similar pair was C1 and C2, with 14 significant differences. The number of significant differences between C2 and C3 was 23. Strains from C1 had significant higher amount of total CBD, total CBDV, total CBC, α-pinene, β-pinene, β-myrcene, (−)-guaiol, β-eudesmol, α-eudesmol, αbisabolol, orientin (F), vitexin (F), isovitexin (F), orientin (L), campesterol, stigmasterol, β-sitosterol, and epifriedanol than in strains of C3, which were all positively correlated with total CBD. Strains from C3 had significant higher amount of total THC, total THCV, total CBG, camphene, limonene, ocimene, linalool, fenchol, borneol, α-terpineol, β-caryophyllene, trans-β-farnesene, α-humulene, trans-nerolidol, quercetin (F), kaempferol (F), β-amyrin, and friedelin, which were all positively correlated with total THC. Most compounds in the C2 strains were at the same level with strains in C1 or C3 or at an intermediate level between C1 and C3. Means ± SD, Tukey's HSD multiple tests at the 0.05 significance level, and p value of one-way ANOVA of the absolute values of 45 compounds for each cluster were summarized in Supplementary Table 6. The largest number of significant differences was 38, which was between C1 and C3. The most similar pair was C1 and C2, with 10 differences. The number of significant differences between C2 and C3 was 23. Cannabinoids, terpenoids, flavonoids, sterols, and triterpenoids that were significantly higher in C1, C2, and C3 were similar to those identified using ratios.
Although numerous significant differences in compounds were found amongst CBD dominant, intermediate, and THC dominant strains, the group means of some compounds differed by less than a factor of two. In addition, some compounds may be significantly different qualitatively in ratios but not quantitatively in absolute values. For example, all three sterols (campesterol, stigmasterol, and β-sitosterol), were significantly higher in roots of CBD dominant strains than in THC dominant strains by ratios (one-way ANOVA p < 0.0001, p = 0.1279, and p < 0.0001, respectively), but they were not significantly different by absolute values (one-way ANOVA p = 0.1279, p = 0.0361, and p = 0.0169, respectively). Compounds significantly different (one-way ANOVA p < 0.05) with two or more than two-fold higher in terms of both ratios and absolute values in the identified clusters than in the clusters with the lowest values were selected as chemotypic markers. These included three cannabinoids (total CBD, total CBDV, and total CBC), six terpenoids (αpinene, β-myrcene, (−)-guaiol, β-eudesmol, α-eudesmol, and αbisabolol), and three flavonoids (orientin, vitexin, and isovitexin) for CBD dominant strains, three cannabinoids (total THC, total THCV, and total CBG), twelve terpenoids (camphene, limonene, ocimene, sabinene hydrate, terpinolene, linalool, fenchol, αterpineol, β-caryophyllene, trans-β-farnesene, α-humulene, and trans-nerolidol), and two flavonoids (quercetin and kaempferol) for THC dominant strains. Intermediate strains are more similar to CBD dominant strains than THC dominant strains with higher amounts of β-myrcene, (−)-guaiol, β-eudesmol, α-eudesmol, and α-bisabolol. There are more mono-and sesquiterpenoids that are significantly higher in the THC dominant cluster than in the CBD dominant and intermediate clusters. The simultaneous presence of a collection of compounds can be used to differentiate types of plants.

Cannabinoids as Chemotypic Markers
In this study, the average THC to CBD ratios in the three chemotypes were 247 ± 79, 0.5 ± 0.1, and 0.04 ± 0.01, respectively. These ratios showed that THC levels in THC dominant strains were greater than CBD levels in CBD dominant strains. This bias toward higher THC is due to the long history of extensive hybridization for recreational purposes (McPartland, 2017). A THC/CBD ratio of 247:1 in THC dominant strains matched with those in "Sativa" and "Indica" strains that were almost devoid of CBD (Fischedick et al., 2010;Hazekamp and Fischedick, 2012;Fischedick, 2015Fischedick, , 2017Hazekamp et al., 2016;Jin et al., 2020). Due to CBD's therapeutic potential without psychoactive effects (Booz, 2011;Couch et al., 2017;Vallée et al., 2017;Callejas et al., 2018;Mallada Frechín, 2018), breeding for high CBD concentrations began only recently by integrating hemp-type CBD acid synthase gene clusters into a background of drug-type cannabis to elevate CBDA production (Clarke and Merlin, 2016;Grassa et al., 2018). The CBD to THC ratios in intermediate trains were similar to 1.8:1 in our previously reported values (Jin et al., 2020), and also matched with the reported cannabinoid profile of intermediate strains available in the database. These intermediate strains may have been created by crossing purebred THC dominant types with CBD dominant types . Chemotaxonomic research in minor cannabinoids of the three chemotypes are sparse in the current literature. In this study, minor cannabinoids were mostly less than 1% in all three chemotypes and several minor cannabinoids were more abundant in one chemotypes relative to others.

Mono-and Sesquiterpenoids as Chemotypic Markers
In general, sesquiterpenoids are considered as more stable markers because monoterpenes are more volatile (McPartland, 2017). In this study, (−)-guaiol, β-eudesmol, α-eudesmol, and α-bisabolol were identified as chemotypic markers in CBD and intermediate strains. These compounds were also noted by Hillig as signature peaks on chromatograms for pre-hybridization Afghani WLD landraces (Hillig, 2005a) and modern "Indica" dominant strains (WLD), but were present in lower amounts in pre-hybridization NLD landraces and modern "Sativa" dominant strains (NLD) (Fischedick et al., 2010;Hazekamp et al., 2016). CBD dominant strains and pre-hybridization Afghani WLD landraces are similar in that they both have elevated CBD concentrations compared to their THC dominant counterparts. According to the correlation analysis in this study, these chemotypic markers for CBD dominant strains and intermediate strains may be related to CBD production. For modern "Indica" dominant strains (WLD), which are nearly devoid of CBD, even though these sesquiterpenoids were considered to be inherited from their WLD landrace ancestors despite selection for elevated THC/CBD ratios, these compounds were detected only in trace amounts (Fischedick et al., 2010;Hazekamp and Fischedick, 2012;Fischedick, 2015Fischedick, , 2017Hazekamp et al., 2016). In this study, terpinolene, β-caryophyllene, and trans-β-farnesene, were identified as chemotypic markers in THC dominant strains. These compounds were also noted by Hillig as signature peaks on chromatograms for pre-hybridization NLD landraces (Hillig, 2005a) and modern "sativa" dominant strains (NLD), but were present in lower amounts in pre-hybridization WLD landraces and modern "Indica" dominant strains (WLD) (Fischedick et al., 2010;Hazekamp et al., 2016). THC dominant strains and pre-hybridization NLD landraces both have elevated THC concentrations and are almost devoid of CBD. These chemotypic markers for THC dominant strains and intermediate strains may be correlated with THC production when CBD is not produced.
Studies have shown that terpenes in cannabis are derived from two pathways: the plastidial methylerythritol phosphate (MEP) pathway and the cytosolic mevalonate (MVA) pathway (Andre et al., 2016;Booth et al., 2017;Zager et al., 2019). Geranyl diphosphate (GPP) is typically derived from the MEP pathway and is the precursor for cannabinoid and monoterpenoid biosynthesis. Farnesyl diphosphate (FPP) is commonly produced from MVA pathway and is the precursor for sesquiterpenoids, triterpenoids and sterols. Although it is hypothesized that the identified chemotypic markers may be related to CBD or THC production, currently there are no biomedical studies on these correlations. Future studies are needed on the biochemical relationship between CBD or THC production and individual terpenoid production.
Of the strains with a reported Sativa/Hybrid/Indica ancestry label, CBD dominant strains contained two "Sativa" strains, intermediate strains contained one "Sativa" strain and one "Indica" strain, and THC dominant strains contained ten "Indica" strains and one "50/50 hybrid" strain. Based on the reported ancestry, the results of this study seem to contradict other studies. The terpenoids markers in CBD dominant strains (reported as "Sativa" due to narrow leaflets) were similar to those identified in "Indica" dominant strains but different from those identified in "Sativa" dominant strains in other studies (Fischedick et al., 2010;Hazekamp and Fischedick, 2012;Fischedick, 2015Fischedick, , 2017Hazekamp et al., 2016). Similarly, the terpenoids markers in THC dominant strains (reported as "Indica" due to wide leaflets) were similar to those identified in "Sativa" dominant strains but different from those identified in "Indica" dominant strains in other studies. These conflicting results reflects the unreliability of the vernacular "Sativa" and "Indica" categories, which are based on the visual determination of leaflet shape, often with no reference data for categorization (Jin et al., 2021). This may lead to mixed results in separating modern strains genetically or chemically (Elzinga et al., 2015;Sawler et al., 2015). Another explanation for the discrepancy is that instead of separating "Sativa" vs "Indica", which are often THC dominant strains, this paper focused on the differentiation between three chemotypes. Because no "Sativa" strains were reported for THC dominant strains in this study, whether (−)-guaiol, β-eudesmol, α-eudesmol, and α-bisabolol are more abundant in "Indica" dominant strains and terpinolene, β-caryophyllene, and transβ-farnesene are more abundant in "Sativa" dominant strains as described in other studies could not be verified.

Flavonoids as Chemotypic Markers
Flavonoid variation in cannabis was investigated by Clark and Bohm (1979), the only such study that used flavonoids for chemotaxonomy and for supporting a two-species hypothesis: where luteolin was more often detected in C. sativa L. but not in C. indica Lam. (Clark and Bohm, 1979). There have yet to be chemotaxonomic studies of flavonoids across the three cannabis chemotypes. We found that orientin, vitexin, and isovitexin were the signature flavonoids of CBD dominant strains, and quercetin and kaempferol were detected only in inflorescences and tended to be higher in THC dominant strains.

Sterols and Triterpenoids as Chemotypic Markers
The role of sterols and triterpenoids in the chemotaxonomy of cannabis have not yet been investigated. In this study, CBD dominant strains had significantly higher ratios of three sterols, but they differed by less than a factor of two and may not provide a firm basis for chemotaxonomic distinction. Similarly, for triterpenoids, although the ratio of epifriedanol was higher in CBD dominant strains and friedelin was higher in THC dominant strains, the differences were not sufficiently large for these compounds to be used as chemotype markers.

The Potential of Developing Holistic Cannabis-Based Products and Medications
Because cannabinoids are concentrated in cannabis inflorescences, cannabis leaves, stems, and roots are normally discarded by cannabis growers. However, in traditional Chinese medicine, cannabis leaves were used for treating conditions such as malaria, panting, roundworm, scorpion stings, hair loss, graying of hair. Cannabis stem bark was used for strangury and physical injury. Cannabis roots were used for gout, arthritis, joint pain, fever, skin burns, hard tumors, childbirth, and physical injury (Smith and Stuart, 1911;Brand and Wiseman, 2008;Ryz et al., 2017). Their traditional uses may serve as points of reference for investigating the medical potential of what is currently a byproduct or plant waste.
To link the traditional therapeutic uses for each part with the chemistry, we had identified the major groups of compounds in each plant part for correlation with benefits described in the literature. Cannabinoids, including THC, CBD, CBG, CBC, THCV, CBN, and CBDV, in both acid and neutral forms all have broad therapeutic potential, including anti-inflammatory (Bolognini et al., 2010;DeLong et al., 2010;De Petrocellis et al., 2012;Borrelli et al., 2013;Cascio and Pertwee, 2014;Brierley et al., 2016), analgesic (Davis and Hatoum, 1983;Evans, 1991;Cascio and Pertwee, 2014), anticonvulsant (Dwivedi and Harbison, 1975;Hill et al., 2010Hill et al., , 2013, antioxidant, and neuroprotective properties (Gugliandolo et al., 2018). Increasing numbers of studies have shown that minor cannabinoids significantly contribute to the variance among cannabis extract, which further alter or enhance targeted therapeutic effects comparing to pure THC or CBD alone (Berman et al., 2018;Baram et al., 2019).
Sterols and triterpenoids are mainly present in cannabis stem bark and roots. Friedelin is the most abundant and most studied triterpenoids in cannabis, and has anti-inflammatory, antioxidant, estrogenic, anti-cancer, and liver protectant properties (Ryz et al., 2017). β-sitosterol, stigmasterol, and campesterol are the most abundant phytosterols in the human diet. Phytosterols are widely recognized as lowering the levels of low-density lipoprotein cholesterol (Gylling et al., 2014;Ras et al., 2014). They are also studied for anti-inflammatory, antioxidant, and pain relieving properties (Kozłowska et al., 2016).
These groups of identified bioactive compounds may underpin the traditional applications indicated for each plant part, but most of the therapeutic properties for these individual compounds have been studied in other herbal medicine and not in cannabis. The pharmaceutical values and the potential synergies of these bioactive compounds need to be directly investigated using cannabis material. Well-designed clinical studies are necessary to convert each part of the cannabis plant into evidence-based medicine. The chemotypic markers identified in this study will facilitate strain selection in research and clinical studies when the optimal combination of the chemical compounds is determined for treating certain conditions.

CONCLUSION
The chemical variation in CBD dominant and intermediate strains has yet to be studied or compared to THC dominant strains in the literature. This comprehensive chemotaxonomic investigation profiled cannabinoids, terpenoids, flavonoids, sterols, and triterpenoids in inflorescences, leaves, stem bark, and roots in 82 plants of 21 cannabis strains. These chemical data were subjected to correlation analysis, unsupervised clustering analysis (hierarchical clustering and PCA) and supervised canonical correlations analysis. In unsupervised clustering, 82 plants were clustered in accordance with their chemotypes. Canonical correlation analysis classified 82 plants into three chemotypes with 100% accuracy using full spectrum of secondary metabolites. Numerous significant differences that could be used as chemotypic markers were found amongst CBD dominant, intermediate, and THC dominant strains. These identified compounds were largely consistent with results from correlation analysis, hierarchical clustering, PCA, and by comparing concentration and ratio averages between chemotypes. At each step of the clustering analysis, it was found that secondary metabolites without total THC and total CBD could continue to sort strains into their defined chemotypes and achieve the same clustering results. This demonstrated that the clustering results were not solely driven by THC and CBD content or ratio, and that other metabolites can be used as chemotypic markers. However, the robustness of these markers should be tested in different growing environments to truly elucidate the chemical differences in terms of chemotypes or intra-chemotype sub-clusters. The results of this study provide a proof-of-concept for further collaboration between academia and the industry for leveraging chemotypic markers in medical studies and clinical trials.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
DJ conceived the project, designed the experiments, preformed the experiments, collected and analyzed the data, and wrote the manuscript. PH contacted the licensed cultivator for this project and proofread the manuscript. JS provided funding, provided suggestions, and proofread the manuscript. JC was the supervisory author and monitored the research progress, provided suggestions, and finalized the manuscript. All authors contributed to the article and approved the submitted version. Supplementary Table 1 | Strain information and assignment of 21 strains into three chemotypes.