Mining the chemical diversity of the hemp seed (Cannabis sativa L.) metabolome: discovery of a new molecular family widely distributed across hemp

Hemp (Cannabis sativa L.) is a widely researched industrial crop with a variety of applications in the pharmaceutical, nutraceutical, food, cosmetic, textile, and materials industries. Although many of these applications are related to its chemical composition, the chemical diversity of the hemp metabolome has not been explored in detail and new metabolites with unknown properties are likely to be discovered. In the current study, we explored the chemical diversity of the hemp seed metabolome through an untargeted metabolomic study of 52 germplasm accessions to 1) identify new metabolites and 2) link the presence of biologically important molecules to specific accessions on which to focus on in future studies. Multivariate analysis of mass spectral data demonstrated large variability of the polar chemistry profile between accessions. Five main groups were annotated based on their similar metabolic fingerprints. The investigation also led to the discovery of a new compound and four structural analogues, belonging to a previously unknown chemical class in hemp seeds: cinnamic acid glycosyl sulphates. Although variability in the fatty acid profiles was not as marked as the polar components, some accessions had a higher yield of fatty acids, and variation in the ratio of linoleic acid to α-linolenic acid was also observed, with some varieties closer to 3:1 (reported as optimal for human nutrition). We found that that cinnamic acid amides and lignanamides, the main chemical classes of bioactive metabolites in hemp seed, were more concentrated in the Spanish accession Kongo Hanf (CAN58) and the French accession CAN37, while the Italian cultivar Eletta Campana (CAN48) demonstrated the greatest yield of fatty acids. Our results indicate that the high variability of bioactive and novel metabolites across the studied hemp seed accessions may influence claims associated with their commercialization and inform breeding programs in cultivar development.


Introduction
Cannabis sativa L., specifically the subspecies, varieties and cultivars known as hemp, are considered one of the most versatile herbaceous plants in agriculture, because of its commercial value to the pharmaceutical, food, nutraceutical, cosmetics, textile, papermaking, and construction industries (Schluttenhofer and Yuan, 2017;Farinon et al., 2020). In the context of its nutraceutical use, the focus has been on its seeds, which are rich in polyunsaturated fatty acids, proteins, minerals, and specialized metabolites with noteworthy implications for human and animal nutrition (Callaway, 2004;Russo and Reggiani, 2015;Galasso et al., 2016;Farinon et al., 2020). In this regard, hemp seeds are processed to make edible oil, cake flour and protein powder (Farinon et al., 2020;Leonard et al., 2021b).
Considering that some varieties of C. sativa express psychoactive cannabinoids that are occasionally detected in Cannabis-based products, a distinction between hemp and the recreational variety has been made (Schluttenhofer and Yuan, 2017). The European Industrial Hemp Association (EIHA) defines hemp as "the plant Cannabis sativa L., or any part of it, with a delta-9 tetrahydrocannabinol (THC) concentration up to 0.3% on a dry weight basis" to differentiate it from psychoactive C. sativa in which the THC concentration exceeds 0.3% and can be up to 20% (European Industrial Hemp Association, https://eiha.org/). Subsequently, several countries mainly from Europe, North America and Asia legalized the cultivation of low THC hemp cultivars (Leonard et al., 2021b). By 2018 more than 50,000 hectares were allocated to the cultivation of this crop in Europe, one of the main producers of industrial hemp. Within Europe, France is currently the largest producer, followed by Italy and the Netherlands (European Industrial Hemp Association, https:// eiha.org/). Over the years, varieties of hemp were developed to select for specific traits and seeds from many of these varieties are currently stored in international germplasm collections and botanical gardens. There are currently 75 varieties registered in the EU Catalogue, all of them with a THC content below 0.3% and a diverse profile of other metabolites (European Industrial Hemp Association, https://eiha.org/).
Beyond the well-known cannabinoids, C. sativa contains a diverse metabolome of bioactive metabolites implicated in human health and nutrition. Hemp seeds are low in cannabinoids, and rich in phenylpropionamides (cinnamic acid amides and lignanamides) and unsaturated fatty acids, attracting nutraceutical and commercial interest (Crescente et al., 2018;Leonard et al., 2021a;Leonard et al., 2021b). The principal nutraceutical value of hemp seed oil is in its fatty acid composition, dominated by >90% of polyunsaturated fatty acids (Schluttenhofer and Yuan, 2017). It contains two dietary essential fatty acids, linoleic acid and alinolenic acid in the ratio of 2.5-3:1. This is allegedly ideal for human nutrition and cardiovascular health (Leizer et al., 2000;Simopoulos, 2008;Galasso et al., 2016). However, the presence of high quantities of phenylpropionamides in hemp seeds have also been linked to some of their biological properties (Chen et al., 2012;Moccia et al., 2020;Leonard et al., 2021a). Hemp seeds accumulate the highest structural diversity of lignanamides among other lignanamides-producing taxa, with more than 80 different compounds (Leonard et al., 2021a).
Cinnamic acid amides and lignanamides display potent antiinflammatory, antioxidant and anti-cancer activities in both in-vitro and in-vivo studies (Chen et al., 2012;Farinon et al., 2020;Moccia et al., 2020;Leonard et al., 2021a). Previous studies have promoted these hemp polyphenols as protective agents against human chronic diseases (Leonard et al., 2021a). For example, phenylpropionamides such as cannabisin B and N-trans-caffeoyltyramine are significantly stronger antioxidants than the standard soy isoflavones (Chen et al., 2012). Subsequent studies demonstrated that cannabisin B also enacts antiproliferative activity by inducing autophagic cell death in liver hepatocarcinoma (HepG2) cells (Chen et al., 2013). Other hemp lignanamides, such as cannabisin F, have demonstrated a potential neuro-protective effect by reducing mRNA levels of proinflammatory mediators, intracellular reactive oxygen species (ROS) and tumor necrosis factor a (TNF-a) in lipopolysaccharidestimulated BV2 microglia cells (Wang et al., 2019).
Thus, the health and nutritional properties of hemp seeds are dependent upon their chemical composition that differs mainly according to the variety used. With such a high number of hemp varieties currently available in international germplasm collections, a comprehensive chemical assessment of these accessions is especially relevant, specifically to discover new metabolites and link the presence of biologically important molecules to certain accessions, ultimately to exploit the hemp germplasm by designing commercial cultivars with specific attributes. However, to date the phytochemical diversity of the hemp seed metabolome has not been explored in detail. In the present study, we focused on characterizing the chemical diversity of the hemp seed metabolome across cultivars/accessions using an untargeted metabolomics study of 52 hemp seed accessions.

Plant material
A collection of 51 hemp seed accessions from fourteen countries were obtained from the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany (http://gbis.ipkgatersleben.de/) (Table 1). These accessions, maintained ex-situ by the IPK, were provided as small quantities of seeds (< 1 g). An additional sample of hemp seeds (1 kg) (S1, Table 1) was acquired in a local supermarket for further isolation of targeted metabolites (see section 2.5), bringing the total number of samples to 52.
The 51 externally sourced samples derive from sixteen recognised cultivars, including five historic cultivars whose seeds have a "deleted" status (Table 1) in the EU database of registered plant varieties (https://ec.europa.eu/info/index_en). Samples that were attained in replicate include: three accessions of the cultivar "Fibrimon" (plus two related but deleted strains, Fibrimon 21 and 56), four of "Cinepa" and three of "Kompolti" (Table 1).

Code
Scientific name Cultivar or Accession name Country of origin EU catalogue status

Extraction and LC-MS analyses
The extraction and metabolic profiling by LC-MS was based on the protocol reported by De Vos et al. (2007). Seeds from each of the hemp accessions were ground in liquid nitrogen with a pestle and mortar. Fifty milligrams of the resultant seed powder were then transferred to Eppendorf tubes and 2 mL of 80% MeOH (HPLC grade) were added as extraction solvent. Tubes were then placed in an ultrasonic bath for 15 min at 25°C using a frequency of 40 kHz. After extraction, samples were centrifuged at 19,975 × g for 10 min at room temperature, and the supernatant was transferred to an LC-MS vial. Samples were randomly analyzed by arbitrarily selecting vials to minimize statistical bias. Furthermore, 30 µL of each of the 52 samples were pooled into a LC-MS vial labelled as "Quality Control". This sample was analyzed multiple times along the chromatographic sequence to check for reproducibility (De Vos et al., 2007).
The metabolic profiles of the hemp seed accessions were recorded on a Vanquish UHPLC system coupled to a 100 Hz photodiode array detector (PDA) and an Orbitrap Fusion Trihybrid high-resolution tandem mass spectrometer (Thermo Fisher Scientific, Waltham, MA, USA). Chromatographic separation of hemp seed extracts (5 mL) was performed on a Luna C18 column (150 mm × 3 mm i.d., 3 mm, Phenomenex, Torrance, CA, USA) using a linear mobile phase gradient of 0:90:10 to 90:0:10 [MeOH (A): water (C): acetonitrile +1% formic acid (D)] over 20 min at a flow rate of 400 mL min −1 . Ultraviolet data were recorded between 210 nm and 550 nm.
Mass spectrometric detection was performed in the positive and negative ionization modes using the full scan and data dependent MS 2 and MS 3 acquisition modes. Total ion current (TIC) chromatograms were obtained over the range of 125 -1800 m/z using a spray voltage of +3.5 and −2.5 kV for the positive and negative ionization modes, respectively. Four different scan events were recorded for each ionization mode as follows: (1) full scan, (2) MS 2 of the most intense ion in full scan, (3) MS 3 of the most intense ion in MS 2 , and (4) MS 3 of the second most intense ion in MS 2 . Additional parameters for the MS included the following: full scan resolution set to 60,000 (full-width at half-maximum, FWHM), capillary temperature set to 350°C, ion transfer tube temperature set to 325°C, RF lens set to 50%, automatic gain control target set to 4.0 × 105 (full scan) or 1.0 × 104 (MS 2 and MS 3 ), intensity threshold set to 1.0 × 10 4 , collision-induced dissociation energy set to 35 eV, activation Q set to 0.25, and isolation window set to 4 m/z. Nitrogen was used as the drying, nebulizer and fragmentation gas.

LC-MS data processing and multivariate analysis
The LC-MS raw data was sliced into two sets according to the ionization mode (positive and negative) and transformed to mzXML format using the MSConvert package from the software ProteoWizard 3.0.9798 (Proteowizard Software Foundation, Palo Alto, CA, USA). Each data set was then processed by MZmine 2.53 (Pluskal et al., 2010) following the protocol previously described (Padilla-Gonzaĺez et al., 2020a;Padilla-Gonzaĺez et al., 2020b). After MZmine pre-processing, the results were exported as.csv and.mgf files which were then uploaded to the GNPS platform (https:// gnps.ucsd.edu) for Feature Based Molecular Networking (FBMN) analysis (Wang et al., 2016;Nothias et al., 2020). The.csv file containing quantitative information related to ion abundances in each hemp seed sample was also submitted to multivariate statistical analysis by principal component analysis (PCA), hierarchical clustering analysis (HCA) and heatmaps in the software R 3.0.3 (R Foundation for Statistical Computing, Austria). Prior to the

Feature-based molecular networking and annotation tools
Feature-based molecular networking (FBMN) was performed following the workflow by (Nothias et al., 2020) on the GNPS platform (https://gnps.ucsd.edu). For this analysis, the data were first filtered by removing all MS/MS fragment ions within a window of +/−17 Da of the precursor ion value and by choosing only the top 6 fragment ions in the +/− 50 Da window throughout the spectrum. The precursor and fragment ion mass tolerance were set to 0.05 Da and 0.99 Da respectively, in accordance with the MS settings used. A cosine score above 0.65 and more than six matched peaks were considered for creating the molecular network. Furthermore, edges between two nodes were kept in the network only if each of the nodes appeared in each other's respective top 10 most similar nodes. Experimental MS 2 spectra were then searched against the GNPS spectral libraries (Wang et al., 2016) using the same filters as for the input data. To enhance chemical structural information within the molecular network and provide a more comprehensive overview of the hemp seed chemical space, information from in silico structure annotations from the GNPS library search were incorporated into the network using the GNPS MolNetEnhancer and MS2LDA workflows (Wandy et al., 2018;Ernst et al., 2019). MolNetEnhancer and MS2LDA were performed using the default values reported in GNPS, except for bin width in MS2LDA, which was set at 0.1. Chemical class annotations were performed using the ClassyFire chemical ontology (Feunang et al., 2016). Final molecular networks including quantitative data, substructure information and chemical classifications were then visualized using the software Cytoscape (Ono et al., 2014).
Lastly, to confirm and expand the spectral library annotation made by molecular networking and its in-silico match against the GNPS spectral libraries, accurate mass values, MS 2 and MS 3 data of all the annotated metabolites were manually inspected and compared with information available in the literature to achieve different confidence values following the Metabolomics Standards Initiative (Sumner et al., 2007).

Isolation of new metabolites
One kilogram of seeds was acquired in a local supermarket and submitted to an MS-guided isolation protocol to target the isolation of a potentially new metabolite detected by LC-MS, following the same steps as done previously (Padilla-Gonzaĺez et al., 2017). Briefly, the seeds were grinded into a fine power and extracted with 80% water in methanol for 24 h, with two repeat extractions of the same material, to exhaust the polar metabolome from the seed flour. Extracts were combined and then evaporated on a centrifugal evaporator (Genevac EZ-2, SP Industries, Warminster, PA, USA) to remove the organic solvent fraction and then lyophilized to remove water and yield a dry crude extract residue. Ten grams of the extract were submitted to Sephadex LH-20 column chromatography (120 g, 400 × 40 mm i.d.) employing mixtures of 200 mL of water-methanol (100:0, 98:2, 96:4, 94:6, and 50:50) that were combined to afford ten fractions (Fr1-Fr10) after LC-UV-MS monitoring for similarities. Fr2 was further purified by semipreparative HPLC (Waters e2695, Waters, Milford, MA, USA) using a Luna C18 column (250 × 10 mm i.d., 10 µm, Phenomenex) and a mobile phase of acetonitrile and water both acidified with 0.1% of formic acid. Chromatographic separation was performed using a linear gradient of 5 -20% acetonitrile over 30 min, to yield 4 mg of ferulic acid 4-O-glucosyl-6'-O-sulphate. The structure of the isolated metabolite was determined using uniand bidimensional NMR experiments (Bruker ARX 400, Billerica, MA, USA, MeOD) and by high-resolution MS (Orbitrap Fusion, Thermo Scientific) and low-resolution tandem mass spectrometry (Ion Trap Velos Pro, Thermo Scientific).

Extraction and derivatization of fatty acids
Powdered seeds (100 mg) from each hemp accession were extracted with 2 mL of n-hexane in an ultrasonic bath for 15 min at 25°C using a frequency of 40 kHz. After extraction, samples were evaporated under vacuum to obtain an oily residue corresponding to hemp seed fixed oil. Five microliters of the oil were added to a 2 mL glass vial containing 500 µL of boron trifluoride in methanol (20%). The vial was filled with nitrogen, tightly sealed and gently mixed. The sample vial was subsequently heated at 100°C for 15 min in a heating block. After cooling to room temperature, 500 µL of water and 750 µL of pentane were added to the vial and vigorously shaken. The organic phase (upper) was then transferred to another vial and evaporated under nitrogen. This partition was repeated for a second time and the organic phases were mixed and evaporated under nitrogen. The concentrated sample containing fatty acids methyl esters was re-dissolved in 200 µL of n-hexane and placed in a glass vial with insert for GC-MS analysis.

GC-MS analyses of fatty acids methyl esters
GC-MS analyses of fatty acids methyl esters were performed in an Agilent 7890A GC chromatograph coupled to a single quadrupole MS analyser (Agilent 5975C). Chromatographic separation was performed on a ZB-WAX column (30 m x 0.25 mm i.d. x 0.25 µm, Phenomenex), using helium as carried gas at a flow rate of 1 mL per minute. The temperature program was configured from 70 -250°C at a linear increment of 3°C per minute. The injection temperature was set to 220°C and the source temperature to 180°C. 70 eV was used as ionization energy and the mass spectrometer was configured to record data from 38 to 650 m/ z. Compounds were identified by comparing their retention indices (calculated against a series of n-alkanes analysed under the same experimental conditions) and by comparing their mass spectra with the NIST library.
The percentage composition of the fatty acid methyl esters detected in each hemp cultivar/accession was semi-quantitatively obtained by manual integration of GC-MS data. This information was then centered and scaled by unit variance and submitted to statistical analyses in the software R.

Metabolic profiling by LC-MS/MS
3.1.1 Exploratory methods and mining of the hemp seed metabolome Metabolic profiling by LC-MS and concomitant data processing by MZmine 2.53 of 52 hemp seed accessions detected 437 and 242 mass features in the positive and negative ionization modes, respectively. A principal component analysis (PCA) of the positive mode dataset explained 37.1% of the total variance in the first two components and grouped the seed extracts according to the similarities of their metabolic fingerprints ( Figures 1A, B). This analysis revealed that while most accessions have similar metabolic profiles, the Italian cultivar Eletta Campana (CAN48), the Spanish accession Kongo Hanf (CAN58) and the French accession CAN37 (Table 1) are clear chemical outliers ( Figure 1A).
A hierarchical clustering analysis with bootstrap resampling (HCAbp) grouped the 52 hemp seed accessions into five main clusters with support values >80% (clusters 1-5, Figure 1C). This analysis further confirmed that accessions CAN37 and CAN58 form an independent group chemically different from the remaining hemp seed accessions (cluster 1, Figure 1C), while the Italian cultivar Eletta Campana (CAN48) is closely related in chemical composition to the Croatian accession (CAN42) and the German cultivar Kompolti (CAN68) ( Figure 1C). Similar results were obtained in the PCA and HCAbp of the negative mode dataset with accessions CAN37 and CAN58 and cultivars CAN48 and CAN68 as clear outliers based on their metabolic fingerprints ( Figure S1). It is noteworthy that the chemical differences between the samples in groups 1 -5 are visibly distinguishable by interrogation of the base peak chromatograms. To exemplify this, a representative sample from each group is provided in Figure 1D.
Close inspection of the five HCA-based groups reveals that in some cases a correlation of chemical composition with the hemp seed cultivar/accession occurs. For example, the five Fibrimon cultivars included in the present study clustered in group 5, while all Cinepa accessions clustered in group 4. This pattern, however, was not consistent across all hemp cultivars, since the three investigated Kompolti samples were grouped in different clusters ( Figure 1C). The correlation between chemical compositions and country of origin was also not evident. The eleven German accessions included in the present study were distributed across t h r e e d i ff e r e n t c l u s t e r s , a s w e r e t h e f o u r I t a l i a n accessions ( Figure 1C).
A feature-based molecular networking analysis (FBMN) with in silico annotation tools of the positive mode LC-MS dataset revealed a clustering tendency by chemical class, where cinnamic acid amides (including lignanamides) and lipid-like molecules clustered the higher number of nodes ( Figure 2A). Cinnamic acid amides and lipid-like molecules were consistently the most structurally diverse chemical classes in hemp seeds. Other classes detected include nucleosides and analogues, sugars and amino acids with derivatives ( Figure 2A). The FBMN analysis, along with database searches and manual inspection of MS data, facilitated the identification of the majority of previously reported cinnamic acid amides in hemp (Table 2) although some nodes from the same chemical class were not identified. This analysis also revealed the accumulation patterns of cinnamic acid amides in the different groups of hemp seed accessions maintained ex-situ in the IPK germplasm collection ( Figure 2B). As observed in Figure 2C (17), 307 (11), 259 (7), 198 (7), 145 (5) [  Figure 2E). The remaining hemp seed accessions collectively account for only ca. 13% of the total ion counts for this chemical class ( Figure 2E). Regarding the lipid-like molecules, the Italian cultivar Eletta Campana (CAN48) showed the highest accumulation of this chemical class with 72% of the total ion counts, while the remaining 51 accessions together represent only 28% of the total ion counts ( Figure 2E). FBMN of the negative ionization mode revealed similar patterns to the positive mode dataset with cinnamic acid amides clustering the higher number of nodes ( Figure S2). However, several unidentified compounds were detected in this mode grouping into a previously unreported "molecular family" in hemp seeds ( Figure 2D). Manual dereplication of the nodes belonging to this chemical class and further isolation and structure elucidation of targeted metabolites (see section 3.1.3) allowed the identification of this molecular family as "cinnamic acid glycosyl sulphates". This unusual chemical class includes several structurally related metabolites, characterized by the presence of a cinnamic acid moiety (or a related derivative) para-O-linked to a glycosyl sulphate unit ( Figure 2D). Interestingly, all hemp seed accessions showed consistently similar amounts of glycosyl sulphates with 15% to 20% of the total ion counts, except for the accessions in cluster 1 (CAN37 and CAN58) which showed a percentage accumulation of ca. 34% ( Figure 2E).

Metabolic differences among hemp seed accessions
Although the current study provides a snapshot of the metabolic composition of different hemp seed accessions obtained from plants grown in different conditions and geographic locations, we aimed to compare their metabolic profiles to link the presence of biologically important molecules to certain accessions in order to inform future studies. Therefore, to have a deeper understanding of semiquantitative differences among the annotated metabolites in each hemp seed accession, a heatmap analysis based on manually extracted peak intensities was performed (Figure 3). The annotation of these compounds was performed by HRMS and MS 2 spectral matching with literature information and with spectra available in the GNPS database and in our in-house library of MS 2 data. After in-silico spectral matching, manual inspection of annotated peaks was performed to confirm and expand the identifications. This combined analysis allowed the annotation of 44 metabolites belonging to five main chemical classes: Cinnamic acid glycosyl sulphates, cinnamic acid amides and their oxidative coupling products, lignanamides, cannabinoids and lipids (Table 2). These compounds were annotated with different levels of confidence according to the metabolomics standards initiative (Sumner et al., 2007).

Cinnamic acid glycosyl sulphates
Detailed analyses of the metabolites assigned as cinnamic acid glycosyl sulphates allowed the annotation of four new compounds (5, 6, 8 and 9) and one known metabolite (compound 7). The assigned structures of compounds 6 -9 are based on the absolute chemical structure of compound 5, ferulic acid 4-O-glucosyl-6'-O-sulphate (cannabigail), which was purified and assigned by 1D and 2D NMR interpretation in MeOD, supported by HRMS and MS 2 data (see section 3.2. for details about its structural elucidation). All the compounds in this chemical class (compounds 5 -9, Table 2) showed similarities in their MS 2 spectra, characterized by the presence of a base peak ion at 241 m/z [M−H] − (Table 2), suggesting close similarities in their chemical structures. However, detailed analyses of MS data revealed that the phenylpropanoid moiety in Heatmap showing the relative accumulation of the annotated metabolites in the 52 hemp seed accessions analysed by LC-MS. For compounds identities refer to Table 2. these compounds differed, being ferulic acid in compounds 5 and 9, caffeic acid in 6, cinnamic acid in 7 (cuinnabis) and dimethoxycinnamic acid in 8. Analysis of accumulation patterns of this chemical class in the 52 hemp seed accessions showed that the accumulation of cinnamic acid glycosyl sulphates is rather homogeneous across all hemp seed accessions, although the Spanish Kongo Hanf (CAN58) and the French accession CAN37 (cluster 1) accumulate an overall higher proportion of these metabolites. Interestingly, an inverse tendency in the accumulation of compound 5 and its isomer, compound 9, was observed across accessions CAN37 and CAN18, while compounds 8 is preferentially accumulated in the Schurig cultivar (CAN60, Figure 3).

Cinnamic acid amides and lignanamides
A total of six cinnamic acid amides and 23 lignanamides were identified in all hemp seed accessions based on the HRMS and MS 2 spectral match with information from the literature and online databases. N-trans-caffeoyltyramine (compound 11) and N-transferuloyltyramine (compound 12) represent the compounds accumulated in the highest proportion in hemp seeds out of all the detected polar metabolites. These two compounds are accumulated in the highest proportions in the French accession CAN37 and the Spanish accession Kongo Hanf (CAN58), although CAN37 accumulates higher amounts of N-trans-caffeoyltyramine relative to N-trans-feruloyltyramine, while the opposite is true for CAN58. Overall, CAN37 and CAN58 show the highest proportion of cinnamic acid amides and lignanamides among all hemp seed accessions (Figure 3), while an Argentinian accession (CAN51) and a Turkish accession (CAN26) both showed the lowest proportions for these two chemical classes, respectively (Figure 3). The Italian cultivar Eletta Campana (CAN48) and most of the accessions in cluster 4 accumulate the lowest proportion of cinnamic acid amides, while intermediate levels were found in accessions in cluster 3 (CAN42 and CAN68) followed by accessions in cluster 5 (Figure 3).
Similar to the monomeric cinnamic acid amides, lignanamides are accumulated in higher proportions in accessions CAN37 and CAN58, followed by the Croatian and German accessions CAN42 and CAN68, respectively, in cluster 3 and those in cluster 5 ( Figure 3). Lignanamides constitute the most diverse group of metabolites in hemp seeds, among which cannabisin B (compound 17) and cannabisin C (compound 27) represent the two most abundant metabolites of this chemical class. Compounds in this chemical class are generally found as isomeric molecules apparently distinguishable by the intensity of their diagnostic MS 2 ions (Table 2), although no study has been performed to validate this hypothesis. Interestingly, although accessions CAN42 and CAN68 are closely related based on their overall chemical compositions ( Figure 1C), the lignanamides profile of these two accessions showed salient differences (Figure 3). While both accessions accumulate similar proportions of compounds 16 -23, accession CAN68 has a much higher proportion of cannabisins D, E, F, G, N and O, relative to CAN42 (compounds 25 -38, Figure 3). Interestingly, while most cinnamic acid amides and lignanamides seem to be preferentially accumulated in accessions in cluster 1 (CAN37 and CAN58), compounds 13 (N-trans-coumaroyl tyramine) and 14 (tricoumaroyl spermidine) were accumulated mostly in accessions CAN36 and CAN20, respectively, both in cluster 5.

Cannabinoids and lipid-like molecules
Considering the previous reports and current regulations on the presence of cannabinoids in hemp, we performed target mass searches for the main cannabinoids previously reported in hemp seed oil and organic extracts (Citti et al., 2019;Jang et al., 2020). Trace amounts (< 0.3%) of only three cannabinoids were detected in 39 of the 52 hemp seed accessions: cannabidiol (CBD), tetrahydrocannabinol (THC) and tetrahydrocannabinolic acid (THCA). THC and CBD were detected in comparatively higher proportions in the Italian cultivar Eletta Campana (CAN48, Figure 3), while THCA was mainly detected in the accessions belonging to cluster 5 (especially CAN17 and CAN22), as well as in a Kompolti cultivar (CAN56, Figure 3).
Similarly, three unidentified lipid-like molecules (compounds 42 -44, Figure 3) were also detected mainly in the Italian Eletta Campana (CAN48), although trace amounts were also found in other accessions. Manual inspection of MS data indicated that these metabolites likely correspond to oxidated fatty acids. However, their structural characterization is yet to be determined.

Structural characterization of new cinnamic acid glycosyl sulphates
Considering that the analysis by FBMN and our dereplication approach suggested the presence of several potentially new metabolites belonging to a previously unreported molecular family in hemp (cinnamic acid glycosyl sulphates), we performed a target isolation of the most abundant metabolite from this chemical class, detected at 6.84 min with a mass of 435.06061 m/z [M−H] − ( Table 2). The process started with a dried 20% methanolic extract of hemp seeds, which was submitted to the classic isolation processes using Sephadex LH-20 column chromatography followed by semipreparative HPLC (see Materials and Methods section). Approximately 4 mg of ferulic acid 4-O-glucosyl-6'-O-sulphate (compound 5) was afforded. Furthermore, the tentative structures of four additional cinnamic acid glycosyl sulphates (compounds 6 -9; Figure 4) in the crude extracts are tentatively reported based on similarity to 5 according to HRMS and MS 2 data. All the compounds in this chemical class (compounds 5 -9, Figure 4) showed a characteristic base peak ion in their MS 2 spectra at 241 m/z [M−H] − , representing a deprotonated glucopyranosylsulphate unit formed after the neutral loss of the phenylpropanoid moiety. A subsequent neutral loss of a water molecule generated the ion at 223 m/z (Figure 4). In addition to these diagnostic ions, minor peaks representing a deprotonated ferulic acid (193 m/z), caffeic acid (179 m/ z), cinnamic acid (163 m/z) and dimethoxycinnamic acid unit (223 m/ z), were observed in the MS 2 spectra of compounds 5, 6, 7 and 8, respectively. The suggested fragmentation mechanism leading to the formation of these diagnostic ions is provided in Figure 4.
Compounds 5 and 9 were assigned as two isomers of the previously undescribed ferulic acid glucosyl sulphate metabolite, based on chromatographic and spectrometric data (HRMS and MS 2 ). Both compounds showed a deprotonated molecule [M−H] − at m/z 435.0606 (Table 2), consistent with the molecular formula of C 16 H 20 SO 12 (calculated for C 16 H 19 SO 12 , 435.0603). Their online UV spectra displayed absorbance maxima at 289 and 315 nm, characteristic of phenylpropanoid derivatives.
The 1H NMR spectrum of compound 5 included a proton spin system consistent with a six membered meta-para-disubstituted aromatic ring (Table 3), and two adjacent exocyclic olefinic monoprotonated carbons with a coupling constant (15.8 Hz) consistent with the trans configuration. The 13C spectrum is consistent with that of a substituted free ferulic acid moiety. Positions 3 and 4 of the ferulic acid moiety were assigned by HMBC couplings with other protons on the ring, and by an online NMR shifts simulator (nmrdb.org). The position of the methoxy singlet was established by a HMBC coupling to the carbon at position 3, and an NOE coupling to the proton doublet (1.2 Hz) at position 2. The attachment of the glucosyl moiety was established by a HMBC coupling from the anomeric proton to the fully substituted carbon at position 4.
The glucosyl moiety was assigned using COESY couplings, coupling constants and HSQC. The position of attachment of the sulphate was confirmed by the re-shielding of the carbon at position 6', which is normally measured at 61 -62 ppm in similar glycosylarene structures (Schuster et al., 1986).

Fatty acid profiles of different hemp seed accessions
Metabolic profiling by GC-MS of the derivatized oil extracted from 52 hemp seed accessions demonstrated homogeneous chemistry with a predominance of linoleic acid (Table 4). Six fatty acids were detected as the major oil components in all accessions: palmitic acid, oleic acid, linoleic acid, g-linolenic acid, a-linolenic acid and stearidonic acid. Integration of chromatographic peak areas showed that linoleic acid (53.65% ± 2.18), a-linolenic acid (20.18% ± 3.30) and oleic acid (13.62% ± 1.86) are the three main fatty acids, representing more than 87% of the total oil components ( Table 4). The ratio of linoleic acid and a-linolenic acid among all hemp seed accessions was found to be 2.66:1, with accessions CAN19, CAN32, CAN43, CAN44, CAN50, CAN53 and CAN54 showing the closest values to 3:1, reported as optimal for human nutrition (Table 4). Chemical structures of cinnamic acid glycosyl sulphates detected in hemp seed and fragmentation mechanism of the isolated ferulic acid glucosyl sulphate. The chemical structure of ferulic acid-4-O-glucosyl-6'-O-sulphate (compound 5, in red) was confirmed by NMR experiments, while the chemical structures of the remaining compounds (6 -9) were suggested by interpretation of HRMS and MS 2 data alone and, therefore, await spectroscopic confirmation.  Chromatographic peak area comparisons revealed that the Italian cultivar Eletta Campana (CAN48) represents the accession with the highest accumulation of fatty acids, followed by samples of the cultivar "Fibrida" (CAN52) and "Forose" (CAN28). However, these cultivars showed interesting differences in the accumulation of specific metabolites. CAN48 showed the highest concentration of palmitic acid, oleic acid and linoleic acid, while CAN28 showed the highest concentration of g-linolenic acid, a-linolenic acid and stearidonic acid. A boxplot analysis based on the fatty acids composition further confirmed that CAN48 and CAN28 are clear outliers in the accumulation of specific lipophiles and the mean values for the six major fatty acids ( Figure 5A). Further analysis of the same dataset by PCA revealed that the fatty acid composition of the 52 hemp seed accessions is rather homogeneous with only two PCs accounting for more than 92% of the total variance ( Figures 5B, C). The chemical structures of the main fatty acids detected in hemp seeds are reported in Figures 5D.

Discussion
Hemp is an industrial crop that has been widely investigated due to its diversity of applications in the pharmaceutical, nutraceutical, food, cosmetic, textile, and material industries (Schluttenhofer and Yuan, 2017;Farinon et al., 2020). However, the chemical diversity of the hemp seed metabolome has not been explored in detail and new metabolites with unknown properties are likely to be discovered. In the current study, following an untargeted metabolomics approach by LC-MS and GC-MS, we discovered a previously unreported molecular family in hemp seeds "cinnamic acid glycosyl sulphates" and linked the presence of biologically important molecules to certain accessions. Chemical knowledge of these cultivars and accessions could be used as a base in future to optimize commercial cultivars in the context of nutritional and other functional uses.
Metabolic fingerprinting by LC-MS and data mining tools revealed that the 52 hemp seed accessions can be categorized into five main groups based on the similarities of their metabolic fingerprints. Although it is expected that when growing the same seeds under homogeneous conditions there will be some differences in the metabolic grouping, we believe that we can only have an accurate representation of the true metabolic phenotype of an organism by studying the chemical composition of plants/organs obtained from individuals collected in their natural environment. Germplasm and herbarium collections provide valuable samples for this. In the current study, among all the studied accessions and cultivars, the Spanish accession Kongo Hanf (CAN58) and the French accession CAN37 showed a remarkable structural diversity of bioactive and novel metabolites, some of them accumulated in high amounts. These two varieties showed the highest accumulation of phenylpropionamides (including cinnamic acid amides and their oxidative coupling products lignanamides), as well as new cinnamic acid glycosyl sulphates.
Phenylpropionamides (cinnamic acid amides and lignanamides) represents the main chemical class of bioactive metabolites in hemp seeds (Moccia et al., 2020;Leonard et al., 2021a;Leonard et al., 2021b). These compounds display potent antioxidant, anti-inflammatory, cytotoxic and acetylcholinesterase inhibitory activities, both in vitro and in vivo studies (Yan et al., 2015;Farinon et al., 2020;Moccia et al., 2020;Leonard et al., 2021a). For instance, previous studies have recognized the strong potential of hemp seed phenylpropionamides as protective agents against human chronic diseases (Leonard et al., 2021a), although results from clinical trials or cohort studies are not available to support seed oil (Galasso et al., 2016;Pavlovic et al., 2019). These metabolites were not reported in the current paper due to identification and quantification discrepancies in these low concentrated metabolites. However, considering that the biological properties of hemp seed oil are related to the presence of the major fatty acids, especially the ratio between linoleic acid and a-linolenic acid, our results are biologically relevant. Complete reports of minor metabolites in hemp seed oil can be found in other studies (Galasso et al., 2016;Pavlovic et al., 2019).
A recent study reported the cannabinoid profile of ten commercially available hemp seed oils (Citti et al., 2019). Besides tetrahydrocannabinol and cannabidiol, another 30 cannabinoids were also identified (Citti et al., 2019). In the current study, we found traces of only three cannabinoids using a non-selective LC-MS method. Our results demonstrated that Eletta Campana (CAN48) was the cultivar with the highest concentration of cannabinoids, although their concentration was around 200 times lower than that of the most abundant metabolite (N-trans-caffeoyltyramine). Previous studies on the same accession (CAN48) obtained from the same genebank (IPK) reported an absolute quantification value of THC of 0.08%, which is well below the 0.3% threshold set by international regulations (Citti et al., 2019). Based on the low THC content of this cultivar, we might assume that all the hemp seed accessions included in the current study meet the international regulatory standards of < 0.3% THC content.
Recent studies have proved that despite being heavily studied, hemp seed continues to be a promising source of new metabolites. In the current study we discovered a previously unknown molecular family in hemp seeds "cinnamic acid glycosyl sulphates" widely distributed across all hemp seed accessions in similar concentrations. The identity of one novel compound from this chemical class was confirmed by isolation and interpretation of spectroscopic data, and four structural analogues, three of them new, is also suggested. The presence of these compounds in all hemp accessions suggest that cinnamic acid glycosyl sulphates might have important implications as quality control markers for the authentication of commercial hemp seed products extracted with highly aqueous solvents. Considering that these highly polar extracts, commonly used in the manufacture of cosmetics, are usually devoid of characteristic hemp seed phenylpropionamides due to insolubility issues, the need of more suitable quality control markers is justified. Further studies are still needed to identify the biological properties of these new metabolites.
In conclusion, our results confirm that some hemp accessions present in the IPK germplasm collection appear to contain more interesting profiles of metabolites than some of the cultivated hemp cultivars, as first suggested by Galasso et al. (2016). We suggest that accessions CAN37 and CAN58, should be scientifically prioritized for the discovery of new metabolites as well as in selective breeding programs for the development of new cultivars with high contents of bioactive phenylpropionamides. However, despite the functional and nutraceutical properties of this crop, the seeds of hemp are not totally free of antinutritional compounds, such as phytic acid (Russo, 2013;Galasso et al., 2016). According to previous reports, including ten of the accessions investigated here, CAN48 presents the highest levels of phytate, although other cultivars not included here like Futura75 and Felina32 have even higher values. The presence of high phytate contents in some cultivars limit their use in human nutrition and other monogastric animals since a high level of phytic acid may lead to mineral deficiencies of macro-and microelements, protein digestibility and poor organoleptic properties (Russo, 2013;Russo and Reggiani, 2015;Galasso et al., 2016). Therefore, an improvement for this trait might be necessary if the high yielding oil cultivar CAN48 is prioritized. According to previous studies (Galasso et al., 2016), the Italian accession CAN40 might be a good candidate to reduce the content of phytate through hybridization and selective breeding. Other accessions such as CAN32, or the Fibrimon cultivars CAN50 and CAN53, could also be prioritized if a 3:1 ratio of linoleic acid/a-linolenic acid is desired.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://massive.ucsd.edu/ ProteoSAFe/static/massive.jsp, MSV000090725.
The authors declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.