Functional Analysis of Four Terpene Synthases in Rose-Scented Pelargonium Cultivars (Pelargonium × hybridum) and Evolution of Scent in the Pelargonium Genus

Pelargonium genus contains about 280 species among which at least 30 species are odorant. Aromas produced by scented species are remarkably diverse such as rose, mint, lemon, nutmeg, ginger and many others scents. Amongst odorant species, rose-scented pelargoniums, also named pelargonium rosat, are the most famous hybrids for their production of essential oil (EO), widely used by perfume and cosmetic industries. Although EO composition has been extensively studied, the underlying biosynthetic pathways and their regulation, most notably of terpenes, are largely unknown. To gain a better understanding of the terpene metabolic pathways in pelargonium rosat, we generated a transcriptome dataset of pelargonium leaf and used a candidate gene approach to functionally characterise four terpene synthases (TPSs), including a geraniol synthase, a key enzyme responsible for the biosynthesis of the main rose-scented terpenes. We also report for the first time the characterisation of a novel sesquiterpene synthase catalysing the biosynthesis of 10-epi-γ-eudesmol. We found a strong correlation between expression of the four genes encoding the respective TPSs and accumulation of the corresponding products in several pelargonium cultivars and species. Finally, using publically available RNA-Seq data and de novo transcriptome assemblies, we inferred a maximum likelihood phylogeny from 270 pelargonium TPSs, including the four newly discovered enzymes, providing clues about TPS evolution in the Pelargonium genus. Notably, we show that, by contrast to other TPSs, geraniol synthases from the TPS-g subfamily conserved their molecular function throughout evolution.


INTRODUCTION
The Pelargonium genus belongs to the Geraniaceae family and contains about 280 species exhibiting a wild range of variation in leaf and floral morphology, as well as body plan organisation (Bakker et al., 2004;Jones et al., 2009;Roeschenbleck et al., 2014;Blerot et al., 2015). Phylogenetic analyses based on nuclear, plastidial and mitochondrial DNA led to a structuration of the Pelargonium genus into five main clades comprising 16 sections (Bakker et al., 2005). Although most of the odorant pelargoniums belong to clade A1 (section Pelargonium), some species of clade B (sections Reniformia and Peristera), clade C1 (section Jenkinsonia) and clade C2 (sections Ciconium and Subsucculentia) are scented (Blerot et al., 2015).
Most Pelargonium species are indigenous of South Africa and nearby countries. Aromas of odorant species are remarkably diverse such as rose, mint, lemon, nutmeg, ginger and many other scents, underlying a richness of scented compounds produced in Pelargonium (Demarne and Van der Walt, 1992;Lis-Balchin, 2003;Lalli et al., 2006). As such, several species were introduced and used in hybrids creation during the 17th century in Europe, most notably to obtain cultivars with refined fragrance notes. Among the scented Pelargonium hybrids, the rose scented P. × hybridum cultivars, also named pelargonium rosat, are the most emblematic cultivars and are often used to replace the expensive Rosa damascena essential oil (EO). These hybrids descend from several crossings between P. graveolens or P. radens in one hand and P. capitatum in the other hand. Cultivars used in EO production are somatically multiplied from cuttings in their production area, but the knowledge of their exact botanical origin has been lost over time (Demarne, 2003). Because of the economic importance of scented pelargoniums, agronomical research was undertaken to improve EO production and to investigate pharmaceutical and antimicrobial potentialities (Saraswathi et al., 2011;Boukhatem et al., 2013). In addition, characterisation of flavones and antioxidants in the hydrolate phase obtained during EO extraction opened new areas for valorisation of by-products (Rao et al., 2002;Sangwan and Singh, 2015).
Composition of EOs extracted from the main P. × hybridum cultivars (cv. rosat 'Bourbon, ' cv. rosat 'China, ' cv. rosat 'Egypt' and cv. rosat 'Grasse') has been extensively studied (Gauvin et al., 2004;Juliani et al., 2006;Blerot, 2016). Geraniol, citronellol, and derivatives, all acyclic monoterpenes, form the main part of EOs and give their stereotypical rose scent to these cultivars. In a lesser extent, limonene, a cyclic monoterpene, and its derivatives with a mint scent, as well as sesquiterpenes like 6,9-guaiadiene or 10-epi-γ-eudesmol with a rose scent and a wood scent, respectively, contribute to EOs fragrances. For a detailed composition of pelargonium rosat EOs, see Blerot (2016). Balance between these main terpenes is of crucial importance for EO quality and fragrance, with variable rosy, minty and woody top notes. As such, use and commercial value of pelargonium rosat EO depend on its composition and fragrance characteristics. For example, EO obtained from pelargonium rosat 'Bourbon' is much appreciated for its pure rosy fragrance.
In pelargonium, biosynthesis of terpenes takes place in leaves in specialised structures known as glandular trichomes, or oil glands (Figure 1A), that are composed of a secretory cell producing EO in a subcuticular storage cavity (Boukhris et al., 2013). Within the secretory cell, prenyl phosphate like isopentenyl diphosphate (IPP) and its allylic isomer, dimethylallyl diphosphate (DMAPP) are biosynthesised through both the cytosolic mevalonate (MVA) and the plastidial methyl erythritol phosphate (MEP) pathways ( Figure 1B). It is generally accepted that IPP/DMAPP provided by MEP and MVA pathways are used, respectively, for the biosynthesis of monoterpenes and sesquiterpenes, but several works have pointed out cross-talks between these two metabolic routes (Opitz et al., 2014;Mendoza-Poudereux et al., 2015). IPP/DMAPP are substrates of prenyltransferases, enzymes involved in hydrocarbon chain elongation of prenylated compounds. The head-to-tail condensation of one DMAPP molecule with one IPP molecule produces geranyl diphosphate (GPP). Adding another IPP unit to GPP produces farnesyl diphosphate (FPP). Longer prenyl diphosphates can be synthesised but are not used to produce volatile compounds.
Prenyl diphosphates are substrates of terpene synthases (TPSs), a class of enzymes found in bacteria, fungi, plants, as well as some organisms from Excavata and Amoebozoa eukaryotic supergroups (Bohlmann et al., 1998;Wawrzyn et al., 2012;Yamada et al., 2015;Chen et al., 2016). In plants, TPSs likely evolved from the duplication of a gene encoding for an enzyme catalysing the formation of ent-kaurene, a precursor in the gibberellin pathway (Hayashi et al., 2006). TPSs including both monoterpene synthases (mTPSs), sesquiterpene synthases (sTPSs), and diterpene synthases (diTPSs) can be divided in seven subfamilies, of which five are represented in angiosperms: TPS-a contains sTPSs, TPS-b mTPSs, TPS-c diTPSs, TPS-e/f and TPS-g mTPSs, sTPSs and diTPSs (Chen et al., 2011). Conversion of GPP and FPP in terpenes is ensured by mTPSs and sTPSs, respectively, although some enzymes are able to accept both substrates when the reaction occurs in vitro (Nagegowda et al., 2008). The catalytic step mediated by TPSs consists in the dephosphorylation of a prenyl diphosphate molecule, thus forming an instable intermediary carbocation, followed by molecular rearrangements of the carbocation until loss of a proton or addition of water ends the reaction. A magnesium ion is needed for the dephosphorylation step and two conserved motifs, DDxxD and NSE/DTE are known to help for Mg 2+ ion stabilisation (Degenhardt et al., 2009). A third conserved motif, RR(x 8 )W, facilitates the cyclisation of carbocations (Williams et al., 1998). Many TPSs are promiscuous enzymes forming a large number of terpene products owing to their catalytic mechanism (Tholl et al., 2005).
in rose (Shalit et al., 2003). Complicating the identification of enzymes involved in terpene secondary modifications, the same molecule can be obtained by different molecular routes. For example, citronellol biosynthesis occurs either by a direct reduction of geraniol or through a multistep pathway involving both ADH and reductase enzymes (Iijima et al., 2014;Xu et al., 2017).
Balance between primary terpenes and their secondary derivatives largely influences the fragrance produced by aromatic plants. Regulation of TPS and other enzymes involved in terpene biosynthesis is both controlled transcriptionally (Nagegowda, 2010) and by IPP/DMAPP supply (Muñoz-Bertomeu et al., 2006). In pelargonium, a gene encoding for 1-deoxy-D-xylulose-5phosphate synthase (DXS), the first step providing IPP/DMAPP by the MEP pathway, has been characterised. Overexpression of the DXS gene led to a slight increase in EO content (Jadaun et al., 2017).
In aromatic non-model plants, the first EST library was derived from isolated secretory trichomes of peppermint (Lange et al., 2000). This strategic resource allowed the characterisation of reductases Ringer et al., 2003), dehydrogenases  and cytochrome P450 monooxygenases (Bertea et al., 2001), all involved in the menthol biosynthetic pathway. Several other libraries were obtained from purified glands of basil (Gang et al., 2001), oregano (Crocoll et al., 2010) and lavender (Lane et al., 2010) resulting in gene characterisation in both isoprenoid and phenylpropanoid biosynthetic pathways. Transcriptome investigation using high throughput sequencing technologies (RNA-Seq) has opened an entire new era in biology, allowing for large scale studies of gene expression, even in non-model organisms for which no genome has been sequenced. In the specialised metabolism field of research, RNA-Seq not only allowed to decipher entire metabolic pathways (Dugé de Bernonville et al., 2015), but also their regulators like transcription factors (Wang et al., 2016). However, pelargonium sequence data are yet scarce. Illumina reads for two dozen accessions of Geraniales, of which half are Pelargonium species, are publically available. Although this dataset encompasses several odorant pelargoniums, TPSs were not investigated. Recently, the first transcriptome analysis of a pelargonium rosat (cv. 'Bourbon') was reported  and 13 mTPSs, 5 sTPSs, and 10 dTPSs sequences were identified solely by sequence homology.
In this study, we report the identification and functional characterisation of a geraniol synthase (PhGES) and an eudesmol synthase (PhEDS), two major TPSs controlling EO composition in pelargonium rosat cultivars, as well as a cineole and a myrcene synthase (PhCINS and PhMYRS, respectively) using a 454 transcriptome of P. × hybridum cv. rosat 'Grasse.' We show a strong correlation between expression of the genes encoding these enzymes and presence of their respective products in several pelargonium cultivars and species. Finally, using publically available RNA-Seq data and de novo transcriptome assemblies, we place the four enzymes in an evolutionary perspective of the TPSs in the Pelargonium genus.

RNA Extraction
Total RNA extraction was performed using a modified version from Chang et al. (1993) and Cock et al. (1997). Therefore, 1 g of young leaves were ground to powder in liquid nitrogen before addition of 5 mL of extraction buffer [2% (w/v) PVP; 2% (w/v) CTAB; 100 mM Tris-HCl pH 8; 25 mM EDTA pH 8; 2 M NaCl] and 100 µL of β-mercaptoethanol. Homogenised sample was incubated 15 min at 65 • C followed by 1 min at room temperature and then extracted twice with an equal volume of chloroform:isoamylic alcohol (24:1). Supernatant (aqueous phase) was collected by centrifugation at 9,000 × g for 20 min at 4 • C. RNA was precipitated by the addition of lithium chloride (2 M) and incubated for 24 h at 4 • C. After centrifugation at 14,000 rpm for 1 h at 4 • C, pellet was washed twice with ice cold ethanol 70% and air dried. Pellet was dissolved with 25 µL of water and 500 µL of SSTE buffer [1 M NaCl; 0.5% (w/v) SDS; 10 mM Tris-HCl pH 8; 1 mM EDTA pH 8]. Aqueous phase was collected by centrifugation at 14,000 rpm for 10 min at 4 • C and RNA was precipitated by the addition of 1 mL of ethanol 100% and 150 µL of sodium acetate 3 M (pH 5.7). RNA pellet was harvested after overnight incubation at −20 • C and centrifugation at 14,000 rpm for 30 min at 4 • C. Pellet was then washed twice with ice cold ethanol 70%, air dried and solubilised in pure water.

Genome Walking
One region of the PhGES gene, the promoter and the ATG start site, were amplified using the GenomeWalker TM kit (Clontech, cat. # 638904) according to manufacturer's instructions. DNA samples were extracted from leaves using NucleoSpin R Plant II Genomic DNA (Macherey-Nagel, cat. # 740770) according to manufacturer's instructions. As recommended by the GenomeWalker TM kit, PCR and nested-PCR were performed using Advantage 2 PCR kit (Clontech, cat. # 639207). Primers were designed following the guidelines from the kit and are listed in Supplementary Table 1.

Pyrosequencing Library and Reads Analysis
Total RNA from P. × hybridum rosat 'Grasse' was extracted from young leaves using the Tri reagent kit according to the manufacturer's instructions and 75 µg was sent to Eurofins MWG GmbH. A normalised random-primed cDNA library was prepared from the RNA, an emulsion-based PCR was performed and one segment of a sequencing plate was sequenced on a GS FLX+ (454/Roche) to yield more than 698,000 reads delivered as assembled reads in FASTA format with quality scoring files of all clusters. Reads were deposited in the SRA under the number SRP144736. Cleaned reads were checked using fastQC 1 and assembled using CAP3 (Huang and Madan, 1999) with a minimum similarity threshold of 90% and a minimum overlap of 40 bases. Assembled sequences were annotated using Blastx searches performed against the Arabidopsis translated coding sequences [The Arabidopsis Information Resource (TAIR), V10 2 ]. Several Perl scripts were designed for data processing and analysis in assembling, annotation and functional classification of the sequences.

Phylogenetic Analyses
Terpene synthases sequences were fetched in pelargonium assembled transcriptomes using hmmer v3.1b2 (Eddy, 2011) and the HMM profile Terpene_synth_C (PF03936) from PFAM database corresponding to the α domain found in the C-terminal part of plant enzymes. Short peptides and sequences devoid of the DDxxD conserved motif typical of class I TPSs were removed. Sequences were aligned with clustalω v1.2.4 (Sievers et al., 2011) using the Terpene_synth_C hmm profile as a guide and 10 iterations. Some sequences obviously belonging to the same gene were manually joined together. During exploratory work, FasttreeMP v2.1.10 (Price et al., 2010) was used to generate phylogenies. Sequences of functionally characterised enzymes from neighbour families of Pelargonium genus and from Arabidopsis were added to the tree in order to tentatively assigned functions to groups of Pelargonium TPSs (Supplementary Table 2); nearly identical sequences with the same biochemical function were omitted. Sequences not belonging to TPS clades -a, -b, -g or -e/f (Chen et al., 2011) were removed from the phylogeny. Final alignment was manually edited to remove non-homologous sites and obvious misalignments (Supplementary File 1). The final tree was obtained using RAxML pthreads v8.2.11 (Stamatakis, 2014), the JTT evolutionary model, four gamma categories and amino acids frequencies deduced from the alignment as determined with the help of Prottest v3.4.2 (Darriba et al., 2011). Bootstrap values were obtained during the RAxML run until convergence (300 bootstraps). Pelargonium TPSs were blasted against NCBI nr database and results were compiled from the first 50 hits to obtain a functional annotation of the sequences.

Expression of Recombinant PhGES, PhEDS, PhCINS, and PhMYRS in E. coli
Full length TPSs genes amplified using primers listed in Supplementary Table 1 were inserted into the vector pENTR/D-TOPO and transferred by homologous recombination in Busso expressing vectors (Busso et al., 2005). Primers used for these steps are described in Supplementary Table 1. E. coli strain Rosetta (DE3) pLysS cells (Novagen, Darmstadt, Germany) were then transformed with the expression vector by heat shock. Production of the heterologous protein was performed during 14-16 h at 16 • C under constant shaking at 180 rpm in terrific broth supplemented with 0.5% glycerol, 0.25 M D-sorbitol, 2.5 mM betaine after induction with 0.2 mM IPTG. The cells recovered by centrifugation were disrupted by sonication in native binding buffer (50 mM NaH 2 PO 4 , 0.5 M NaCl, 20 mM imidazole, 5% glycerol, 5 mM DTT, pH 8) supplemented with 0.5 mg.mL −1 lysozyme. After clarification of the lysate by centrifugation, the recombinant protein was purified by binding to the Talon R metal affinity resin (Clontech, cat. # 635515) according to the manufacturer's instruction. The resin-bound protein was incubated overnight at 4 • C in 200 µL of native binding buffer supplemented with 10 units of thrombin. The next day the TPS was recovered from the mixture by filtration. Protein concentration was measured using the Bio-Rad reagent (cat. # 500-0006) with bovine serum albumin as standard (Bradford, 1976).
Enzymatic assays were performed at least in three replicates in a final volume of 500 µL containing 15-50 µg of purified recombinant protein, buffer (25 mM Tris-Cl, pH 7.5, 10% glycerol, 1 mM DTT, 1 mg.mL −1 BSA) and cofactors (10 mM MgCl 2 , 1 mM MnCl 2 ). The reaction was initiated by addition of 50 µM geranyl or FPP and the mixture was overlaid with 500 µL of hexane. After 2 h incubation at 30 • C, the mixture was vigorously mixed and the upper hexane phase was collected, concentrated under nitrogen stream and analysed by GC/MS. Negative controls were performed using the purified product from Rosetta (DE3) pLysS without expression vector.

Subcellular Localisation
In silico predictions of proteins subcellular localisation were made with TargetP (Emanuelsson et al., 2007) and Predotar (Small et al., 2004) web services. PhGES, PhCINS, PhMYRS and PhEDS full length coding sequences were amplified using primers reported in Supplementary Table 1. Following cloning in the pENTR/D-TOPO entry vector, sequences were transferred into the expression vector pMDC83 under the control of a double 35S promoter (Curtis and Grossniklaus, 2003), to generate a fusion protein with GFP fused to the C-terminal part of the TPS protein. This construct was transformed into the Agrobacterium tumefaciens strain C58 (pMP90). Agrobacteria cultures expressing independently the four TPSs were co-infiltrated in Nicotiana benthamiana leaves according to Batoko et al. (2000) together with agrobacteria expressing the P19 viral suppressor of silencing (Voinnet et al., 2003). After 5 days, infiltrated leaf sectors were observed under a confocal microscope as reported in Magnard et al. (2015).

RT-PCR Analysis
Quantitative PCR was performed with CFX96 TM real-time detection system (Bio-Rad, Hercules, CA, United States) using the SsoAdvanced TM SYBR Green Supermix (Bio-Rad, cat. # 172-5270). All reactions were carried out in 20 µL using 2 µL of reverse transcribed cDNA as template and 500 nM of each of the primers according to the manufacturer's protocol. Gene primers were designed using the Primer3 software. Two biological replicates were analysed for each accession then cDNA was amplified twice in two independent qPCR runs with each primer combination. Therefore, data yielded four replicates per original sample.    cycles, a melting-curve analysis (65 • C to 95 • C, one fluorescence read every 0.5 • C) was performed to check the specificity of the amplification.
Normalised expression values (2 − Cq method, Livak and Schmittgen, 2001) of PhGES and PhEDS were calculated by the CFX96TM data manager (Bio-Rad, Hercules, CA, United States) using β-ACTIN, TUBULIN and GAPDH as reference genes (Supplementary Table 1). The stability of expression of these reference genes was evaluated using Best-Keeper (Pfaffl et al., 2004), geNorm v. 3.5 (Vandesompele et al., 2002), and NormFinder (Andersen et al., 2004). We used REST 2009 (Pfaffl et al., 2002) to compare the expression level of a gene in a 'sample' group using a 'control' group as a reference by implementing a pairwise fixed reallocation randomization test (10,000 iterations). Differences in expression between 'sample' and 'control' cDNAs were considered significant for p-values < 0.05.
Semi-quantitative RT-PCR was performed using primers listed in Supplementary Table 1 and samples were normalised using ACTIN gene. After 25 amplification cycles, amplification levels were estimated following ingel staining and visual comparison against ACTIN gene amplification.

GC-MS Analysis
Terpenes were extracted independently from fresh leaves of three plants after overnight incubation in hexane (2 mL per gram), supplemented with camphor as internal standard to allow for quantification, or by hydro-distillation. An Agilent GC 6850 gas chromatograph coupled with an Agilent 5973 ion trap mass detector was used for GC-MS analyses of enzymatic activities and transiently transformed tobacco leaf extracts. The instrument was equipped with a 30 m × 0.25 mm apolar capillary column DB5. Temperatures of injector and detector were 250 • C. Helium was used as the carrier gas at a flow rate of 1.0 mL.min −1 . A volume of 2 µL of extract was injected with a split ratio of 1:2. Oven temperature settings were: 4 min at 60 • C after injection followed by a 4 • C.min −1 temperature ramp from 60 • C to 240 • C. Temperature was then kept on hold at 240 • C for 5 min. Molecule identification was performed using Wiley, NIST 05 and IFF-LMR mass spectra databases. GC-MS analyses in IFF analytic laboratory (Grasse) were performed on a GC-MS 6890-MS Agilent 5973. Most of the parameters were common with those applied in LBVpam except for the temperature ramp that was 60 • C during 10 min, then 2 • C.min −1 from 60 • C to 300 • C. Temperature was then kept on hold at 300 • C for 3 min. A volume of 2 µL of extract was injected with a split ratio of 1:110.

Functional Characterisation of a Geraniol
Synthase and an Eudesmol Synthase, as Well as Two mTPSs From P. × hybridum Rosat 'Grasse' Geraniol, limonene, eudesmol and their derivatives make up to 75% of EO composition in pelargonium rosat cultivars. As such, characterising TPSs responsible for their synthesis is an important step to better understand how these terpenes are produced in pelargonium. To this aim, a transcriptome of P. × hybridum rosat 'Grasse' was de novo assembled from 454 reads obtained from leaf RNA. Known sequences of GES, limonene synthase and EDS were used to search for homologous sequences in the transcriptome and four TPSs were identified as good candidates based on their percentage of similarity. The four sequences were cloned and recombinant proteins were produced in the E coli Rosetta TM (DE3) pLysS strain using Busso's expression vectors (Busso et al., 2005). Incubation of the four recombinant proteins with GPP or FPP led to the identification of three mTPSs and one sTPS. PhGES (MF503883) produced uniquely geraniol in presence of GPP as a substrate (Figure 2A). PhEDS (MF503882) produced 10-epi-γ-eudesmol and α-eudesmol as major and minor products, respectively, after incubation with FPP ( Figure 2B). The two remaining enzymes were multiproduct mTPSs: PhCINS (MF503881) catalysed the in vitro production of more than 10 monoterpenes from GPP, with 1,8-cineole and α-terpineol as major products ( Figure 2C); PhMYRS (MF503884) used GPP to produce β-myrcene as major compound and both αand β-pinene in lower amounts ( Figure 2D). Unfortunately, none of the enzymes catalysed the formation of limonene. No sesquiterpene could be detected when PhGES, PhCINS or PhMYRS were incubated with FPP; similarly, PhEDS was unable to catalyse the production of any monoterpene in presence of GPP. Retention times of all enzymatic products were similar to standards (data not shown) and a comparison of mass spectrum between enzymatic products and standards was also performed (Supplementary Figure 1).

Sequence Analysis of PhGES, PhEDS, PhCINS, and PhMYRS
Alignment of the protein sequences of the three mTPS showed a strong similarity between PhCINS and PhMYRS (identity: 63%; similarity: 77%) compared to PhGES (Figure 3). The three enzymes contained conserved motifs of TPSs: LSLYEASYL and DDxxD; RR(x) 8 W, was absent from the PhGES sequence, as expected for a mTPS catalysing the synthesis of acyclic products like geraniol. Because geraniol and derivatives are the main terpenes responsible for the rose scent of pelargonium rosat, orthologs of PhGES in several rosat cultivars as well as in their parents were examined (Figure 4). PhGES sequence was highly conserved between the different pelargonium rosat as compared to P. cv. 'Toussaint.' A sequence of three amino acids (TAL) was only found in 'Bourbon' and 'Egypt' cultivars but not in 'Grasse' cultivar. PhGES and PhEDS gene structures were investigated by  Table 3). Exon size was conserved between the two genes whereas introns were longer on average in PhEDS. PhGES gene structure is in accordance with previous published data of monoterpene genomic structure (Martin et al., 2010), except for an unusual long intron of 542 bp in position three.

In vivo Localisation of the Four TPSs
It is generally believed that the plastidial MEP pathway and the cytosolic MVA pathway provide IPP/DMAPP for the synthesis of monoterpenes and sesquiterpenes, respectively. As such, mTPSs are expected to be localised in the plastid, whereas sTPSs are expected to be cytosolic. In silico targeting predictions of PhGES, PhCINS, and PhMYRS indicated that the three enzymes were targeted to the plastid, excepted PhGES predicted to be localised in the mitochondria solely by Predotar software (not shown). To verify these predictions, N. benthamiana leaves were transiently transformed with each TPS fused with the GFP moiety at the end of their C-terminal (Curtis and Grossniklaus, 2003). Confocal microscopy showed that PhGES was localised in the plastid (Figure 5) as well as PhCINS (Supplementary  Figure 2). Our attempt to localise PhMYRS was unsuccessful but its high similarity with PhCINS (Figure 3) led us to be confident with in silico predictions. Transformed plants only with agrobacteria expressing the P19 viral suppressor of silencing provided a negative control of GFP fluorescence. Unexpectedly, PhEDS was predicted to be localised in the plastid, but all targeting predictions overlapped with the RR(x) 8 W conserved P. cap, P. capitatum; P. grav, P. graveolens; P. quer, P. quercifolium; P. tom, P. tomentosum. Commercial cultivars in brackets; pelargonium rosat cultivars: 'Bourbon,' 'Grasse' and 'Egypt'; 'Clorinda' is a commercial hybrid from unknown lineage. The two main compounds of each species are shaded. Pinenes, α-and β-pinenes; phellandrenes, α-and β-phellandrenes; ocimenes, α-, β-and allo-ocimenes; menthone, addition of menthone, isomenthone and menthol; citronellol, addition of citronellol, citronellal and several citronellyl like acetate, formate, butyrate and tigliate; geraniol, addition of geraniol, citral, and several geranyl like acetate, formate, butyrate and tigliate; MT, monoterpenes; guaiadiene, 6,9-guaiadiene and 1(5),1,1-guaiadiene; ST, sesquiterpenes. Average percentages and total amounts as well as their corresponding standard deviations were calculated from results obtained from three independent GC-MS analyses.
motif involved in the biosynthesis of cyclic sesquiterpenes like eudesmol, thus indicating a misprediction of PhEDS subcellular localisation. Unfortunately, our attempts to localise PhEDS in vivo were unsuccessful.

Correlation of Expression of the Four TPSs With Terpene Content in Different Pelargonium Species
To validate PhGES, PhEDS, PhCINS, and PhMYRS functions in vivo, terpene content and expression of the four enzymes were investigated jointly in several Pelargonium species. Several P. × hybridum cultivars, their putative parents, P. capitatum, P. graveolens and P. radens, as well as some botanical species were analysed ( Table 1). Terpene amount ranged from 0.5 mg.g −1 FW up to 2.7 mg.g −1 FW depending on the species. P. graveolens produced geraniol and both P. graveolens and P. capitatum produced citronellol and derivatives, giving a rose and lemon scent to these species. P. radens contained almost exclusively menthone and its derivatives relative to the other terpenes detected. Pelargonium rosat cultivars synthesised both rose and mint scented terpenes with different balances between geraniol and citronellol. Finally, P. cv. 'Clorinda, ' P. quercifolium and P. tomentosum exhibited a different pattern of terpenes with a higher amount of non-oxygenated monoterpenes like myrcene, phellandrenes and ocimenes. Expression of both PhGES and PhEDS was assessed by semi-quantitative RT-PCR using PhACTIN as reference ( Figure 6). As expected, PhGES was expressed in all pelargonium rosat and in P. graveolens that produces citronellol and geraniol as main terpenes, PhEDS was only expressed in rosat 'Grasse' and 'Egypt' cultivars that contain 10-γ-epi-eudesmol, Overall, these results indicated a good correlation between PhGES, and PhEDS gene expression and accumulation of the products synthesised by the respective enzymes in several Pelargonium species.
Expressions of PhGES and PhEDS genes were more precisely assessed by qPCR. As shown in Figure 7, the results essentially recapitulated the semi-quantitative RT-PCR results. In addition, PhGES was only slightly expressed in the species P. capitatum (previously not analysed by semiquantitative RT-PCR). Expression of PhEDS in P. capitatum was unexpected but similar results were already reported in Zingiber zerumbet (Yu et al., 2008). These results confirmed a strong correlation between PhGES and PhEDS gene expression and accumulation of the products synthesised by the respective enzymes.

TPS Family in Pelargonium Botanical Species and Phylogenetic Relationships of PhGES, PhEDS, PhCINS, and PhMYRS
RNA-Seq data are publically available for 12 botanical species of Pelargonium and for P. × hybridum cv. rosat 'Bourbon' . Hence, this opportunity was used to explore the TPS family in the Pelargonium genus, as well as the phylogenetic placement of the four newly discovered  Table 1. enzymes. To these aims, transcriptomes were assembled from the 13 Illumina read sets and annotated. 271 TPS sequences belonging to subfamilies -a, -b, -g, and -e/f (Chen et al., 2011) were retrieved using TPS HMM profile from the Illumina transcriptomes and from the 454 P. × hybridum cv. rosat 'Grasse' transcriptome, thus excluding homologs and some diTPS enzymes belonging to the TPS-c subfamily (see the section "Materials and Methods" for details). The number of TPS sequences by species varied from three enzymes in P. cotyledonis to 36 enzymes in P. × hybridum rosat 'Grasse' ( Table 2). Because TPS homologs were searched in transcriptome data, the number of TPS retrieved by species is likely underestimated as compared to the total number of TPS encoded by their respective genomes. No clear links could be drawn between the number of transcriptional units per transcriptome and the number of TPS sequences identified ( Table 2) showing that the number of TPS per species does not correlate with the number of transcripts inferred during transcriptome assembly. However, analysis of volatile content from available plant specimens in Lyon botanical garden (Supplementary Table 2) showed that the three non-scented species devoid of any detected terpene (P. cotyledonis, P. transvaalense and P. australe) had a low number of TPS sequences in their transcriptome ( Table 2). Although the analysed set of plants was limited, this result could mean that non-scented species express or have a lower number of TPS sequences encoded in their genome. It should be noted that the apparent discrepancy in the number of TPSs found between rosat 'Bourbon' and rosat 'Grasse' cultivars is likely due to the different sequencing and assembly methods used.
A maximum likelihood (ML) phylogeny ( Figure 8A) was inferred including the 271 TPS sequences identified across the A B FIGURE 7 | Transcriptional activity of PhGES (A) and PhEDS (B) in several pelargonium species. TPS transcript levels were normalised to b-actin and tubulin then to the expression level of P. tomentosum. Error bars indicate standard deviation with n = 4. Statistical analyses were performed with the software REST. Same letter indicates no statistical differences with a p-value < 0.05. set of Pelargonium species, the four newly discovered enzymes PhGES, PhCINS, PhMYRS, and PhEDS, as well as Arabidopsis sequences and functionally characterised enzymes from diverse species. Two cloned sequences from P. × hybridum cv. rosat 'Bourbon' homologous to PhGES and PhEDS were included in the phylogeny. As expected, PhCINS and PhMYRS clustered with the TPS-b subfamily that contains most of the mTPSs; PhEDS clustered with the TPS-a subfamily made of angiosperms sTPSs; PhGES clustered with the TPS-g subfamily as many other geraniol synthases (Figure 8A). Species phylogeny (Figure 8B) was reconciled with the TPS ML tree in order to infer orthologous groups of TPSs between the 13 Pelargonium species. With the exception of PhMYRS for which no orthologs could be inferred, PhGES, PhEDS, and PhCINS possessed orthologous sequences in other Pelargonium species, with the orthologous group comprising PhGES (G-1) being strongly supported by bootstrap values (Figure 8A). PhEDS and PhGES possessed direct orthologs in P. × hybridum rosat 'Bourbon, ' and PhCINS seemed to be closely related to a paralogous enzyme that consequently could be a second cineole synthase expressed in P. × hybridum cv. rosat 'Grasse.' However, no clear correlation could be drawn between presence or absence of an enzyme in a given species, and terpene content analysis (Supplementary Table 4).
Terpene specificity was tentatively assigned to the Pelargonium TPSs, both using blast information and connection of Pelargonium TPS clades to functionally characterised TPSs in other species. In almost all cases, functionally characterised enzymes with different product affinity clustered together by species (see V. vinifera sequences in the TPS-a subfamily for example, Figure 8A) and branched at the base of clades containing Pelargonium TPS sequences, consequently providing no indication as for the putative functions of the Pelargonium enzymes. As such, terpene specificity of TPSs has evolved 225,783 * 36 33 * * * The 6 to 10 order of magnitude difference in sequence number between the P. × hybridum cv. rosat 'Grasse' and the other transcriptomes is due to the sequencing technology and assembly procedure. * * Both P. × hybridum cv. rosat 'Bourbon' and P. × hybridum cv. rosat 'Grasse' volatiles were analysed independently of the botanical species.
independently in Pelargonium genus compared to other lineages. A remarkable exception was represented by the geraniol synthase clade where GESs from different species clustered at the base of the clade containing PhGES orthologous sequences, both forming an independent clade in the TPS-g subfamily ( Figure 8A). Thus, GES function seems to be conserved throughout angiosperm evolution, although TPSs functioning as GES evolved also out of the TPS-g subfamily (Yang et al., 2005). The GES group contained also an enzyme acting as a linalool synthase.

DISCUSSION
TPS Gene Expression Explain Terpene Diversity in P. × hybridum Cultivars and Their Parents P. capitatum in one hand and P. graveolens or P. radens in the other hand are the putative parents of P. × hybridum cultivars. P. radens and P graveolens have been described as mint-scented species with isomenthone as major compound (Van der Walt and Demarne, 1988), but geraniol and citronellol were reported in EO of some accessions of P. graveolens (Lis-Balchin, 2003). Such differences in EO composition could be explained either by wrong botanical identification or by the presence of different chemotypes in the species. This last hypothesis is supported by the fact that in P. capitatum, at least 8 chemotypes have been described (Viljoen et al., 1995). In any case, the high number of species in Pelargonium genus points out the importance of careful botanical identification of accessions. Several cultivars of pelargonium rosat have been obtained in different places of production. All these cultivars are characterised by a high amount of geraniol, citronellol and other derivative compounds, with different balances depending on the cultivars. In contrast, pelargonium rosat cultivars 'Grasse' and 'Egypt' can be chemotypically differentiated from 'Bourbon' cultivar by their high content in 10-γ-epi-eudesmol.
Expression of PhGES and PhEDS was studied by semi quantitative RT-PCR and qPCR. A good correlation was found between PhGES expression and the accumulation of geraniol and downstream products in the studied Pelargonium cultivars. Rosat 'Bourbon' and 'Grasse' cultivars, the two highest geraniol and citronellol producers, were also those that expressed the highest level of PhGES. In contrast, P. radens and P. tomentosum that did not produce geraniol or citronellol exhibited a low expression of PhGES. Transcriptional regulation of TPS genes has already been shown (Nagegowda, 2010), although other modes of regulation cannot be excluded. qPCR results demonstrated that PhGES was implicated in the biosynthesis of geraniol in pelargoniums in vivo. PhEDS expression was very high in the two cultivars rosat 'Egypt' and 'Grasse, ' both producing a high level of 10-γ-epi-eudesmol, whereas no expression was found in accessions devoid of this terpene, to the exception of P. capitatum. In this species, a high level of PhEDS transcripts was found but no 10-γ-epi-eudesmol could be detected. A similar unexpected result was previously reported by Yu et al. (2008) who found a high expression of β-eudesmol synthase and no accumulation of the corresponding product. One explanation could be the further conversion of 10-γ-epi-eudesmol to secondary non-volatile derivatives or a lack of FPP substrate. This result remains unclear and needs further investigations.

PhGES, a Key Enzyme in Controlling the Scent of Pelargonium Rosat Cultivars
Geraniol, citronellol and derivative compounds are the main terpenes involved in the fragrance of P. × hybridum rosat cultivars, all deriving from the enzymatic activity of PhGES. PhGES belongs to the TPS-g subfamily. Enzymes from this clade lack the RR(x 8 )W conserved motif, which facilitates isomerisation of the geranyl cation in the linalyl cation, a step necessary to form cyclic terpenes (Williams et al., 1998). Thus, TPS-g enzymes can only produce acyclic terpenes. Although the first characterised enzyme from this clade was involved in monoterpene synthesis, it is known that this clade gathers mono, sesqui and diterpenes synthases catalysing acyclic terpene biosynthesis. PhGES is well conserved in the different rosat cultivars, as well as in their putative parents, although some do not produce geraniol. Interestingly, PhGES sequence possessed three supplementary amino acids (TAL) in rosat 'Bourbon' and 'Egypt' cultivars, as compared to rosat 'Grasse' cultivar. This three amino acids were also found in P. graveolens species. Taken together, this indicate that pelargonium rosat 'Grasse' cultivar could be descending from a different crossing as compared to the two other rosats. GES sequence in P. cv. 'Toussaint, ' a hybrid cultivated for its high content in citronellol, was more different A B FIGURE 8 | Phylogeny of Pelargonium TPS and Pelargonium species. Maximum likelihood phylogeny of Pelargonium and of functionally characterised TPS from other species (A). Classification of TPS is indicated by concentric circles, from the outside to the inside: TPS subfamilies, annotation from BLAST results, putative intermediary carbocation, and putative function from orthology relationships with functionally characterised enzymes. Carbocations were tentatively deduced from known reaction pathways and functionally characterised enzymes in the clade. Orthology relationships between TPS sequences are indicated in the name of the sequence using the subfamily letter followed by a number. Orthology was reconstructed from the known Pelargonium species phylogeny (B), redrawn after (Weng et al., 2012). Some flexibility was allowed within each Pelargonium clade and sequences not belonging to the right clade are indicated in brackets. Sequences from pelargonium rosat were considered to cluster with P. citronellum sequences. Characterised TPS in this study are indicated in bold. Bootstrap support: * ≥70, * * ≥90. compared to rosat cultivars and P. graveolens, clearly indicating a different origin.

PhEDS, a New Sesquiterpene Synthase
Eudesmols are sesquiterpenes that are biosynthesised from a farnesyl cation that undergoes a 1,10 cyclisation leading to the germacradienyl cation (Figure 9). A nucleophile attack of the cation by water leads first to the stabilisation of the molecule by addition of a hydroxyl group. Then, mesmerisation between the two double bonds allows an internal 2,7 closure with a protonation producing another carbocation intermediate. Finally, a deprotonation stabilises the molecule, restoring a double bond at a position depending on the position of the lost proton, consequently producing different eudesmol isomers. In this paper, we provide the first cloning and functional characterisation of a 10-epi-γ-eudesmol synthase (PhEDS). To our knowledge, one β-eudesmol synthase from ginger catalysing the formation of both αand β-eudesmol has previously been reported (Yu et al., 2008). β-eudesmol was shown to confer resistance to plants against ant attack (Marinho et al., 2005) and to possess antifungal activity (Guleria et al., 2012). Although eudesmol production was enhanced by abiotic stresses in pelargonium rosat cultivars (Blerot, 2016), it is difficult to assess yet the specific role of 10-epi-γ-eudesmol in these plants without further investigations.

TPS Evolution in the Pelargonium Genus
Analysis of the P. × hybridum rosat 'Grasse, ' P. × hybridum rosat 'Bourbon, ' and the 12 botanical Pelargonium transcriptomes unveiled about 270 TPS sequences from the Pelargonium genus. The terpene specificity of most of these enzymes cannot be assigned on the sole base of sequence homology nor phylogenetic relationships. This is due to the rapid evolution by duplication and mutation acquisition of TPS sequences (Chen et al., 2011;Jullien et al., 2014) leading to convergent functional evolution in this family at low phylogenetic levels. Incidentally, functional plasticity of TPS enzymes has been demonstrated experimentally by target mutations of the active site (Yoshikuni et al., 2006;Kampranis et al., 2007). As a consequence, TPS sequences with different enzymatic capabilities from close species are more related to each other than homologous sequences with the same enzymatic specificities from more distantly related species, as is exemplified here by the Pelargonium centric TPS phylogeny. It should be noted, however, that this plasticity crosses TPS subfamilies borders only marginally: for example, sTPSs are predominantly found within the TPS-a subfamily whereas mTPSs are for the most part found within the TPS-b subfamily. Contrasting with this observation, closely related paralogs from the same subfamily can use both GPP and FPP as a substrate in in vitro assays but produce only monoterpenes or sesquiterpenes in planta because of their differential subcellular localisation, either in the chloroplast or in the cytosol where only GPP or FPP is available, respectively (Nagegowda et al., 2008;Huang et al., 2010). Moreover, synthesis of some terpene like linalool can be ensured by enzymes from diverse origins, see for example the linalool synthases sequences placed in TPS-g,b and -e/f subfamilies in Figure 8A and Magnard et al. (2018). Geraniol synthesis, however, seems to be mainly catalysed by a group of enzymes clustering in a clade localised within the TPS-g subfamily, although enzymes from other subfamilies have evolved as GES (for example the P. frutescens GES belongs to the TPS-b subfamily, see Figure 8A). This observation asks why GES enzymes from the TPS-g subfamily keep their product specificity across larger phylogenetic distances as compared to others TPSs. A possible explanation could be that TPS-g enzymes have limited evolutionary routes because of their inability to form cyclic terpenes, as described above.
Orthologous sequences to three of the four newly discovered TPS enzymes in P. × hybridum rosat 'Grasse' could be identified by reconciling species phylogeny with the Pelargonium TPS phylogeny. It is tempting to assign the same terpene specificity to the P. × hybridum rosat 'Grasse' PhGES, PhEUS, and PhCINS respective orthologs, although one should stay cautious in such a functional assignment given the fast evolution of TPS sequences, as discussed above. Presence of theses enzymes and of their orthologs does not always correlate with terpene composition observed in the different Pelargonium species. This apparent discrepancy can be easily explained by several factors: expression level of the TPS gene, posttranscriptional regulation, utilisation of the enzyme product by subsequent reactions, absence of the substrate at the time of the chemical analysis, chemical analysis sensitivity, etc.

CONCLUSION
In this study, we identified and functionally characterised for the first time a GES and an EDS in pelargonium, two key enzymes controlling scent and EO composition of this odorant and economically important plant. We characterised two other mTPSs and placed the four enzymes in an evolutionary perspective of the TPS family within the Pelargonium genus, thus laying the foundation of a better understanding of how pelargonium odour is produced. Functional characterisation of more TPS in Pelargonium and finer chemical analysis will provide a better understanding of TPS evolution within the genus and, more generally, will allow gaining insights in how odorant Pelargonium species evolved and could be used as new sources of genetic material for perfumery industry. To this aim, large scale multiomic studies of several odorant cultivars and species combining transcriptomics and terpene metabolomics will bring to light entire metabolic pathways involved in the synthesis of compounds making the richness of fragrance observed in Pelargonium.

AUTHOR CONTRIBUTIONS
BB and LM brought equivalent contribution to the experimental work and deserve therefore to be both ranked as first author. CP and AB gave a large contribution in molecular cloning of pelargonium TPSs. DS-M and SL performed the bioinformatics analysis. LS, FG, and NB were involved in the project as gardeners to maintain the pelargonium collection in greenhouse. SB, J-CC, and FJ supervised the experiments. FJ and DS-M wrote the manuscript. BB, LM, DS-M, FJ, SB, and J-CC contributed to the discussion and interpretation of the results and read and approved the final manuscript.

FUNDING
We thank IFF-LMR company for its financial support.