Systematic characterization of gene families and functional analysis of PvRAS3 and PvRAS4 involved in rosmarinic acid biosynthesis in Prunella vulgaris

Prunella vulgaris is an important material for Chinese medicines with rosmarinic acid (RA) as its index component. Based on the chromosome-level genome assembly we obtained recently, 51 RA biosynthesis-related genes were identified. Sequence feature, gene expression pattern and phylogenetic relationship analyses showed that 17 of them could be involved in RA biosynthesis. In vitro enzymatic assay showed that PvRAS3 catalyzed the condensation of p-coumaroyl-CoA and caffeoyl-CoA with pHPL and DHPL. Its affinity toward p-coumaroyl-CoA was higher than caffeoyl-CoA. PvRAS4 catalyzed the condensation of p-coumaroyl-CoA with pHPL and DHPL. Its affinity toward p-coumaroyl-CoA was lower than PvRAS3. UPLC and LC-MS/MS analyses showed the existence of RA, 4-coumaroyl-3’,4’-dihydroxyphenyllactic acid, 4-coumaroyl-4’-hydroxyphenyllactic acid and caffeoyl-4’-hydroxyphenyllactic acid in P. vulgaris. Generation and analysis of pvras3 homozygous mutants showed significant decrease of RA, 4-coumaroyl-3’,4’-dihydroxyphenyllactic acid, 4-coumaroyl-4’-hydroxyphenyllactic acid and caffeoyl-4’-hydroxyphenyllactic acid and significant increase of DHPL and pHPL. It suggests that PvRAS3 is the main enzyme catalyzing the condensation of acyl donors and acceptors during RA biosynthesis. The role of PvRAS4 appears minor. The results provide significant information for quality control of P. vulgaris medicinal materials.


Introduction
Prunella vulgaris L. is a perennial medicinal plant of Lamiaceae, which is widely distributed in Asia, North America, Europe and North Africa (National Pharmacopoeia Committee, 2020;Hu et al., 2023).The whole plants and spikes of P. vulgaris are commonly used to treat thyroiditis, mastitis, tuberculosis, infectious hepatitis and hypertension in East Asia, the Middle East, and Europe (Tang et al., 2023).In addition, P. vulgaris spikes are used as the main raw materials of functional herbal tea in the southern provinces of China.Its fresh leaves are used as seasonal vegetables in southeastern China.The whole plants are often used as urban landscape plants for urban greening (Chen et al., 2019).The demand for P. vulgaris in the production of Chinese patented medicines and functional herbal tea is approximately 60 million kilograms per year (Chen et al., 2022;Li et al., 2022).
P. vulgaris is rich in polyphenols, of which rosmarinic acid (RA) is an index component in evaluating the quality of P. vulgaris medicinal materials and Chinese patented medicines.As the main polyphenol component produced in P. vulgaris, RA has a variety of pharmacological activities, such as antioxidant, anti-inflammatory, anti-tumor, anti-allergy, anti-depression, and anti-anxiety (Taguchi et al., 2017).It also has unique pharmacological effects in improving sleep, neurological prevention, reducing testicular injury and inhibiting elastin degradation (Kwon et al., 2017), and has obvious inhibitory effect on liver tumor cells, lung tumor cells and stomach tumor cells (Radziejewska et al., 2018).In addition, RA is easily absorbed and no toxic side effects on blood cells, kidney, and liver (Noguchi-Shinohara et al., 2015).
RA is a depside condensed from two single phenolic acids (Scarpati and Oriente, 1958).One of them is derived from the general phenylpropanoid pathway.It serves as the acyl donor during condensation.The other one is come from the tyrosinederived pathway.It serves as the acyl acceptor (Figure 1).RA is present in some hornworts, ferns, and multiple taxa of flowering plants and its biosynthetic pathways are probably evolved independently in differently species (Petersen et al., 2009;Levsh et al., 2019).Analysis of RA biosynthesis in Coleus blumei, Phacelia campanularia and Salvia miltiorrhiza showed the existence of three proposed RA biosynthetic routes in different plants (Figure 1), w hi c h i n c l u d e t he b i os y n t h e s is o f 4 -c o u m a r o y l -4'hydroxyphenyllactic acid from p-coumaroyl-CoA and phydroxyphenyllactic acid (pHPL) in C. blumei and P. campanularia (route 1), the biosynthesis of 4-coumaroyl-3',4'dihydroxyphenyllactic acid from p-coumaroyl-CoA and 3,4dihydroxyphenyllactic acid (danshensu, DHPL) in S. miltiorrhiza (route 2), and the biosynthesis of caffeoyl-4'-hydroxyphenyllactic acid from caffeoyl-CoA and pHPL in S. miltiorrhiza (route 3) (Eberle et al., 2009;Petersen et al., 2009;Di et al., 2013;Levsh et al., 2019;Liu et al., 2019;Lu, 2021).The biosynthetic routes of RA in P. vulgaris are largely unknown.
Recently, we sequenced and assembled the genome of P. vulgaris, which provide a solid foundation for analyzing RA biosynthetic routes in P. vulgaris (Zhang et al., 2024).In this study, a total of 51 P. vulgaris genes belonging to seven RA biosynthesis-related gene families were systematically studied through genome-wide identification, feature analysis, expression analysis, and phylogenetic analysis.Among them, seventeen were identified as candidate genes for RA biosynthesis.In vitro enzymatic assay of PvRAS3 and PvRAS4, in vivo phenolic acid compound determination and PvRAS3 transgenic analysis showed that PvRAS3 was the main enzyme catalyzing the condensation of acyl donors and acceptors during RA biosynthesis, whereas PvRAS4 played a minor role.

Plant materials and growth conditions
A wild and whole genome sequenced Prunella vulgaris L. line, named Bangshan-XKC, was transplanted from Bangshan village, Shunchang county of Fujian Province of China and grown in a greenhouse at the Institute of Medicinal Plant Development in Beijing of China.Shoots were cut from the plant and surfacesterilized using 75% ethanol for 1 min and 5% sodium hypochlorite for 20 min.Subsequently, the shoots were rinsed three times with sterile water and inserted into MS medium supplemented with 30 g L -1 sucrose with pH value adjusted to 5.8.After two weeks, the resulting sterile plantlets were transferred to a fresh MS medium.To induce rooting, the apical and axillary buds were cut and placed on 1/2 MS medium containing 0.1 mg L -1 indole-3-butyric acid (IBA).The sterile plantlets were sub-cultivated in a tissue culture room on 1/2 MS medium supplemented with 30 g L -1 sucrose under a 16/8 h light/dark photoperiod at 25°C.

Sequence retrieval and gene prediction
The deduced amino acid sequences of RA biosynthesis-related genes from S. miltiorrhiza were downloaded from NCBI GenBank (https://www.ncbi.nlm.nih.gov/protein)(Wang et al., 2015).BLAST analysis of the downloaded proteins against the chromosome-level assembly of P. vulgaris was carried out using the tBLASTn algorithm (Altschul et al., 1997;Zhang et al., 2024).An E-value cut-off of 10 -5 was applied to the homologue recognition.Gene models were predicted from the retrieved P. vulgaris genomic DNA sequences based on the downloaded S. miltiorrhiza genes and through BLASTx analysis of retrieved sequences against the NR d a t a b a s e u s i n g t h e d e f a u l t p a r a m e t e r s ( h t t p s : / / blast.ncbi.nlm.nih.gov/Blast.cgi).The predicted gene models were further examined and corrected manually through BLASTn analysis against P. vulgaris transcriptome sequencing data (Altschul et al., 1997;Zhang et al., 2024).

Phylogenetic analysis
RA biosynthesis-related protein sequences from various plant species were downloaded from NCBI GenBank (https:// www.ncbi.nlm.nih.gov/protein).Sequence alignment was carried out using the ClustalW algorithms in MEGA version 7.0.26(Kumar et al., 2016).Neighbor-joining trees were constructed for amino acid sequences using MEGA versopm 7.0.26 with default parameters (Kumar et al., 2016).The number of bootstrap replications was 1000.

Quantitative real-time PCR analysis of gene expression
Total RNA was extracted from roots, stems, leaves and spikes using the EASYspin Plus Complex Plant RNA kit (Aidlab, China) as described previously (Cui et al., 2022).Genomic DNA contamination was eliminated by treating with RNase-free DNase (Aidlab, China).RNA integrity was evaluated on a 1% argarose gel.RNA quantity was determined using a NanoDrop 2000C Spectrophotometer (Thermo Scientific, USA).RNA was reversetranscribed into single-stranded cDNA using Superscript III Reverse Transcriptase (Invitrogen, USA).qRT-PCR was carried out using TB Green Premix Ex Taq II (Takara, Japan) on a Bio-Rad CFX96 Real-Time system.Primers used for qRT-PCR were designed using Primer Premier 5 (Lalitha, 2000) and are shown in Supplementary Table 1.Gene amplification efficiency of each primer pair was evaluated using the standard curves.Primer pairs with an appropriate PCR amplification efficiency were used for subsequent analysis (Supplementary Figure 1).PveIF-2 was selected as the reference gene as described before (Zheng et al., 2022).The specificity of amplification was assessed by dissociation curve analysis.Relative abundance of transcripts was determined using the 2 -DDCt method.Standard deviations were calculated from three biological replicates and three PCR replicates per biological replicates.

PvRAS3 and PvRAS4 gene cloning and expression vector construction
Total RNA extracted from leaves of P. vulgaris was reversetranscribed into cDNA using the SuperScript III First-Strand Synthesis System for RT-PCR (Invitrogen, USA).PvRAS3 and PvRAS4 were amplified by nested PCR using cDNA from P. vulgaris leaves as the template.The nesting and nested primers used for PCR are listed in Supplementary Table 1.PCR products were inserted into pGEX-4T-1 and verified by Sanger sequencing.

Heterologous expression of PvRAS3 and PvRAS4 proteins in E. coli
The pGEX-4T-1 vector with PvRAS3 or PvRAS4 was introduced into E. coli strain BL21 (DE3).Heterologous expression of PvRAS3 and PvRAS4 proteins were induced with 0.5 mmol L -1 IPTG at 16 °C for 20-24 h.Cells were collected through centrifugation at 6,000 rpm for 10 min at 4 °C.After resuspension in 10 mM PBS buffer (pH 7.2), the cells were sonicated on ice.Purification of soluble proteins was carried out using the PurKine ™ GST-Tag Protein Purification kit (Glutathione) (Abbkine, China).Concentration of the purified proteins was determined using the BCA Protein Assay kit (Takara Biomedical Technology, Beijing).

In vitro enzymatic activity assay of PvRAS3 and PvRAS4 recombinant proteins
The enzymatic activity assay was carried out in a 500 μl reaction system comprising 100 μg purified proteins, 1 mM caffeoyl-CoA or p-coumaroyl-CoA as the acyl donors, 1 mM pHPL or DHPL as the acyl receptors.The reactions were incubated at 25 °C for 60 min and terminated by adding 10 μl of 10 M acetic acid.Controls were carried out using total proteins from E. coli transformed with the empty pGEX-4T-1 vector.Reaction products were collected and analyzed using ACQUITY UPLC system (Waters, Milford, MA, USA).MS/MS data were recorded on a Xevo G2-XS Q-ToF Mass Spectrometer (Waters, Milford, MA, USA) coupled to a Waters Acquity I-Class UPLC system (Waters, Milford, MA, USA).MS/MS analyses were conducted in negative-ion mode.The samples were separated on an ACQUITY UPLC BEH C18 column (1.7 mm, 100×2.1 mm) at 25°C.The mobile phase A was 0.1% (v/v) formic acid-acetonitrile.The mobile phase B was 0.1% (v/v) formic acid in water.The flow rate was 0.3 mL min −1 .The mobile phases changed with the following gradient: 0-6 min, 5% A and 95% B; 6-8 min, 20% A and 80% B; 8-14 min, 21% A and 79% B; 14-18 min, 95% A and 5% B. MS was analyzed using electrospray ionization (ESI) at negative ion mode.MS-MS data were analyzed using the MssLynx V4.1 software (Waters) as described previously (Pan et al., 2023).

Kinetic analysis of PvRAS3 and PvRAS4 recombinant proteins
Kinetic analysis of PvRAS3 and PvRAS4 was carried out in a 200 mL reaction system consisting of Tris-HCl buffer (100 mM Tris-HCl, pH 7.0, 2 mM DTT, 4 mM MgCl 2 , 10% glycerol), 100 mg recombinant protein, and different concentrations of substrates.The reactions were incubated at 25 °C for 30 min and terminated by adding 10 µl of 10 M acetic acid.The reaction products were analyzed using UPLC system as described as in vitro enzymatic activity assay of recombinant proteins.Enzyme activity was determined by measuring the variation of substrate contents.To determine kinetic parameters, PvRAS3 or PvRAS4 was incubated with different concentrations of acyl donor and acyl acceptor.The saturation concentration of one substrate was set at 2 mM, while the concentration of another substrate was varied at different levels, including 10 mM, 50 mM, 100 mM, 250 mM, 350 mM, 500 mM, and 1000 mM, respectively.The kinetic constants of the donor substrates were calculated based on contents of the product.The kinetic constants of the acceptor substrates were determined through monitoring the consumption of the acceptor substrates.Enzyme assays were performed in triplicate at each concentration of substrate.Vmax and Km values were calculated using Origin 8.0 software with nonlinear regression analysis.

UPLC and LC-MS/MS analyses of phenolic acids
Roots, stems, leaves and spikes of two-year-old P. vulgaris were ground in liquid nitrogen.The ground samples (0.5 g) were dissolved in 10 ml of 80% ethanol and sonicated for 60 min.The extracts were collected by centrifugation and filtered using a 0.22 mm filter (Merk Millipore, USA).UPLC and LC-MS/MS analyses were performed using the ACQUITY UPLC I-Class system (Waters) as described as in vitro enzymatic activity assay of recombinant proteins.Three biological and three technological replicates were carried out for analysis of each tissue.

Generation and analysis of pvras3 mutants
Pvras3 mutants of P. vulgaris hairy roots were generated using the CRISPR/Cas9 system described previously (Wang et al., 2022a).Briefly, PCR amplification was carried out using two pairs of primers containing two dividual guide RNAs (sgRNAs) sequences of PvRAS3.pDT1T2 vector was used as a template.The products were purified, digested with Bsa I, and ligated into the binary vector pHEE401E.The resulting constructs were transferred into Agrobacterium strain ATCC15834.
Leaf discs from thirty-day-old sterile plantlets were cultivated on 1/2 MS medium in dark for two days, immersed for 10 min in the suspension of Agrobacterium cells with or without the constructs, and co-cultivated on MS medium for 2 days.The leaf discs were then transferred onto 1/2 MS medium supplemented with 30 mg L -1 of hygromycin and 400 mg L -1 of cefotaxime for generation of hairy roots.Leaf discs were subcultured every two weeks.Hairy roots generated were transferred to 1/2 MS medium supplemented with 200 mg L -1 of cefotaxime and cultivated for about two weeks.Newly generated hairy roots were then transferred to 1/2 MS medium supplemented with 100 mg L -1 of cefotaxime and cultivated for about two weeks.Finally, newly generated hairy roots from medium with 100 mg L -1 of cefotaxime were transferred to 1/2 MS medium without cefotaxime and cultivated for two weeks.Root tips with 3-4 cm in length were cut, transferred to 100 ml of 1/2 MS medium in 250 ml-flasks, and cultivated at 25°C in dark with 100 rpm shaking.
To analyze the mutations of PvRAS3 in transgenic hairy roots, genomic DNA was extracted.DNA fragments around the target site were PCR-amplified using gene-specific primers, Mut-F: GTCGTTTGCTCCCTTACAAAT, and Mut-R: GATCGAAGTGA AGGAGTCGACG.PCR products were sequenced using the primer Mut-F.Hairy roots generated from leaf discs through inoculation with Agrobactrium without the constructs were used as a control.UPLC analysis of chemical compounds was performed using the ACQUITY UPLC I-Class system (Waters).Three biological and three technological replicates were carried out for analysis of each transgenic hairy root line.
The identified genes are distributed on the 14 chromosomes of the whole genome assembly of P. vulgaris (Supplementary Figure 2) (Zhang et al., 2024).The deduced proteins have length varying from 309 to 709 amino acid residues, pI varying from 5.29 to 9.22, and molecular weight varying from 34.00 to 77.01 (Table 1).All of them do not contain transmembrane regions and were predicted to be localized in the cytoplasm, endoplasmic reticulum, peroxisome, chloroplast and mitochondrion, respectively (Table 1).The predicted localization of PvPALs, PvHCTs and PvRASs in the cytoplasm is consistent with the experimental results from tobacco, S. miltiorrhiza (Achnine et al., 2004;Di et al., 2013).The predicted localization of PvC4Hs and PvCTP98As in the endoplasmic reticulum is consistent with the experimental results from Populus, S. miltiorrhiza, and P. vulgaris (Ro et al., 2001;Di et al., 2013;Ru et al., 2017b).Pv4CLs, PvTATs and PvHPPRs were predicted to be localized in the peroxisome, chloroplast and mitochondrion, respectively.However, Peucedanum praeruptorum 4CL, P. vulgaris TAT and S. miltiorrhiza HPPR were previously found to be located in the cytoplasm (Liu et al., 2017;Wang et al., 2017;Ru et al., 2017a).Thus, the actual subcellular localization of Pv4CLs, PvTATs and PvHPPRs remain to be experimentally analyzed.

Characterization and expression analysis of genes involved in the general phenylpropanoid pathway
The general phenylpropanoid pathway involves three enzymes, including phenylalanine ammonia lyase (PAL), cinnamate 4hydroxylase (C4H), and 4-coumaroyl CoA ligase (4CL) (Figure 1).PAL catalyzes the conversion of L-phenylalanine to trans-cinnamic acid through deamination of L-phenylalamine.It is the first reaction of the general phenylpropanoid pathway and a rate limiting step mediating the influx from primary metabolism into the general phenylpropanoid pathway (Raes et al., 2003).In a plant, PAL is usually encoded by a small gene family.For instance, there are two PAL genes in tobacco, three in S. miltiorrhiza, and four in Arabidopsis (Raes et al., 2003;Achnine et al., 2004;Wang et al., 2015).Genome-wide analysis showed that there were four putative PvPAL genes in P. vulgaris (Table 1), all of which contained an intron and had similar gene structures (Figure 2A).qRT-PCR analysis showed that the four PvPAL genes were differentially expressed (Figure 2B).RNA-seq analysis showed that PvPAL1 and PvPAL2 were expressed relatively higher than the other two PvPALs (Figure 3A).The expression patterns revealed between qRT-PCR and RNA-seq were largely consistent for PvPALs and other genes analyzed hereinafter.However, discrepancy was also observed, which could be results from the difference of detection technologies, plant tissues, data analysis method, or other unknown factors.Analysis of the deduced PvPAL proteins showed that all of them contained the conserved "GTITASGDLVPLSYIA" motif involved in substrate binding and catalytic activity and the conserved "FL" residues impartent for substrate specificity (Poppe and Retey, 2005;Watts et al., 2006;Ma et al., 2013).In addition, other three conserved catalytic active sites, including "GLALVNG", "NDN" and "HNQD", were also found (Supplementary Figure 3) (He et al., 2020).It indicates that all of the four identified PvPALs have catalytic activity.Phylogenetic analysis of PALs from P. vulgaris, S. miltiorrhiza, Arabidospsis, Populus trichocarpa and various other plant showed that PvPAL1, PvPAL2 and PvPAL4 were grouped with SmPAL1, SmPAL3 and MoPAL involved in RA biosynthesis (Figure 2C) (Weitzel and Petersen, 2010;Song and Wang, 2011;Hou et al., 2013).Taken together with previous results for PvPAL1 (Kim et al., 2014), the presence of RA in roots, stems, leaves and flowers (Kim et al., 2014), and the results from gene expression analysis, we speculated that PvPAL1 and PvPAL2 could be the main PvPALs for RA biosynthesis.C4H catalyzes the hydroxylation of trans-cinnamic acid to pcoumaric acid (Figure 1).It is encoded by the members of CYP73A gene subfamily.Genome-wide analysis showed that there were three putative PvC4H genes in P. pulgaris.All of them contained two introns (Figure 2D).PvC4H1 and PvC4H2 showed similar expression patterns with the highest expression in roots, followed by spikes, stems and leaves (Figures 2E, 3A).High expression of PvC4H1 and PvC4H2 is consistent with high content of RA in roots, stems, leaves and flowers (Kim et al., 2014).The expression of PvC4H3 was very low in four tissues analyzed (Figure 3A).Analysis of the deduced proteins showed that all of the three PvC4Hs contained five conserved P450 motifs, including the proline-rich motif "PPGP", the oxygen binding motif "AAIETT", the "ETLR" motif, the "PERF" motif, and the heme-binding motif "FGVGRRSCPG" (Supplementary Figure 4) (Khatri et al., 2023).Phylogenetic analysis of C4Hs from P. vulgaris, S. miltiorrhiza, Arabidospsis, P. trichocarpa and various other plants showed that PvC4H1 and PvC4H2 were grouped with SmC4H1 involved in RA biosynthesis (Figure 2F), indicating the involvement of PvC4H1 and PvC4H2 in RA biosynthesis (Xiao et al., 2011;Kim et al., 2014;Wang et al., 2015).
4CL is the third and the last enzyme of the general phenylpropanoid pathway.It catalyzes the thioesterification of pcoumaric acid (Figure 1).The product, p-coumaroyl-CoA, can be funneled into downstream branch pathways for lingnins, flavonoids, coumarins, lignans, and RA (Deng and Lu, 2017).4CL is encoded by a multiple gene family.For instance, there are seventeen Pt4CLs in P. trichocarpa, ten Sm4CLs in S. miltiorrhiza, and thirteen At4CLs and At4CL-likes in Arabidopsis (Raes et al., 2003;Shi et al., 2010;Wang et al., 2015).Genome-wide analysis showed that there were seventeen putative Pv4CL genes with 3-5 introns in P. pulgaris (Figure 4A).Gene expression analysis showed that the seventeen Pv4CLs had differential expression patterns (Figure 4B).RNA-seq analysis showed that the levels of Pv4CL1, Pv4CL3, Pv4CL8, Pv4CL10 and Pv4CL11 were relatively high, whereas the levels of Pv4CL9 and Pv4CL12-Pv4CL17 were very low in the four tissues analyzed (Figure 3B).
It is generally known that the 4CL proteins contain two conserved motifs, including Box I with the representative sequence "SSGTTGLPKGV" and Box II with the representative sequence "GEICIRG" (Uhlmann and Ebel, 1993).Box I is conserved in adenylate-forming enzymes and involved in adenosine monophosphate (AMP)-binding.Box II is conserved in 4CL and related to the spatial conformation of the enzyme (Wang et al., 2022b).Sequence alignment of the seventeen Pv4CL proteins showed that Pv4CLs also had the two conserved motifs.However, their sequences were divergent (Supplementary Figure 5).It indicates that the identified seventeen Pv4CLs could be functionally diverse.Phylogenetic analysis of 4CLs from P. vulgaris, S. miltiorrhiza, Arabidospsis, rice and various other plants showed that Pv4CL1, Pv4CL3, Pv4CL6, Pv4CL7 and Pv4CL8 were grouped with Mo4CL1, Sm4CL2 and Sm4CL3 involved in RA biosynthesis (Figure 4C) (Zhao et al., 2006;Weitzel and Petersen, 2010;Wang et al., 2015).These Pv4CLs could be associated with RA biosynthesis in P. vulgaris (Kim et al., 2014).
PvTAT3 (Figures 3C, 5B).Sequence alignment of the seven PvTAT proteins showed that all of them contained the conserved Motif 1 for aminotransferase family-I pyridoxal phosphate binding site and Motif 2 with the highly conserved residue Arg (Supplementary Figure 6) (Lu et al., 2013;Wang et al., 2018).Phylogenetic analysis of seven PvTATs and TATs from S. miltiorrhiza, Arabidopsis and other plants showed that PvTAT3 and PvTAT4 were clustered with SmTAT1 and PfTAT involved in RA biosynthesis (Figure 5C) (Xiao et al., 2011;Lu et al., 2013).PvTAT3 was previously shown to participate in the biosynthesis of RA in P. vulgaris and its high expression in four tissues analyzed is consistent with the accumulation of RA (Figure 5B) (Kim et al., 2014;Ru et al., 2017a).PvTAT4 could be a novel PvTAT playing a redundant role with PvTAT3 in RA biosynthesis.
HPPR, belonging to the family of D-isomer-specific 2hydroxyacid dehydrogenases, is the other enzyme involved in the tyrosine-derived pathway.It catalyzes the conversion of pHPP to pHPL (Kim et al., 2004) (Figure 1).HPPR is encoded by a small gene family, such ast there are three SmHPPR genes in S. miltiorrhiza and four in Arabidopsis (Wang et al., 2015;Xu et al., 2018).Its involvement in RA biosynthesis has been verified through functional analysis of MoHPPR from Melissa offcianalis (Mansouri and Mohammadi, 2021), CsHPPR from Coleus scutellarioides (Kim et al., 2004), and SmHPPR1 from S. miltiorrhiza (Xiao et al., 2011;Wang et al., 2017).Genome-wide analysis showed that there were four PvHPPR genes in P. vulgaris, all of which contained one intron and had similar structures (Figure 5D).All of them showed differential expression patterns, and the overall expression level of PvHPPR1-PvHPPR3 was higher than PvHPPR4 (Figures 3C,5E).Amino acid sequence alignment showed that all four PvHPPR proteins contained the NAD(P)H binding motif with the representative sequence "GLGRIG" and the putative myristylation site with the representative sequences "GTVETR" and "GNLEA" (Supplementary Figure 7) (Wang et al., 2017).Phylogenetic analysis of four PvHPPRs and HPPRs from S. miltiorrhiza, Arabidopsis and other plants showed that PvHPPR1 and PvHPPR3 were grouped with SmHPPR1, MoHPPR and CsHPPR involved in RA biosynthesis (Figure 5F).It suggests that PvHPPR1 and PvHPPR3 could be involved in the biosynthesis of RA in P. vulgaris.The function of PvHPPR2 and PvHPPR4 remain to be elucidated.Expression of RA biosynthesis-related genes in roots, stems, leaves and spikes of P. vulgaris.(A-E) Hierarchical clustering of the expression levels of RA biosynthesis-related genes analyzed using RNA-seq clean data.
Genome-wide analysis showed that there were three PvHCT and eight PvRAS genes in P. vulgaris (Table 1).Except that PvRAS4 had no intron and PvRAS8 had two introns, other three PvHCTs and six PvRASs contained an intron and shared similar gene structures (Figure 6A).qRT-PCR analysis showed that the three PvHCTs showed differential expression (Figure 6B).RNA-seq analysis showed that PvHCT1 had the highest expression, Gene structures, expression patterns and phylogenetic analysis of Pv4CL genes and their deduced proteins.(A) The intron-exon structures of Pv4CL genes.(B) Fold changes of Pv4CL gene expression in roots, stems, leaves and spikes of P. vulgaris plants.The expression level in leaves was arbitrarily set to 1, respectively.(C) Phylogenetic analysis of 4CL proteins.The unrooted Neighbor-Joining tree was constructed using the MEGA program (version 7.0) with default parameters.4CLs included are seventeen Pv4CLs and other 4CLs from S. miltiorrhiza (Sm), Arabidopsis (At), rice (Os), M. officinalis (Mo), and P. trichocarpa (Pt) (Supplementary Table 2).
followed by PvHCT2 (Figure 3D).The expression of PvHCT3 was very low (Figure 3D).It indicates that, among the three PvHCTs, PvHCT1 could be most likely to be involved in RA biosynthesis in P. vulgaris.Among the eight PvRASs, PvRAS3 showed the highest expression and was highly expressed in spikes, followed by stems, leaves, and roots (Figures 3D, 6B).The expression of other seven PvRASs was relatively low in the tissues analyzed (Figure 3D).
Sequence alignment of PvHCT and PvRAS proteins showed that all of them contained the conserved "HXXXD" and "DFGWG" motifs (Supplementary Figure 8) (Berger et al., 2006).Phylogenetic analysis of three PvHCTs, eight PvRASs, and HCTs and RASs from various other plants showed that RASs and HCTs separated into two clades (Figure 6C).All HCTs were clustered in one clade, whereas all RASs were clustered in the other one.In addition, the RAS clade   could be divided into two sub-clades.PvRAS3 was clustered with functionally known RASs from C. blumei, Melissa officinalis, Lavandula Angustifolia and S. miltiorrhiza in a sub-clade (Berger et al., 2006;Landmann et al., 2011;Sander and Petersen, 2011;Weitzel and Petersen, 2011;Di et al., 2013;Zhou et al., 2018;Fu et al., 2020).Taken together with the high expression of PvRAS3 gene (Figure 6B), the results suggest the involvement of PvRAS3 in RA biosynthesis.The other RAS sub-clade included PvRAS1, PvRAS2, PvRAS4-PvRAS8, and four putative SmRASs (Figure 5C).The function of these RASs is currently unknown.Among them, the expression of PvRAS2, PvRAS4 and PvRAS8 showed similar patterns with RA distribution (Kim et al., 2014) (Figure 6B).It indicates that these PvRASs could also be associated with RA biosynthesis.
C3H is the other enzyme involved in downstream of the RA biosynthetic pathway (Figure 1).It catalyzes the hydroxylation of pcoumaroyl shikimic acid, a shikimate ester of p-coumarate generated from p-coumaroyl-CoA and shikimate under the catalysis of HCT, into caffeoyl shikimic acid, a shikimate ester of caffeic acid (Schoch et al., 2001).C3H is a cytochrome P450 encoded by members of the CYP98 gene family (Schoch et al., 2001;Franke et al., 2002;Ralph et al., 2006).Similarily, the enzyme involved in the final step of RA biosynthetic pathway is also a cytochrome P450 encoded by members of the CYP98 gene family.It introduces the hydroxyl group(s) to the products of RAS (Eberle et al., 2009;Di et al., 2013;Levsh et al., 2019;Liu et al., 2019;Fu et al., 2020) (Figure 1).

PvRAS3 and PvRAS4 were involved in RA biosynthesis in vitro
Based on gene expression patterns, phylogenetic relationships and the significance of RAS in RA biosynthesis, PvRAS3 and PvRAS4 were selected for functional analysis using experimental approaches.Among them, PvRAS3 showed the highest expression among the eight PvRAS genes identified and was clustered with functionally known RASs in the phylogenetic tree constructed (Figures 6B, C).PvRAS4 was one of the three PvRAS genes with expression patterns similar to RA distribution patterns (Kim et al., 2014) (Figure 6B).PvRAS3 and PvRAS4 cDNAs were cloned and introduced into E. coli competent cells.Recombinant proteins were induced and purified (Supplementary Figure 10).To assay the activity of recombinant PvRAS3 enzyme, p-coumaroyl-CoA or caffeoyl-CoA was used as the acyl donor, and pHPL or DHPL was used as the acyl acceptor.Negative controls were performed with E. coli BL21 (DE3) cells transformed with the empty pGEX-4T-1 vector (Supplementary Figure 11).LC-MS/MS analysis showed that PvRAS3 could catalyze the condensation of acyl donors and acceptors to generate four compounds, respectively (Figures 8A-D).Compound 1, generated through the condensation of caffeoyl-CoA and DHPL, was identified as RA based on UPLC and LC-MS/MS analyses (Figures 8A, E) and previous publication (Landmann et al., 2011;Di et al., 2013).Compounds 2 (Figure 8B) and 3 (Figure 8C) had the formula C 18 H 16 O 7 according to the MS spectra in negative mode [M-H] -(m/z = 343) (Figures 8F, G).They corresponded to an ester of caffeoyl-CoA and DHPL (caffeoyl-4'hydroxyphenyllactic acid) or p-coumaroyl-CoA and pHPL (4coumaroyl-3',4'-dihydroxyphenyllactic acid), respectively, as described previously (Landmann et al., 2011).Compound 4 was determined to be the ester of p-coumaroyl-CoA and pHPL based on the negative ion spectra (m/z = 327, C 18 H 15 O 6 ) (Figures 8D, H) and previous publication (Landmann et al., 2011).
For kinetic analysis of PvRAS3, p-coumaroyl-CoA or caffeoyl-CoA was used as the acyl donor and pHPL or DHPL was used as the acyl acceptor.To test the acceptor specificity, the concentration of the donor substrates remained saturated, while the levels of the acceptor substrates were varied (Supplementary Figure 12).When  2).
caffeoyl-CoA was used as the donor, the Km values of PvRAS3 with DHPL and pHPL were 197.6 and 166.8 mM, respectively (Table 2).Using p-coumaroyl-CoA as the donor, the Km values of DHPL and pHPL were 41.7 and 32.1 mM, respectively.The results suggested that the Km values of DHPL and pHPL with p-coumaroyl-CoA were lower than those with caffeoyl-CoA.To assess the donor affinity, the concentrations of the acceptors (DHPL and pHPL) were kept constant while the levels of caffeoyl-CoA and pcoumaroyl-CoA were varied.The results showed that the Km value of PvRAS3 for Caffeoyl-CoA was approximately 5.7-fold higher than that for p-coumaroyl-CoA (Table 2).Overall, PvRAS3 exhibited a high affinity toward DHPL and pHPL when p-coumaroyl-CoA was used as the acyl donor.
Similarly, p-coumaroyl-CoA, caffeoyl-CoA, pHPL and DHPL were also used as substrates for the analysis of PvRAS4 (Supplementary Figure 13).LC-MS/MS analysis showed that products could be detected when p-coumaroyl-CoA was used as acyl donor and pHPL or DHPL were used as acyl acceptor (Figures 9B, D).Based on MS spectra, the products were identical to compounds 2 and 4 catalyzed by PvRAS3 (Figures 9E, F).The K cat / K m values of PvRAS4 for p-coumaroyl-CoA and pHPL or DHPL were smaller than those of PvRAS3 (Table 3).No product was found when caffeoyl-CoA was used as acyl donor (Figures 9A, B).The results indicated that PvRAS4 could use p-coumaroyl-CoA, but not caffeoyl-CoA, as acyl donor.However, its affinity toward p-coumaroyl-CoA was lower than PvRAS3.To analyze whether the products exist in P. vulgaris plants, phenolic acid compounds were extracted from roots, stems, leaves and flowers and analyzed using UPLC and LC-MS/MS as described (Kim et al., 2014).The results showed that all of the four products, including RA, 4-coumaroyl-3',4'-dihydroxyphenyllactic acid, 4coumaroyl-4'-hydroxyphenyllactic acid and caffeoyl-4'hydroxyphenyllactic acid, could be detected in the tissues analyzed (Figure 10; Supplementary Figure 14).RA was highly accumulated at the level of mg g -1 fresh weight (FW) with the highest level of 6.1 mg g -1 FW in flowers and the lowest level of 3.1 mg g -1 FW in stems (Figure 10A).The contents of 4-coumaroyl-4'hydroxyphenyllactic acid ranged from 51.9 μg g -1 FW in roots to 140.3 μg g -1 FW in leaves (Figure 10B).The contents of 4coumaroyl-4'-hydroxyphenyllactic acid were relatively low, which

Saturating subatrate
Varying substrate  ranged from 32.9 μg g -1 FW in roots to 93.5 μg g -1 FW in flowers ( F i g u r e 1 0 C ) .T h e c o n t e n t s o f 4 -c o u m a r o y l -3 ' , 4 'dihydroxyphenyllactic acid ranged from 133.3 μg g -1 FW in roots to 368.0 μg g -1 FW in flowers (Figure 10D).The contents of RA and 4-coumaroyl-3',4'-dihydroxyphenyllactic acid in P. vulgaris were higher but comparable to those in lavender flowers, which were 2 mg g -1 FW and 150 μg g -1 FW, respectively (Landmann et al., 2011).
To our best knowledge, this is the first report to detect 4-coumaroyl-4'-hydroxyphenyllactic acid and caffeoyl-4'-hydroxyphenyllactic acid in plants.In addition, we also analyzed the contents of 3,4dihydroxyphenyllactic acid and 4-hydroxyphenyllactic acid.Among them, 3,4-dihydroxyphenyllactic acid was highly accumulated in roots, leaves and flowers (Figure 10E), whereas the content of 4-hydroxyphenyllactic acid was relatively higher in leaves and flowers than roots and stems (Figure 10F).The results suggest that both the substrates and products of the four reactions catalyzed by PvRAS3 and/or PvRAS4 in vitro exist in P. vulgaris plants.
CRISPR/Cas9-mediated functional analysis of PvRAS3 in P. vulgaris hairy roots Gene expression profiling and In vitro enzyme activity assay showed that PvRAS3 could be the main RAS catalyzing RA biosynthesis in P. vulgaris.To gain further insight into the involvement of PvRAS3 in RA biosynthesis, we designed two guide RNAs (gRNAs) targeting the first coding exon of PvRAS3 for the CRISPR/Cas9 gene-editing tool.Five lines of transgenic hairy roots with the same insertion and deletion patterns in PvRAS3 gene were obtained (Figure 11A).It indicated that these transgenic lines were homozygous mutants of PvRAS3, hereinafter referred to as pvras3-1, pvras3-2, pvras3-3, pvras3-4 and pvras3-5, respectively.Analysis of RA, 4-coumaroyl-3',4'-dihydroxyphenyllactic acid, 4-coumaroyl-4'hydroxyphenyllactic acid and caffeoyl-4'-hydroxyphenyllactic acid showed that the contents of these compounds decreased significantly in pvras3 mutants with the contents of RA and caffeoyl-4'hydroxyphenyllactic acid almost below the detection limit  E-G).On the contrary, the contents of DHPL and pHPL increased significantly in the mutants (Figures 10C, D).It confirmed the catalytical function of PvRAS3 and its significance in RA biosynthesis.

Conclusions
P. vulgaris is a species of the Lamiaceae family with significant medicinal value.RA is one of the major bioactive components in P. vulgaris medicinal materials.Through genome-wide analysis, a total of 51 P. vulgaris genes belonging to seven RA biosynthesis-related gene families were identified.Subsequent gene and protein feature analysis, gene expression analysis and phylogenetic relationship analysis showed that seventeen of them, including PvPAL1, PvPAL2, PvC4H1, PvC4H2, Pv4CL1, Pv4CL3, Pv4CL6, Pv4CL7, Pv4CL8, PvTAT3, PvTAT4, PvHPPR1, PvHPPR3, PvRAS3, PvRAS4, PvCYP98A-1 and PvCYP98A-2, could be involved in RA biosynthesis.In vitro enzymatic assay showed that both of PvRAS3 and PvRAS4 were involved in RA biosynthesis.PvRAS3 could catalyze the condensation of p-coumaroyl-CoA and caffeoyl-CoA with pHPL and DHPL.The affinity of PvRAS3 toward p-coumaroyl-CoA was higher than caffeoyl-CoA.PvRAS4 only catalyzed the condensation of pcoumaroyl-CoA with pHPL and DHPL.The affinity of PvRAS4 toward p-coumaroyl-CoA was lower than PvRAS3.These results were consistent with in vivo phenolic acid compound determination and PvRAS3 transgenic analysis.UPLC analysis of phenolic acid compounds showed the existence of RA, 4-coumaroyl-3',4'dihydroxyphenyllactic acid, 4-coumaroyl-4'-hydroxyphenyllactic acid and caffeoyl-4'-hydroxyphenyllactic acid in roots, stems, leaves and flowers of P. vulgaris.Generation of pvras3 homozygous mutants through CRISPR/Cas9 technology and subsequent chemical compound analysis showed that the contents of RA, 4-coumaroyl-3',4'-dihydroxyphenyllactic acid, 4-coumaroyl-4'-hydroxyphenyllactic acid and caffeoyl-4'-hydroxyphenyllactic acid decreased significantly, whereas the contents of DHPL and pHPL increased significantly in pvras3 mutants.These results indicate the existence of four possible RA biosynthetic routes in P. vulgaris, which remains to be further confirmed through the analysis of PvCYP98A genes (Figure 1).Among them, routes 1 and 2 could be the main routes.PvRAS3 was the main enzyme catalyzing the condensation of acyl donors and acyl acceptors during RA biosynthesis in P. vulgaris.PvRAS4 could play a minor role.Further functional analysis of other fourteen candidate genes, particularly PvCYP98A-1 and PvCYP98A-2, may provide a more complete picture of RA biosynthetic pathway.

TABLE 1
Sequence features of RA biosynthesis-related genes in P. vulgaris.

TABLE 2
Kinetic parameters of recombinant PvRAS3 toward different substrates.

TABLE 3
Kinetic parameters of recombinant PvRAS4 toward different substrates.