Tandem UGT71B5s Catalyze Lignan Glycosylation in Isatis indigotica With Substrates Promiscuity

Lignans are a class of chemicals formed by the combination of two molecules of phenylpropanoids with promising nutritional and pharmacological activities. Lignans glucosides, which are converted from aglycones catalyzed by uridine diphosphate (UDP) glycosyltransferases (UGTs), have abundant bioactivities. In the present study, two UGTs from Isatis indigotica Fort., namely IiUGT71B5a and IiUGT71B5b, were characterized to catalyze the glycosylation of lignans with promiscuities toward various sugar acceptors and sugar donors, and pinoresinol was the preferred substrate. IiUGT71B5a was capable of efficiently producing both pinoresinol monoglycoside and diglycoside. However, IiUGT71B5b only produced monoglycoside, and exhibited considerably lower activity than IiUGT71B5a. Substrate screening indicated that ditetrahydrofuran is the essential structural characteristic for sugar acceptors. The transcription of IiUGT71B5s was highly consistent with the spatial distribution of pinoresinol glucosides, suggesting that IiUGT71B5s may play biological roles in the modification of pinoresinol in I. indigotica roots. This study not only provides insights into lignan biosynthesis, but also elucidates the functional diversity of the UGT family.

Generally, glycosylation is catalyzed by UGT, which transfers a sugar moiety from a UDP-sugar to an acceptor molecule (Noguchi et al., 2008). To gain insights into lignan biosynthesis, we identified two lignan UGT genes, named IiUGT71B5a and IiUGT71B5b, which are responsible for glucosylation at the 4-position of pinoresinol. Meanwhile, the comprehensive catalytic properties and expression profiles of these two IiUGTs were also characterized. Thus, our findings will be important for understanding the biosynthesis of lignans, as well as for elucidating the functional diversity of the UGT family.

Plant Materials
Isatis indigotica Fort. and Nicotiana benthamiana: the seeds were kept in our laboratory. The plants were growing under a constant temperature of 25°C, light for 16 h, and a constant temperature of 18°C in a dark environment for 8 h, with humidity of approximately 75%.

UHPLC-Q-TOF/MS Based Metabolic Profiling
Lignans and phenylpropanoids profiling was carried out on an Agilent 1290A Infinity II ultra-performance liquid chromatography (UHPLC) system coupled with an Agilent 6530A accurate mass quadrupole-time of flight mass spectrometer (Q-TOF/MS; Agilent, United States) equipped with a dual AJS electrospray ionization source (ESI) operated in negative ion mode. The parameters were as follows: nitrogen drying gas temperature, 350°C; flow, 11 L•min −1 ; nebulizer pressure, 45 psi; sheath gas temperature and flow rate were the same as those of the drying gas; capillary voltage, 4 kV; fragment voltage, 120 V; skimmer voltage, 60 V; octopole 1 RF peak voltage, 750 V; and mass range, 100-3,200 m/z. Chromatographic separations were performed using an Agilent Poroshell 120 SB-C18 column (2.7 μm, 2.1 mm × 150 mm; Agilent) at 35°C with the mobile phase consisting of (0.01% formic acid + 2 mM ammonium acetate) aqueous solution (phase A) and mass spectrometry grade acetonitrile (phase B), and the following elution method: 5% ACN at 0 min, 20% ACN at 2 min, 25% ACN at 10 min, 95% ACN at 20 min, and a final 4.5 min of equilibration post run. The injection volume was 3.0 μl, and the flow rate was 0.3 ml/min. The main mass spectrometry parameters of the target compound were all designated in the negative ion mode of UHPLC-Q-TOF/ MS. Mass spectrometry parameters of the target compound in the negative ion mode of UHPLC-Q-TOF/MS are listed in Supplementary Table S1. All data acquisition and analysis were controlled by Agilent MassHunter Workstation Software (Agilent Technologies, United States).

LC/MS Based Lignans and Phenylpropanoids Assay
The liquid phase mass spectrometer (LC/MS) was an Agilent 1200-6410 LC/MS, the chromatographic column was an Agilent ZORBAX SB-C18 (3.5 μm, 2.1 mm × 100 mm), the column temperature was 30°C, the flow rate was controlled at 0.3 ml/min, and the injection volume was 5 μl. The mobile phase was composed of acetonitrile (phase A) and 5 mM ammonium acetate aqueous solution (phase B), and the elution method was as follows: 14% ACN at 0 min, 50% ACN at 6 min, 85% ACN at 6.5 min, 85% ACN at 12 min, and a final 4.5 min of equilibration post run. The main mass spectrometry parameters of the target compound were all designated in the negative ion mode of LC/MS 6410 (Supplementary Table S2).

Expression and Purification of UGTs
The coding regions of each UGT gene were subcloned into the pET-32a + expression vector and then transformed into Escherichia coli strain BL21(DE3; primers are listed in Supplementary Table S3). The cell cultures were induced by 1 mM isopropyl-β-Dthiogalactoside (IPTG) until the OD 600 reached 0.5-0.7. After 10-16 h of incubation at 16°C at 200 rpm, the cells were harvested by centrifugation at 4°C. The tagged recombinant proteins were purified by Ni-NTA affinity chromatography (Bio-Scale Mini Profinity IMAC Cartridges, BIO-RAD, United States).

Activity Assays in vitro
Pinoresinol, (+)-pinoresinol-4-O-glucoside, lariciresinol, secoisolariciresinol, matairesinol, isolariciresinol, phillygenin, sesaminol, clemaphenol A, and coniferyl alcohol were selected as sugar acceptors. UDP-glucose, UDP-Xyl, UDP-Rha, UDP-Ara, and UDP-galacturonic acid were tested as sugar donors. The reaction was carried out in 50 μl of 100 mM phosphate buffer (pH 8.0), containing 2 mM sugar donor, 200 μM substrate, and 1 μg of purified protein. The reaction mixture without enzyme was preincubated at 30°C for 10 min, and then the purified protein was added and incubated at 30°C for 5 min to 12 h. The reaction was stopped by adding 150 μl of absolute ethanol. The reaction solution was evaporated to dryness, and reconstituted with methanol before chemical analysis.

Sequence Analyses
Multiple sequence alignments of target UGTs were performed using the Clustal-W program, and phylogenetic trees were constructed using MEGA 7.0 (Kumar et al., 2016). The neighborjoining statistical method was used to calculate the phylogenetic tree, with 1,000 bootstrap replications. The homology models of IiUGT71B5a and IiUGT71B5b were built using the crystal structure of Medicago truncatula UGT71G1 [Shao et al., 2005; Protein Data Bank (PDB) code: 2acv.1.A] as a template with the SWISS-MODEL server at http://swissmodel.expasy.org. UDP-glucose and sesaminol bound in GTB were taken as the sugar donor and sugar acceptor, respectively, and were docked into the built model of IiUGT71B5a using Autodock 4.2. The models were visualized with the PyMOL molecular graphics system. 1

Transcription Analysis
Total RNA was extracted by a TransZol Plus RNA Kit (TransZol Up Plus RNA Kit, ER501, TransGen, China). cDNA was synthesized by one-step reverse transcription (PrimeScript™ 1st Strand cDNA Synthesis Kit, 6110A, TAKARA, China) using 2 μg of total RNA as a template. Gene expression levels were detected using real-time quantitative PCR [qRT-PCR; TB Green ® Premix Ex Taq™ (Tli RNaseH Plus), RR420A, TAKARA, China; QuantStudio™ 3, Applied Biosystems, United States]. Gene specific primers are listed in Supplementary Table S3. Each group of samples had six biological replicates, and each biological replicate was assayed three times.

Subcellular Localization of UGTs
The coding regions of IiUGT71B5a and IiUGT71B5b were cloned into the plant expression vector PHB-yellow fluorescent protein (YFP; Primers are listed in Supplementary Table S3). PHB-YFP vectors carrying IiUGT71B5a and IiUGT71B5b were transferred into Agrobacterium tumefaciens strain GV3101. Cultures were inoculated in 10 ml of YEB medium (containing 75 μg•ml −1 kanamycin and 25 μg•ml −1 rifampicin) overnight (28°C, 200 rpm) and collected by centrifugation (5,000 g, 10 min, RT). The collected cells were resuspended in Murashige and Skoog medium (10 mM MES, 100 μM acetosyringone, pH 5.8) to a final OD 600 of 0.3-0.6. After incubation for 2-3 h at RT, the mixed A. tumefaciens was injected into the abaxial surface of leaves of 4-week-old N. benthamiana plantlets by needle-free syringes. The infected leaves were harvested 48-72 h after infiltration. Agrobacterium tumefaciens containing PHB-YFP was infiltrated as a negative control. The YFP fluorescence was imaged using a laser scanning confocal microscope (Leica TCS SP3, Germany).

Synteny and Collinearity Analysis in Plant Genomes
Syntenic blocks were assigned via all-by-all BLASP with cutoffs of identity ≥40% and e-value ≤ 1e −10 . Synteny comparison and Microsynteny visualization were performed using JCVI with LASTAL (Tang et al., 2008).

Data Availability
The sequence data of IiUGT71B5a, IiUGT71B5b, and IiUGT71B5c have been submitted to the GenBank databases under accession numbers: MW051594, MW051595, and MW051596, respectively.

Identification of UGTs
To annotate UGT genes from the I. indigotica genome (VHIU00000000; Kang et al., 2020), HMMER was used to search UGTs according to the plant secondary product glycosyltransferase (PSPG) motif (Yonekura-Sakakibara and Hanada, 2011;Caputi et al., 2012). As a result, 83 putative UGTs were identified, which were further assigned to 15 previously characterized groups based on phylogenetic tree construction (Figure 2; Wilson and Tian, 2019). Three IiUGT71B5s (IiUGT71B5a, IiUGT71B5b, and IiUGT71B5c) were suggested to have lignan catalytic activity as they have close phylogenetic relationship with a known pinoresinol glycosyltransferase (FkUGT71A18) from F. koreana (Figure 2; Ono et al., 2010).

Cloning and Functional Characterization of IiUGT71B5a and IiUGT71B5b
IiUGT71B5a has an open reading frame (ORF) of 1,449 bp encoding 482 amino acids (aa), and IiUGT71B5b has an ORF of 1,443 bp encoding 480 aa. However, IiUGT71B5c only shows an ORF of 435 bp encoding a protein (145 aa) without the PSPG motif (Supplementary Figure S3). Thus, IiUGT71B5a and IiUGT71B5b were chosen for further studies. To identify the catalytic capability of the two IiUGTs in vitro, recombinant  Figure 3B, 1b). Notably, 1a and 1b were detected with the same retention time as the authentic compounds of (+)-pinoresinol-4-O-glucoside and pinoresinol diglucoside, respectively ( Figure 3A). The above results suggest that both IiUGT71B5a and IiUGT71B5b may be involved in lignan biosynthesis in I. indigotica, while only IiUGT71B5a may contribute to the production of pinoresinol diglucoside ( Figure 3C).

Sugar Donor Preference of IiUGT71B5a
To determine the sugar donor specificity of IiUGT71B5a, UDP-Glc, UDP-Xyl, UDP-Rha, UDP-Ara, and UDP-glucuronic acid (UDP-GalA) were tested ( Figure 6A). When pinoresinol (1) was used as the substrate, UGT71B5a could efficiently utilize UDP-Glc (conversion rate of pinoresinol > 85%) and UDP-Xyl (conversion rate of pinoresinol > 10%), but not UDP-Rha, UDP-Ara, or UDP-GlcA ( Figure 6B). In support of this, similar results were observed when using (+)-pinoresinol-4-O-glucoside (1a) or lariciresinol (5) as the substrate ( Figure 6C). Consistent with previous report (Zhang et al., 2020), by comparing the structures of the five sugar donors (Figure 6D), we confirmed that the 4-OH configuration of the sugar group is an essential structural trait that affects the selective binding to the sugar donor. Furthermore, diglucosides (1b and 1d) with glucose and xylose moieties were also detected (Supplementary Figure S8).

Modeling and Docking of IiUGT71B5a and IiUGT71B5b
To explore the potential structural basis for the catalytic properties of IiUGT71B5a, molecular docking was performed to model the binding sites. A glycosyltransferase from  The kinetic parameters were analyzed through Lineweaver Burk plots (Supplementary Figure S7) Shao et al., 2005). Binding sites for UDP-glucose were modeled first ( Figure 7A). Several residues (Ile13, Thr141, Ser283, Ala350, Gln352, His367, Ser372, Tyr389, and Gln392) were shown to form hydrogen bonds (2.1-2.7 Å) with UDP-glucose, with a predicted affinity of −9.2 kcal·mol −1 (Figure 7B). Among these residues, most were located on the plant secondary product PSPG motif (Gln352, His367, Ser372, Tyr389, and Gln392; Figure 7C). Given that Gln392 is conserved and considered the critical residue for the preference toward UDP-glucose (Kubo et al., 2004), pinoresinol and pinoresinol-4-O-glucose were then docked into the predicted pocket (IiUGT71B5a-UDP-Glc). Two amino acids (Ser10 and Arg42) were predicted to interact with pinoresinol, with a predicted affinity of −7.7 kcal·mol −1 (Figure 7D). Meanwhile, only one residue (Asp47) was predicted to form a hydrogen bond with pinoresinol-4-O-glucose, with a affinity of −6.9 kcal·mol −1 (Figure 7E). Notably, the two ligands approached UDP-glucose at different angles and positions is in the predicted pocket with an estimated volume of 1,121 Å 3 , which was a broad binding space. This might be caused by the structural basis for the substrate promiscuity of IiUGT71B5s. Interestingly, when comparing the secondary structure of IiUGT71B5s (90% sequence similarity), most of the different residues were located on the surface of the proteins and near the entrance of the pocket (Figure 7F), which might be critical for their difference in catalytic capability.

Correlation Between Spatial Distribution of Lignans and Transcription of IiUGT71B5s
To verify the correlation between IiUGT71B5s and correlated lignan glucosides, chemical profiling of lignans was carried out in different organs (leaf, root) and specific root cells (epidermis and cortex, phloem, xylem, and cambium). LC/MS profiling showed that at least three lignans and three lignan glucosides were present in I. indigotica, including pinoresinol, (+)-pinoresinol-4-O-glucoside, pinoresinol diglucose, lariciresinol, secoisolariciresinol, and secoisolariciresinol monoglucoside (Supplementary Table S5). Only a single lignan was analyzed and found that (+)-pinoresinol-4-O-glucoside and pinoresinol diglucose were mainly distributed in the roots, with the highest accumulation in the epidermis and cortex ( Figure 8A). In contrast, pinoresinol was not detected in any root cells, which could be due to an extremely low accumulation. Besides, as a precursor of lignan biosynthesis, we also characterized coniferyl alcohol and found that coniferyl alcohol was not detected in these tissues (Supplementary Table S5). It is mainly stored in the form of trans-coniferin in plants because coniferyl alcohol may have a toxic effect on cells (Vaisanen et al., 2015). Interestingly, trans-coniferin has a high accumulation in roots, but it is almost undetectable in leaves, and the abundance of lignans in roots is much higher than that in leaves (Supplementary Table S5). These above results indicate that the process of synthesis and accumulation of lignans is mainly carried out in the roots, not in the leaves. The correlated transcription level of IiUGT71B5s was analyzed by qRT-PCR. In different organs, the transcription levels of the two IiUG71B5s in leaves were much higher than those in roots. Among different root cells, the highest transcription levels of both IiUGT71B5a and IiUGT71B5b were detected in the epidermis and cortex cells, followed by phloem cells, and their expression in xylem and cambium cells were slightly lower than those in phloem ( Figure 8B). The transcription of IiUG71B5s and the metabolism of pinoresinol glucosides are highly consistent in the roots of I. indigotica, suggesting that IiUG71B5s could possibly be a major contributor to the accumulation of pinoresinol glucosides in roots. However, the high expressions of IiUG71B5s did not increase the accumulation of pinoresinol glucosides in the leaves, which may be due to the extremely low biosynthesis of pinoresinol in leaves. Obviously, the low accumulation of pinoresinol is also caused by the low abundance of its precursor (trans-coniferin) in the leaves (Supplementary Table S5). In addition, the high transcription levels of IiUG71B5s in leaves imply that they may also assume wider glycosylation functions in plants, and the substrate heterogeneities of IiUGT71B5s also support this conjecture (Figure 5).

Subcellular Localizations of IiUGT71B5a and IiUGT71B5b
To investigate the subcellular localizations of IiUGT71B5a and IiUGT71B5b, their ORFs were fused with YFP at the C-terminus and transiently expressed in N. benthamiana leaves. As shown in Figure 8C, the signals of both fused proteins presented in the cytoplasm, which was consistent with those of previously reported UGTs, such as MtUGT72L1, PhUGT79B31, and PbUGT72AJ2 (Pang et al., 2013;Knoch et al., 2018;Cheng et al., 2019). This result also indicated that the cytoplasm might be the subcellular site for lignan glycoside biosynthesis.

Substrate Heterogeneities of IiUGT71B5s
Although there are diverse types of lignans, only a few UGTs involved in lignan glycosylation have been discovered, including lignans (pinoresinol and lariciresinol) glycosyltransferase UGT71C1 from A. thaliana, pinoresinol glycosyltransferase UGT71A18 from F. koreana, secoisolariciresinol glycosyltransferase UGT74S1 from L. usitatissimum, and sesaminol glycosyltransferases UGT71A9, UGT94D1, UGT94AG1, and UGT94AA2 from S. indicum (Ono et al., 2010Ghose et al., 2014;Okazawa et al., 2015;Teponno et al., 2016;Murata et al., 2017). UGTs, such as AtUGT71C1, FkUGT71A18, and IiUGT71B5s are able to catalyze glycosylation of pinoresinol, however, their substrate catalytic characteristics are not the same. For example, AtUGT71C1 has a more extensive substrate heterogeneity than IiUGT71B5s, which can catalyze phenylpropanoids, flavonoids, as well as lignans (Lim et al., 2003;Okazawa et al., 2015). FkUGT71A18 also exhibits relatively broad sugar acceptor specificity for lignans with a preference for ditetrahydrofuran lignans, but it performs well catalytic activity for phillygenin ( Figure 5B, compound 4) compared with IiUGT71B5s (Ono et al., 2010). The identification of novel lignan glucosyltransferases from I. indigotica based on the similarity to the FsUGT71A18, AtUGT71C1, and SiUGT71A9 highlights the structural conservation of lignan UGTs across plant species (Figure 2). Nevertheless, similar to flavonoid UGTs forming independent phylogenetic clades based on their various regio-specificities (Noguchi et al., 2009), the structural diversity of lignan glucosides strongly suggests that not all lignan UGTs belong to the UGT71 family, such as LuUGT74S1, SiUGT94D1, SiUGT94AG1, and SiUGT94AA2 (Supplementary Figure S2). In addition to the catalytic activities of lignans, glycosylation 3-OH of quercetin by AtUGT71B5 was also reported (Lim et al., 2004). IiUGT71B5s, as homologous genes of AtUGT71B5, are preferred to glycosylate 3'-OH of quercetin, indicating that functional differentiation occurs in UGT71B5s (Figure 2;  Supplementary Figure S11).
Among these known UGTs, most have been validated to have functional plasticity with a wide range of substrate recognition toward a variety of lignans. In this study, the promiscuity of both sugar acceptors and donors of two IiUGT71B5s was demonstrated. On the other hand, although IiUGT71B5s are capable of catalyzing multiple lignan substrates (Figure 5), we surmise that two IiUGT71B5s may contribute mainly to the catalysis of pinoresinol, owing to its substrate preference ( Figure 5B) and high consistency with the accumulation of pinoresinol glucosides in the roots of I. indigotica (Figures 8A,B). Notably, the similar catalytic activities and transcription patterns suggested functional redundancy between IiUGT71B5a and IiUGT71B5b (Laruson et al., 2020). In support of this, a tandem duplication, including three loci (Ii4G26670, Ii4G26680, and Ii4G26690) coding UGT71B5 genes was discovered on chromosome 4 (Supplementary Figure S12), which could be the primary source for redundancy of UGT71B5 genes. In addition, Ii4G26690 (IiUGT71B5c) represented as a partial gene with an ORF region of 435 bp, accompanied by two extra introns (2,009 and 104 bp; Supplementary Figures S3, S13), indicating the pseudogenization of this locus. Although transcription of IiUGT71B5c was detected in the leaves and roots of I. indigotica (Supplementary Figure S9), without the PSPG box it could only become a nonfunctional UGT (Supplementary Figure S3). However, it is also possible that the in vivo activity of IiUGT71B5s does not match its in vitro activity, which has been reported in many plant UGT proteins, including AtUGT73C6, MtUGT78G1, and LjUGTs (Peel et al., 2009;Husar et al., 2011;Yin et al., 2017). Therefore, further validation of the roles of UGT71B5s in planta is required. Interestingly, although sesaminol ( Figure 5B, compound 3) and its glucosides are not produced in I. indigotica, IiUGT71B5a had the strongest activity toward sesaminol, indicating that IiUGT71B5a might be used as an efficient catalytic element for biosynthesis.

Catalytic Properties of IiUGT71B5s
IiUGT71B5s seemed to exhibit similar functions as previously reported lignan UGTs such as FkUGT71A18, with a wide range substrate promiscuity (Ono et al., 2010). However, specificities of substrate were also discovered in IiUGT71B5s catalyzation. First, although IiUGT71B5s catalyzed various lignan substrates, they have showed obvious substrate preferences toward lignans containing ditetrahydrofuran on the skeleton, such as pinoresinol (1), clemaphenol A (2), and sesaminol (3; Figure 5). In addition, among the tested four ditetrahydrofuran lignan substrates (Figure 5A, compounds 1-4), IiUGT71B5s had extremely low activities toward phillygenin (4), the only substrate in the S configuration with a phenolic group at C-1, supporting the strong stereoselectivity of IiUGT71B5s. Furthermore, conserved activities toward sugar donors were also observed. IiUGT71B5s only showed high conversion efficiency with UDP-Glc, which was supposed to correlate with the conserved glutamine (Gln, Q) located at the end of the PSPG motif (Supplementary Figure S3; Kubo et al., 2004).

Evolution of the Lignan Biosynthesis Pathway
Lignans represent eight categories according to the differences in their basic skeleton (Teponno et al., 2016;Fang and Hu, 2018), and they have diverse chemical compositions and distributions in the plant kingdom (Ono et al., 2010;Lau and Sattely, 2015;Murata et al., 2017;Zhang et al., 2019). Thus, the diversity of lignan biosynthesis pathways in different plants provides an ideal example to study the evolution of the origins and the loss of plant chemical diversity. For example, the major lignans accumulated in the roots of A. thaliana are lariciresinol glucosides but not secoisolariciresinol, which is determined by variance in the activity of PLR (Nakatsubo et al., 2008). In this study, we found that the pinoresinol in I. indigotica is not only catalyzed by PLR to produce lariciresinol and secoisolariciresinol (Xiao et al., 2015;Zhang et al., 2016), but also accumulates in the form of its glucosides (Figure 1; Supplementary Table S5), which may be influenced by functional differentiation of UGT71B5s. In addition, some categories of lignans are not produced in I. indigotica, such as sesaminol and podophyllotoxin, which is due to the functional diversity of gene families involved in the lignan biosynthesis network, including CYPs (Murata et al., 2017;Harada et al., 2020), OMTs (Lau and Sattely, 2015), and UGTs . Studying the activity, diversity, and evolution of these families will help to reveal mechanisms for the diversity of lignans in plants.
In summary, we identified two UGTs that may primarily contribute to the modification of pinoresinol. We discussed the structural insights of their functional diversity, which will provide an in-depth understanding of lignan biosynthesis and the functional diversity of the UGT family in plants. In addition, these novel UGTs may facilitate further enzyme engineering to produce bioactive lignan glucosides.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
WC, LZ, and JC were the leading investigators of this research program. JC and XC designed the experiments and wrote the manuscript. XC performed the most of experiments and analyzed the data. JF performed the molecular docking. YW, SL, YX, and YD assisted in experiments and discussed the results. All authors contributed to the article and approved the submitted version.