Structural and Biochemical Insights Into Two BAHD Acyltransferases (AtSHT and AtSDT) Involved in Phenolamide Biosynthesis

Phenolamides represent one of the largest classes of plant-specialized secondary metabolites and function in diverse physiological processes, including defense responses and development. The biosynthesis of phenolamides requires the BAHD-family acyltransferases, which transfer acyl-groups from different acyl-donors specifically to amines, the acyl-group acceptors. However, the mechanisms of substrate specificity and multisite-acylation of the BAHD-family acyltransferases remain poorly understood. In this study, we provide a structural and biochemical analysis of AtSHT and AtSDT, two representative BAHD-family members that catalyze the multisite acylation of spermidine but show different product profiles. By determining the structures of AtSHT and AtSDT and using structure-based mutagenesis, we identified the residues important for substrate recognition in AtSHT and AtSDT and hypothesized that the acyl acceptor spermidine might adopt a free-rotating conformation in AtSHT, which can undergo mono-, di-, or tri-acylation; while the spermidine molecule in AtSDT might adopt a linear conformation, which only allows mono- or di-acylation to take place. In addition, through sequence similarity network (SSN) and structural modeling analysis, we successfully predicted and verified the functions of two uncharacterized Arabidopsis BAHD acyltransferases, OAO95042.1 and NP_190301.2, which use putrescine as the main acyl-acceptor. Our work provides not only an excellent starting point for understanding multisite acylation in BAHD-family enzymes, but also a feasible methodology for predicting possible acyl acceptor specificity of uncharacterized BAHD-family acyltransferases.

All these phenolamide synthesis enzymes belong to the acyl-coenzyme A (CoA)-dependent BAHD acyltransferases family, which was named according to the first letter of each of its first four biochemically characterized enzymes: Benzylalcohol O-acetyltransferase, Anthocyanin O-HCT, HCT of anthranilate, and Deacetylvindoline 4-O-acetyltransferase (D' Auria, 2006). It has been established that BAHD acyltransferases catalyze the transfer of acyl groups from acyldonors (Acyl-CoA) to various acyl-acceptors. Several structural studies of BAHD acyltransferases have been also reported (Ma et al., 2005;Unno et al., 2007;Garvey et al., 2008Garvey et al., , 2009Lallemand et al., 2012;Chiang et al., 2018;Levsh et al., 2019). However, the BAHD family enzymes in plants have complicated substrates (various acyl-donors and acceptors) and diverse products, while only sharing low sequence similarities, which makes it extremely challenging to predict the possible substrate specificity of uncharacterized BAHD family enzymes from their amino acid sequences. In addition, the multisite acylation mechanisms of BAHD-family acyltransferases also remain poorly understood until now.
In this study, we first performed structural analyses on two representative BAHD family enzymes, the Arabidopsis mono-/ di-acyltransferase-SDT (AtSDT) and poly acyltransferase-SHT (AtSHT), then explored the acceptor-molecule specificity and potential molecular mechanisms of multisite acylation by structurebased mutagenesis approaches, and, finally, successfully predicted and varified the acyl acceptor substrates of two uncharacterized BAHD family transferases in Arabidopsis using our prediction methods on the basis of a sequence similarity network (SSN) and of structure modeling. All these results here could not only could deepen our understanding of substrate specificity and multisite acylation mechanisms of this large enzyme family, but also provide important insights on the function of many diverse and so far, uncharacterized BAHD-family proteins in plants.

MATERIALS AND METHODS
Gene Cloning, Site-Directed Mutagenesis, Expression, and Protein Purification Wild type SDT (AT2G19070), SHT (AT2G23510), OAO95042.1 (AT5G07080), and NP_190301.2 (AT3G47170) genes from Arabidopsis thaliana were cloned into the pET28a vector under control of the bacteriophage T7 gene promoters using NdeI and XhoI. The resulting plasmids were transformed into Escherichia coli strain BL21(DE3; Invitrogen). Single colonies of the resulting transformants were used to inoculate 50 ml LB broth containing 50 μg/ml kanamycin, and cultures were incubated 16 h at 37°C with shaking. Aliquots (10 ml) were used to inoculate 1 L LB broth containing 50 μg/ml kanamycin, cultures were incubated at 37°C with shaking until OD 600 = 0.8, cultures were induced by addition of isopropyl-β-Dthiogalactoside to 1 mM, and cultures were incubated 16 h at 16°C. Cells were harvested by centrifugation (4,000 × g; 15 min at 4°C), re-suspended in buffer A (10 mM Tris-HCl, pH 8.0, 200 mM NaCl, 5 mM DTT, and 5% glycerol), and were lysed using an EmulsiFlex-C5 cell disruptor (Avestin). The lysate was centrifuged (20,000 × g; 30 min at 4°C) and the cell debris was removed. The supernatant was loaded onto a 5 ml column of Ni 2+ -NTA-agarose (Qiagen) pre-equilibrated in buffer A, and the column was washed with 10 × 5 ml buffer A containing 25 mM imidazole and eluted with 50 ml buffer A containing 250 mM imidazole. The sample was concentrated to around 10.0 mg/ml and further purified by gel filtration chromatography on a HiLoad 16/60 Superdex 200 prep grade column (GE Healthcare) in 20 mM Tris-HCl, pH 8.0, 100 mM NaCl, 5 mM MgCl 2 , and 1 mM β-mercaptoethanol; The peak fractions were collected and concentrated to 10 mg/ml in the same buffer using 30 kDa MWCO Amicon Ultra-15 centrifugal ultrafilters (EMD Millipore); and stored in aliquots at −80°C. Yields were ~5 mg/l, and purities were ~95%. Site-directed mutations were prepared using one step PCR method, and proteins were expressed and purified following similar protocols as wild type.

In vitro Activity Assays
The in vitro activity assays of the purified recombinant protein were performed at 30°C for 30 min in 100 μl 100 mM Tris-HCl buffer (pH 7.5) containing 60 μM acyl donor (caffeoyl-CoA or feruloyl-CoA or sinapoyl-CoA) and 200 μM acyl acceptor (spermidine or spermine or putrescine) and 10 μg purified protein. The reactions were terminated by adding 20 μl ice-cold 0.5% trifluoroacetic acid and directly subjected to liquid chromatography-mass spectrometry (LC-MS) on an Agilent 1260 system (Agilent Technologies, CA), equipped with electron spray ionization mass spectrometer 6125B. The temperature of column oven is 30°C; Electron Spray Ionization (ESI) is in the positive mode; capillary voltage is at 3 Kv; and for fullscan mode, the wavelength range is from 190 to 600 nm. Samples were separated on a reverse-phase C18 column [Thermo Syncronis C18 analytical column (150 mm × 4.6 mm, 5 μm)] at a flow rate of 0.8 ml/min and a gradient mobile phase as follows: 0-5 min, 15% solvent B (0.2% acetic acid in acetonitrile) in solvent A (0.2% acetic acid in water); 5-25 min, 15-100% solvent B; 25-35 min, 100% solvent B; 35-40 min, 100 to 15% B (Grienenberger et al., 2009;Luo et al., 2009). CoA esters were synthesized according to published methods (Semler et al., 1987) and were identified and quantified by spectrophotometry (Stockigt and Zenk, 1974). All of the reactions were run for two technical replicates, and each assay was repeated for at least three independent experiments.

Crystallization, Data Collection and Structure Determination
Robotic crystallization trials were performed for AtSDT co-crystallized with spermidine and CoA-HS for AtSHT co-crystallized with spermdine and CoA-HS by using a Gryphon liquid handling system (Art Robbins Instruments), commercial screening solutions (Emerald Biosystems, Hampton Research, and Qiagen), and the sitting-drop vapor-diffusion technique (drop: 0.2 μl protein plus 0.2 μl screening solution; reservoir: 60 μl screening solution; 20°C). About 900 conditions were screened. Under several conditions, AtSDT and AtSHT crystals appeared within 2 weeks. Conditions were optimized using the hanging-drop vapor-diffusion technique at 20°C. The optimized crystallization condition for AtSDT was 0.1 M MES (pH 6.5), 25% W/V PEG 4000 at 20°C; the optimized crystallization condition for AtSHT was 0.2 M ammonium sulfate, 0.1 M Tris-HCl (pH 8.5), 25% W/V PEG 3350 at 20°C. Crystals were transferred to a reservoir solution containing 20% (v/v) glycerol and flash-cooled with liquid nitrogen.
The structures of AtSDT and AtSHT were solved by molecular replacement with MOLREP (Collaborative Computational Project Number 4, 1994) using the structure of native HCT from Coffea canephora (PDB 4G0B) as a starting model. The molecular replacement solution was outstanding, and an automatic model building was performed with Phenix (Adams et al., 2010). Additional model building was done manually with Coot (Emsley et al., 2010) and refined with Phenix. The final model of AtSDT and AtSHT was refined to 2.4 Å and 2.3 Å resolution, respectively. The final models for AtSDT and AtSHT were refined to R work and R free of 0.19/0.24 and 0.19/0.25, respectively ( Table 1).

Molecular Docking Studies
All molecular docking studies were performed using Autodock4.2 package (Morris et al., 2009). Briefly, the crystal structure of AtSHT or AtSDT was docked with their potential products Frontiers in Plant Science | www.frontiersin.org (CoA-HS for AtSHT/AtSDT when docking with their final products or intermediate products). The molecule was added with non-polar hydrogens and assigned partial atomic charges using AutoDockTools (ADT; Morris et al., 2009). The coordinates of feruloyl-CoA and spermidine in AtSHT structure were generated based on the coordinates of p-coumaroyl shikimate from the crystal structure of AtHCT (PDB 5KJT) and of HS-CoA from the crystal structure of AtSHT (PDB 6LPV) in combination with the CORINA Classic online service. A grid box with 40 × 40 × 40 grid points and 0.2 Å grid spacing centered roughly at the feruloyl-CoA binding position was used as the searching space. 100 runs of Larmarckian Genetic Algorithm were performed to search the protein-ligand interactions. The results were clustered and ranked. Result analyses and figure rendering were performed using PyMOL.

Sequence Similarity Network and Phylogenetic Analysis
The sequence datasets of the BAHD family were gathered by using AtSDT, AtSHT, and AtHCT as searching the template for Blast e-value cut-off at 1 × 10 −25 . All the sequences were filtered by a redundancy check and a conserve motif search. The final number of 12,768 non-redundant BAHD proteins were further used to generate the sequence similarity network (SSN) by using Pythoscape (Barber and Babbitt, 2012) and visualized at e-value cut-off 1 × 10 -51.5 in Cytoscape. Two hundred and twenty-seven protein sequences from each of the clusters in the SSN, including AtSCT, AtSHT, AtHCT, AtSDT, were used to generate the neighborjoining tree by MEGA8 software (Tamura et al., 2013) and draw the final maps using the iTOL online software.

Data Availability
The crystal structures of AtSHT and AtSDT have been deposited into Protein Data Bank under accession numbers 6LPV and 6LPW. GenBank accession numbers: AT5G07080 for OAO95042.1 and AT3G47170 for NP_190301.2.
To elucidate the molecular basis underlying differences in the acceptor-molecule specificity and possible molecular multisite acylation mechanisms of AtSHT and AtSDT, we expressed the AtSHT and AtSDT enzymes, and determined the crystal structure of AtSHT in complex CoA-HS at 2.2 Å resolution, as well as the crystal structure of AtSDT in the apo form at 2.4 Å resolution.
Statistics of data collection and model refinement are summarized in Table 1. The overall structures of both AtSHT and AtSDT are similar to known BAHD family acyltransferases. The structures consist of two pseudo-symmetric N-terminal and C-terminal domains that are connected by a long loop. The N-terminal (residues 1-173 and 391-411) and C-terminal (residues 230-390 and 412-451) domains of AtSHT feature a similar spatial arrangement with a β-sheet core flanked by α-helices ( Figure 1B). The active site is located at the interface of the two domains with a residue His155 in between. The extra density near His155 was determined as CoA-HS ( Figure 1B). The 3'-phosphoadenosine group of CoA-HS binds with residues Arg246, Arg263, Ser387, and Thr390 in the structure, while residues Glu265, Thr262, and Arg298 interact with the diphosphate group of CoA-HS (Supplementary Figure 5), and these residues are conserved in both AtSDT and AtSHT (Supplementary Figure 4). To view the relative positions of the acyl acceptor and donor, we modeled the potential acyl donor feruloyl-CoA and the acyl-acceptor spermidine into the AtSHT structure by using AtHCT/p-coumaroyl-CoA (PDB ID 5KJT) and AtHCT/p-coumaroylshikimate (PDB ID 5KJU) structures as references ( Figure 1C).
In the modeled structure, several hydrophobic residues were found to surround the feruloyl head, which could serve to improve the affinity-to-fit the phenolic acid group and is consistent with the previous study (Levsh et al., 2016;Supplementary Figure 6). Interestingly, we observed a possible electron density in the modeled position of spermidine from the AtSHT map. The residues around the position form a C-shape channel, which is highly negative charged (Figures 2A,B). We further used spermidine and N 1 , N 5 ,N 10trihydroxyferuloyl spermidine as ligands to dock in the position, and the results shows both of them fit well in the channel (Supplementary Figure 7). Based on our docking results, spermidine seems to be able to adopt three differently rational conformations that each fits the electron density, and interactions with the surrounding protein residues make a head-to-tail "C" shape (Figures 2A,B). In all three possible conformations, spermidine can form hydrogen bonds with Asp314, Thr33, and the conserved catalytic residue His155. In addition, residues Asp416 and His411, as well as a water molecule, form a hydrogen-bonding network stabilizing spermidine; while residues Cys292, Gly290, Thr312, Val386, and Ile37 further stabilize spermidine via van der Waals interactions (Figures 2A,B); all of these interaction residues form a possible spermidine binding pocket in AtSHT.
To verify this potential spermidine binding pocket in the AtSHT structure, we mutated the relevant residues within the spermidine binding pockets of AtSHT to assess the effects on enzyme activity and reaction products. The experimental results clearly showed that wild-type AtSHT can efficiently convert spermidine and feruloyl-CoA to monoferuloyl spermidine (46%), diferuloyl spermidine (42%), and triferuloyl spermidine (12%; Figure 2C; Supplementary Figure 2A). AtSHT mutation D314A absolutely abolishes the hydrogen-bonding interaction with the amino group of spermidine, and mutation C292W disrupts the binding of spermidine, thereby resulting in substantially decreased AtSHT enzymatic activity ( Figure 2C). In addition, the T33A mutation destabilizes the binding of spermidine and impairs the enzyme activity (fully substituted spermidine was reduced from 12 to 3%), while D416A had less effect on the enzymes as it interacts with spermidine via a water molecule. These results confirmed the potential interactions within the proposed spermidine binding pocket of the AtSHT crystal structure (Figure 2).

Structural and Biochemical Characterization of AtSDT Enzyme
AtSDT adopts an overall structure that is similar to AtSHT (RMSD = 1.68 Å), which also contains two pseudo-symmetric N-terminal (residues 1-187 and 384-407) and C-terminal (residues 233-383 and 408-451) domains that are connected by a long loop (residues 188-232; Figure 3A). No extra electron density for the acyl donor or the reaction product CoA-HS was seen in the AtSDT map (in contrast with AtSHT), which may be due to the crystal packing since the crystallography symmetry related AtSDT molecule blocks the entry binding tunnel, thereby preventing acyl donor and putative reaction product entry. Unexpectedly, after superimposed with the AtSHT structure, we observed a possible electron density for spermidine in the corresponding modeled acyl-acceptor position from the AtSDT map. The residues around the position form a linearshape channel, which is highly negative charged (Figures 3A,B). We further used spermidine, N 1 ,N 10 -dihydroxysinapoyl spermidine, N 1 ,N 5 ,N 10 -trihydroxysinapoyl spermidine as ligands to dock in the position and the results show spermidine and N 1 ,N 10 -dihydroxysinapoyl spermdine fit well in the channel but crash with N 1 ,N 5 ,N 10 -trihydroxysinapoyl spermidine (Supplementary Figures 7B,C). According to the docking results, the spermidine in AtSDT seems be able to adopt a linear rational conformation to fit that electron density and forms a hydrogen bond through its τ-nitrogen with the conserved catalytic residue His169. This spatial arrangement is consistent with the role of His169 as a general base that deprotonates the acyl-acceptor spermidine, priming it for the nucleophilic attack of the carbonyl carbon of the hydroxycinnamoyl-CoA acyl donor. Specificity-determining residues, Asp316 and Ser294, form hydrogen bonds with the N 1 and N 5 groups of spermidine. In addition, Tyr47, Trp381, and Thr379 form hydrogen bonds through a water molecule with the spermidine N 10 group. These interactions stabilize the binding of spermidine. Furthermore, N-terminal domain residues Tyr314, Tyr318, Cys377, Glu354, Thr358, and Gly292, and C-terminal domain residues, Asp40 and Asn43, contact spermidine via van der Waals interactions (Figures 3A,B). All of these surrounding residues will also form a possible spermidine binding pocket in AtSDT, and spermidine may adopt a totally different conformation from that in AtSHT (Figures 3A,B).
To confirm this potential spermidine binding pocket in the AtSDT structure, we mutated the relevant residues within the spermidine-binding pockets of AtSDT to assess the effects on enzyme activity and reaction products. The results showed that wild-type AtSDT can efficiently couple spermidine with sinapoyl CoA to yield monosinapoyl (2%) spermidine and disinapoyl spermidine (98%; Figure 3C; Supplementary Figure 3). Meanwhile, eight single or double AtSDT mutants that potentially affect the binding of spermidine were generated. The results showed that the D316A mutation absolutely abolishes the hydrogen bonding interaction with the N5 group of spermidine, and mutant S294W not only disrupts the potential hydrogen bonding interaction with the N1 group, but also prevents spermidine binding, thereby substantially decreasing AtSDT enzymatic activities. The S294I mutation impairs the enzyme activity (compared with wild-type AtSDT, the production of disinapoyl spermidine was reduced from 98 to 63%), suggesting that disruption of the hydrogen bond interaction destabilizes spermidine binding. In addition, the Y47A and W381A mutations seem to have a greater effect on the enzyme activity than the S294I mutation (the product disinapoyl spermidine was reduced from 98 to 28% or 49%, respectively), as these two residues also form the hydrogen bond interactions with spermidine via a water molecule, and these results indicated that the hydrogen bond interaction between S294 has less effect on the enzyme activity than that of Y47 or W384 and spermidine, while the N43A and Y314A mutations disrupt the van der Waals interactions with spermidine, slightly impairing enzyme activities ( Figure 3C). Furthermore, the double mutant E354A/T358A, which changes residues located at the predicted spermidine entry channel in the AtSDT structure (Supplementary Figure 8A), showed decreased activity compared with wild-type AtSDT ( Figure 3C). Interestingly, these two residues are also close to a loop (Met364-Leu375 in AtSDT) which is unstable compared with other parts of the structure and shows a huge movement when aligned with AtSHT and AtHCT structures (Supplementary Figures 8B,C). As the loop is located near the active center, its movement might lead the active center open/close to the outside environment, which thus may help in maintaining the catalytic environment when it is in a close state or releasing the products when it is an open state. Since its proposed function is like a lid, we here named this loop as "lid-loop." These results are fully consistent with the interactions revealed from the modeled structures and suggest that key interactions from the proposed spermidine-binding pockets of AtSDT and AtSHT are critical for acyl acceptor substrate recognition and binding.

Comparison on Spermidine Binding Pockets of AtSDT and AtSHT
Guided by structural information of AtSDT and AtSHT, we performed structure-based sequence alignments by using AtSHT, AtSDT, and other representative BAHD-transferase family homologues (Supplementary Figure 4) and found that residues in the proposed acyl-acceptor spermidine-binding pocket of AtSDT were conserved with SDT homologues and were variable from SHT homologues (Supplementary Figure 4, shown as blue stars). Meanwhile, the residues from the AtSHT acyl-acceptor spermidinebinding pocket were conserved among SHT homologues and were variable in SDT homologues (Supplementary Figure 4, shown as green circles). These findings further confirmed that different evolution strategies may be adopted for acyl acceptor spermidine binding in AtSHT and AtSDT.
In addition, we also compared the proposed acyl-acceptor binding pocket of AtSHT and AtSDT with that of AtHCT. Although the overall structures of AtSHT and AtSDT are very similar to that of AtHCT (the RMSDs of AtHCT with AtSHT and AtSDT are 1.70 Å and 2.03 Å, respectively; Figure 4A), the interior features and the electrostatic potential of the acyl acceptor binding pockets of AtSHT and AtSDT differ significantly from those of AtHCT. The formers are composed of negativelycharged residues to accommodate spermidine, while the latter is mainly comprised of positively-charged residues suitable for attracting and binding shikimate ( Figure 4B). This suggests that it may be possible for us to predict and evaluate the acyl-acceptor substrate preference of unknown BAHD transferase family proteins based on the charge distributions of the residues comprising the acyl acceptor binding pocket.

Prediction of Unknown BAHD Family Proteins
With the structural information (especially the acyl-acceptor interaction residues' distributions) from AtSDT and AtSHT, we expanded our analysis on other unknown BAHD-family proteins. A total number of 12,768 non-redundant protein sequences from the National Center for Biotechnology Information (NCBI) protein database was identified, including 49 A. thaliana BAHD transferase proteins (Dataset 1). Since the SSN can not only can display the same topology compared with phylogenetic tree (Davidson et al., 2018;Burroughs et al., 2019), but also have an advantage dealing with a large sequence dataset and a better global overview, we applied the SSN analysis in studying the BAHD-protein family sequences. The SSN map was generated using Pythoscape (Barber and Babbitt, 2012), where nodes represent sequences, and edges represent pairwise local alignments with e-values cut-off of 1e -51.5 (Supplementary Figure 9). Forty-nine A. thaliana BAHD transferase proteins were assigned to the SSN map where 18 functionally characterized sequences (Supplementary Table S1) were marked as purple triangle and 31 functionally unknown sequences were marked as blue circles (Supplementary Figure 9). The AtSDT, AtSCT, AtSHT, and AtHCT proteins were also highlighted in this map. Interestingly, AtSDT and AtSHT were present in two different clusters in the SSN map. The AtSDT cluster contains 62 nodes, including 437 sequences in which AtSCT, the spermidine dicoumaroyl transferase, is also present (sequence identity between AtSDT and AtSCT is 52.5%). The AtSHT belongs to a more complicated cluster which contains several sub-clusters. The AtSHT sub-cluster contains 96 nodes including 592 sequences and MdSHT, and CiSHT are in the same node with AtSHT. The sub-cluster containing hydroxycinnamoyl-CoA:shikimate hydroxycinnamoyl transferase (HCT) connected with AtSHT sub-cluster showing the close relationship with AtSHT. Indeed, the sequence of AtHCT in this sub-cluster shows higher sequence identity with AtSHT (sequence identity 36.4%) rather than with AtSDT (sequence identity 22.2%) or AtSCT (sequence identity 20.5%). To our surprise, there are two uncharacterized A. thaliana BAHD proteins (OAO95042.1 and NP_190301.2) in the same cluster with AtSDT and AtSCT. We then further evaluate their acyl-acceptor substrate preference on the basis of the charge distributions of the residues comprising the acylacceptor binding pocket. The structures of OAO95042.1 and NP_190301.2 were modeled using the AtSDT structure as a template through the online Swiss-modeling program (Arnold et al., 2006), and negatively charged acyl-acceptor substrate binding pockets were found (Supplementary Figure 11), suggesting a basic substrate with characteristics similar to polyamines. For verification, the corresponding genes, AT5G07080 and AT3G47170 for OAO95042.1 and NP_190301.2, respectively, were cloned from A. thaliana and recombinant proteins were expressed in E. coli. The potential activity was tested using caffeoyl/feruloyl/sinapoyl-CoA as acyl donors and positively charged molecules putrescine/spermidine/spermine as acyl acceptors. The HPLC-MS results showed that both AT5G07080 and AT3G47170 encoded enzymes that could efficiently convert putrescine and caffeoyl-CoA to di-caffeoyl putrescine (Figures 5, 6 and Supplementary Figure 12). Further, OAO95042.1 enzyme can even convert spermidine/spermine and feruloyl CoA to mono-feruloyl spermidine/spermine (Figure 6 and Supplementary Figure 12). Results also suggested that the enzyme encoded by AT5G07080 has a preference for feruloyl-CoA binding, but little acyl-acceptor specificity, while the enzyme encoded by AT3G47170 has a preference for caffeoyl CoA and putrescine.
The successful prediction on substrate specificity of AT5G07080and AT3G47170 encoded BAHD enzymes demonstrated the feasibility of predicting and identifying possible substrates for uncharacterized BAHD-family transferases on the basis of the charge distributions of the residues comprising the acyl-acceptor binding pocket. It would provide great possibilities for us to gain a better understanding of the functionally uncharacterized BAHD-family proteins, and the potential strategy to predict substrates would be very useful for uncharacterized plant metabolite enzymes.

DISCUSSION
Phenolamides are ubiquitous secondary metabolites in plants. They are produced by BAHD-family acyltransferases that mono-conjugation of aromatic monoamines or polyconjugation of aliphatic polyamines with phenolic acids. More and more research findings have highlighted the importance of phenolamides in diverse plant physiological processes, including defense responses and development. However, BAHD-family acyltransferases usually only share low sequence similarities among them, which makes it extremely challenging to predict the substrate specificity of uncharacterized BAHD family enzymes from their amino acid sequences. To solve this remaining question, our work here provides a new feasible methodology for the substrate specificity prediction of unknown BAHD family transferases in plants. The phylogenetic tree has been used for a long time on the function analysis of the unknown genes; however, with the limitation of gene numbers, the results are less reliable and irreproducible (Shen et al., 2020). By using the SSN, we could use all the information from the BAHD family sequences and have a global view on the classification of each gene, which would greatly accelerate the discovery of target genes. By combining the SSN analysis and structural information, we further clarified the potential target and validated them by more expensive in vitro experiment. Just based on this important finding, we successfully predicted the possible substrates of two uncharacterized BAHD-family enzyme OAO95042.1 and NP_190301.2, which turned out encoding the putrescine hydroxycinnamoyl transferases.
Our study also highlights the potential convergentevolutionary way of AtSDT and AtSHT genes. Although both AtSDT and AtSHT use spermidine as an acyl-acceptor, the expression pattern and distribution of these two enzymes are different (Grienenberger et al., 2009;Luo et al., 2009). AtSHT were mainly expressed in the tapetum of Arabidopsis anthers and synthiszedsynthesized fully substituted products. The other enzymes that synthesize the fully substituted products also have the similar expression pattern, such as CiSHT, which promotes tetrahydroxycinnamoyl spermine accumulation in the pollen coat of the Asteraceae family, and MdSHT which synthesizes the trihydroxycinnamoyl spermidines in the pollen coat of core Eudicotyledons. Disrupting the function of AtSHT would lead to abnormal formation of pollengrains in the sht mutant of Arabidopsis, indicating the probable function of trihydroxycinnamoyl spermidine derivatives in sporopollenin ultrastructure that the fully substituted products may provide for a barrier for pollen or may function as a supporting structure. Interestingly, according to our SSN map and phylogenetic tree, all of all these SHTs genes are close to HCTs, which are the key enzymes in lignin metabolism, and that both HCTs and SHTs may evolute from a same ancestor. On the other hand, the AtSDT is mainly expressed in the seed and the root of Arabidopsis and synthesizes the mono-or di-substituted phenolamides. Unlike the SHT, the enzymes that synthesis the mono-or di-substituted phenolamides seem to be widely distributed in different organs and functions as plant biotic or abiotic stress responses. In our SSN map and phylogenetic tree, AtSDT is far away from HCT, but close to our newly discovered putrescine transferases OAO95042.1 and NP_190301.2. Furthermore, according to our structures, residues in the proposed acyl-acceptor spermidine binding pocket of AtSDT were conserved with SDT homologues and were variable from SHT homologues (Supplementary Figure 4, shown as blue stars). Meanwhile, the residues from the AtSHT acyl acceptor spermidine-binding pocket were conserved among SHT homologues and were variable in SDT homologues (Supplementary Figure 4, shown as green circles). Taken together, we suggest that the AtSHT and AtSDT may undergo the convergent-evolutionary way and thus gain similar spermidine transferase activity.
The molecular mechanisms of multisite acylation of BAHDfamily acyltransferases remain poorly understood so far. In this study, we tried to answer this tough question by determining the crystal structures of AtSHT and AtSDT, two BAHD family members catalyzing the multisite acylation of spermidine, and but showing different product profiles in Arabidopsis thaliana. We closely compared the differences in their potential spermidine binding pockets. The possible electron density shape for spermidine in the AtSHT structure and our molecular docking results suggests that it may adopt a freely-rotating conformation in the center of the binding pocket (Supplementary Figure 13A) by interacting with residues Thr33, Asp314, Asp416, and His155, establishing equal probabilities for acylation of N 1 , N 5 , and N 10 atoms in spermidine. In contrast, the possible electron density shape for spermidine in the AtSDT structure, in combination with the molecular docking results, suggests a linear conformation at the center of the binding pocket (Supplementary Figure 13B). Therefore, spermidine in AtSDT could only be docked into the binding pocket in two different orientations ("N 1 to N 10 " or "N 10 to N 1 "). In view of these spermidine conformation differences in AtSHT and AtSDT, we propose a "linear/rotation" model here, which may be able to clarify the potential mechanism of the different acylation activities of AtSHT and AtSDT (Supplementary Figure 14). The full acylation activity of AtSHT is enabled by the "freely-rotating" conformation adopted by the acyl-acceptor spermidine in the binding pocket, while AtSDT only binds in a linear conformation that is limited to the "head-tail" acylation (Supplementary Figure 14). That is, the acyl-acceptor spermidine adopts a free-rotating conformation in AtSHT and can undergo mono-, di-or tri-acylation; while the spermidine molecule in AtSDT adopts a linear conformation, which only allows mono-or di-acylation to take place. Our biochemical results all support this proposal that changing the spermidine binding pattern will decrease or abolish the production of fully acylated products, and thus match our proposed "linear/rotation" model (Supplementary Figure 14). Meanwhile, by superpositioning AtSDT and AtSHT structure with AtHCT, we observe a potential movement of the "lid-loop," which is located near the active center and may function in the maintenance of catalytic environments and the release of products (Supplementary Figure 8).
In summary, our extensive structural and biochemical analyses on AtSDT and AtSHT in this study provides an excellent starting point for predicting the biochemical functions of uncharacterized BAHD-family enzymes and understanding multisite acylation in BAHD-family enzymes. However, to further elucidate the potential molecular mechanism underlying the differing acylation activities of AtSHT and AtSDT, crystal structures of AtSDT and AtSHT in complex with its acyl donor and acceptor are still anticipated in the future.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: http://www.wwpdb.org/, 6LPV and 6LPW.

AUTHOR CONTRIBUTIONS
CW, JL, and WL designed experiments. CW and JL performed the bulk of the experiments. MM contributed to protein expression, purification, and crystallization. ZL and WH contributed to enzymatic assay experiments. PZ, CW, and WL analyzed the data and wrote the manuscript. PZ conceived