Bacterial Glycosyltransferases: Challenges and Opportunities of a Highly Diverse Enzyme Class Toward Tailoring Natural Products

The enzyme subclass of glycosyltransferases (GTs; EC 2.4) currently comprises 97 families as specified by CAZy classification. One of their important roles is in the biosynthesis of disaccharides, oligosaccharides, and polysaccharides by catalyzing the transfer of sugar moieties from activated donor molecules to other sugar molecules. In addition GTs also catalyze the transfer of sugar moieties onto aglycons, which is of great relevance for the synthesis of many high value natural products. Bacterial GTs show a higher sequence similarity in comparison to mammalian ones. Even when most GTs are poorly explored, state of the art technologies, such as protein engineering, domain swapping or computational analysis strongly enhance our understanding and utilization of these very promising classes of proteins. This perspective article will focus on bacterial GTs, especially on classification, screening and engineering strategies to alter substrate specificity. The future development in these fields as well as obstacles and challenges will be highlighted and discussed.


INTRODUCTION
Glycosyltransferases (GTs) represent a subclass of enzymes that catalyze the synthesis of glycosidic linkages by the transfer of a sugar residue from a donor substrate to an acceptor. Acceptor substrates are mono-, di-, or oligo-carbohydrates, as well as proteins, lipids, DNA, and numerous other small molecules (Lairson et al., 2008). Therefore they play essential roles in biosynthesis pathways of oligo-and polysaccharides, as well as protein glycosylation and formation of valuable natural products (Lairson et al., 2008). Amongst the donor substrates, nucleotide-sugar conjugates represent the most prominent substrates (∼65%), but also lipid phosphate sugars and phosphate sugars are used (Ardèvol and Rovira, 2015). The mechanism for the regio-and stereo-specific transfer of the distinct sugar can occur via the inverting or retaining mechanism, which also defines the stereo-chemical outcome (α-or β-glucosides). The inverting mechanism follows a single displacement mechanism by a nucleophilic attack of the acceptor on the C-1 of the sugar donor inverting the anomeric stereochemistry. This mechanism is widely accepted and chemically elucidated (Schuman et al., 2013).
For retaining GTs different mechanisms have been proposed and the exact mechanism is still a matter of debate (Schuman et al., 2013). Latest findings based on quantum mechanics and molecular mechanics dynamic simulations indicate that two different enzyme families might have evolved, which follow either a double displacement (two S N 2 reactions) mechanism or a front-face mechanism. One factor influencing the distinct mechanism will be the presence or absence of a putative nucleophile residue near the anomeric carbon of the donor sugar. Furthermore a competition between the front-face and double displacement mechanism was calculated by QM/MM for nucleophile-containing GTs (Rojas-Cervellera et al., 2013). The departure of the leaving group and the nucleophilic attack occur in an asynchronous manner at the same face of the glycoside (Ardèvol and Rovira, 2015).
For classification of GTs several approaches are used. The most prominent one is the classification by amino acid sequence similarities, as basically done by the Carbohydrate-Active enZYmes Database (CAZy). The CAZy database groups the different GTs into families. It comprises 97 families based on ∼215,930 entries (at 27th November, 2015). Additionally ∼4,015 sequences are not classified to this date. Nomenclature of the families is performed by use of GT and the following number of the GT family. Next to the EC 2.4 families, the CAZy classification also includes six families, belonging to EC 3.X and EC 5.X with around 395 entries. In 2012 the CAZy classification contained only ∼87,000 entries which were divided into 90 families, showing the fast development in the field of sequence identification of GTs (Gloster, 2014). In November 2015, the three families 36, 46, and 86 are still listed, but do not contain any sequences since no characterized members (GT-46) exist, or they have been deleted and merged with other GT-families (GT-36 and GT-86) based on newest findings. From the ∼215,930 listed sequences less than 1% (1,919) has been characterized. Structures are available for 161 of these, distributed over 41 families, which include solely three crystal structures, two of them for bacterial GTs. The statistical insights of the several GT-families as classified by CAZy are displayed in in Table 1.
Use of CAZy database for classification toward substrate specificity is done by conserved three-dimensional architecture for the structures of all nucleotide sugar dependent GTs. All structures of GTs solved to date adopt one of three folds, termed GT-A, GT-B, and GT-C (Gloster, 2014). The GT-A enzymes are generally dependent on divalent metal ions and comprise two β/α/β Rossmann-like domains. A highly conserved DXD motif within the active side coordinates the metal ion, which stabilizes the charged phosphate groups of the nucleotide sugar donor, therefore supporting departure of the leaving group. The three dimensional architecture of GT-B enzymes comprises two β/α/β Rossmann-like domains that face each other. These enzymes are generally independent of metal ions, and the active site is spatially located in between the two Rossmannlike domains. The residues of the active side are involved in leaving group departure. In 2003 there was identified the third family, the GT-C enzymes, which are hydrophobic integral membrane proteins having a modified DXD signature in the first extracellular loop, and mainly using lipid phosphate linked sugar donors (Liu and Mushegian, 2003;Gloster, 2014). The limited amount of structural folds in GTs might result from the potential evolutionary origin of only few precursors' sequences. Only GTs belonging to GT2 and GT4 are found in ancient Archaea, and it is assumed that the other families may have evolved from these two families (Coutinho et al., 2003). The next step toward increased functional prediction and substrate specificity is the classification into clans that is performed by grouping families displaying similar fold, analogous catalytic apparatus and identical mechanism (Coutinho et al., 2003;Osmani et al., 2009). These approaches will end up in so called subfamilies to further increase the functional prediction, but are also guided by the need of the researcher. Mammalian and bacterial GTs show only a very low identity on amino acid sequence level even if they synthesize the same glycan linkages (Brockhausen, 2014). Plant and bacterial GTs also have a low identity on nucleotide and amino acid level, which is the same for bacterial GTs amongst themselves, but similarities might be identified on structural level (Gloster, 2014). Next to their role in protein-and natural product glycosylation or oligo-and polysaccharide biosynthesis, they can be used for tailored chemo-enzymatic synthesis of novel, modified natural products (Luzhetskyy and Bechthold, 2008). Additionally, they play essential roles in fundamental biological processes and might be exploited for novel medical applications (Wu et al., 2012;Zhan et al., 2015). Especially by in-depth characterization and altered sugar functionality of GTs, this emerging field may develop rapidly.  97, 94, 92, 90, 76, 75, 74, 73, 67, 61, 56, 54, 53, 51, 49, 48, 47, 42, 40, 38, 37, 31, 29, 26, 25, 18, 17, 16, 14, 11, 28, 445 (21, 184) 356 (82) 16 (10) Retaining 96, 95, 93, 89, 88, 79, 77, 71, 69, 62, 60, 45, 44, 34, 32 5, 903 (1, 798) 81 (25) 7 (6) Numbers in clamps represent bacterial GTs.

SCREENING FOR GTs ACTIVITY
Screening of GT activity is essential to characterize the GTs identified in sequenced genomes and proof the predicted activity and specificity. Within the last years several assays for GTs have been published, and were reviewed by Palcic and Sujino (2001) and Wagner and Pesnot (2010) with focus on fluorescence based methods. In general, the depletion of the acceptor or the nucleotide donor or the formation of the nucleotide or the glycosylated product can be monitored. Because of the absence of changes in the fluorescence or UV/visible spectra by GT activity, other methods to monitor substrate conversion are needed. Additionally, the assays have to be highly sensitive, caused by the low concentration of GTs in natural sources. Therefore, radiochemical assays represent a good choice for quantitative monitoring of GT activity by use of commercially available radiolabeled donors, and following separation of the unreacted donor from labeled reaction products. Depending on the acceptors (saccharide, protein etc.), different separation (Electrophoresis, ion exchange Chromatography, Thin layer Chromatography, organic solvent extraction) and analysis techniques (GC-MS) can be applied (Palcic and Sujino, 2001). Within the last years, several scintillation proximity assays have been developed to realize high throughput screening (HTS) without tedious washing steps. These assays function by immobilizing the acceptor of the glycosylation reaction on scintillation-coated microspheres, which emit a light signal, when the radio emitting labeled sugar comes in close proximity during the reaction (Hood et al., 1998;Miyashiro et al., 2005;Ahsen et al., 2008). Non-radioactive highly sensitive assays are realized by use of immunological methods such as antibodies or lectins, directly identifying the reaction products (Cummings and Etzler, 2009). Mostly glycolipid acceptors are used, which have to be removed after the reaction. Within the last years several HT-methods have been developed, which adsorb the acceptor to the surface of the micro titer plate (MTP)-wells, turning the acceptor into an immobilized product, which can be specifically stained after washing steps (Lira-Navarrete et al., 2011). Spectrophotometric assays can be realized by coupling different enzymes. For example, NDP that is released during glycosyl transfer can react with phosphenolpyruvate (pyruvate kinase), which then releases pyruvate that can be detected by following the decrease of 340 nm for the oxidation of NADH (lactate dehydrogenase). Several adapted versions have been developed, which can be applied in high throughput and small volume also for membrane bound GTs (Brown et al., 2012). A pragmatic approach is the use of pH-sensitive assays. The hydrolysis of the sugarnucleotide donor substrates into the corresponding nucleoside diphosphate results in the release of proton equivalents for all GT-catalyzed glycosylation reactions. These protons cause a color change of pH-indicators, as shown for the first time by use of phenol red (Deng and Chen, 2004) or bromothymol blue (Persson and Palcic, 2008). Another assay variant based on malachite green monitors the free phosphate (P i ) as released from leaving nucleotides by phosphatases (Wu et al., 2011). The highly specific phosphatases do not act on sugar-nucleotides as substrate. Therefore, the concentration of released P i is directly proportional to the sugar molecules transferred, enabling additional measurement of kinetic parameters. These kinds of colorimetric assays can be easily applied in HT-studies (Shen et al., 2010). Fluorescence based methods combine high sensitivity with operational simplicity and suitability for HTS (Gribbon and Sewing, 2003). Chemosensors with high binding selectivity toward pyrophosphate monoesters were successfully used to read out GT activity as well as inhibitor screening (Wongkongkatep et al., 2006). Assays for UTP/UDP, GDP and CMP selective fluorescent probes were developed (Chen et al., 2009) and commercial nucleotide immunodetection systems (Transcreener TM ) are available (Lowery and Kleman-Leyer, 2006). Additionally, more and more coupled assay variants arise within the last years (Kumagai et al., 2014). The use of mass spectrometry (MS) obtained high impact for GT screening and characterization (Norris et al., 2001;Yang et al., 2005;Ban et al., 2012;Lauber et al., 2015). A highly sophisticated HTS approach based on immobilized oligosaccharide acceptors, placed on goldcoated islands in the geometry of a 348-well MTP exist (Ban et al., 2012). By mixing and incubating unpurified, in vitro expressed proteins with different sugar donors on the immobilized acceptor molecules and following MS analysis enables characterization of GT specificity as well as additional kinetic information. This example impressively shows the power of MS-based methods, to screen and characterize GTs. One of the main obstacles of GT screening and characterization approaches is the membrane localization of GTs (associated or integral). Latest findings in the field of synthetic membranes, such as nano-disks might massively enhance the functional expression and in vitro screening options of GTs (Inagaki et al., 2013). Additionally, optimized expression and biotransformation systems such as Pichia Pastoris will massively enhance the screening and characterization efficiency (Ahmad et al., 2014;Ge et al., 2014).

ENGINEERING OF GTs
The acceptor and donor specificity in GTs reside in different well separated domains. Engineering of GT sequences is a powerful tool for altering the acceptor and donor specificity by targeting the corresponding domains. Enhanced insights into the glycosylation mechanism was obtained by structural information for, e.g., plant GTs (Wang, 2009). For the GT-A fold enzymes the N-and C-terminal domains show dissimilar architecture. The N-terminal domain consists of several β-sheets, which all are flanked by α-helical Rossmann folds and are is responsible for recognition of the sugar-nucleotide donor (Davies et al., 2005;Jank et al., 2007;Mittler et al., 2007;Erb et al., 2009). The C-terminal domain mainly contains mixed β-sheets and is responsible for the binding of the acceptor molecule. In the case of GT-B fold GTs, the N-and C -terminal domains are formed by two similar Rossmann folds and have reversed functions. The N-terminal region includes the acceptor binding site and the C-terminal domain binds the donor substrate (Erb et al., 2009). The C-terminal domains show higher similarities since they recognize the same or similar donors, whereas the N-terminal domain shows lower similarity due to the greater varieties of acceptor molecules (Wang, 2009). The C-and N-terminal domains are connected via a linker and form a cleft which acts in the case of GT-B fold containing UDP-GTs as substrate binding site. There are several examples, which describe minor mutations to significantly alter the acceptor or donor acceptance of selected GTs (Hoffmeister et al., 2001(Hoffmeister et al., , 2002Williams et al., 2007Williams et al., , 2008b. Williams and Thorson (2008) developed a fluorescence-based HTS in conjunction with error-prone PCR/saturation mutagenesis to modify proficiency and promiscuity of GTs, resulting in 200-300-fold improved enzyme activity (2008). The massively altered substrate specificity (13 different UDP-sugars) was reached by mutation of three different amino acid substituents, which were identified to serve as "hot-spots" for directed evolution. The exchange of acceptor and donor recognizing domains was described to be partly successful. In some cases it was described that swapping of larger sequence elements successfully altered the substrate specificity, whereas exchange of the whole C-or N-terminal regions might also lead to inactive versions (Kohara et al., 2007), indicating that the acceptor recognition is not strictly encoded in the single domains. Sequential domain swapping approaches can be successfully applied to identify amino acids (even single ones are described) which are responsible for the altered regioselectivity of glycosylation (Cartwright et al., 2008). For bacterial GTs several swapping experiments are described which lead to chimeric GT variants having an exchanged specificity (Fischbach et al., 2007;Hansen et al., 2009;Krauth et al., 2009;Park et al., 2009). These experiments were mainly performed with highly homologous GTs and the true modular nature of, e.g., GT-B enzymes has yet to be proved (Williams et al., 2008a). Especially in the field of bacterial GTs involved in polysaccharide production only limited information concerning substrate specificity on structural level is available (Naegeli et al., 2014). But recent activities showed enhanced possibilities to further predict the substrate and product specificity (Sánchez-Rodríguez et al., 2014).

STRUCTURAL MODELING OF GTs
Difficulties with high-level expression, purification, and crystallization hampers crystal structure determinations for GTs (Breton et al., 2006). Additionally, the ratio of loops to secondary elements is high in GTs. Most of these loops have a high flexibility, therefore resulting in a low electron density, thus limiting the detailed description of the catalytic domains. Additionally, GTs show a donor substrate induced conformational change (open and closed active conformation), which mainly involves the flexible loops (Boix et al., 2001(Boix et al., , 2002Qasba et al., 2002;Ramakrishnan et al., 2002). The low degree of sequence similarity within most of the CAZY families renders molecular modeling difficult. Fold recognition as theoretical approach, named as "threading" (Godzik, 2003) categorizes GTs, but still had some limitations so far and needed experimental proof. Additionally, the weak scores in fold recognition of many GTs might indicate not yet identified novel folds, as recently shown, when a forth fold (GT-D) was proposed (Zhang et al., 2014). But, multivariate sequence analysis in combination with fold recognition proofed to be useful for predicting folds and mechanisms for Escherichia coli and Synechocystis GTs (Rosén et al., 2004). However, most FIGURE 1 | General Flow-chart for the specific characterization of glycosyltransferases (GTs), based on optimized modeling and docking experiments in combination with final binding affinity estimation, to predict substrate specificity.
Frontiers in Microbiology | www.frontiersin.org models still have a low confidence index for flexible loops and the highly variable regions, what represent the major problem for modeling acceptor sites. Therefore it can only be applied when the target and the template have sufficient identity, to allow docking of nucleotide sugars and acceptors (Heissigerová et al., 2003). Specificity toward the sugar donor and acceptor was shown to be determined by a few critical residues in the binding site (Meech et al., 2012;Naegeli et al., 2014). Especially the flexible loops involved in GT mechanism have been the subject of studies by molecular dynamic (MD) simulation (Persson et al., 2001;Ramakrishnan and Qasba, 2001;Gunasekaran and Nussinov, 2004;Šnajdrová et al., 2004). These results demonstrate the correlated motions of several loops as well as the importance of contacts between loops in the mechanism (Breton et al., 2006). Docking of substrates is a difficult task because of the presence of phosphate and divalent cation as well as the flexibility of the nucleotide sugar, but appropriate energy parameters have been developed (Petrova et al., 1999) and latest reports show highly promising results to improve our understanding and prediction of substrate specificity of especially bacterial GTs (Zhang et al., 2014;Pandey et al., 2015;Zuegg et al., 2015).

FUTURE PERSPECTIVES
Most computational methods make use of sequence-based comparisons for accurate prediction of substrate specificity. However, these approaches are rather limited due to the high sequence variability within the GT families. Sánchez-Rodríguez et al. (2014) used a sequence-based strategy combined with a network-based approach to infer the putative substrate classes of these predicted GTs thereby taking into account genomic organization. Due to the determination of several GT structures in the recent years (Gloster, 2014), structurebased approaches might be a promising alternative for substrate specificity prediction in the future. However, accurate ligandprotein binding affinity prediction, for a set of similar binders, is a major challenge. In general, docking calculations alone perform unsatisfactorily in these settings. But docking calculations, followed by MDs simulations and free energy calculations can be applied to improve the predictions, keeping in mind that glycosylation pattern of bacterial GTs is highly diverse and complex (e.g., rare sugars). Therefore, the transferred sugar moiety might not have been discovered yet (Figure 1).
A number of studies have shown that refining docking calculations by performing MD and free energy calculations starting from docked ligand positions can increase the accuracy of binding affinity predictions (Stjernschantz et al., 2006;Carlsson et al., 2008;Wünsch et al., 2012;Jiang et al., 2013). The improved accuracy of the simulations is mainly due to the increased level of molecular details, using a flexible and explicitly solvated protein. To this end, one could try to identify binding pockets on the GT structures by using docking calculations of the sugar molecules. Moreover, one could then calculate and compare the binding affinities of different sugar molecules by means of MD simulations combined with free energy calculations, e.g., with the Molecular Mechanics Poisson-Boltzmann Surface Area method (MM-PBSA) to determine substrate specificity (Kollman et al., 2000). Additionally, further approaches such as HTS crystallization and characterization (Zhu et al., 2013) of bacterial GTs are necessary to enhance our knowledge on bacterial GTs, especially involved in oligo and polysaccharide biosynthesis. Additional improved structural, modeling, and mutational studies are needed and on the way to further progress in the understanding of these highly attractive class of enzymes. These approaches in combination with novel expression systems and sophisticated tools to analyze integral membrane GTs as well as carbohydrate analysis (Rühmann et al., 2014) will enable efficient utilization of this enzyme class to efficiently tailor natural products.

AUTHOR CONTRIBUTIONS
JS, DH, and VS outlined the structure of the manuscript. JS and DH wrote the main parts of the manuscript. JS, NS, and NW did statistics on CAZy database and were involved in writing. All authors read and approved the final manuscript ACKNOWLEDGMENTS This work was supported by the German Research Foundation (DFG) and the Technische Universität München within the funding programme Open Access Publishing.