Natural Occurring and Engineered Enzymes for Peptide Ligation and Cyclization

The renaissance of peptides as prospective therapeutics has fostered the development of novel strategies for their synthesis and modification. In this context, besides the development of new chemical peptide ligation approaches, especially the use of enzymes as a versatile tool has gained increased attention. Nowadays, due to their inherent properties such as excellent regio- and chemoselectivity, enzymes represent invaluable instruments in both academic and industrial laboratories. This mini-review focuses on natural- and engineered peptide ligases that can form a new peptide (amide) bond between the C-terminal carboxy and N-terminal amino group of a peptide and/or protein. The pro's and cons of several enzyme classes such as Sortases, Asparaginyl Endoproteases, Trypsin related enzymes and as a central focus subtilisin-derived variants are summarized. Most recent developments with regards to ligation and cyclization are highlighted.


INTRODUCTION
Due to the increasing length and complexity of peptide pharmaceuticals, there is a growing demand for their green and efficient production (Lau and Dunn, 2018). Established methods such as recombinant expression and solid phase peptide synthesis (SPPS) have several disadvantages driving the need for new ligation and modification technologies. Using recombinant expression, it is difficult to incorporate (multiple) unnatural amino acids (AAs) or include peptide modifications such as (fatty acid) acylation or (C-terminal) amidation, which is much more straightforward using SPPS. However, it is still a challenge to produce longer peptides using classical SPPS due to the decrease in yield directly correlated to the peptide length. Incrementally more impurities are generated, and consequently, the purification of the final product becomes more demanding and increasingly costly. Therefore, several ligation methods have been developed to ligate smaller peptide fragments, which can be produced in higher yield and purity. Many chemical ligation methods such as native chemical (Dawson et al., 1994;Rohde and Seitz, 2010;Conibear et al., 2018;Kulkarni et al., 2018;Agouridas et al., 2019), α-Ketoacid-Hydroxylamine (Bode et al., 2006;Bode, 2012, 2015;Bode, 2017), Staudinger- (Maly et al., 2000;Nilsson et al., 2000;Köhn and Breinbauer, 2004), or Serine threonine ligation (Liu and Tam, 1994;Li et al., 2010;Zhang et al., 2013;Tung et al., 2015;Lee et al., 2016;Liu and Li, 2018) have become powerful tools in chemical biology, giving the access to synthetic proteins by using fragment ligation strategies. Besides chemical ligation, enzymatic ligation strategies have gained increased attention in recent years due to their inherent properties such as excellent regio-and chemoselectivity and the catalysis of reactions under mild conditions (Schmidt et al., 2017b;Nuijens and Schmidt, 2019). The variety of enzymes used for enzymatic ligation mainly includes proteases and engineered variants thereof as well as transpeptidases. Even though proteases are very abundant in nature, few enzymes, namely ligases, have been found that naturally catalyze the reverse reaction, i.e., peptide bond formation. Triggered by this, researches have started exploiting and engineering proteases to act as ligases (Jakubke, 1995). In this mini-review, we describe the currently existing set of ligases and recent developments, both for intermolecular and intramolecular (cyclization) ligation. Here, we consider only enzymes that catalyze the formation of a native peptide bond. Roughly four classes of peptide ligases are discovered up to date, i.e., Sortases, Asparaginyl Endoproteases, Trypsin related enzymes and subtilisin-derived variants. The main focus will be on the recent rise and applications of subtilisin-type of enzymes.

SORTASES
In nature, Sortase A from Staphylococcus aureus catalyzes the covalent anchoring of surface proteins to the cell wall (Marraffini et al., 2006). First, it cleaves off the C-terminal glycine of an LPXTG recognition motif (X = any amino acid) and couples the threonyl carboxylate to the N-terminal amino group of a pentaglycine peptide attached to peptidoglycan (Figure 1.1A). This transpeptidation reaction by Sortase A has been applied as a synthetic tool for peptide and protein conjugation (Schmidt et al., 2017b) as well as for peptide and protein (C-to-N, i.e., head-to-tail) cyclization (Antos et al., 2009; Wu et al., 2011;van't Hof et al., 2015). During catalysis the motif R 1 -LPXT-G-R 2 (R 1, R 2, ,R 3 = proteins, synthetic peptides, solid supports or cells) is recognized and the Thr-Gly amide bond is cleaved by the active site thiol to form an acyl-enzyme (thioester) complex with glycine as leaving group. The thioester complex is resolved by nucleophilic attack of a peptide with an N-terminal glycine (GR 3 ), yielding a native R 1 -LPXT-G-R 3 peptide bond, see Figure 1.1C for ligation examples.
The main drawbacks of Sortases are the strict sequence requirements, which remain present in the ligation product, the poor catalytic efficiency and reversibility of the reaction leading to low yield and product hydrolysis. To expand sortagging beyond the standard LPXT-G motif, Sortase homologs as well as engineered variants have been reported, although with limited success (Dorr et al., 2014;Antos et al., 2016;Nikghalb et al., 2018). Besides the substrate scope, Sortase variants with increased thermal and chemical stability (Pelay-Gimeno et al., 2018) or activity (Beerli et al., 2015) have been described. Another method to circumvent the poor reaction kinetics is via proximity-based Sortase-mediated ligation (PBSL), which enables ligation efficiencies of over 95%. For PBSL the target protein and sortase are linked using the SpyTag-SpyCatcher protein pair. Although after ligation the Spytag is cleaved off and the target protein is released, this approach requires elaborate reaction engineering and Spycatcher modified and His 6 -tagged sortase is required in equimolar amounts (Wang et al., 2017). Besides protein engineering, another successful strategy used is reactant engineering that renders the transpeptidation reaction irreversible. One approach uses modified depsipeptide substrates that upon transpeptidation release non-reactive fragments, e.g., a non-reactive hydroxyacetate moiety (Williamson et al., 2012(Williamson et al., , 2014 or spontaneously form a diketopiperazine . In conclusion, when addition of the sorting sequence LPXTG to a peptide or protein does not interfere with its function, sortagging represents a powerful tool for siteselective bioconjugation (Figure 1.1C). Nevertheless, its broad application is still hampered by the low catalytic efficiency (large quantity of enzyme required), long reaction times, moderate yields and the high molar equivalents of one of the substrates needed to drive the equilibrium toward product. Despite the shortcomings, mainly due to easy accessibility of enzyme and substrates, sortagging has become a popular tool in chemical biology.

ASPARAGINYL ENDOPROTEASES
More recently discovered and a promising alternative to Sortases is the application of asparaginyl endoproteases (AEP) such as Butelase 1 (Nguyen et al., 2014;James et al., 2017;Jackson et al., 2018). Butelase 1, isolated from the tropical plant (Clitoria ternatea) is an Asx-specific (Asx = Asn or Asp) cysteine transpeptidase that natively catalyzes peptide head-totail cyclization in the biosynthesis of cyclotides (Craik et al., 1999). As with Sortase, AEP enzymes cleave a recognition sequence, in this case N-HV or D-HV, to form a thioester acylenzyme intermediate that is resolved by nucleophilic attack by a peptide N-terminal amine (Figure 1.2A). A major advantage is the relatively short recognition sequence, the His-Val motif is cleaved off and only an Asx residue is left as a footprint at the ligation site.
Butelase 1 has a broad tolerance for the first (N-terminal) residue to be coupled (any AA except Pro, Asp, and Glu), but at the second position Ile, Leu, Val, or Cys is required (Nguyen et al., 2016a). Compared to Sortases, Butelase 1 features substantially higher catalytic efficiency (only ∼0.005 molar equivalents of enzyme required). The peptide substrates containing the Asx-His-Val motif can be easily prepared via straightforward SPPS or recombinant expression. Butelase 1 has been shown to efficiently promote intermolecular peptide ligation as well as head-totail macrocyclization of peptides from 10 residues or longer in nearly quantitative yields (Figure 1.2B) (Nguyen et al., 2014(Nguyen et al., , 2015b(Nguyen et al., , 2016bHemu et al., 2016). As in nature, it preferably catalyzes cyclization over hydrolysis. For example, kalata B1, GFP, and human growth hormone (somatropin) were cyclised with excellent efficiency (>95% yield) (Nguyen et al., 2015b). Furthermore, Tam and coworkers recently reported the first chemical synthesis of large circular bacteriocins such as the 70mers AS-48 and uberolysin (Hemu et al., 2016). Interestingly, Butelase 1 has the ability to cyclise peptides consisting of almost exclusively D-AAs, except for the C-terminal Asx residue (Nguyen et al., 2016a). Besides cyclization, Butelase-1 can be used for the modification of live cell bacterial surfaces (Bi et al., 2017), for the semi-synthesis of ubiquitin (Nguyen et al., 2015a), and to prepare large circular bacteriocins, the largest antimicrobial peptides known up to date (Hemu et al., 2016). Other possibilities are the preparation of peptide dendrimers using lysine derived scaffolds (Cao et al., 2016) or even the modification of proteins (Nguyen et al., 2015a). For example, Ploegh et al. described a one-pot dual labeling approach for the sequential modification of heterodimeric proteins such as antibodies with different labels at light and heavy chain, respectively, as well as an approach for the sequential C-to-C fusion of two protein of interest (Harmand et al., 2018). Butelase-1 can also be applied in the synthesis of protein C-terminal thioesters and thus enabling tandem chemoenzymatic ligations (e.g., via NCL) (Liu et al., 2015). Butelase prefers intra-over inter-molecular ligations for which a large excess of nucleophile is required and the Nterminus of the acyl donor should be protected or outside the Butelase substrate scope.
As glycine for Sortases, the cleaved HV-dipeptide by Butelase acts as a competitive nucleophile with the substrate of interest, therefore requiring a huge access of reactant. To overcome this limitation, the use of thiodepsipeptide substrates is successful in rendering the reaction irreversible (Nguyen et al., 2015a). However, this strategy involves the use of unstable thioester substrates and does not prevent hydrolysis of the product. Besides the drawbacks of sequence specificity (Asx footprint), hydrolytic activity and reversibility, Butelase-1 has to be isolated from plants, therefore limiting its potential in biotechnological applications. So far, recombinant expression has not been successful, although this will probably be achieved in the near future. Recently, a markedly less active AEP named OaAEP1, has been recombinantly expressed in Escherichia coli. Although titers were low (<2 mg/L), OaAEP1 has the advantage of being a fully characterized enzyme that is able to cyclise a diverse range of substrates (∼90 times slower than Butelase 1) (Harris et al., 2015;Yang et al., 2017). Later studies showed that the catalytic efficiency of native OaAEP1 could be improved through structure-based enzyme engineering (Yang et al., 2017), however, Butelase 1 is still most often the enzyme of choice (Yang et al., 2017).
Clearly, Butelase type enzymes have some advantages over Sortases such as minimal recognition motif, broader substrate scope and much higher catalytic activity (Figure 1.2C). However, poor accessibility of the enzymes has so far limited its application.

TRYPSIN RELATED ENZYMES
The use of native Trypsin and engineered variants for peptide synthesis has been known for decades (Nuijens et al., 2012). Recently, Bordusa et al. discovered a new engineered Trypsin variant, termed Trypsiligase, which can be used for the Nand C-terminal modification of protein or peptide substrates (Liebscher et al., 2014b). Trypsiligase adopts an inactive partially disordered zymogen-like conformation and represents a striking example for substrate-activated catalysis, as it is exclusively active in the presence of a YRH tripeptide motif and Zn 2+ ions (Liebscher et al., 2014b). Trypsiligase can be used for the efficient labeling of proteins bearing an N-terminal RH motif, which proceed via the use of activated substrates (such as peptidyl 4guanidinophenyl esters) as acyl donors (Figure 1.3A) (Meyer et al., 2016). C-terminal modification can be achieved by a transpeptidation reaction between a peptide-or protein-Y-RH recognition sequence and a RH-X (X = peptide, tag) nucleophilic acyl acceptor peptide (Figure 1.3B) (Liebscher et al., 2014a). The ligation reaction is usually complete within minutes and requires ∼0.1 molar equivalents of enzyme with an excess of the corresponding acyl acceptor substrate (often 10 eq.) because it needs to compete with the RH leaving group.
The Y-RH recognition motif is rare and only found in 0.5% of all known protein sequences (Liebscher et al., 2014b). Therefore, the use of Trypsiligase is restricted with regards to synthesis of native peptides and proteins, similarly to Sortase A (Figure 1.3C). Another drawback is the presence of the Y-RH sequence in the ligation product (C-terminal protein modification) leading to a reversible reaction and undesired hydrolysis.

SUBTILISIN-DERIVED VARIANTS
Ligases from nature such as Sortase and Butelase rely on a cysteine residue in the active site that forms a thioester with the acyl-donor peptide. Over 50 years ago the active site serine of a subtilisin protease was chemically converted to cysteine. Although this enzyme had an increased acylation (ligase) over hydrolysis (protease) rate, the enzyme activity was extremely low (Polgár and Bender, 1967). A few decades later, Wells et al. discovered that one additional mutation was required to reduce the steric crowding created by the thiol residue to restore the enzyme activity. This double mutant of a serine protease from Bacillus amyloliquefaciens, i.e., subtilisin BPN ′ , was termed Subtiligase (Braisted et al., 1997;Weeks and Wells, 2019). Although this mutant exhibits considerable ligase activity it still lacks satisfactory efficacy, as a huge excess of the acyl acceptor fragment is required to suppress substantial amounts of hydrolysis. However, its ligase specificity has recently been engineered in a proteome-wide screening approach, enabling the N-terminal labeling of diverse proteins (Weeks and Wells, 2017). During the past 10 years there has been a revival of the subtilisin based peptide ligases by the discovery of a novel Ca 2+independent and stable variant, termed Peptiligase (Toplak et al., 2016). This variant efficiently catalyzes peptide bond formation between a C-terminal ester [preferably carboxyamidometyl ester (Nuijens et al., 2016b), Figure 1.4A] fragment and an acyl-acceptor nucleophile with, in many cases, insignificant amounts of hydrolysis. Since the ester to amide conversion via a thioester intermediate (acyl-enzyme complex) is virtually irreversible, a theoretical quantitative yield of 100% can be achieved using a one-to-one molar ratio of the substrates. Peptiligase has a very high catalytic efficiency (<0.0003 molar equivalents required) and the enzyme can be easily obtained from Bacillus subtilis (>0.5 g/ L) (Pawlas et al., 2019). The ligation reaction of unprotected peptide fragments proceeds in aqueous media (neutral to slightly basic pH) at ambient temperature with extremely high average ligation yields (up to 98% in <1 h). Only a low molar excess of acyl acceptor (1.1-2 molar equivalents) is required (Schmidt et al., 2017b). Compared to other peptide ligases, Peptiligase is exceptionally thermostable (T M = 66 • C) and tolerates the presence of organic co-solvents [e.g., up to 50% (v/v) dimethylformamide (DMF)] and disrupting agents (e.g., 2 M urea or guanidinium chloride), therefore also enabling the ligation of poorly soluble or folded peptides (Toplak et al., 2016).
Peptiligase has six distinct substrate recognition pockets: four recognizing the C-terminal part of the peptide (S1-S4), and two involved in binding the N-terminal acyl acceptor part of the peptide (S1 ′ and S2 ′ ). After it's discovery, it was found that especially the S1' pocket was highly discriminating, only able to accommodate small AAs such as Gly, Ser, and Ala. However, using computational design and site-directed engineering, the substrate scope of this pocket could be radically broadened (Nuijens et al., 2016a). Several years of engineering focused on ligation efficiency and broad substrate scope resulted in the discovery of Omniligase-1 (Nuijens et al., 2016c). This enzyme provides an excellent basis for efficient and completely Frontiers in Chemistry | www.frontiersin.org footprint-free inter-and intramolecular peptide ligation for almost any peptide sequence. For instance, it was shown that Omniligase-1 could be applied for the synthesis of the 39mer pharmaceutical peptide exenatide in excellent yield (Pawlas et al., 2019). Most importantly, it was later shown that the enzymatic ligation technology using Omniligase-1 is scalable and robust enough for industrial application (Nuijens et al., 2016b). Exenatide was prepared at >100 gram scale with a quantified ligation yield of 88% using crude fragments made by chemical synthesis. The overall yield proved almost twice as high compared to established solid phase productions methods and the product was obtained within pharmacopeia specifications. Besides exenatide, it was shown that ligation to proteins or polymers, such as human serum albumin or the polymer XTEN is also possible. Using 4 equivalents of ester, over 95% N-terminal ligation efficiency leading to products >500 AAs could be achieved (Nuijens, 2016). Finally, besides peptides and conjugates, Omniligase-1 has been applied for the headto-tail cyclization of peptides. Peptides over 12 AAs, even when containing isopeptide bonds, polyethylene glycol or D-AAs in the sequence, were cyclised in over 95% efficiency (Schmidt et al., 2017a). In the same article, the one pot synthesis and folding of the natural occurring cyclotide MCoTI-II at multi gram scale was described as well as the combination of Omniligase-1 catalyzed cyclization with chemical rigidification using tris(bromomethyl)benzene. Later, other disulfide rich peptides such as kalata B1 and RTD-1 were synthesized and successfully folded to their native conformations . It was shown that due to the broad substrate scope and traceless ligation, different sites could be used to synthesize the cyclic peptides. Most recently, the cyclization technology was combined with small organic scaffolds and other ligation technologies such as oxime ligation and click chemistry (Richelle et al., 2018;Streefkerk et al., 2019). Combining enzymatic and chemical ligation technologies, tetracyclic peptides could be synthesized in a one pot fashion that poses two distinct biological activities.
In addition to Peptiligase variants with a broad substrate scope such as Omniligase-1, enzyme engineering efforts also yielded Peptiligase variants with redesigned substrate profiles that allow selective peptide couplings. For instance, variants that can discriminate between small and large side-chains, hydrophobic and polar or negative vs. positive charge. One example is the development of a Peptiligase variant for the synthesis of Thymosin-alpha-1, termed Thymoligase. This enzyme has a preference for a positively charged AA in P1 (Lys or Arg) and a negatively charged AA in P1 ′ (Asp or Glu) . Two crude 14-mer peptides could be ligated in high efficiency to make the 28-mer product, which could be isolated in >98% purity after one single preparative HPLC step. Besides peptides, the substrate specific ligases could also be used for the selective coupling to heterodimeric proteins, such as the heavy/light chain of antibodies or A-and B-chain of insulin (Nuijens, 2016). Antibodies with two different tags could be prepared with almost quantitative ligation efficiency and heavy vs. light chain selectivity. A summary of possible peptide ligation and cyclization reactions is illustrated in Figure 1.4B.
In conclusion, both Peptiligase and Subtiligase variants represent valuable tools in peptide-peptide ligation, as well as for the site-specific modification of proteins (Figure 1.4C) (Schmidt, 2019). In particular, Peptiligase variants such as Omniligase-1 have the potential to establish as the preferred method for the synthesis of long (pharmaceutical) peptides and protein-conjugates in a cost-efficient and environmentally friendly approach. Peptiligase-mediated coupling is scalable and can be used either as a versatile stand-alone technology or as an addition to chemical ligation methodologies (e.g., NCL) or intein-based protein ligation in both academic research labs and industrial settings.

RECOMMENDATIONS
Clearly, a diverse set of ligases is available for peptide ligation and cyclization, all with their own specific advantages and disadvantages. When an enzyme recognition motif in the ligation product is not an issue and one of the reactants can be used in high excess, Sortase mediated ligation is most straight forward. The enzyme is efficient and easily accessible to any laboratory. For peptide cyclization's Butelase is one of the most efficient enzymes known, although application could be challenging because the enzyme is hard to obtain. It is less efficient for intramolecular ligations since a large excess of one of the reactants is required and protecting groups might be needed. For protein labeling, Trypsiligase is a highly selective ligase, with a very specific recognition motif. Because of the similarities to Sortase (recognition motif, transpeptidation, excess of one reactant), the latter is more often applied simply because it is commercially available. For traceless ligation and cyclization of peptides, Peptiligases are the best option. There is no need for a large excess of the reagents and the enzymes are easy to produce. However, an active ester starting material is required, which is relatively easy for peptides but not straight forward for proteins.

AUTHOR CONTRIBUTIONS
All authors contributed to the writing of this mini review.

ACKNOWLEDGMENTS
Parts of the thesis: Enzymatic tools for peptide ligation and cyclization, Development and applications by MS are included in this article .